The tested individual and population statistics. An exploration

Test X predicts achievement Y in school A. Peter scores x on that test, and is therefore denied access to A. How unfair can that be? (Hint: predictive validity is a characteristic of test x population x situation; the test score is one of a particular human being)

The case of admission to the Civil Service: Francis Y. Edgeworth (1888). The statistics of examinations. ‘Journal of the Royal Statistical Society, 51, 599-635. https://www.jstor.org/stable/2339898 The exam is a lottery offering better chances to the better prepared, which is not unfair.

626: Even when we have abolished the order of merit within the honour class, there remains an inevitable injustice in excluding those who are just below the boundary line of that class. Can nothing be done to mitigate this hardship? Ought not the excluded candidates to have at least a chance of entering within the pale, corresponding to the probability that they really deserve to be there? Might it not be permitted to those who are in the neighbourhood of the boundary to ballot for a certain number of places? In this lottery the chances should not be equal for all, but graduated according to the probability above determined that each candidate ought by rights to be in the honours class. The Calculus affords a neat expression for this probability. (…) The probability thus measured ranges from 1/2, an even chance, to zero, which represents the certainty that candidates at a certain distance from the Honour line are placed in the right category. It would be easy to contrive a composite sort of ballot box, by means of which the examiners, met in solemn conclave, should settle the position of the candidates about whom there might be any doubt. This elegant piece of mechanism may have attractions for the admirers of educational machinery. But on reflection it will be found that this additional wheel is superfluous. The candidates have already had their chance. [note 39: … In urging the need of some such remedy Mr. Elliott dwelt on the anomaly that, at many of the Civil Service Examinations, a serious difference in the income and position of the candidates turns upon differences in the aggregate of marks which, according to the reasoning in this paper, must be regarded as largely or altogether accidental.] A public examination is already a sort of lottery of the graduated species which I have been describing: one in which the chances are not equal, but are better for the more deserving; increasing with the real merit of the candidates up to a degree of probability which amounts to certainty. It is a species of sortition infinitely preferable to the ancient method of casting lots for honours and offices.

On lotteries in education, see http://www.benwilbrink.nl/projecten/lottery.htm In the Netherlands admission to studies with a restricted number of slots was regulated by weighted lottery from 1975-2017 (since then selection is by whatever instruments are deemed opportune, lotteries being outlawed!)

Another way to express the random component in testing goes this way: the cut-off (pass-fail) score x is the score where the decision maker (teacher; school; institution) is indifferent between passing or failing students scoring x. Read this again.
Indifferent: the expected utility of a pass equals that of a fail. Whose utility? That of the institution (or that of the teacher in behalf of). For technicalities see http://benwilbrink.nl/publicaties/80bGrensscoresTOR.htm [in Dutch] Expected utility is a group statistic. Is it fair to fail Peter scoring x?

The (written as well as unwritten) law sees on the rights and duties of schools and individual pupils. For the Netherlands see [in Dutch]
C. W. Noorlander 2005 ‘Recht doen aan leerlingen en ouders’
http://www.wolfpublishers.com/book.php?id=291;
M. Job Cohen 1981 ‘Studierechten’
http://benwilbrink.nl/projecten/toetsvragen.8.htm#Cohen_1981

Flunking grades. Dutch schools have detailed regulations (more detailed than just a GPA) on the grades that give pupils a pass to the next class. Peter is 0,1 point short. The team decides to flunk him, without further motivation. Correct? Noorlander:

Click to access Noorlander14.5zittenblijven.pdf

The Noorlander quote is in Dutch. The main point of Dutch law here is: schools have a best efforts obligation, pupils are responsible themselves for their own achievements or lack thereof. WTF? I’ll have to research its historical roots (sometime, not now ;-).

There is a serious inconsistency between attributing responsibility primarily to pupils, and yet flunking them without giving pupils (and their parents) voice. Whose fault is it that Peter has to repeat class: the decision was made by an algorithm (school rulings), not by Peter.

Dutch schools are free to choose their own regulations. Retaining pupils must not depend on only the letter of regulations ( = the algorithm), however. Every individual decision needs a motivation by the school team that is sufficient to justify it. Case
https://onderwijsgeschillen.nl/uitspraken/03098-po

Jurisprudence on class retention (Dutch):
https://onderwijsgeschillen.nl/thema/bevordering-doubleren-en-doorstromen-het-voortgezet-onderwijs
Google for case law in your own country, f.e. Who Grades Students? Some Legal Cases, Some Best Practices
http://umich.edu/~aaupum/Euben.html

It is a general principle of justice that drastic decisions (failing pupils) be motivated materially (i.e., just following a rule is not a motivation), having heard both sides. In seemingly evident cases just applying the rule might be too harsh, given personal circumstances.

Whether or not the team has discussed the case of Peter, if it is communicated that he has to repeat class just because of being 0,1 point short then Peter has been done an injustice. [Discussion of effectiveness of grade repetition itself is another question. (not effective)]

A difference of 0,1 point (in whatever the metric is) can not justify treating two persons differently: failing the one, passing the other. A. D. de Groot (internationally known for his work on the thinking of chess grand masters ‘Thought and choice in chess’) understood that much, but it was a paradox for him. How else can this kind of decision be justified? He had no answer.

Observe that this case is not the same as that of Edgeworth’s Civil Service examination. Here Peter is denied an educational opportunity (if he and his parents think so) on the basis of arbitrariness. The educational/human rights of Peter are in the balance now.

Just in case you might think this all is just theory: a case hanging on 1 point in a secondary school examination is now being considered by the Supreme Court of the Netherlands (for Dutch readers, info:
https://benwilbrink.wordpress.com/2017/09/20/examenonrecht-en-effet/

The curious thing about this examination: one runs the risk of failing it. Why do we accept the fuss and enormous costs of failing pupils on their exam? After all: they get a diploma and a listing of the grades received. That should be enough! Proposal:
https://www.telegraaf.nl/watuzegt/641511281/geef-eindexamen-een-ceremonieel-karakter

The proposal is nothing new: in the middle ages students were admitted to an examination as soon as their master deemed them ready. Our academic promotion is an examination one can’t fail on. A history of assessment (in English):
http://benwilbrink.nl/publicaties/97AssessmentStEE.htm
Grading = ranking, really!

A treasury of insights is Cronbach & Gleser’s classic 1965 (2nd) ‘Psychological tests and personnel decisions’ Reviewed:
https://www.journals.uchicago.edu/doi/abs/10.1086/402166)
Main distinction: between interests of employers served (mainstream psychometrics) or those of patients, clients, students (advisory).
Find Cronbach & Gleser on https://abebooks.com

Another treasury is Hanson 1993 ‘Testing testing. Social consequences of the examined life’ about abuse of tests (crushing the individual) by employers, institutions, schools. Open access:
https://publishing.cdlib.org/ucpressebooks/view?docId=ft4m3nb2h2;brand=ucpress

This book’s sociocultural perspective on testing has generated two basic theses. One is that tests do not simply report on preexisting facts but, more important, they actually produce or fabricate the traits and capacities that they supposedly measure. The other is that tests act as techniques for surveillance and control of the individual in a disciplinary technology of power. This concluding chapter extends the analysis and critique of these two properties of tests and offers some suggestions as to what might be done about them. [p. 284]

Important case: The lie detector. It will label more testees as liars than there are true liars. That should be an important lesson for everyone using psychological tests or having to sit them. Representatives of the American Psychological Association explained it in testimony:

Representatives of the American Psychological Association explained it in testimony before the House Subcommittee on Employment Opportunities in the following way:“Assume that polygraph tests are 85 percent accurate, a fair assumption based on the 1983 OTA report [on the validity of lie detector tests]. Consider, under such circumstances, what would happen in the case of screening 1,000 employees, 100 of whom (10 percent) were dishonest. In that situation, one would identity 85 of the dishonest employees, but at the cost of misidentifying 135 (15 percent) of the honest employees. As you can see, in this situation the polygraph tester identifies 220 “suspects,” of whom 61 percent are completely innocent. It can be shown mathematically that if the validity of the test drops below 85 percent, then the misidentification rate increases. Similarly, if the base rate of dishonesty is less than 10 percent, and it most likely is, the misidentification rate increases. It is obvious that in the employment screening situation it is a mathematical given that the majority of identified “suspects” are in fact innocent!”
[p. 81-82]

General case: “Collective statistical illiteracy refers to the widespread inability to understand the meaning of numbers.” F.e. in medical diagnosis (essentially identical to assessment in education!): Gigerenzer a.o. 2008 ‘Making sense of health statistics’ pdf:

Click to access health_stats.pdf

Tests and assessments are forms of institutional violence, never mind the good intentions expressed by stakeholders. Violence may be justified, like surgery. Or not. Whatever the case, there should not be any secrecy about tests and exams:
https://www.researchgate.net/publication/234728727_The_Prices_of_Secrecy_The_Social_Intellectual_and_Psychological_Costs_of_Current_Assessment_Practice_A_Report_to_the_Ford_Foundation

Documentation on test secrecy in the Netherlands (in Dutch)
http://benwilbrink.nl/projecten/geheimhouding.htm
A special case was the #rekentoets (a math test as part of exit exams in Dutch secondary education, now discontinued), kept secret ‘in the national interest’ (yes, you read that correctly!). Crushing the rights of the individual examinee is no problem, then …..

In the Netherlands most children in primary education get tested a number of times every school year: standardized aptitude tests reported in terms of percentile groups the pupil scores in. Some schools stream children based on those results. Fair? No way.
Telling a child repeatedly that it belongs to the 10% least clever kids in the country is nothing less then psychological abuse, a serious breach of the child’s rights to a quality education. Schools should only test with permission of the parents. ‘Refusing the Test: Opting Out of Standardized Testing’
https://www.justiceinschools.org/opting-out-standardized-testing

I came across a passage by Alexander Astin (tehe one expert knowing all and everything about higher education in the States) on the wide-spread educational evil of comparing pupils to other pupils. Therefore, supporting my previous argument about the abuse of standardized aptitude tests in Dutch primary schools, this quote:

The use of grades and test scores for admission to higher education has serious equity implications beyond the competitive disadvantage that it creates for certain groups in the college admissions process. Because the lower schools tend to imitate higher education in their choice and use of assessment technology, there is a heavy reliance on school grades and standardized tests all the way down to the primary schools. Given the normative nature of such measures (students are basically being compared with each other, see chapter 3), students who perform below ‘the norm’ are receiving important negative messages about their performance and capabilities. At best, they are being told that they are not working hard enough; at worst, they are being told that they lack the capacity to succeed in academic work. A young person who regularly receives such messages year after year is not likely to view academic work in a positive way and is certainly not likely to aspire to higher education. Why continue the punishment? In other words, it seems reasonable to assume that the use of normative measures such as school grades and standardized test scores cause many students to opt out of education altogether long before they reach an age at which they might consider applying to college.
Alexander W. Astin (1993). Assessment for excellence: the philosophy and practice of assessment and evaluation in higher education. American Council on Education. p. 194-195 (in Ch 10, Assessment and equity)

The problem is labeling the individual child on the basis of group statistics. Remember the lie detector case?

Nobody has a right to be admitted to the Civil Service: an examination is fair.

Children’s rights on a quality education have been agreed upon in international treaties.

Next I’ll try to disentangle the thinking of some behaviour geneticists that heritability of years of schooling (a population statistic) implies that school curricula should be individualised (given pupil’s genome), using the debate on Plomin’s ‘Blueprint’.

Warming-up, today’s blog in Quillette on correlation ≠ causation:
‘The Other Crisis in Psychology’

The Other Crisis in Psychology

(via @RichardPPhelps )

For the behaviour genetics of cognition (IQ), see for example Briley & Tucker-Drob 2013 ‘Explaining the Increasing Heritability of Cognitive Ability Across Development. A Meta-Analysis of Longitudinal Twin and Adoption Studies’
https://www.jstor.org/stable/23484670?seq=1#page_scan_tab_contents
Lots of ‘influences’ in that paper, yet the only data available are correlations. Stick to correlations. “In early childhood, increasing genetic influences on cognitive ability can be attributed to innovative genetic influences.” Influences on = correlations with. Cognitive ability = differences in cognitive ability.
The data are not experimental, only correlational. Be warned: even highly heritable traits like weight are highly malleable. So is IQ: remember the Flynn effect? Stop educating youth and the country’s human capital will drop to archaic lows. So, don’t jump to easy conclusions!

Heritability of intelligence, or of length of schooling, is not a characteristic of intelligence (or length of schooling) per se, only of intelligence in our contemporary society. Health care influences IQ heritability; better health care heightens IQ heritability. Really! [heritability is the proportion of total variance; health issues contribute to that variance; beter health therefore heightens the heritability of IQ]

Robert Plomin’s ‘Blueprint’, info:
https://www.penguin.co.uk/articles/2018/robert-plomin-on-understanding-our-dna.html

The power of genetic research comes from its ability to detect the effect of these inherited DNA differences on psychological traits without knowing anything about the intervening processes. [p. viii]

This is extremely problematic:
(1) ‘Without knowing anything about the intervening processes’ [#blackbox] claims about causality (‘effects’) are idle, and therefore dangerous.
(2) Talk about differences without considering absolute levels is misleading. (contingent on societal institutions: NHS, schools).

Plomin, on the Penguin page: “The evidence for the importance of genetics itself calls for a radical rethink about parenting, and education, and society” This man is dangerous. What’ll happen to the six-year old in primary school, having had his genome screened in the Plomin way?
Let me look at commentaries on Plomin’s extreme position by some of his colleagues. For example this one on causation: Eric Turkheimer (July 6, 2019). Behavior causes genes!

Behavior Causes Genes!

[I once asked Denny Borsboom how a DIFFERENCE can be a cause. Surprise!]

Turkheimer refers in his blog to a paper on causation that will serve to defuse the Plomin simplification of (differences in) years of education, intelligence, depression, being ‘caused’ by genes: Craver & Bechtel, Top-down causation without top-down causes. Pdf:

Click to access top_down_causation_0.pdf

Turkheimer, on Plomin and his ‘Blueprint’:

But overstating the science of human behavioral genetics comes with the greatest price imaginable: it encroaches on human freedom and justice. [Turkheimer, 2019, ‘The social science blues’ pdf:
http://www.people.virginia.edu/~ent3c/papers2/Social_Science_Blues.pdf ]

In another blog, June 11 2019

Causation and Mechanism

Turkheimer formulates an answer to the problem posed at the start of this thread: is it fair to judge and treat an *individual* pupil (given her IQ, or DNA) on the basis of *population* statistics? It isn’t. Look:

Try it this way. Let’s say your SAT scores are half a standard deviation higher than mine. Your EA is also half a SD higher, and your parents made half a SD more money than mine. Your PGS for SES is also half a SD higher than mine. All of these predictors are correlated with each other in the population. Question: why are you smarter than me? The answer is, we have no freaking idea, unless we are happy with old-fashioned platitudes about genes and environment working together. And given that we don’t know, how could it be a good idea to declare as scientists that my low SATs are caused by my PGS, or have a school make decisions about my curriculum based on them? On the other hand, if we had a working dopamine-sprayer model, and I had the bad sprayers, you would have a basis for attributing my low scores to my genes by way of my neurons. It would be like assigning individuals with Down Syndrome to special curricula. Still some room for unfairness, but understandable and scientifically sound. [my emphasis, b.w.]

Some references on ethics

The references are about ethics in psychology. Do not be mistaken about this restriction: if you as a teacher are doing things or judging pupils that properly belong to the domain of psychology, you are bound by codes of conduct in psychology in the same way that your acting as a teacher is bound by written and unwritten law. You may not know of any codes of conduct or rules of fair treatment, yet you are supposed to treat or judge pupils appropriately.

Gerald P. Koocher & Patricia Keith-Spiegel (Eds.) (1998 2nd). Ethics in psychology. Professional standards and cases. Oxford University Press. Now in its 4th edition, 2016 (see Ch 7. Psychological assessment: testing tribulations 145-170; Ch 14. Ethical dilemmas in specific work settings: juggling porcupines 340-360):
https://global.oup.com/academic/product/ethics-in-psychology-and-the-mental-health-professions-9780199957699

American PsychologicalAssociation (1992). Ethical principles of psychologists and code of conduct. American Psychologist, 47, 1597-1611. Online:
https://www.apa.org/ethics/code/

Donald N. Bersoff & Paul J. Hofer (1995). Legal issues in computerized psychological testing. In D. N. Bersoff (Ed.) Ethical conflicts in psychology (291-294). American Psychological Association. Pdf:

Click to access b51488673bbeffd09122d8f9c29d0487f3a5.pdf

Edward H. Haertel (2009). Reflections on Educational Testing: Problems and Opportunities. Prepared for the Carnegie Corporation of New York-Institute for Advanced Study Commission on Mathematics and Science Education. Pdf:

Click to access b9ca12a8-9d04-404d-87ae-1e0013ff1bcb.pdf

Walter B. Pryzwansky & Donald N. Bersoff (1978). Parental consent for psychological evaluations. Legal, ethical, and practical consideration. Journal of School Psychology, 16, 274-281. Pdf:
https://sci-hub.tw/10.1016/0022-4405(78)90011-0

Rodney L. Lowman (Ed.) (1998). The ethical practice of psychology in organizations. American Psychological Association. Info 2nd edition 2006:
https://www.apa.org/pubs/books/4312006

Jannette Elwood (2013) Educational assessment policy and practice: a matter of ethics. Assessment in Education: Principles, Policy & Practice, 20:2, 205-220. Pdf:
https://sci-hub.tw/10.1080/0969594X.2013.765384

Stephen J. Ceci & Paul B. Papierno (2005). The Rhetoric and Reality of Gap Closing. When the ‘Have-Nots’ Gain but the ‘Haves’ Gain Even More. American Psychologist, 60, 149-160. Pdf:
<a href=”https://pdfs.semanticscholar.org/6f57/d7ac5f2ea185bc054058aff338e57c487a6f.pdf?_ga=2.197512815.1922410799.1564946654-1581951397.1541585461

Code of Fair Testing Practices in Education (Revised). Working Group of the Joint Committee on Testing Practices. Educational Measurement: Issues and Practice, 24 #1, 23-26
https://sci-hub.tw/10.1111/j.1745-3992.2005.00004.x

Jeannie Oakes (2005). Keeping track. How schools structure inequality. Yale University Press, second edition 2005 (new preface, extra chapters discussing the ‘tracking wars’ of the last 20 years). Review:
http://fcis.oise.utoronto.ca/~daniel_schugurensky/assignment1/1985oakes.html

Keeping Track provides a vast amount of evidence to show that tracking does not alleviate attitude and behavior problems among students, but rather aggravates them, and forcefully demonstrates the ways in which track placements are often inaccurate, inappropriate, biased, and unfair.

Roxanne Amanda Korthals (2015). Tracking Students in Secondary Education. Consequences for Student Performance and Inequality. Dissertation Maastricht University. Open access:
https://cris.maastrichtuniversity.nl/portal/en/publications/tracking-students-in-secondary-education–consequences-for-student-performance-and-inequality(ba5ba4e9-f8b6-4c52-bf75-c6e37e4c6c7e).html

Alison Bernstein (April 4, 2018). Risk In Perspective: Population risk does not equal individual risk. https://scimoms.com/population-risk-individual-risk/

Jonathan Michael Kaplan & Eric Turkheimer (2021). Galton’s Quincunx: Probabilistic causation in developmental behavior genetics. Studies in History and Philosophy of Science Part A https://www.sciencedirect.com/science/article/abs/pii/S0039368121000455

Over die quincunx van Galton, zie:

Stephen M. Stigler (1989). Francis Galton’s Account of the Invention of Correlation. Statistical Science, 4 #2, 73-79. open access: https://projecteuclid.org/download/pdf_1/euclid.ss/1177012580

Fair schooling & assessment

The tested individual and population statistics. An exploration

Some references on ethics

Leave a comment Cancel reply

Some references on ethics

Share this:

Leave a comment Cancel reply