Best Practice

Interpreting the outcomes of standardised tests

Many schools use standardised tests as part of their assessment practices. To help you get the most out of standardised tests, Liz Twist outlines some of the key terms and information

Thousands of primary schools choose to use standardised tests as part of their approach to assessment. For many, the benefit lies in the reliable outcomes, the results of the tests having been trialled with a large nationally representative sample during development. Standardised tests also enable pupil performance to be benchmarked against the national average and meaningfully compared with other pupils and standardised scores from other tests.

While most tests will provide a raw score (the actual mark or score obtained by a pupil), these do not enable meaningful comparisons between tests or between pupils. From standardised tests there are at least three further outcomes that can be obtained: standardised scores, age-standardised scores, and age-related expectations.

It is easy to confuse standardised scores with scaled scores, and to misinterpret the results without appreciating the role that confidence bands have to play. To help you get the most out of standardised tests, below is an outline of the key terms you need to know.

Department for Education scaled scores

At the end of key stage 1 or key stage 2, the scaled score of 100 on the national curriculum tests represents the “expected standard” as defined by the Department for Education (DfE). This is not the average and is not the same as, nor equivalent to, a standardised score of 100. For standardised tests, a score of 100 represents the average performance, based on a normal distribution, of the sample of pupils on which the tests were standardised.

Standardised scores

Standardised scores compare a pupil’s performance to that of a nationally representative sample of pupils from the relevant year group, who will have all taken the same test at the same time of year.

The average score on most standardised tests is 100. Technically a score above 100 is above average and a score below 100 is below average. About two-thirds of pupils will have standardised scores between 85 and 115. Almost all pupils fall within the range 70 to 140, so scores outside this range can be regarded as exceptional.

If you wish to group pupils according to standardised (or age-standardised) scores, the following descriptions may be useful. These may vary between test providers, but this example from NFER tests gives you an idea of what the range of scores may mean:

Confidence bands

Confidence bands (sometimes called confidence intervals) are used to show the extent of the margin of error in the standardised scores. In other words, how accurately the test measures a pupil’s attainment. The margin of error is simply a statistical estimate, based on the fact that tests can only sample the particular area of learning which they assess and therefore the score a pupil achieves may vary within a few points of their “true score”.

In NFER tests, to indicate how wide this margin of error is likely to be, a “90 per cent confidence band” has been calculated. This means that you can have 90 per cent certainty that the true score lies within the confidence band.

Age-standardised scores

These follow the same principle as standardised scores in that they are comparing performances of pupils based on their raw (total) score. However, age-standardised scores take the pupil’s age into account and compare their performance with that of pupils of the same age at the time of testing (in years and months).

Again, this uses information derived from the large scale trial. In practice, age-standardised scores mean that, with two pupils who have the same raw score, it is likely that the younger pupil will have a higher age-standardised score.

Age-related expectations

The Standards and Testing Agency (STA) scaled score of 100 on the year 2 and year 6 national curriculum tests represents the “expected standard” for the end of the relevant key stage. It is inappropriate to apply this standard to tests in other year groups when pupils have not been taught all the relevant content.

Instead, in order to provide a curriculum-related outcome, some standardised test providers undertake a standard setting exercise. NFER uses “bookmarking”, an internationally recognised procedure that combines statistical information from the large scale trial with the judgements of groups of teachers who scrutinise the new assessments.

As part of this exercise at NFER, teachers worked with the test developers to identify the knowledge, skills and understanding that can be expected by the end of a given year, in the 2014 national curriculum.

This information was combined with statistical information from the large trial to arrive at a guide to the number of marks a pupil needs to achieve on a particular test in order to have achieved an appropriate standard on the curriculum, given that they are part way through the programme of study. A range of marks, rather than a definitive mark, is published.

Continuing with bookmarking, teachers also scrutinised the tests to look at high achievement and this was combined with the statistical information to arrive at a range of marks. This range, generally of three or four marks, gives an indication of a pupil’s standard of achievement not in comparison to his or her peers (which is what standardised scores do) but in relation to the expectations of the national curriculum for that particular year group.

In NFER’s view, it is important that teachers use their professional judgement when interpreting test outcomes and for this reason a range of marks is used to suggest where the age-related threshold lies.


An example of how to interpret results


Emma’s date of birth is November 27, 2008, and she took the year 4 summer maths test on June 12, 2017, scoring 64.

Jay, whose date of birth is March 3, 2009, took the same test on the same day and scored 68.

Emma’s standardised score is 109. With a confidence interval of –5 and +4, there is a 90 per cent likelihood of her “true” score being between 104 and 113 and her performance on the test could broadly be described as “high average”.

Jay’s raw score of 68 converts to a standardised score of 111 which is also “high average”. The confidence band around Jay’s score (also –5 and +4) indicates that his “true” score has a 90 per cent likelihood of being between 106 and 115.

Their age-standardised scores are 114 for Emma and 118 for Jay. This takes into account the difference in their ages.

A total score of 64 suggests that Emma is comfortably reaching age-related expectations as measured by the summer year 4 maths test. Jay’s 68 suggests that his teacher should consider whether other evidence of his work supports a grading of “high achievement” as he is at the borderline between the age-related expectation and the high achievement band.

Conclusion

By utilising standardised tests and applying their own professional judgements when interpreting the results, teachers can build a profile of attainment and progress for their pupils and be confident in their conclusions and next steps. Standardised tests should form just one part of a school’s approach to assessment, with on-going formative assessment informing teaching throughout the year. But when it comes to choosing summative assessments to assess learning at the end of a teaching period, high-quality standardised tests can ensure the data gained is reliable and meaningful.

  • Liz Twist is head of assessment research and product development at the National Foundation for Educational Research (NFER).

Further information

If you found this valuable and would like further guidance to help the teachers in your school to brush up on their understanding of assessment, there is a wealth of free support on the NFER website. You can also sign up to receive a series of free assessment guides direct to your inbox this autumn. Visit www.nfer.ac.uk/assessment-hub

NFER Research Insights

This article was published as part of Headteacher Update’s NFER Research Insights series. A free pdf of the latest Research Insights best practice and advisory articles can be downloaded from the supplements page of this website: www.headteacher-update.com/supplements/