Custom Norms - Getting it Right

Building your own norms for psychometric tests is like assembling kit set furniture: it’s not hard, but if you don’t stick to the instructions you can end up in a right mess. This article explains the most important guidelines for building robust norms that give more meaning to test scores.

Why Custom Norms?

Generic norms are fine when you only need to (a) rank-order candidates, or (b) have an idea of where a candidate sits in a general population. But do you really have a clear picture of what that generic group looks like? And hence what test scores are really telling you about behaviour? You’re probably far more familiar with people in your own organisation or industry. Wouldn’t it be nice to know where a candidate sits relative to that group – a group that you know well? It could give you a much clearer and more concrete idea of how the candidate might behave and what they might be capable of. This is where specialised, custom norms come in.

Does Your Norm Group Represent?

The people in your norm group should be a microcosm of the group you want them to represent. So, don’t just blindly throw all respondents into your custom norm!

Does each respondent fit the bill? If your custom norm is “Acme Clerical Staff”, your norm group should only contain people who are – or at least have been –employees of Acme in a clerical role. Did respondents sit the test recently enough? It is best not to incorporate data that are obsolete, particularly if the shape of the organisation changes quickly over time. Is the overall norm group a fair representation? The mix of people should roughly match the group you’re trying to represent. Does it contain approximately the right mix of males and females? Ethnicities? Ages? Education? Job categories? Other potentially important attributes? If it doesn’t, find which groups are clearly over-represented and randomly remove respondents accordingly[1]. Did respondents sit the assessment in the same context that the norm will be used? Candidates who sat assessments in a selection context may respond differently from those who responded in a development context. Ensure you’ve got the right ones for how you’re going to use the norm.

Are Your Data Clean?

Norms are only as good as their underlying data. You need to look carefully at each respondent’s data, down to the level of individual item responses. Remove respondents who:

Didn’t pay attention. Watch for long sequences of the same response (e.g., 1, 1, 1, …), repeating response patterns (e.g., 1, 2, 3, 1, 2, 3, …), unusually low scores in ability tests, and overly moderate scores in personality tests (with high infrequency scores, if assessed). These indicate inattention. Didn’t complete enough items. If the number of items completed is extremely low, it suggests that the test was not finished properly. Were testing out the assessment. Sometimes, people complete assessments for testing or evaluation purposes. If you identify such instances, remove them. Have highly inaccurate biodata. Sometimes, respondents enter biographical data that are clearly inaccurate. Such instances affect your assessment of group composition. (If response data look fine, you can set these fields to “unknown” if you prefer not to delete the data.)

Do You Have Enough Respondents?

If you’ve followed the guidelines above, you can get away with having as few as 100 respondents in your norm group (see Tett et al., 2009) – but no fewer. In fact, for specialised custom norms, there is almost no advantage to having a norm group with more than 300 respondents.

Using the Norm Correctly


Kline, P. (1993). The handbook of psychological testing. New York: Routledge.

Tett, R. P., Fitzke, J. R., Wadlington, P. L., Davies, S. A., Anderson, M. G., & Foster, J. (2009). The use of personality test norms in work settings: Effects of sample size and relevance. Journal of Occupational and Organizational Psychology, 82, 639-659.

[1] It is even better to use techniques like stratified sampling of all attributes that correlate with test scores, as advocated by Kline (1993). Such techniques are usually reserved for larger, more generic norms, however.

See also