Notes: The calibration of student judgement through self-assessment

Reference: Boud, D., Lawson, R., & Thompson, D. G. (2015). The calibration of student judgement through self-assessment: disruptive effects of assessment patterns. Higher Education Research & Development, 34(1), 45-59.


  • Effective judgement of own work is an important attribute for HDR students
  • Focused on Self-Assessment (also represented as Self-regulation and Metacognition in few works)


  • Self assessment is not facilitated using systematic activities, but rather thought to be achieved from normal tasks.
  • Students are not given feedback on their judgement.
  • Ensure that different assessment methods don’t distract students’ ability of learning and judgement.


  • To investigate if students’ judging improves over time through criteria-based self-assessment in the given units of study in two parts:
    1. Replicate improvement of student judgement over time with more data from different disciplines (Repeat questions 1- 4)
    2. Investigate improvement of judging skill over sequential modules and analyze based on assessment type, assessment criteria (New questions 1-4)

Method/ Context:

  • Voluntary self assessment of students in authentic settings using an online assessment system ReView™.
  • Percentage marks from continuous sliding scale stored – of both students and tutors.
  • Data from 5 year period in Design, and 3 year period in Business course from two Australian universities.
  • 182 design students, 1162 business students.

Results and Discussion:

Repeat Q1: Does accuracy in students’ self-assessed grades vary by student performance?

  • Ability level – High (distinction or high distinction), low (fail) and mid (pass or credit)
  • p values
  • Yes. Low ability students did not improve judgement over time. Significantly higher improvement for mid ability students (Students matched tutors grading)

Repeat Q2: Does students’ accuracy in self-assessment relate to improved performance?

  • Accuracy levels: Over-, accurate and Under-estimators
  • One way ANOVA
  • Accurate estimators had increased scores over time

Repeat Q3: Do students’ marks converge with tutors’ within a course module?

  • Series of paired t-tests
  • Convergence found in design data, but not in business data
    • Design assessment tasks are scaffolded to lead one to other, but business uses different modes of assessment within a course module.

Repeat Q4: Does the gap between student grades and tutor grades reduce over time?

  • Yes. Students’ judgement improves over the time of the degree programme, but not very useful as the convergence happens almost at the end of the degree programme.

New Q1: Does the gap between student grades and tutor grades reduce across modules designed as a sequence?

  • No data for design and reduction in gap (erratic patterns with no gradual reduction) between student and tutor grades for business with sequential modules
  • Leading to examine the type of assessments

New Q2: Does mode/type of assessment task (e.g., written assignment, project, and presentation) influence students’ judgement of grades?

  • Data was inconsistent, despite showing earlier convergence for few assessment types. Most assessment types saw convergence in iterations 2 or 3 (Refer Table 1 from the original paper).
    • Could be due to difference in tasks within each assessment type

New Q3: Does analysis of criteria that relate to type of assessment task influence students’ judgement of grades?

  • Consistent and related criteria for particular assessment type fosters faster calibration in iterations 1 or 2 (Refer Table 2 from the original paper).


  • Criterion is provided for assessment since students are not experts, however a holistic evaluation of own work is recommended.
  • Not possible to identify the cause for improvement from independent measures.
  • Whole population of students is not included, especially the less engaged students who might be low achieving.
  • There might have been other informal factors (not measured in this study) to influence the results like the following: comments received from staff, discussions with peers, and students’ own aspirations.