Back to blog

Ecommerce marketing blog: AI

How I graded 50 university exams with the help of AI

Karl Philip Lund
27.06.2023

In the Spring semester of 2023, I used AI to help me assess 50 student assignments, varying in length from 30 to 60 pages.

Other assessors told me it took between 30 and 60 minutes to review each assignment. In total, they spent about 50 hours on all the assignments combined.

My approach reached the national press in Norway (VG) and has been debated in an independent news paper for higher education and research(Khrono). The approach has also been criticized by AI researchers at University of Oslo and Bergen (Morgenbladet).

After assessing 50 student assignments, I am convinced that the use of AI increases both the speed, fairness, and quality of the assessment process.

I used a tool called Humata to help me identify and summarize the important parts of the documents. I did this by creating prompts based on the grading criteria. After that, I reviewed each section and made my evaluation.

My process:

A. Find the right AI tool B. Define basic quality signals and assess the overall quality C. Ask the document questions using AI D. Manual evaluation

Here is the story behind how I did it:

A. Finding the right AI tool

I spent a lot of time researching AI tools to assist me in grading student assignments.

New apps emerge every week, and it was difficult to find a good alternative.

Eventually, I found Humata, a ChatGPT for documents. The service, developed by a former Stanford student, is used by educational institutions worldwide.

Humata claims that the tool can help do research 100x faster! Humata focuses on privacy, and document data is not used to train the language models. Humata deletes all data sent to/from the model after 30 days.

While it took some time to learn how to use Humata, the benefits are well worth it. Now, I can assess an assignment in 10-15 minutes, compared to the 30-60 minutes it used to take!

B. Define basic quality signals and assess the quality

Before using AI in the grading process, I analyzed an "A" assignment from other assessors. I identified quality signals that I used when reviewing the assignments. Quality signals are indicators that search engines like Google use to rank content.

Basic quality signals in the A assignment:

1. Title: Does the title of the document describe the content of the document? 2. Table of contents: Does the table of contents provide a good overview of the content of the document? Is it too short or too long? 3. Introduction/summary: Does the assignment contain a brief introduction that gives the reader a short introduction to what the assignment is about? 4. Headings: Does the document contain good, descriptive headings that make it easy for the reader to scan the document? Is the document logically structured? 5. Figures/images: Does the document contain visual elements that make it easier to read the document? Are the visual elements unique (self-developed) and well-designed throughout the document? Are there relevant captions that describe the content of the figures? 6. Tables: Are there good tables that are easy to read? Are there clearly defined column headings? 7. Formatting: Are the paragraphs in the document formatted and organized in a way that makes it easy to read? 8. Spelling errors: Student submissions should not contain spelling errors. Spelling errors are easy to avoid by using spell check and having someone read through and correct spelling errors. 9. References: Does the document refer to the curriculum and other credible sources? (For example, a source containing "gclid" indicates that the student has referred to a paid Google ad. Paid ads are generally not credible sources.)

You can assess the quality of a document by comparing the content against quality signals.

Now let's discuss the use of AI in grading student assignments!

C. Ask the document questions using AI

After assessing the quality of the documents, the next step was to engage with the content using AI.

My initial thought was that AI could evaluate the assignments based on the assessment criteria.

I soon realized it's not that simple.

I asked the Humata founders for help in writing good prompts:

"Hey. I'm testing out Humata. I have uploaded 48 reports written by students. I'm trying to use Humata to evaluate the quality of the reports. Do you have any suggestions for prompts that I can ask to get a quick overview of the quality of the reports?"

I received the following response:

"Hi Karl, this is a novel and creative use case. I don't have any suggestions for evaluation at bulk at the moment. Although, it would be great at evaluating the papers on a one-by-one basis. If there is a specific criteria for evaluation you can ask it that. For example, "Does this paper explain X in relation to Y?”

For instance, does the student discuss [X]'s role in the formation of [Y] and give reasons to support their conclusion?"

I then asked the founder the following question:

"I would like to do the following: Upload a student submission that is considered to be an "A" submission, high quality based on the grading criteria by human evaluators

Then I would like to use this to grade other similar submissions.

Is this a potential use case for Humata?"

I received the following response:

"Hi Karl, yes, this is a potential and very creative use case for the multi-document analysis mode."

Me:

"Can you suggest a few prompts that could be used to test this?"

Answer:

"Since this is such a new use case I don't have any particular prompts that come to mind. If you have criteria that you used to deem the paper an "A", then you can turn such criteria into questions in which to ask other documents. That is the first thing to come to mind.

Another experimental strategy is to ask Humata to generate the prompt given your criteria. Although I would tread carefully with this direction."

I tested many prompts before the tip from the Humata founder-led me in the right direction.

I read the grading criteria for the exam assignment and tried to convert the key criteria into sensible prompts.

This was an a-ha moment for me.

I tested different prompts.

For example, the criterion:

"Apply the curriculum and what you have learned throughout the semester"

I converted it to:

"Has the student applied the following curriculum: Iversen, Aalen, Hanlon, Furu"

This prompt should be easy to answer, and it surprised me that Humata did not give me a correct response. Humata told me that the AI model should be able to respond accurately to the prompt, but that the system is under development.

Eventually, I found that Humata worked well to extract relevant excerpts from the documents based on my prompts.

I found that the better the assignment, the better Humata was able to respond! For the Humata founder, this was not a surprise:

"Better submissions will perform better with Humata because there is more context from the writing to pull from."

I was on the right track! I wrote 7 prompts that dealt with different parts of the submissions. For each submission, I used the same prompts. I systematically went through each assignment, read through the texts, and thus gained a good overview of the submissions. The method also made it easier to compare the documents against each other!

D. Manual evaluation

After reviewing the quality signals and using Humata with AI prompts, I read through the reflection section. By reading a small part of the text, I could understand what the rest of the writing in the document looked like. This part was about checking if the student's writing flowed and made sense.

Next steps

AI is a game-changer when it comes to assessing student assignments. With the right AI tool and a set of basic quality signals, you can quickly evaluate the quality of documents and extract relevant excerpts.

Combining AI and manual assessment can significantly reduce the time and effort required to grade large numbers of assignments, making it more efficient for both students and educators. So let's keep exploring how AI can be used in education to maximize its benefits while minimizing its limitations.