My First Attempt to Build an Online Autograder for SWI-Prolog
This is a story about my online autograder project that is designed for SWI-Prolog. As a Logic Programming Teaching Assistant, this is not as simple as giving input and then comparing the output with the expected output. Are you asking why? Let's take a look then.
First thing first, before you read this story, remember that this is a story, not a technical report. At this point, you should understand what to expect and what to not expect, right? Well, whatever.
Logic programming is a programming paradigm. Learning a new paradigm is difficult. That's exactly because we need to change how we usually think when programming a solution. I remember the first time I learn how to program. It is about 10 years ago that I wrote my first program in VB6. At that time, I just followed instructions from a book. I don't know anything about the programming paradigm at all. What I know is that when the user does some events, then my codes will run. That is how event-driven programming works. Later on, I learn how to program a website in PHP. Again, I just followed instructions from a book and I think that it is kinda weird since my code is running by itself. However, I understand that it is followed my instructions from the beginning of the file until the end of the file. In general, this is how imperative programming works. Not so long in the future, I learn how to program with OOP in mind. What I want to say is that I need to change my way of thinking, again and again, every time I learn a new programming paradigm. It took me some time. Yes, even when we already know how to program, when we actually learn a different paradigm and actually use one, we once again learn how to program almost as if we never learn it in the first place.
That is the main reason why my lecturer (Muhammad Arzaki) in the Logic Programming course put the I/O chapter in the middle of the term instead of putting it at the beginning of the term. Ah, have you ever visited the Haskell (programming language) homepage before? What you will see is not a hello world program but a prime number generator program. Wonder why? I'm also wondering why but if you ever learn Haskell and actually look at the theory behind I/O actions such as putStrLn then you will know that it is a complex one. What the hell is Monad anyway? Do you get my point here? Well, the point is that the Haskell (programming language) homepage kinda shows it to you, like, "this is how declarative programming looks like" or something similar. Yeah, because showing main = putStrLn "Hello, World!"
is not cool and it doesn't really show you how declarative programming works either.
Yeah! That's the background as to why we are not teaching the student how to write "hello" and such. Instead, we teach them how to think first. Since it is like this, do you think we can actually build an autograder that is similar to what's available on the internet? Of course not! Let's take a look at one of the earlier problems in this course.
% male(X) denotes that X is a male.
male(benjamin). male(david). male(edward). male(george). male(james). male(lucas). male(mike). male(oscar). male(peter). male(raymond). male(umberto).
% female(X) denotes that X is female.
female(anya). female(clara). female(fiona). female(hannah). female(ira). female(kiana). female(nancy). female(quincy). female(sarah). female(tina). female(victoria).
% parent(X,Y) denotes that X is one of Y's parent.
parent(anya,clara). parent(anya,edward). parent(anya,fiona). parent(benjamin,clara). parent(benjamin,edward). parent(benjamin,fiona). parent(clara,hannah). parent(clara,ira). parent(clara,lucas). parent(david,hannah). parent(david,ira). parent(david,lucas). parent(fiona,mike). parent(fiona,nancy). parent(george,mike). parent(george,nancy). parent(ira,peter). parent(ira,quincy). parent(james,peter). parent(james,quincy). parent(kiana,raymond). parent(kiana,sarah). parent(kiana,tina). parent(lucas,raymond). parent(lucas,sarah). parent(lucas,tina). parent(nancy,umberto). parent(nancy,victoria). parent(oscar,umberto). parent(oscar,victoria).
Given the knowledge base above. Assuming that any couple who have a common child is married, construct the predicate married(X, Y)
which means that X
and Y
are married couples. Here is what the test looks like.
Input: married(anya, benjamin).
Output: true
Input: married(benjamin, anya).
Output: true
Input: married(anya, anya).
Output: false
Input: married(benjamin, benjamin).
Output: false
Input: married(anya, edward).
Output: false
Input: married(benjamin, clara).
Output: false
Input: married(kiana, david).
Output: false
Input: married(clara, david).
Output: true
Input: married(david, clara).
Output: true
Input: married(edward, fiona).
Output: false
Okay, now, we need to think about how to grade the student's solution by calling the predicate directly. There are no input and output operations in the code. Well, fortunately, the predicate must be the exact same which is married
. If they by any chance write something else, then they might get 0 as their scores. So… how? First, before we think further, we need to know how SWI-Prolog looks like.
SWI-Prolog, unlike GNU Prolog, is not compiled. This is definitely an advantage for us since we have this kind of issue. If you have installed SWI-Prolog on your PC, then you may try to run the swipl
command. As you may see from Picture 1, it is an interpreter. By running swipl -f <file name>.pl
, we can load a Prolog code into it. Then, we may try to test the student's solution based on our test cases just like how we did in Picture 2. Well, for simplicity, the test cases are exactly the same as the problem described above.
The thing shown in Picture 2 is the easiest way to test the solution. We can do it manually. But! It is time-consuming and hella tiring. Now, how can we automate this thing? What we need to do is to understand the capability of swipl
itself. Therefore, we just need to run the swipl -h
command! The list of the options is shown in Picture 3. The most interesting options are -t
and -g
. Since I was curious, I tried to run several commands which are shown in Picture 4.
The first try, which is try1
, shows that I can write a Prolog code directly there and it is executed just fine. However, I'm stuck in the interpreter after the code is executed. Notice that ^D
is actually CTRL+D so I can quit from the interpreter. On the second try, I'm also able to run the code just fine but there is a welcome message shown up. Then, I combine the two of them in a single command. Finally, we can see that the third one is exactly what we need. It doesn't show the welcome message and it also closes the interpreter. This is just perfect. Okay, now, let's try to test the student's solution by using this method.
In Picture 5, I first try to run swipl -f ehardi.pl -g 'married(anya,benjamin).' -t 'halt.' > output
command. I save the output into the output
file so we can see clearly what is the actual output we got here. Well, we got nothing. Then, I try to run swipl -f ehardi.pl -g 'married(anya,benjamin) -> writeln(true); writeln(false).' -t 'halt.' > output2
command. We finally got true inside the output2
file. Yes, luckily, a message such as a warning message is ignored. Ah, also, if you are aware that I was using -t 'halt.'
is because I see that from Picture 4 this is how you close the interpreter. By using the halt
predicate, that's it. From this point, the last thing we need to do is to compare the output with the expected output.
Since we know how to actually use swipl
for grading, we may now build the autograder. However, we need to prove that this is actually the way. Hence, I decided to build a CLI program named Grode first before actually developing the web application. You may take a look and then we can skip the detail here. Yes, it's time to think about the web application!
Let me ask you. If you want to build an autograder, what is the basic feature you need to develop? Just think that we are about to build an MVP (Minimum Viable Product). Keep it as simple as possible. Well, there are users so we need to let them log in before actually doing anything. Then, there is should be a page for assignments. An assignment contains several problems. The students may upload their solution which is a Prolog code. Lastly, we need the ability to grade their solution and let them know the scores. As an addition, we may add scoreboards so they might be more competitive. Deal! These are the minimum features we need to develop first. Then, we may ask for feedback from them later.
The next step is to understand how many funds we have. This project is funded by my lecturer since this is a part of our research. Well, as you may know, that the fund is limited. Therefore, my first concern is to make the expense as cheap as possible. For your information, in this class, the number of registered students is 22. However, the students who are actually attending the class are 21. That's mean our autograder will have about 23 concurrent users at most, including me and my lecturer. Alright, we can see that the number of users is not that big so we may rent a cheap server. Anyway, let's continue designing the autograder.
I divide the whole autograder into four parts. First, the backend. Second, the frontend. Third, the grader. Lastly, the storage.
The backend simply provides CRUD operations with REST API as its interface. I wrote this in JavaScript (Node.js) with Koa as the web framework. Why JavaScript? It is simply because I have written something like boilerplate before and I don't have much time so I use this instead of writing it all from scratch. That boilerplate also looks like my own framework which is a good way for me to practice as a software architect. My dream is to be one after all. I can't disclose the code right now because that boilerplate I made is for some projects at Proclub Studio, a place that I work before.
Next, the frontend. This is the main part that will be interacting with the users directly. I build this by using Vue.js. I actually want to use Sapper, a Svelte web framework. However, I do think that this might be the best chance for me to learn something new. Yeah, I already learn and deployed a production level of Sapper web application before when I'm a Software Engineer Intern at Tokopedia.
Next, the grader. This is actually the backend but instead of all endpoints available, we only create one endpoint for triggering a syscall for running the code and then grade the solution. Yes, this is written in JavaScript. Why? Because I need to deliver it as soon as possible. JavaScript sucks for this task and please don't ask me why. I should've (at least) use Golang instead.
Lastly, the storage. I'm using PostgreSQL as the database, MinIO as the file storage, and Redis as the cache. There is no argument as to why I prefer PostgreSQL other than anything else. It is just that I currently feel more comfortable with PostgreSQL. As for the file storage, at first, I want to use Amazon S3. However, I remember that my peer from the previous job introduced me to MinIO and it is quite cool so, in the end, I choose to use MinIO. Well, actually I find out that setting up MinIO using Docker is really easy. Ah, as for the cache, I think Redis is the most suitable one because it has a lot more features than Memcached. I might be want to use Redis Queue or Redis Sentinel in the future. So, yeah Redis is the one I need here.
We have already breakdown the parts of the autograder. Now, let's think about where do we want to deploy these parts. I have mentioned this before, we need to make this as cheap as possible. Finally, I made the decision. I will deploy the backend part with the Heroku hobby plan which is free. Then, I will deploy the frontend part with the Netlify starter plan which is free. Last, I will deploy the grader and the storage part with a virtual private server (cloud compute) from Vultr. I choose the $5 per month plan because I need IPv4. The cheapest plan is $2.5 per month and it doesn't include IPv4. Unfortunately, the $3.5 per month plan is not there when I've deployed this part.
Based on this plan, without a need to buy a new domain, we just need to pay $5 per month! Why do we need a cloud compute or VPS in the first place? It is because the grader needs to perform a syscall! It needs to download the student's code, store it in a temporary directory, run the swipl command and retrieve the output, then compares the output with our expected output that we have set before. Since we need a VPS anyway then I decided to install PostgreSQL, MinIO, and Redis on this server.
Yes, it's time to see how it actually looks!
In Picture 6, you can see that we are using Google as the only way to log in. There is a good reason for it. Here at Telkom University, we have our personal email addresses. We are using Google for Education and it is connected to our SSO service. Why do we need to type a handle and a password again if we already have an SSO service? It is tiring and the students may forget the password over and over. So, I decided to use Google OAuth. Only enrolled students may log in. If your email address is not in the database, that's mean you are not enrolled hence you can't use Teary.
After the students logged in, they will be redirected to the assignment page which contains a list of available assignments as seen in Picture 7. There are upcoming assignments, ongoing assignments, and past assignments. We can only see past assignments since when I write this story the term is already over. Also, please don't comment on the UI/UX here because I don't have much time for doing UX research and actually designing the UI.
Let's say we want to work on Midterm Remedial which is not shown in Picture 7. After you click on the View action, you will be redirected to the assignment detail with a list of problems in it. The problem in the list consists of the problem title, your scores over the problem full scores, and the time limit. In Picture 8, you also can see that there is a View Scoreboard
action. However, we will see the scoreboard later.
Now, let's try to see the problem detail of the "A Join Venture" problem. It can be seen in Picture 9. I show the problem detail as a PDF to make it simple. We don't need anything other than the <object>
tag and the URL for the PDF. Keep it as simple as possible first. On the right side, we can submit the solution there and see the submission history. Let's try to submit one!
We can see in Picture 10 that we got a full score of 20. There are about five test cases and we got an AC (accepted verdict) from all of them. However, please note that each test case might have different scores. Like, the first test case may contain 5pts but the second test case may contain 4pts. Anyway, the sum of all test cases is 20pts. The process here is quite simple. First, we request an upload URL from MinIO via our backend. Then, the frontend uploads the file by using that URL. If the upload succeeds, then the frontend will create a new submission entry by hitting a certain endpoint. Afterward, the grader will be triggered and try to download the uploaded file then run the grading mechanism. The scores are now calculated. The grader then updates the submission entry by setting the scores and the grading logs there. Since we are not implementing WebSocket yet, the frontend asks the backend whether the grading is finished or not in the constant interval of several seconds. Finally, we got the submission result and the submission history is updated as well.
Well, shall we check out the scoreboard page now? The scoreboard page is also quite simple. It can be seen in Picture 11. It has shown the rank, the student ID which is NIM, in this case, the list of the problem including its scores, and then the total scores. I coded the problem title as an alphabet to make things simple. However, the student may see the problem title if they put their cursor there. The order here is based on the total scores (descending) first, then the total time they need to finish it all (ascending).
That's it! Ah, if you are curious as to how I write the test case, please visit the Grode repository and take a look in the inputs directory. Now, it's time to gather feedback from the users! There are at least 16 out of 21 students who give their feedback about Teary.
Based on the feedback, 6.25% of them have no comment, 6.25% need a new feature to grade an interactive problem, 25% of them said that it is okay already, 25% of them need the grader to show them the detail of the grading issue such as when they got compile-error verdict (CE), and then 37.5% of them want the UI/UX to be improved.
Yes, I'm aware that the UI/UX is bad. However, don't worry since I'm started working on version 2 of Teary. It will be a complete rewrite and the code can be opened to the public. There will be a huge improvement for the UI/UX too. I don't know when it will be finished since I need to take care of my undergraduate thesis first.
Anyway, this is my story as a teaching assistant. I'm quite happy since I could make the class much more fun. You know what? When the final test is up, we have a commentary based on the scoreboard. It is fun but might be intimidating for the others since they can see their scores and compare them with the first rank on the scoreboard. For your information, the final test was conducted online since COVID-19 hit Indonesia. I'm glad that I developed this autograder since they definitely need this. I'm also glad that there is no issue when we did the final test back then. It also lessens my burden to score their homework, like, manually.
See ya next time~