How to detect if a crowd worker has returned an assignment of your HIT on MTurk

When using your own web app to deploy HITs via MTurk API (ExternalQuestion), it’s important sometimes to detect the status of HIT assignments — whether it’s being worked on or returned or abandoned. This might influence what data should be rendered in your task interface, or what tasks to release the next, or other specific requirements.

However, MTurk doesn’t provide such information in the API. Yet, with a good understanding of how MTurk assign unique IDs and some javascript tricks,

Method 1: Check Assignment ID

MTurk usually assign the returned Assignment ID to the next worker taking the HIT. This is very useful information because some requesters might use assignment id as primary key in their tables. Bad idea.

By checking if the assignment ID has been recorded in your system, you can tell the previous task with this assignment ID was returned/abandoned by previous worker.

However, this is not enough. When the assignment ID is not reused, we need to help ourselves.

Method 2: sending AJAX POST request repeatedly

You can send an AJAX post every few minutes to your server, to make sure the worker is still there. When all your tasks are recorded taken in your database, but there are still new worker taking a new assignment, that means some previous workers returned their tasks but you don’t know yet. By check the last time you get an AJAX post for a task, you will know which one is no longer active and mark it as returned/abandoned.

Following is my example implementation

var mytimeout = 180000;
function working(){
 $.ajaxSetup({
 beforeSend: function(xhr, settings) {
 if (!csrfSafeMethod(settings.type) && !this.crossDomain) {
 xhr.setRequestHeader("X-CSRFToken", csrftoken);
 }
 }
 });
 $.ajax({
 url : "/pipeline/worker/save", // the endpoint
 type : "POST", // http method
 async: true,
 data : {stillhere: true, assignmentId: $("input[name=assignmentId]").val(), hitId: $("input[name=hitId]").val(), workerId: $("input[name=workerId]").val()}, // data sent with the post request
 // handle a successful response
 success : function(json) {
 },

// handle a non-successful response
 error : function(xhr,errmsg,err) {
 // alert(errmsg); // add the error to the dom
 console.log(xhr.status + ": " + xhr.responseText); // provide a bit more info about the error to the console
 },
 complete: function (data) {
 // Schedule the next
 setTimeout(working, mytimeout);
 }
 }); 
 }
 setTimeout(working, mytimeout);

Usually these two methods together should take care of all the returned HIT. If you are still worried, you can always use MTurk API to retrieve submitted assignment IDs and compare with your database records.

Change Primary Key in PostgreSQL Tables via Django API (used as Foreign Key by other tables)

MTurk assignment ID is not always unique. When a worker return a HIT, the next worker taking it will have the same assignment ID.

Such being the case, when designing data structures for our own system to deploy tasks on MTurk iframe, it’s important not to use assignment ID as primary key for any table. However, if you already did this, here’s how I fixed it in Django (using PostgreSQL):

  1. You have an old table (defined in model.py), where you specify the assignment ID column to be primary key.
    Create new model object with exactly the same columns, except that: the model name is different (of course!) and the primary_key=True of your assignment ID column is removed

    Example
    	class Task(models.Model):
    	# other fields
    	assignment_id = models.CharField(max_length=30, primary_key = True)
    	class NewTable(models.Model):
    	# other fields
    	assignment_id = models.CharField(max_length=30)
    
    	class Result(models.Model):
    	# other fields
    	task = models.ForeignKey(Task)
    	# add the following line to all tables that refers your task object as foreign key
    	newtable = models.ForeignKey(NewTable, null=True)
  2. run in command line:
     python manage.py makemigrations
    
    	 python manage.py migrate
  3. Now you will have a empty new table with the correct schema constraints, the next thing to do is to copy existing contents to the new table.
      • You can choose to put your scripts in [yourappname]/management/commans/ folder, then run python manage.py [your filename without .py] to execute your code
      • Or you can make your changes directly in command line. Following is an example
    python manage.py shell
    from [yourappanme].models import *
    		for t in Task.objects.all():
    		 NewTable.objects.create(assignment_id=t.assignment_id, ...[other fields])
    		for t in Result.objects.all():
    		 nt = NewTable.objects.get(assignment_id=t.taskmeta.assignment_id)
    		 t.newtable = nt
    		 t.save()
  4. Now the new table is also a foreign key of those tables, you can remove the old task objects in your models.py, and migrate the changes
  5. Change the model name and field names (as foreign key) of NewTable in your models.py, then migrate the changes
  6. Done!