Analyzing TurkSort

27 Feb 2014

There is a sorting algorithm I’ve always wanted to implement, but never had the patience to deal with all the fiddly bits of: TurkSort. The premise of TurkSort is that some lists are hard to sort, where hard is a highly-technical word meaning “no good comparison function is known for two elements”. TurkSort solves this problem by outsourcing the work to Amazon Mechanical Turk.

First, what makes a list hard to sort? Numbers are not hard to sort; the simple function > or any of its ilk will efficiently compare two numbers. Other kinds of input can be murkier, but still tractable. Consider colors: while there is no obvious sorting algorithm for colors, it is easy to come up with one for any given color space. In RGB space, we could treat each color as a 6-digit hexidecimal number 0xrrggbb and then sort numerically. Likewise for Y’UV or HSL we can use a simple tuple comparison of the color components (otherwise known as lexicographical order).

All of these easy sorts have had a small number of dimensions - three at most. Highly-dimensional data, however, poses a challenge. Consider a photograph represented as a BMP file - at the very least we have an M by N grid of pixels and each pixel has an R, G, and B component. A naïve algorithm would treat each image as an M * N long string of RGB values and sort lexicographically from there. A potentially better one would first sort all images by size (M * N) and then by average R in the image, then G, then B. However, it is easy to imagine the comparison metric depends on the data itself.

Consider a set of images in varying sizes, all of the sky. A human may prefer them sorted by cloud cover (which can be approximated by looking at the number of “non-blue” pixels). If all images have roughly equivalent cloud cover, perhaps by time of day. Or by size. The possibilities are endless, and could drive one insane.

TurkSort is one such solution to this problem. Using Mechanical Turk, one can let real humans decide on an appropriate metric. Instead of writing a comparison function, one would create a unit of work that displays the two elements and asks which is “less than” the other. Again, in the naïve case, this is not a tractable problem - using mergesort requires O(n log n) comparisons to be made. If each comparison requires a real humans input, and the elements to compare are based on the result of the previous comparison, the time required to sort a list quickly outpaces a reasonable amount of time to wait.

One solution is to ask a single human to simply sort the list themselves. Browsing Mechanical Turk, the going rate seems to be around $0.01/minute for smaller tasks and $1/hour for larger tasks. However, a suitably large set may be infeasible for a single worker to sort - the work is, frankly, boring. I would gladly sort 10 items for a small fee; 100 is murkier and 1000 is right out.

For a larger set, we can solve the latency problem in a few ways. The most obvious is to create a unit of work for each pair of elements to sort. Assume we offered the minimum for these - $0.01/pair - how much would it cost? This is simply a combinatorics problem - a binomial coefficient to be precise. For N elements, we have N choose 2 pairs of items. This is roughly O(n^2) comparisons. For a 500 element list, we have 124,750 comparisons to make - or $1,247.50 to spend. Unless one works for a well-funded research department, this comes nowhere close to feasible.

One way to tackle this is to bundle many pairs into a single task. One would need to find the largest number of comparisons you can get a person to do for a penny, and then make each task have that many pairs to sort. However, the number of pairs grows quadratically and the number of comparisons a person is willing to do is a constant. In Big-O speak, that is insignificant to the overall cost.

Another solution is to not consider every pair. Assuming humans implement a strict weak ordering (a problematic assumption, but greatly simplifying), we know A < B and B < C implies A < C. This is, in fact, how efficient sorting algorithms work. The trick, then, is to pick a sorting algorithm with behaviour s.t. all comparisons we need to make are known ahead of time. As we saw before, mergesort is not one such algorithm, nor are any of the O(n log n) algorithms. Only O(n^2) algorithms satisfy this constraint, and this puts us in a bit of a pickle. To make TurkSort efficient, the quadratic growth must be solved. It is the difference between $1,247.50 and $44.82. That is around 500 days of Top Ramen diet (assuming 400 calories per $0.50 package) - or roughly 3 failed startups.

Next week, we’ll look at ways to tackle the issue by relaxing our sorting to be approximate rather than exact - which plays well with the fallibity of humans.

hazzens scribe

links

Analyzing TurkSort