My main field is artificial intelligence. In June and July 2014 I had a good opportunity to compare myself with other people in the field and see whether the models and methods we were working on could stand up to strong international competition.

About the Competition

The Multi-label classification of printed media articles to topics competition ran on kaggle.com from 2 June to 15 July. A colleague pointed me to it shortly after it started, on Friday 6 June. My first thought was that it looked interesting and would take a lot of time. In the end, both parts were true.

The task was to assign topics to 35,000 Greek newspaper articles as accurately as possible. More precisely, the competition asked for a computer model that would automatically identify topics from the input representation of Greek newspaper articles obtained by scanning and OCR. I did not even have to learn Greek, because each article had already been preprocessed into a “pile of numbers”. For example:

58,152 833:0.032582 1123:0.003157 1629:0.038548 ...

This means that the article has two topics, numbered 58 and 152, and contains words represented by the numbers 833, 1123, 1629, and so on.

There were 65,000 prepared articles available for training. The goal was to prepare a model that would assign topics to the remaining 35,000 articles.

The evaluation was straightforward. Each competitor predicted topics for the remaining articles, saved the result to a file, and uploaded it to kaggle.com. The server compared the predicted topics with topics assigned by humans and computed a score between 0 and 1. The higher the number, the better the model.

Each day it was possible to upload two submissions and compare progress with the others. The catch was that the visible score was based on only about 25,000 articles. The final ranking was computed after the competition ended, using the remaining 10,000 articles.

The first prize was 680 dollars, but as it turned out, the money was beside the point. The whole competition was a solid dose of fun and comparison.

My Competition

Because I defended my dissertation in June and had used similar methods there, I decided to try them in this competition. Over the first weekend I computed a simple zero-effort model and uploaded it to Kaggle on Monday. To my pleasant surprise, I immediately appeared in fifth place, and that settled it. The top spots were still out of reach, but the gap was small enough to be worth trying.

I then improved the original model day by day. A month turned out to be just enough time to develop a new approach: enough to do a lot of work, but short enough to keep me from exploring every side street on the way to the final solution.

The progress looked roughly like this:

9 June  - 0.70321
11 June - 0.73554
16 June - 0.75425
20 June - 0.77176
24 June - 0.77490
30 June - 0.78003 (the first submission above 78%)
3 July  - 0.78444
6 July  - 0.79023 (again the first above 79%)
10 July - 0.79307
15 July - 0.79558 (final public score)

The story was more dramatic for me because I had two one-week holidays planned during the competition, one in Austria and one in Croatia. During the holidays, computers in Czech server rooms ran the experiments. I processed the results remotely and uploaded them to Kaggle from a tablet. Many thanks go to MetaCentrum, which provides this infrastructure for Czech academia free of charge.

During the last two weeks, the race stabilised around the first place: me, the KazAnova&Rafa team from Greece, and Stanislav Semenov from Russia.

The Finish

When the peloton of competitors entered the stadium, I was in Croatia. My final experiments were scheduled in MetaCentrum, and as they finished I uploaded the results to Kaggle. Even though I submitted results on the final day and was still first, I knew that randomness was stronger than all of us researchers combined. The visible results were different from the results used for the final ranking.

I decided to wait until two in the morning, because the competition ended at midnight UTC. Shortly before two, Alexander D’yakonov moved into second place and the anttip team appeared in fifth. That was the public leaderboard on 25,000 articles. At midnight GMT the system recomputed the results on the remaining 10,000 articles. Alexander appeared first, anttip second, and I landed in third place.

What Came Out of It

The reshuffling after switching to the 10,000 unseen articles was interesting. The anttip team attributed it to the leading models being tuned too much to the public 25,000 articles. I think the reason was elsewhere, because my model had scores 0.79558 and 0.79463, so it behaved almost the same on both sets. Anttip had 0.79123 and 0.79482, which is a much larger difference between the two datasets.

After the competition ended, I learned from the forum that an earlier Kaggle competition in 2014, Large Scale Hierarchical Text Classification, had been won by anttip, with Alexander D’yakonov in second place. In that light, my third place out of 121 teams was a strong result, especially because I did not build on an existing text-processing model and started from scratch.

Besides the competition itself, I also ended up with a strong model with a relatively simple structure. It was useful mainly for research projects we were working on at KKY UWB. At the time of writing, I already had experiments running on similar Czech data and the results looked promising.

At the beginning of October, the WISE 2014 conference was planned in Thessaloniki, Greece. I planned to attend, and a joint paper by the first four teams was being prepared to describe and compare the successful solutions.

Final Notes

  • I programmed the whole solution in Python, using the sklearn machine learning library and scipy/numpy.
  • The code had 1,665 lines in total.
  • While running experiments, I used about 2 CPU years of computation, roughly what an ordinary desktop computer would compute in two years.
  • Over the whole competition I created 34 different model versions and uploaded 52 different submissions to Kaggle.