Entering Kaggle Competitions with Google Predict

By Chris Clark, 01/17/2013, in Code & tutorials

BigML had a great series of posts over the summer pitting some prediction-as-a-service products against each other. One of those was the Google Predict API. I thought it might be fun to enter a Kaggle competition using the API and see how it did against some of the world's top data scientists.

It turns out this was a terrible, terrible waste of time.

If you are expecting a rigorous analysis of the Google Predict API, this post will be a disappointment. In fact, I'll go so far as to ruin the surprise right now: on the Biological Response competition, the Predict API turned in a 0.67245 on the private leaderboard, just edging out the optimized constant value benchmark (in a nutshell - it did badly). It fared a bit better on the Titanic competition, scoring a somewhat-reasonable 0.79426 (tied with 55 other users for 112th place as of this writing).

So instead of focusing on the actual performance of the algorithm, I will instead share some tips and tricks for using the Google Predict API:

Chris' Tips for Using the Google Predict API:

My terrible, horrible code that does this is on github. It's not a particularly usable form and does some...questionable things because I didn't learn some of these lessons until it was too late. But if you are absolutely determined to use the Google Predict API (god help you), it might get you started.

Also, kudos to the BigML engineers who are either way smarter than me and got this working right away, or have a remarkable amount of self-restraint and did not complain one jot.

