2010年12月11日星期六

Statistics / Machine Learning

Recently I subscribed a blog owned by Brendan O'Connor, a student of CMU's Eric.P.Xing.
Quite many suggestive articles came to me with detailed and subjective discussions. I felt common sense when it occurs the relationship between machine learning and statistics. In fact I'm having a class named Elements of Statistical Learning given by my supervisor and benefits a lot.
As what was listed in the comparison, O'Connor declare ML guys are much more(unfairly) lucky. However, aren't statisticians more welcomed by industries except IT?
Anyway, statistics is far beyond only distribution, mean, variance and toy probability.

Another recent issues is tools for research. Choices listed below :
Python : agile and save-time coding, a real programming language s.t. easy to combine several tasks, hadoop interface; immature libraries (possibly make codes do now work)
Matlab : most popular in ML(easy to find reference codes and interfaces) and vision(important for me), wonderful debugging; weak in presenting logic, slow execution(seems tricky to speedup)
Mathematica/Maple : brilliant symbolic derivation
R : most popular in statistics, better plotting(visualization); steep learning curve, syntax with unfamiliar philosophy(so might often forget some basic usages), poor GUI
C : hard and annoying to prototype one's idea
Java : full of redundancy

O'Connor gives positive comment on all languages. Comments are funny and well designed.

I'd think my choice will be Matlab for numerical and vision related computing while Maple for formula derivation : MM combination ! :-)