Wednesday, January 16, 2013

A Detour: Data Mining, Machine Learning and Artificial Intelligence


Data mining, machine learning and artificial intelligence - these seem to be some of the "big data space" buzz from people, companies, institutions, research analyst groupsmedia and others. However I feel these terms are misnomers and would like to take an opportunity to put forth my thoughts on them.

Lets take the term "mining" as an example. Traditionally, mining is associated with the process or activity of extracting minerals from the earth. Mineral is the object of interest that is mixed with dirt, mud, sand, clay and other throw-away material excavated from mines. So for example, when one refers to iron-ore mining, gold mining, etc. - it indicates the extraction of the "object of interest" (e.g. gold) and not the "throw-away object" (e.g. mud). Hence gold mining is the term for extracting gold out of the mixture of gold and mud. In the same sense, when we mine data for information and insight, it should be referred to as information mining or data crawling or something more appropriate, but not data mining.

Now lets look at machine learning and artificial intelligence (AI). AI was in vogue in the 80s when there was FUD (Fear Uncertainity and Death) being spread in the U.S. about AI and robotics advances in Japan. See Appendix B of this book. There was so much hype about AI and robotics that well-respected people and media were forecasting everyday use/appearances of robotics at home and workplace. I don't see that happening even when the 32-bit epoc timestamp doomsday dawns upon us! From my perspective, we teach children to walk, talk, read and write by using a variety of things like repeated actions, coercion  reward, enticement, "positive re-enforcement" etc. to impart that learning. Children in turn apply that learning in a variety of ways. For example, we just teach children to read and write, but we do not read every book for them - they use their learning and apply that to read a book (or any book). Same goes for say, walking. Once a child learns to walk, he/she not only goes for treks, but also adopts to walks in space and/or under-water! Once the basic skill is learnt, a child uses his or her own "feedback" mechanism to adjust, adopt and improve and enhance that learning. However when we use statistical/quantitative and other techniques for supervised and/or unsupervised "machine learning", we are just using computers and software to do some mundane, repetitive task at high speed. The computers/software themselves don't learn anything (atleast that's what I believe!). It is the human developer that is "learning" and acting as the "intelligence", "controller" and the "feedback-loop" to adjust the computer programs/software. Of course  developers often "automate" the feedback loop by additional programming of rules and "fuzzy logic".

Finally, referring to software or computer systems as "Artificial Intelligence" is taking away credit from the people who designed, developed and created it. The software system is the embodiment of the collective knowledge, intelligence and smarts of the people who created it and others (e.g the developers/designers in turn may have used software modules developed by others).

All the same, I do not intend to "swim against the tide" and will (unwillingly) adopt these misnomers and use them - this was just my digression to voice a disagreement with those terms.

No comments:

Post a Comment