For many reporters, policy makers and observers, data mining is a black box. The data goes into the system, magic happens in the box, and at the other end out pops a correlation. In some seeming inexplicable way, the computer analysis tells us that people who purchase beer also buy diapers or that Mr. Jones is associated with Mr. Smith, a known casino cheater.
It isn’t really magic, of course. There is a hard science behind data mining. The science involves complex analysis of probabilities, the identification of social networks and uses a host of mathematical techniques. It is possible, of course, to report on data mining without knowing anything about how it works — but a far more insightful and nuanced set of stories is available to someone who knows some of the details.
In this video, Professor Robert Grossman of the University of Chicago explains some of the science behind data mining.
Video by Susanna Pak
Professor Grossman answers the following questions during the course of this interview. You can watch the entire video or fast-forward to the issues of interest:
00:15 What is data mining?
00:24 Examples of data mining
03:25 How does data mining work?
06:26 Alternatives to data mining
10:47 Origins of data mining.
14:43 Who uses data mining?
16:58 Any additional information?
Professor Grossman’s Datamining FAQs (to which he refers at the end of the video) are here.