Recently, IEEE Spectrum interviewed Michael Jordan - a leading researcher in machine learning. He gave his view on hype in machine learning as well as in big data analysis and presented his point of view related to some other interesting issues (technological singularity, P=NP, Turing test).
Here are more interesting points related to machine learning and big data that were made by the researcher:
- The biological interpretations seem to be overused in the field of machine learning. Case in the point: activation function used in the neural network perceptron model is the same function as the one used in the statistical method called logistic regression that dates back to 1950s. The former method has nothing to do with neurons.
- Due to huge amount of data analyzed, it is very easy to find spurious dependencies in big-data projects. People active in the field are not paying enough attention to this problem. There are some statistical methods to deal with these problems, like familywise error statistical tests, but many of them haven't been studied computationally and it will take decades to get them right.
- Data analysis can deliver inferred data at a certain level of quality and we need to be explicit about it. We need to add error bars to the inferred data that we show. This is approach is missing in much of the current machine learning literature.
- Because big data analyses often do not present information about the quality of produced prediction, an in more general terms, the analyses are often not methodologically sound, this might result in "big-data winter." This will be a general state of disappointment and lack of funding related to big data after its hype bubble bursts.