Topics in Machine Learning
Course Description: The course will cover standard concepts in machine learning: PAC learning, boosting, decision trees and rule learning, lazy learning, neural networks, support vector machines, Bayes networks, latent semantic analysis, PCA and ICA, visualization methods, parzen window estimation. The general concepts will shortly be introduced. Then original articles on the topics will be considered.
In this class, we learned the basics of Machine Learning and then every student presented a research paper on a current ML topic. I presented a paper on the Set Covering Machine with Stan James that consisted of an overview of the algorithm, along with a few comprehensible examples, and a performance comparison between SCMs and SVMs. I then implemented the algorithm in Python.
Paper Abstract: We extend the classical algorithms of Valiant and Haussler for learning compact conjunctions and disjunctions of Boolean attributes to allow features that are constructed from the data and to allow a trade-off between accuracy and complexity. The result is a generalpurpose learning machine, suitable for practical learning tasks, that we call the set covering machine. We present a version of the set covering machine that uses data-dependent balls for its set of features and compare its performance with the support vector machine. By extending a technique pioneered by Littlestone and Warmuth, we bound its generalization error as a function of the amount of data compression it achieves during training. In experiments with real-world learning tasks, the bound is shown to be extremely tight and to provide an effective guide for model selection.
- The Original Paper
- Our Presentation - Minus cool animations
- My Implementation of the Algorithm