Artificial intelligence (AI) is a computer system trained to perceive its environment, make decisions, and take action. AI systems rely on learning algorithms, such as machine learning and deep learning, along with large sets of sensor data with well-defined representations of objective truth.
Machine learning: We use machine learning as shorthand for “traditional machine learning”—the workflow in which you manually select features and then train the model. When we refer to machine learning we exclude deep learning. Common machine learning techniques include decision trees, support vector machines, and ensemble methods.
Deep learning: A subset of machine learning modeled loosely on the neural pathways of the human brain. Deep refers to the multiple layers between the input and output layers. In deep learning, the algorithm automatically learns what features are useful. Common deep learning techniques include convolutional neural networks (CNNs), recurrent neural networks (such as long short-term memory, or LSTM), and deep Q networks.
Ensemble machine learning make use of multiple algorithms simultaneously to make a prediction that often yields a lower error and is less prone to overfitting as compared to workflows that do not leverage the ensemble approach. The philosophy behind ensemble learning is to build a prediction model that combines the strengths of a collection of simpler base models (Hastie et al., 2009).
We divide ensemble machine learning into two major categories: boostrap aggregating and boosting. Boostrap aggregating, also known as bagging, uses multiple models of the same learning algorithm to train subsets of the dataset randomly picked from the training dataset. This contrasts with boosting, where the trees are grown in an adaptive way to remove bias, and hence are not identically distributed.
Boosting is a two-step approach using subsets of the original data to produce a series of average-performing models and then “boosts” their performance by combining them together using a cost function. Unlike bagging, the subset creation procedure is not random and depends upon the performance of the previous models: every new subset contains the elements misclassified by previous models.