Artificial Intelligence

Artificial intelligence (AI) is a computer system trained to perceive its environment, make decisions, and take action. AI systems rely on learning algorithms, such as machine learning and deep learning, along with large sets of sensor data with well-defined representations of objective truth.

Machine learning: We use machine learning as shorthand for “traditional machine learning”—the workflow in which you manually select features and then train the model. When we refer to machine learning we exclude deep learning. Common machine learning techniques include decision trees, support vector machines, and ensemble methods.

Deep learning: A subset of machine learning modeled loosely on the neural pathways of the human brain. Deep refers to the multiple layers between the input and output layers. In deep learning, the algorithm automatically learns what features are useful. Common deep learning techniques include convolutional neural networks (CNNs), recurrent neural networks (such as long short-term memory, or LSTM), and deep Q networks.

Ensemble Machine Learning

Ensemble machine learning make use of multiple algorithms simultaneously to make a prediction that often yields a lower error and is less prone to overfitting as compared to workflows that do not leverage the ensemble approach. The philosophy behind ensemble learning is to build a prediction model that combines the strengths of a collection of simpler base models (Hastie et al., 2009).

We divide ensemble machine learning into two major categories: boostrap aggregating and boosting. Boostrap aggregating, also known as bagging, uses multiple models of the same learning algorithm to train subsets of the dataset randomly picked from the training dataset. This contrasts with boosting, where the trees are grown in an adaptive way to remove bias, and hence are not identically distributed.

Boosting is a two-step approach using subsets of the original data to produce a series of average-performing models and then “boosts” their performance by combining them together using a cost function. Unlike bagging, the subset creation procedure is not random and depends upon the performance of the previous models: every new subset contains the elements misclassified by previous models.

Figure 1: Ensemble of decision trees.

Our bagging algorithm of choice uses decision tree-based classifiers (Breiman, 2001). The method was first conceived as a technique that combines multiple classification and regression trees (Breiman et al., 1984) using boostrap aggregating (Breiman, 1996). Part of the core design used in this technique is also presented in the early work of Kwok & Carter (1988) on ensembles of decision trees (Figure 1). They use a meta estimator that fits decision tree classifiers on various sub-samples of a dataset and uses averaging to improve the prediction and control over-fitting. The idea of the method is to average many noisy but approximately unbiased models using trees, hence reducing the variance. Trees are ideal candidates for bagging, since they can capture complex structures in data, and if grown sufficiently deep, have a relatively low bias. Although trees may be notoriously noisy, they benefit greatly from the power of averaging. Moreover, the expectation of an average of a specified number of given trees is the same as the expectation of any one of them. This means the bias of bagged trees is the same as that of the individual trees, and the only hope for improvement is through the reduction of variance.