Machine Learning (ML)
Machine Learning (ML) is an approach to artificial intelligence that combines statistics and data science to develop and applying algorithms that improve their output through experience without being explicitly programmed to do so; in other words, algorithms that can "learn" to detect patterns, make decisions, and predict outcomes.
Machine Learning Terminology
|Data sampling||Systematic creation of smaller representative samples of larger data sets|
|Feature||A variable with high relevancy to the outcome variable|
|Feature selection||Automatic detection of variables most relevant to the outcome variable|
|Imputation||Correction of corrupt and missing values through inference|
|Integer encoding||Assignment of an integer value to a categorical value, e.g. values "red", "green", and "blue" could be assigned integer values of 1, 2, and 3 respectively|
|One-hot encoding||Assignment of a bit-mapped binary value to a set of categorical values, e.g. a "color" category with potential values of "red", "green", and "blue" could be mapped to three bits of 100, 010, and 001, respectively|
|Outcome variable||The value to be predicted by a Machine Learning Model|
|Outlier||A observation significantly different from other observations of the same data|
The Machine Learning Process
Machine learning model evaluation
Typically, when a machine learning model is trained, some portion of the training data is withheld for use in model evaluation. The model is then used to predict the withheld data. The predictions are then compared to the actual values to derive an accuracy rate, which represents the overall accuracy of the model, and an error rate which represents the number of "bad" predictions made by the model.
Accuracy and error rates are useful; however, they treat all misclassifications as being equally bad. A confusion matrix plots the misclassifications to provide more detail on model accuracy.
For example, we may have a classification model that predicts whether a user will "like" or "dislike" a post on social media in which the model accurately predicts the user's input 60% of the time. The model therefore has a 60% accuracy rate and a 40% error rate. The confusion matrix for this model might look something like the following table, illustrating that the model performs better for predicting "like" classes than "dislike" classes.
Machine learning resources
Deeper Knowledge on Machine Learning (ML)
A type of machine learning that classifies entities based on their characteristics
A guide to finding patterns and relationships in data
Transforming "raw" data into a more easily analyzed form through normalization and format standardization
Broader Topics Related to Machine Learning (ML)
Artificial Intelligence (AI)
The mimicking of human cognitive functions and behaviors by machines
The scientific method applied to data analysis