• Create or train ML models

Clustering Decision Trees Random Forest Naïve Bayes Q-Learning Time Series

  • Split train test
  • Put data into ML to create a model
  • So, to decide if a photo is of a dog or a cat, need a training phase and testing phase to see learn how to do it
  • Collect many dog and cat photos
  • That data must then be split into training set and testing set

  • Two data set in this case, X will be the independent features, Y will be the dependent variables
  • The formula to split it is x Train - x Test / y Train - y Test
  • x Train and y Train are used to create the model

  • Once the model is created, input x Test and the output should be equal to the y Test
  • The more closely the model output is to y Test, the more accurate the model is

  • Supervised learning methods
  • Tree based learning algorithms

  • Gradient descent is the most used learning algorithm, there are many variants of gradient descents
  • Gradient descent requires a cost function, there are many types of cost functions
  • We need the cost function because we want to minimize it
  • Minimizing any function means finding the deepest valley in that function
  • Cost function is used to monitor error in prediction of a ML model
  • So, minimizing this means reducing the error or increasing the accuracy of the model

Decision tree algorithms are referred to as CART (Classification and Regression Trees), the possible solutions to a given problem emerge as the leaves of a tree, each node representing a point of deliberation and decision

Decision tree model is a flowchart like structure in which each internal node represents a test on attribute (example: whether a coin comes up head or tail), each branch represents the outcome of the test and each leaf node represents a class label (decision taken after computing all attributes), the paths from root to leaf represents classification rules

Use cases:

  • When user is trying to achieve maximum profit, optimize cost
  • When there are several courses of action
  • Calculate measure of benefit of the various alternatives
  • When there are events beyond the control of decision makers (such as environmental factors)
  • Uncertainty concerning which outcome will actually happen

  • Overfitting is more common than underfitting
  • Overfitting means that the model is too well trained and fit too closely to the training dataset
  • Overfitting means that the model is too complex, underfitting means the model is too simple

  • To avoid overfitting, the data can’t have too many features/variables compared to the number of observations
  • To avoid underfitting, the data need enough predicators/independent variables

accuracy_score() takes two arguments, first is the test data labels (actual labels) and then the model’s prediction


accuracy = accuracy_score(y_test, predictions)

brew install graphviz (not pip)



Neural Network

