An important concept regarding machine learning.

 Distribution: Distribution shows us how the probabilities of measurements are distributed.

1. Normal Distribution:- Normal distribution is always centered on the average value.



    Standard Deviation: The width of the curve defines the 'standard deviation'. It is called a central limit theorem. Width of the mean value. 

 Mean ( ): Center value or average value of the distribution.

Both represent the population called as Population Parameter. The parameters that determine how a distribution fits the population data are called Population Parameters

2. Exponential Distribution:- It is also used for probabilities and statistics.



3. Gamma Distribution:- It is also used for probabilities and statistics. In this type of distribution, shape and rate are parameters.



Variance:  


OR





Standard Deviation:  


Model: A model is a way to explore a relationship. We use statistics to determine how useful and how reliable our model is.

Hypothesis: Hypothesis means the assumption of data. Trying to find out things are the same or not. There is hypothesis testing.

  1. Null hypothesis testing: In this testing, there is no difference between the two things. Or no difference between the mean of groups. it is denoted as HA. 

  2. Alternative hypothesis testing: It is opposite to the Null hypothesis. In this testing at least one group mean is different. It is denoted as Ho. 

There are two errors:
1] Type 1 error: Rejection of a true null hypothesis known as false positive.
2] Type 2 error: Non-rejection of a false null hypothesis known as a false negative.

Confusion Matrix: A confusion matrix is a table that is used to describe the performance of a classification model or classifier.  

  • True Positive: We predicted yes and the actual result is also yes.
  • True Negative: We predicted no and the actual result is also no.
  • False Positive: We predicted yes but the actual result is no
  • False Negative: We predicted no but the actual result is yes.
    Accuracy: (TP+TN)/Total
    Misclassification: Misclassification means error rate. (FP+FN)/Total
    True Positive Rate: TP/Actual Yes count
    True Negative Rate: Actual No Count/TN
    Precision: When it predicts yes, how often is it correct.
    Sensitivity: The percentage of the predicted answer is equal to the actual answer. 

Example: 


Sensitivity = TP/(TP+FN) = 139/(139+32) = 0.81
0.81*100 = 80% of peoples with heart disease were correctly identified by the logistics regression. 

Accuracy = (TP+TN)/Total = 251/303 = 0.8283
0.8283*100 = 82% accuracy of this model.

The size of the confusion matrix is determined by the number of things we want to predict.

Association Rule:- It is a rule-based machine learning method for discovering interesting relationships between variables in large databases. Discovering regularities between products in large-scale transactions.
    Types:                  
  1. Horizontal Data Format: 

  2. Vertical Data Format: 
    Apriori Algorithm: Identifying the frequent individual items in the database.
  • 100% of sets with apples contain bananas.
  • 50% of sets with apple, banana have watermelon.
  • 50% of sets with apple, banana have any other fruits.    
    Support: How to frequently the itemset appears in the dataset.
                    Support(XY) = count(XY) / Total number
    Confidence: Confidence(XY)  = support(XY) / support(X)

Statistics: 
  1. Mean: Avg value.  Used when data is not normally distributed. In this case, outliers occur. Outliers: It can occur by chance in distribution but they indicate either measurement error or heavy-tailed distribution.
  2. Median: Middle value. Used when dealing with ordinal data. Ex. Strongly dislike, dislike, like, strongly like.
  3. Mode: The value that occurs frequently in the dataset. Used when dealing with unordered categories data.
  4. Skewness: If it looks the same to the left and right of the center.
  5. Kurtosis: Measure of whether the data are peaked or flat relative to the rest of the data.


Thank you 😊 for reading. Please read other blogs. And also share with your friends and family.






        















    


  


  

Comments

Popular posts from this blog

How to perform operations on emails and folders using imap_tools?

How to convert PDF file into audio file?

Pillow Libary in Python.