An important concept regarding machine learning.

 Distribution: Distribution shows us how the probabilities of measurements are distributed.

1. Normal Distribution:- Normal distribution is always centered on the average value.



    Standard Deviation: The width of the curve defines the 'standard deviation'. It is called a central limit theorem. Width of the mean value. 

 Mean ( ): Center value or average value of the distribution.

Both represent the population called as Population Parameter. The parameters that determine how a distribution fits the population data are called Population Parameters

2. Exponential Distribution:- It is also used for probabilities and statistics.



3. Gamma Distribution:- It is also used for probabilities and statistics. In this type of distribution, shape and rate are parameters.



Variance:  


OR





Standard Deviation:  


Model: A model is a way to explore a relationship. We use statistics to determine how useful and how reliable our model is.

Hypothesis: Hypothesis means the assumption of data. Trying to find out things are the same or not. There is hypothesis testing.

  1. Null hypothesis testing: In this testing, there is no difference between the two things. Or no difference between the mean of groups. it is denoted as HA. 

  2. Alternative hypothesis testing: It is opposite to the Null hypothesis. In this testing at least one group mean is different. It is denoted as Ho. 

There are two errors:
1] Type 1 error: Rejection of a true null hypothesis known as false positive.
2] Type 2 error: Non-rejection of a false null hypothesis known as a false negative.

Confusion Matrix: A confusion matrix is a table that is used to describe the performance of a classification model or classifier.  

  • True Positive: We predicted yes and the actual result is also yes.
  • True Negative: We predicted no and the actual result is also no.
  • False Positive: We predicted yes but the actual result is no
  • False Negative: We predicted no but the actual result is yes.
    Accuracy: (TP+TN)/Total
    Misclassification: Misclassification means error rate. (FP+FN)/Total
    True Positive Rate: TP/Actual Yes count
    True Negative Rate: Actual No Count/TN
    Precision: When it predicts yes, how often is it correct.
    Sensitivity: The percentage of the predicted answer is equal to the actual answer. 

Example: 


Sensitivity = TP/(TP+FN) = 139/(139+32) = 0.81
0.81*100 = 80% of peoples with heart disease were correctly identified by the logistics regression. 

Accuracy = (TP+TN)/Total = 251/303 = 0.8283
0.8283*100 = 82% accuracy of this model.

The size of the confusion matrix is determined by the number of things we want to predict.

Association Rule:- It is a rule-based machine learning method for discovering interesting relationships between variables in large databases. Discovering regularities between products in large-scale transactions.
    Types:                  
  1. Horizontal Data Format: 

  2. Vertical Data Format: 
    Apriori Algorithm: Identifying the frequent individual items in the database.
  • 100% of sets with apples contain bananas.
  • 50% of sets with apple, banana have watermelon.
  • 50% of sets with apple, banana have any other fruits.    
    Support: How to frequently the itemset appears in the dataset.
                    Support(XY) = count(XY) / Total number
    Confidence: Confidence(XY)  = support(XY) / support(X)

Statistics: 
  1. Mean: Avg value.  Used when data is not normally distributed. In this case, outliers occur. Outliers: It can occur by chance in distribution but they indicate either measurement error or heavy-tailed distribution.
  2. Median: Middle value. Used when dealing with ordinal data. Ex. Strongly dislike, dislike, like, strongly like.
  3. Mode: The value that occurs frequently in the dataset. Used when dealing with unordered categories data.
  4. Skewness: If it looks the same to the left and right of the center.
  5. Kurtosis: Measure of whether the data are peaked or flat relative to the rest of the data.


Thank you 😊 for reading. Please read other blogs. And also share with your friends and family.






        















    


  


  

Comments

Popular posts from this blog

How to convert PDF file into audio file?

How to perform operations on emails and folders using imap_tools?

Pillow Libary in Python.