An important concept regarding machine learning.

September 27, 2021

Distribution: Distribution shows us how the probabilities of measurements are distributed.

1. Normal Distribution:- Normal distribution is always centered on the average value.

Standard Deviation: The width of the curve defines the 'standard deviation'. It is called a central limit theorem. Width of the mean value.

Mean (x̅ ): Center value or average value of the distribution.

Both represent the population called as Population Parameter. The parameters that determine how a distribution fits the population data are called Population Parameters.

2. Exponential Distribution:- It is also used for probabilities and statistics.

3. Gamma Distribution:- It is also used for probabilities and statistics. In this type of distribution, shape and rate are parameters.

Variance:

Standard Deviation:

Model: A model is a way to explore a relationship. We use statistics to determine how useful and how reliable our model is.

Hypothesis: Hypothesis means the assumption of data. Trying to find out things are the same or not. There is hypothesis testing.

Null hypothesis testing: In this testing, there is no difference between the two things. Or no difference between the mean of groups. it is denoted as HA.
Alternative hypothesis testing: It is opposite to the Null hypothesis. In this testing at least one group mean is different. It is denoted as Ho.

There are two errors:

1] Type 1 error: Rejection of a true null hypothesis known as false positive.

2] Type 2 error: Non-rejection of a false null hypothesis known as a false negative.

Confusion Matrix: A confusion matrix is a table that is used to describe the performance of a classification model or classifier.

True Positive: We predicted yes and the actual result is also yes.
True Negative: We predicted no and the actual result is also no.
False Positive: We predicted yes but the actual result is no
False Negative: We predicted no but the actual result is yes.

Accuracy: (TP+TN)/Total

Misclassification: Misclassification means error rate. (FP+FN)/Total

True Positive Rate: TP/Actual Yes count

True Negative Rate: Actual No Count/TN

Precision: When it predicts yes, how often is it correct.

Sensitivity: The percentage of the predicted answer is equal to the actual answer.

Example:

Sensitivity = TP/(TP+FN) = 139/(139+32) = 0.81

0.81*100 = 80% of peoples with heart disease were correctly identified by the logistics regression.

Accuracy = (TP+TN)/Total = 251/303 = 0.8283

0.8283*100 = 82% accuracy of this model.

The size of the confusion matrix is determined by the number of things we want to predict.

Association Rule:- It is a rule-based machine learning method for discovering interesting relationships between variables in large databases. Discovering regularities between products in large-scale transactions.

Types:

Horizontal Data Format:
Vertical Data Format:

Apriori Algorithm: Identifying the frequent individual items in the database.

100% of sets with apples contain bananas.
50% of sets with apple, banana have watermelon.
50% of sets with apple, banana have any other fruits.

Support: How to frequently the itemset appears in the dataset.

Support(XY) = count(XY) / Total number

Confidence: Confidence(XY) = support(XY) / support(X)

Statistics:

Mean: Avg value. Used when data is not normally distributed. In this case, outliers occur. Outliers: It can occur by chance in distribution but they indicate either measurement error or heavy-tailed distribution.
Median: Middle value. Used when dealing with ordinal data. Ex. Strongly dislike, dislike, like, strongly like.
Mode: The value that occurs frequently in the dataset. Used when dealing with unordered categories data.
Skewness: If it looks the same to the left and right of the center.
Kurtosis: Measure of whether the data are peaked or flat relative to the rest of the data.

Thank you 😊 for reading. Please read other blogs. And also share with your friends and family.

Search This Blog

Pythoholic: Python conepts and projects.

An important concept regarding machine learning.

Comments

Post a Comment

Popular posts from this blog

How to perform operations on emails and folders using imap_tools?

How to convert PDF file into audio file?

Pillow Libary in Python.