Model Evaluation

UNIT 3: Evaluating Models

Class X – Artificial Intelligence (AI)

1. Importance of Model Evaluation

What is Model Evaluation?

Evaluation is the process of testing an AI model’s performance on a specific dataset to see how well it has learned and how accurately it can make predictions on new, unseen data.

It helps us understand:

  • Whether the model gives correct predictions
  • How accurate the AI system is
  • Whether improvements are needed

Example

Suppose an AI model predicts whether an email is Spam or Not Spam.

If out of 100 emails it predicts 92 correctly, the model is considered good.


Need for Model Evaluation

Model evaluation is important because it:

  • Measures model performance
  • Detects mistakes and errors
  • Helps compare different AI models
  • Improves reliability and accuracy
  • Prevents wrong predictions

Real-life Examples

AI ApplicationWhy Evaluation is Important
Face UnlockMust recognize the correct user
Medical DiagnosisWrong prediction can be dangerous
Self-driving CarsAccuracy is very important for safety

2. Splitting the Training Set Data for Evaluation

What is Train-Test Split?

In AI, data is divided into two parts:

Data TypePurpose
Training DataUsed to teach the AI model
Testing DataUsed to check model performance

This process is called Train-Test Split.


Why is Train-Test Split Needed?

If we test the model using the same data used for training:

  • The model may memorize answers
  • Evaluation will not be fair

Testing on new data checks the real performance.


Common Split Ratio

Training DataTesting Data
80%20%
70%30%

Example

If there are 1000 records:

  • 800 records → Training
  • 200 records → Testing

Advantages of Train-Test Split

  • Easy to implement
  • Gives fair evaluation
  • Reduces overfitting


4. Evaluation Metrics for Classification Model

What is Classification?

Classification means placing data into categories.

Examples

InputCategory
EmailSpam / Not Spam
PhotoCat / Dog
Student ResultPass / Fail

Popular metrics used for classification model:

  1. Confusion matrix
  2. Classification accuracy
  3. Precision
  4. Recall
  5. F1 Score

I. Confusion Matrix

A confusion matrix is a performance measurement table used in machine learning to evaluate classification models by comparing predicted values against actual outcomes.

  • True Positive (TP): Predicted ‘Yes’ and it was actually ‘Yes’.
  • True Negative (TN): Predicted ‘No’ and it was actually ‘No’.
  • False Positive (FP): Predicted ‘Yes’ but it was actually ‘No’ (Type I Error).
  • False Negative (FN): Predicted ‘No’ but it was actually ‘Yes’ (Type II Error).

1. True Positive (TP)

Model correctly predicts positive.

Example: Spam email correctly identified as spam.

2. True Negative (TN)

Model correctly predicts negative.

Example: Normal email correctly identified as normal.

3. False Positive (FP)

Model wrongly predicts positive.

Example: Normal email marked as spam.

4. False Negative (FN)

Model wrongly predicts negative.

Example: Spam email marked as normal.


II. Accuracy and Error

Accuracy

Accuracy tells how many predictions are correct. It is the percentage of correct predictions out of the total number of cases.
Accuracy = (Correct Predictions/Total Predictions) X 100

Formula

Accuracy=(Correct Predictions/Total Predictions​) X 100

[latex]Accuracy=Correct PredictionsTotal Predictions×100[/latex][latex]\text{Accuracy} = \frac{\text{Correct Predictions}}{\text{Total Predictions}} \times 100[/latex]

Example

A model predicts:

  • Correct answers = 90
  • Total answers = 100

Then,

Accuracy=90/100​×100=90%

So, the model accuracy is 90%.


Error

Error means wrong predictions made by the AI model. This represents the gap between the predicted output and the actual output. High error indicates a poorly performing model.

Formula

Error Rate=(Wrong Predictions/Total Predictions)​×100


Example

Wrong predictions = 10
Total predictions = 100

Error Rate = (10/100) x 100 = 10%


Relationship Between Accuracy and Error

Accuracy + Error Rate = 100%

III. Precision

Precision tells how many predicted positive cases are actually positive.

Formula

Precision = TP/ (TP+FP)

Example

If:

  • TP = 40
  • FP = 10

Precision = 40/ (40+10) = 0.8

Precision = 80%


IV. Recall

Recall, also known as Sensitivity or True Positive Rate, is a maetric that measures the ability of the model to correctly identify all relevant instances. It is the proportion/ratio of true positive out of all actual positives.

Recall tells how many actual positive cases are correctly identified.

Formula

Recall = TP / (TP+FN)


V. F1 Score

F1 Score is a balance between Precision and Recall. It is used when there is an uneven class distribution.
F1 Score = 2 X (Precision X Recall ) / (Precision + Recall)

Formula

F1 = 2x (Precision x Recall ) / (Precision + Recall)


Choosing the Correct Metric

SituationBest Metric
General performanceAccuracy
Avoid false alarmsPrecision
Detect all positive casesRecall
Balance Precision & RecallF1 Score

5. Ethical Concerns Around Model Evaluation

AI systems must be ethical and fair.


1. Bias

Bias happens when AI gives unfair results.

Example

A hiring AI prefers one group unfairly.

Solution

  • Use balanced data
  • Test AI carefully

2. Transparency

Transparency means AI decisions should be understandable.

Users should know:

  • How AI works
  • Why decisions are made

3. Accuracy

AI should make correct predictions.

Low accuracy can:

  • Cause wrong decisions
  • Reduce trust in AI

Important Terms

TermMeaning
Model EvaluationChecking AI performance
AccuracyCorrect predictions percentage
ErrorWrong predictions percentage
Train-Test SplitDividing data for training and testing
ClassificationSorting data into categories
PrecisionCorrect positive predictions
RecallPositive cases found correctly
F1 ScoreBalance of Precision and Recall
BiasUnfair AI behavior

Short Answer Questions

  1. What is model evaluation?
  2. Why is train-test split important?
  3. Define accuracy.
  4. What is a confusion matrix?
  5. Differentiate between Precision and Recall.
  6. What is bias in AI?
  7. Define F1 Score.
  8. What is classification?

Answers to Short Answer Questions

1. What is model evaluation?

Model evaluation is the process of checking how well an AI model performs and how accurately it gives predictions.


2. Why is train-test split important?

Train-test split is important because it helps test the AI model on new data and gives a fair evaluation of its performance.


3. Define accuracy.

Accuracy is the percentage of correct predictions made by an AI model.

Accuracy = (Correct Predictions / Total Predictions) X 100

4. What is a confusion matrix?

A confusion matrix is a table used to measure the performance of a classification model by comparing actual and predicted values.


5. Differentiate between Precision and Recall.

PrecisionRecall
Measures correct positive predictionsMeasures how many actual positives are identified
Focuses on reducing false positivesFocuses on reducing false negatives

6. What is bias in AI?

Bias in AI means unfair or prejudiced behavior shown by an AI system due to unbalanced or incorrect data.


7. Define F1 Score.

F1 Score is a metric that balances Precision and Recall to measure model performance.

F1 = 2 X ( Precision X Recall) / (Precision + Recall)

8. What is classification?

Classification is the process of grouping data into categories or classes, such as Spam/Not Spam or Cat/Dog.

MCQs

1. Which data is used to teach the AI model?

a) Testing Data
b) Training Data
c) Validation Data
d) Output Data

Answer: b) Training Data


2. Which metric measures correct predictions?

a) Recall
b) Precision
c) Accuracy
d) Bias

Answer: c) Accuracy


3. False Positive means:

a) Correct positive prediction
b) Wrong positive prediction
c) Correct negative prediction
d) Wrong negative prediction

Answer: b) Wrong positive prediction


4. Which metric balances Precision and Recall?

a) Accuracy
b) Error
c) F1 Score
d) Bias

Answer: c) F1 Score


5. Unfair AI behavior is called:

a) Transparency
b) Accuracy
c) Bias
d) Recall

Answer: c) Bias

=====================================================

Question: A healthcare-based AI model was trained to predict whether a patient has a certain medical condition (Positive) or is healthy (Negative). Out of a total test group, the model’s predictions were recorded and organized into the Confusion Matrix shown below:

Predicted PositivePredicted Negative
Actual Positive6010
Actual Negative525

Based on the matrix given above, answer the following questions. (Show all formula and calculation steps clearly):

  1. Identify the values of TP, TN, FP, and FN. (1 Mark)
  2. Calculate the Accuracy of the AI model. (1 Mark)
  3. Calculate the Precision and Recall of the model. (2 Marks)
  4. Calculate the F1 Score using the calculated Precision and Recall. (1 Mark)

Answer:

Step 1: Identifying the Matrix Values

  • True Positive (TP): 60 (Actual Positive predicted as Positive)
  • True Negative (TN): 25 (Actual Negative predicted as Negative)
  • False Positive (FP): 5 (Actual Negative predicted as Positive)
  • False Negative (FN): 10 (Actual Positive predicted as Negative)
  • Total Observations = 60 + 10 + 5 + 25 = 100

Step 2: Calculating Accuracy

Note: The approximation symbol () is used in the F1 score calculations simply because the division results in an infinite or long recurring decimal, and we need to round it off to a practical number of decimal places (usually 2 or 3).

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top