UNIT 3: Evaluating Models
Class X – Artificial Intelligence (AI)
1. Importance of Model Evaluation
What is Model Evaluation?
Evaluation is the process of testing an AI model’s performance on a specific dataset to see how well it has learned and how accurately it can make predictions on new, unseen data.
It helps us understand:
- Whether the model gives correct predictions
- How accurate the AI system is
- Whether improvements are needed
Example
Suppose an AI model predicts whether an email is Spam or Not Spam.
If out of 100 emails it predicts 92 correctly, the model is considered good.
Need for Model Evaluation
Model evaluation is important because it:
- Measures model performance
- Detects mistakes and errors
- Helps compare different AI models
- Improves reliability and accuracy
- Prevents wrong predictions
Real-life Examples
| AI Application | Why Evaluation is Important |
|---|---|
| Face Unlock | Must recognize the correct user |
| Medical Diagnosis | Wrong prediction can be dangerous |
| Self-driving Cars | Accuracy is very important for safety |
2. Splitting the Training Set Data for Evaluation
What is Train-Test Split?
In AI, data is divided into two parts:
| Data Type | Purpose |
|---|---|
| Training Data | Used to teach the AI model |
| Testing Data | Used to check model performance |
This process is called Train-Test Split.
Why is Train-Test Split Needed?
If we test the model using the same data used for training:
- The model may memorize answers
- Evaluation will not be fair
Testing on new data checks the real performance.
Common Split Ratio
| Training Data | Testing Data |
|---|---|
| 80% | 20% |
| 70% | 30% |
Example
If there are 1000 records:
- 800 records → Training
- 200 records → Testing
Advantages of Train-Test Split
- Easy to implement
- Gives fair evaluation
- Reduces overfitting
4. Evaluation Metrics for Classification Model
What is Classification?
Classification means placing data into categories.
Examples
| Input | Category |
|---|---|
| Spam / Not Spam | |
| Photo | Cat / Dog |
| Student Result | Pass / Fail |
Popular metrics used for classification model:
- Confusion matrix
- Classification accuracy
- Precision
- Recall
- F1 Score
I. Confusion Matrix
A confusion matrix is a performance measurement table used in machine learning to evaluate classification models by comparing predicted values against actual outcomes.

- True Positive (TP): Predicted ‘Yes’ and it was actually ‘Yes’.
- True Negative (TN): Predicted ‘No’ and it was actually ‘No’.
- False Positive (FP): Predicted ‘Yes’ but it was actually ‘No’ (Type I Error).
- False Negative (FN): Predicted ‘No’ but it was actually ‘Yes’ (Type II Error).
1. True Positive (TP)
Model correctly predicts positive.
Example: Spam email correctly identified as spam.
2. True Negative (TN)
Model correctly predicts negative.
Example: Normal email correctly identified as normal.
3. False Positive (FP)
Model wrongly predicts positive.
Example: Normal email marked as spam.
4. False Negative (FN)
Model wrongly predicts negative.
Example: Spam email marked as normal.
II. Accuracy and Error
Accuracy
Accuracy tells how many predictions are correct. It is the percentage of correct predictions out of the total number of cases.
Accuracy = (Correct Predictions/Total Predictions) X 100
Formula
Accuracy=(Correct Predictions/Total Predictions) X 100
Example
A model predicts:
- Correct answers = 90
- Total answers = 100
Then,
Accuracy=90/100×100=90%
So, the model accuracy is 90%.
Error
Error means wrong predictions made by the AI model. This represents the gap between the predicted output and the actual output. High error indicates a poorly performing model.
Formula
Error Rate=(Wrong Predictions/Total Predictions)×100
Example
Wrong predictions = 10
Total predictions = 100
Error Rate = (10/100) x 100 = 10%
Relationship Between Accuracy and Error
Accuracy + Error Rate = 100%
III. Precision
Precision tells how many predicted positive cases are actually positive.
Formula
Precision = TP/ (TP+FP)
Example
If:
- TP = 40
- FP = 10
Precision = 40/ (40+10) = 0.8
Precision = 80%
IV. Recall
Recall, also known as Sensitivity or True Positive Rate, is a maetric that measures the ability of the model to correctly identify all relevant instances. It is the proportion/ratio of true positive out of all actual positives.
Recall tells how many actual positive cases are correctly identified.
Formula
Recall = TP / (TP+FN)
V. F1 Score
F1 Score is a balance between Precision and Recall. It is used when there is an uneven class distribution.
F1 Score = 2 X (Precision X Recall ) / (Precision + Recall)
Formula
F1 = 2x (Precision x Recall ) / (Precision + Recall)
Choosing the Correct Metric
| Situation | Best Metric |
|---|---|
| General performance | Accuracy |
| Avoid false alarms | Precision |
| Detect all positive cases | Recall |
| Balance Precision & Recall | F1 Score |
5. Ethical Concerns Around Model Evaluation
AI systems must be ethical and fair.
1. Bias
Bias happens when AI gives unfair results.
Example
A hiring AI prefers one group unfairly.
Solution
- Use balanced data
- Test AI carefully
2. Transparency
Transparency means AI decisions should be understandable.
Users should know:
- How AI works
- Why decisions are made
3. Accuracy
AI should make correct predictions.
Low accuracy can:
- Cause wrong decisions
- Reduce trust in AI
Important Terms
| Term | Meaning |
|---|---|
| Model Evaluation | Checking AI performance |
| Accuracy | Correct predictions percentage |
| Error | Wrong predictions percentage |
| Train-Test Split | Dividing data for training and testing |
| Classification | Sorting data into categories |
| Precision | Correct positive predictions |
| Recall | Positive cases found correctly |
| F1 Score | Balance of Precision and Recall |
| Bias | Unfair AI behavior |
Short Answer Questions
- What is model evaluation?
- Why is train-test split important?
- Define accuracy.
- What is a confusion matrix?
- Differentiate between Precision and Recall.
- What is bias in AI?
- Define F1 Score.
- What is classification?
Answers to Short Answer Questions
1. What is model evaluation?
Model evaluation is the process of checking how well an AI model performs and how accurately it gives predictions.
2. Why is train-test split important?
Train-test split is important because it helps test the AI model on new data and gives a fair evaluation of its performance.
3. Define accuracy.
Accuracy is the percentage of correct predictions made by an AI model.
Accuracy = (Correct Predictions / Total Predictions) X 100
4. What is a confusion matrix?
A confusion matrix is a table used to measure the performance of a classification model by comparing actual and predicted values.
5. Differentiate between Precision and Recall.
| Precision | Recall |
|---|---|
| Measures correct positive predictions | Measures how many actual positives are identified |
| Focuses on reducing false positives | Focuses on reducing false negatives |
6. What is bias in AI?
Bias in AI means unfair or prejudiced behavior shown by an AI system due to unbalanced or incorrect data.
7. Define F1 Score.
F1 Score is a metric that balances Precision and Recall to measure model performance.
F1 = 2 X ( Precision X Recall) / (Precision + Recall)
8. What is classification?
Classification is the process of grouping data into categories or classes, such as Spam/Not Spam or Cat/Dog.
MCQs
1. Which data is used to teach the AI model?
a) Testing Data
b) Training Data
c) Validation Data
d) Output Data
Answer: b) Training Data
2. Which metric measures correct predictions?
a) Recall
b) Precision
c) Accuracy
d) Bias
Answer: c) Accuracy
3. False Positive means:
a) Correct positive prediction
b) Wrong positive prediction
c) Correct negative prediction
d) Wrong negative prediction
Answer: b) Wrong positive prediction
4. Which metric balances Precision and Recall?
a) Accuracy
b) Error
c) F1 Score
d) Bias
Answer: c) F1 Score
5. Unfair AI behavior is called:
a) Transparency
b) Accuracy
c) Bias
d) Recall
Answer: c) Bias
=====================================================
Question: A healthcare-based AI model was trained to predict whether a patient has a certain medical condition (Positive) or is healthy (Negative). Out of a total test group, the model’s predictions were recorded and organized into the Confusion Matrix shown below:
| Predicted Positive | Predicted Negative | |
| Actual Positive | 60 | 10 |
| Actual Negative | 5 | 25 |
Based on the matrix given above, answer the following questions. (Show all formula and calculation steps clearly):
- Identify the values of TP, TN, FP, and FN. (1 Mark)
- Calculate the Accuracy of the AI model. (1 Mark)
- Calculate the Precision and Recall of the model. (2 Marks)
- Calculate the F1 Score using the calculated Precision and Recall. (1 Mark)
Answer:
Step 1: Identifying the Matrix Values
- True Positive (TP): 60 (Actual Positive predicted as Positive)
- True Negative (TN): 25 (Actual Negative predicted as Negative)
- False Positive (FP): 5 (Actual Negative predicted as Positive)
- False Negative (FN): 10 (Actual Positive predicted as Negative)
- Total Observations = 60 + 10 + 5 + 25 = 100
Step 2: Calculating Accuracy

Note: The approximation symbol (≈) is used in the F1 score calculations simply because the division results in an infinite or long recurring decimal, and we need to round it off to a practical number of decimal places (usually 2 or 3).