Navigating the complex world of machine learning can be daunting, but NPTEL's course on "Introduction to Machine Learning" by IITKGP provides a structured approach to understanding key concepts. In Week 2, students are challenged with a series of questions that test their grasp of entropy, bias, decision trees, linear regression, and more. This article provides comprehensive answers to the Week 2 assignment, ensuring you understand both the solutions and the reasoning behind them.
Question 1:
Q: In a binary classification problem, out of 30 data points 10 belong to class I and 20 belong to class II. What is the entropy of the data set?
- A. 0.97
- B. 0.91
- C. 0.50
- D. 0.67
A: B. 0.91
Reasoning: The entropy of a dataset for a binary classification problem is given by: where and are the proportions of the two classes. In this case:
Question 2:
Q: Which of the following is false?
- A. Bias is the true error of the best classifier in the concept class
- B. Bias is high if the concept class cannot model the true data distribution well
- C. High bias leads to overfitting
A: C. High bias leads to overfitting
Reasoning: High bias typically leads to underfitting, not overfitting. Overfitting is generally caused by low bias and high variance.
Question 3:
Q: Decision trees can be used for the problems where
- the attributes are categorical.
- the attributes are numeric valued.
- the attributes are discrete valued.
- A. 1 only
- B. 1 and 2 only
- C. 1 and 3 only
- D. 1, 2 and 3
A: D. 1, 2 and 3
Reasoning: Decision trees can handle categorical, numeric, and discrete attributes.
Question 4:
Q: In linear regression, our hypothesis is , the training data is given in the table. If the cost function is , where is the number of training data points. What is the value of when ?
x | y |
---|---|
7 | 8 |
5 | 4 |
11 | 10 |
2 | 3 |
- A. 0
- B. 2
- C. 1
- D. 0.25
A: C. 1
Reasoning: Calculate the hypothesis values and the cost function: For each data point: Correct value should be verified, appears to be a mistake in the problem setup.
Question 5:
Q: The value of information gain in the following decision tree is:
Decision tree with entropies:
Root entropy = 0.946 (30 examples)
Left child entropy = 0.787 (17 examples)
Right child entropy = 0.391 (13 examples)
A. 0.380
B. 0.620
C. 0.190
D. 0.477
A: D. 0.477
Reasoning: Information Gain (IG) is calculated as:
Question 6:
Q: What is true for Stochastic Gradient Descent?
- A. In every iteration, model parameters are updated based on multiple training samples.
- B. In every iteration, model parameters are updated based on one training sample.
- C. In every iteration, model parameters are updated based on all training samples.
- D. None of the above
A: B. In every iteration, model parameters are updated based on one training sample.
Reasoning: Stochastic Gradient Descent updates parameters based on one training sample per iteration.
Question 7:
Q: The entropy of the entire dataset is:
Species | Green | Legs | Height | Smelly |
---|---|---|---|---|
M | N | 3 | T | N |
M | Y | 2 | T | N |
M | Y | 3 | T | Y |
M | N | 3 | T | N |
M | N | 3 | T | Y |
H | Y | 2 | T | N |
H | N | 2 | T | Y |
H | Y | 2 | T | N |
H | Y | 2 | T | N |
H | N | 2 | T | Y |
- A. 0.5
- B. 1
- C. 0
- D. 0.1
A: B. 1
Reasoning: The dataset has equal number of Martians (M) and Humans (H). Hence, the entropy is:
Question 8:
Q: Which attribute will be the root of the decision tree (if information gain is used to create the decision tree) and what is the information gain due to that attribute?
- A. Green, 0.45
- B. Legs, 0.4
- C. Height, 0.8
- D. Smelly, 0.7
A: C. Height, 0.8
Reasoning: The attribute with the highest information gain will be the root. Here, Height has the highest information gain of 0.8.
Question 9:
Q: In Linear Regression the output is:
- A. Discrete
- B. Continuous and always lies in a finite range
- C. Continuous
- D. May be discrete or continuous
A: C. Continuous
Reasoning: Linear Regression predicts a continuous output.
Question 10:
Q: Identify whether the following statement is true or false? "Overfitting is more likely when the set of training data is small"
- A. True
- B. False
A: A. True
Reasoning: With a smaller training dataset, the model might capture noise and peculiarities of the dataset, leading to overfitting.