**Machine learning **is the heart of modern technology, influencing everything from search engines to recommendation systems. If you're embarking on the **NPTEL Machine Learning** course, you're on a path to understanding one of the most exciting and dynamic fields in computer science. This article will provide a comprehensive guide to the Week 1 assignment for the **NPTEL Introduction to Machine Learning course**, running from July to October 2024.

**Course Overview**

The **NPTEL Machine Learning** course is meticulously designed to introduce students to the fundamental concepts and techniques of machine learning. Structured across several weeks, the course aims to equip learners with the knowledge required to understand and apply machine learning algorithms. The learning outcomes include a deep understanding of the algorithms, their applications, and the ability to implement them effectively.

**Assignment Overview**

The **Week 1 assignment** is designed to test your understanding of the basic concepts introduced during the first week. It includes a variety of questions, such as multiple-choice questions, short answer questions, and problem-solving exercises. These questions aim to assess your grasp of the fundamental principles of machine learning and your ability to apply them.

## Detailed Answers to Week 1 Assignment

### Question 1

**Which of the following are supervised learning problems (Multiple Correct)?**

a. Classifying Spotify users based on their listening history

b. Weather forecast using data collected by a satellite

c. Predicting tuberculosis using patient’s chest X-Ray

d. Training a humanoid to walk using a reward system

**Answer:** a. Classifying Spotify users based on their listening history**Answer:** b. Weather forecast using data collected by a satellite**Answer:** c. Predicting tuberculosis using patient’s chest X-Ray

**Explanation:** Supervised learning involves training a model on labeled data. Classifying Spotify users, weather forecasting, and predicting tuberculosis are all examples where the model is trained on input-output pairs. Training a humanoid to walk using a reward system is an example of reinforcement learning, not supervised learning.

### Question 2

**Which of the following are regression tasks (Multiple Correct)?**

a. Predicting the outcome of an election

b. Predicting the weight of a giraffe based on its weight

c. Predicting the emotion conveyed by a sentence

d. Identifying abnormal data points

**Answer:** b. Predicting the weight of a giraffe based on its weight

**Explanation:** Regression tasks involve predicting continuous values. Predicting the weight of a giraffe is a regression task. The other options involve classification tasks.

### Question 3

**Which of the following are classification tasks (Multiple Correct)?**

a. Predicting the outcome of an election

b. Predicting the weight of a giraffe based on its weight

c. Predicting the emotion conveyed by a sentence

d. Identifying abnormal data points

**Answer:** a. Predicting the outcome of an election**Answer:** c. Predicting the emotion conveyed by a sentence**Answer:** d. Identifying abnormal data points

**Explanation:** Classification tasks involve predicting discrete labels. Predicting the outcome of an election, predicting the emotion conveyed by a sentence, and identifying abnormal data points are all classification tasks.

### Question 4

**Which of the two functions overfit the training data?**

a. Both functions F1 & F2

b. Function F1

c. Function F2

d. None of them

**Answer:** c. Function F2

**Explanation:** Overfitting occurs when a model learns the noise in the training data rather than the actual signal. In the given plot, function F2 (the dashed pink line) fits the training data too closely, indicating overfitting.

### Question 5

**Which of the following 2 functions will yield higher training error?**

a. Function F1

b. Function F2

c. Both functions F1 & F2 will have the same training error

d. Can not be determined

**Answer:** a. Function F1

**Explanation:** Function F1 (the dotted blue line) is simpler and likely does not fit the training data as closely as Function F2. Therefore, F1 will have a higher training error compared to F2.

### Question 6

**What does the term 'policy' refer to in reinforcement learning?**

a. A set of rules governing the environment

b. The reward function

c. The initial state of the environment

d. The strategy the agent follows to choose actions

**Answer:** d. The strategy the agent follows to choose actions

**Explanation:** In reinforcement learning, a policy is a strategy or a mapping from states of the environment to the actions to be taken when in those states.

### Question 7

**Given the following dataset, for k = 3, use KNN regression to find the prediction for a new data point (2,3) (Use Euclidean distance measure for finding closest points):**

X1 | X2 | Y |
---|---|---|

1 | 2 | 3 |

2 | 3 | 3 |

3 | 4 | 3 |

4 | 5 | 3 |

2 | 4 | 2.5 |

Options:
a. 2.0

b. 2.6

c. 2.8

d. 3.2

**Answer:** b. 2.6

**Explanation:** Using K-Nearest Neighbors (KNN) regression, we find the three closest points to (2,3):

- (2,3) with distance 0
- (2,4) with distance 1
- (1,2) with distance sqrt(2) ≈ 1.41

The average of their Y-values is (3 + 2.5 + 3)/3 = 2.83/3 ≈ 2.6.

### Question 8

**For any given dataset, comment on the bias of K-nearest classifiers upon increasing the value of K.**

a. The bias of the classifier decreases

b. The bias of the classifier does not change

c. The bias of the classifier increases

d. Can not be determined

**Answer:** c. The bias of the classifier increases

**Explanation:** As K increases, the classifier uses more neighbors to make a prediction. This tends to smooth out the predictions, increasing the bias (underfitting) while reducing the variance.

### Question 9

**Bias and variance are given by:**

**Answer:** $E[(f(x) - E[f(x)])^2] + E[(E[f(x)] - f(x))^2]$

**Explanation:** Bias and variance are decomposed from the mean squared error. The first term represents the variance and the second term represents the bias squared.

### Question 10

**Which of the following statements are FALSE regarding bias and variance?**

a. Models which overfit have a high bias

b. Models which overfit have a low bias

c. Models which underfit have a high variance

d. Models which underfit have a low variance

**Answer:** a. Models which overfit have a high bias**Answer:** c. Models which underfit have a high variance

**Explanation:** Models that overfit have low bias but high variance, as they fit the training data too closely. Models that underfit have high bias but low variance, as they fail to capture the underlying trend in the data.