Source code please in Github repository
Q1 Comparison of Classifiers
Data Description
We use the letter dataset from Statlog. The statistics of dataset is shown in Table 1. The class number of the data is 26.
Dataset | Size | Feature |
---|---|---|
Train | 15000 | 16 |
Test | 5000 | 16 |
Comparison of Classifiers
You are required to implement the following classifiers and compute the performance achieved by different classifiers.
Decision Tree
You should form decision trees on dataset in terms of entropy and gini criterions. For each criterion, you should set the depth as [5,10,15,20,25] separately. You need to compare the performance (accuracy, precision, recall, f1 score and training time) and give a brief discussion.
KNN, Random Forest
Apply three different classifiers KNN and Random Forest on the dataset. For each classifier, evaluate the performance (accuracy, precision, recall, f1 score and training time) . You are required to compare the performance of different classifiers and give a brief discussion.
1 | import pandas as pd |
1 | def evaluate(y_predict, y_test): |
Decision Tree
1 | import time |
According to the above evaluation result, we can summarize the performance of decision tree with different parameters in this table (keep 3 significant digits)
Criterion | Max Depth | Accuracy | Precision | Recall | F1 | Training Time |
---|---|---|---|---|---|---|
gini | 5 | 0.369 | 0.392 | 0.368 | 0.327 | 0.0942s |
gini | 10 | 0.713 | 0.763 | 0.716 | 0.726 | 0.103s |
gini | 15 | 0.832 | 0.842 | 0.833 | 0.835 | 0.125s |
gini | 20 | 0.867 | 0.869 | 0.867 | 0.867 | 0.126s |
gini | 25 | 0.872 | 0.872 | 0.872 | 0.872 | 0.126s |
entropy | 5 | 0.501 | 0.561 | 0.501 | 0.498 | 0.086s |
entropy | 10 | 0.799 | 0.804 | 0.799 | 0.800 | 0.118s |
entropy | 15 | 0.874 | 0.875 | 0.875 | 0.875 | 0.132s |
entropy | 20 | 0.871 | 0.871 | 0.872 | 0.871 | 0.151s |
entropy | 25 | 0.874 | 0.874 | 0.874 | 0.874 | 0.157s |
From this table, we can see that with the same parameter of max_depth, entropy always performs better than gini. It’s obvious that the larger the max_depth, the better performance of the classifier with higher accuracy, precison, recall and F1 but also with longer time to train. The best accuracy that the clssifier can achieve is about 0.874.
1 | from sklearn.neighbors import KNeighborsClassifier |
We summarize the performance of different classifiers in the following table. Notice we tune the n_neighbors
(number of neighbors to use for K nearest neighbors queries) in the KNN and adjust the n_estimators
(number of trees in the forest) in RandomForest. Then we get 6 different classifiers.
Classifier | Accuracy | Precision | Recall | F1 | Training Time |
---|---|---|---|---|---|
KNN -> n_neighbors = 2 | 0.947 | 0.948 | 0.949 | 0.947 | 0.198s |
KNN -> n_neighbors = 5 | 0.952 | 0.952 | 0.952 | 0.952 | 0.189s |
KNN -> n_neighbors = 8 | 0.948 | 0.949 | 0.948 | 0.948 | 0.187s |
RandomForest-> n_estimators = 50 | 0.957 | 0.958 | 0.958 | 0.958 | 0.932s |
RandomForest-> n_estimators = 100 | 0.960 | 0.961 | 0.961 | 0.961 | 1.813s |
RandomForest-> n_estimators = 150 | 0.962 | 0.963 | 0.963 | 0.963 | 2.734s |
Because RandomForest is an ensemble methods that it need to train multiple base classifiers to combine, it needs much more time to train compared with KNN (2.734s is 15X of 0.187s). More number of trees in the forest, it needs more training time but can achieve better performance. However, in KNN, the training time becomes smaller when n_neighbors
becomes larger. The best n_neighbors
is 5 and the best accuracy that KNN achieves is about 0.952 which is lower than the performance of RandomForest’s best accuracy 0.962.
In a word, RandomForest classifiers reduces the variance and has better performnce than KNN but need more time to train.
Q2 Implementation of Adaboost
The following table shows the training dataset. It consists of 10 data and 2 labels.
x | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
y | 1 | 1 | 1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 |
We assume the weak classifier is produced by $x < v$ or $x > v$ where $v$ is the threshold and makes the classifier get the best accuracy on the dataset. You should implement the AdaBoost algorithm to learn a strong classifier. Notice that you CANNOT use Adaboost library. You need to implement it manually.
You should also report the final expression of the strong classifier, such as $C^∗(x) = sign [\alpha_1 C_1(x) + \alpha_2 C_2(x) + \alpha_3 C_3(x) + \cdots]$, where $C_i(x)$ is the base classifier and $\alpha_i$ is the weight of base classifier. You are also required to describe each basic classifier in detail.
For simplicity, the threshold $v$ should be the multiple of 0.5, i.e., $v\%0.5==0$. For example, you can set $v$ as 2, 2.5, or 3, but you cannot set $v$ as 2.1.
1 | import math |
We consider from two sides to find all best base classifiers, firstly we consider this situation,
With $v$ satisfying $v\%0.5==0$, it’s obvious when $v=2.5$ or $v=8.5$, classifier achieves the lowest error rate 0.3, only misclassfying 3 samples. We set the two classifiers as $C_1$ and $C_2$:
Then we consider a classifier with this form,
Similarly, we can find that the classifier with $v=5.5$ has the smallest error rate of 0.4. We denote it by $C_3$:
1 | # Training Dataset |
Classifier 1: v < 2.5 => y = +1, v >= 2.5 => y = -1
prediction: [1, 1, 1, -1, -1, -1, -1, -1, -1, -1]
error rate: 0.3
Classifier 1: v < 8.5 => y = +1, v >= 8.5 => y = -1
prediction: [1, 1, 1, 1, 1, 1, 1, 1, 1, -1]
error rate: 0.3
Classifier 1: v < 2.5 => y = -1, v >= 2.5 => y = +1
prediction: [-1, -1, -1, -1, -1, -1, 1, 1, 1, 1]
error rate: 0.4
1 | class Adaboost: |
1 | ada = Adaboost(base_classifiers=[C1, C2, C3], n_classifiers=3) |
C1 error rate = 0.030000000000000006
C2 error rate = 0.030000000000000006
C3 error rate = 0.04
weights = [0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 0.051546391752577324, 1.6666666666666665, 1.6666666666666665, 1.6666666666666665, 0.051546391752577324]
C1 error rate = 0.5
C2 error rate = 0.015463917525773196
C3 error rate = 0.02061855670103093
weights = [0.02617801047120419, 0.02617801047120419, 0.02617801047120419, 1.6666666666666667, 1.6666666666666667, 1.6666666666666667, 0.8464223385689353, 0.8464223385689353, 0.8464223385689353, 0.02617801047120419]
C1 error rate = 0.25392670157068065
C2 error rate = 0.5
C3 error rate = 0.010471204188481676
weights = [1.25, 1.25, 1.25, 0.8421516754850089, 0.8421516754850089, 0.8421516754850089, 0.42768959435626097, 0.42768959435626097, 0.42768959435626097, 1.25]
final ensemble classifier = 1.7380493449176364 * C1 + 2.07683056968926 * C2 + 2.2742999172498486 * C3
1 | ada.predict(X) |
array([ 1., 1., 1., -1., -1., -1., 1., 1., 1., -1.])
1 | ada.get_error_rate(X, y, weights) |
0.0
In the end, we get a strong classifier $C^*(x)$
that can achieve 0 error of classification where
The classification result of $C^*(x)$ is [1, 1, 1, -1, -1, -1, 1, 1, 1, -1]