1. Import the OS library.
import os
2. Set the working directory to "C:\Workshop\Data".
os.chdir("C:\Workshop\Data")
3. Import the pandas library as "pd".
import pandas as pd
4. Read the Iris CSV file into a data frame named iris.
iris = pd.read_csv("Iris.csv")
1. Inspect the iris data set with the head
function.
iris.head()
2. Import the matplotlib.pyplot library as "plt".
import matplotlib.pyplot as plt
3. Create a color palette containing three colors for setosa, versicolor, and virginica.
palette = {
'setosa':'#fb8072',
'versicolor':'#80b1d3',
'virginica':'#b3de69'}
4. Map the colors to each species of iris flower.
colors = iris.Species.apply(lambda x:palette[x])
5. Create a scatterplot matrix of the iris data set colored by species.
Note: The semicolon at the end returns only plot and no text output.
pd.plotting.scatter_matrix(
frame = iris,
color = colors,
alpha = 1,
s = 100,
diagonal = "none");
6. Create a scatterplot of petal width (on the y-axis) vs. petal length (on the x-axis) colored by species.
plt.scatter(
x = iris.Petal_Length,
y = iris.Petal_Width,
color = colors)
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.show()
1. Create a data frame named X containing all features (i.e. the first four columns).
X = iris.iloc[:, 0:4]
2. Insect the features data frame X using the head
function.
X.head()
3. Create a series named y containing the Species labels.
y = iris.Species
4. Inspect the series of labels y using the head
function.
y.head()
1. Import the numpy library as "np".
import numpy as np
2. Set the random number seed to 123.
np.random.seed(123)
3. Import the train_test_split function from sklearn.
from sklearn.model_selection import train_test_split
4. Randomly sample 100 rows for the training set and 50 rows for the test set.
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
train_size = 0.67,
test_size = 0.33)
5. Inspect the shape of the training and test sets using their shape
property.
print("X_train: ", X_train.shape)
print("y_train: ", y_train.shape)
print("X_test: ", X_test.shape)
print("y_test: ", y_test.shape)
6. Question: How do you interpret these shapes in terms of columns and rows?
1. Import KNN classifier class from sklearn.
from sklearn.neighbors import KNeighborsClassifier
2. Create a KNN model with k = 3.
knn_model = KNeighborsClassifier(
n_neighbors = 3)
3. Train the model using the training data.
knn_model.fit(
X = X_train,
y = y_train)
4. Predict the labels of the test set using the model.
knn_predictions = knn_model.predict(X_test)
5. Create a confusion matrix for the predictions.
pd.crosstab(
y_test,
knn_predictions,
rownames = ['Reference'],
colnames = ['Predicted'])
6. Import the accuracy_score function from sklearn.
from sklearn.metrics import accuracy_score
7. Get the prediction accuracy.
knn_score = accuracy_score(
y_true = y_test,
y_pred = knn_predictions)
8. Inspect the prediction accuracy.
print(knn_score)
9. Visualize the knn predictions with correct prediction in black and incorrect predictions in red.
plt.scatter(
x = X_test.Petal_Length,
y = X_test.Petal_Width,
color = np.where(
y_test == knn_predictions,
'black',
'red'))
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.show()
10. Question: Why do you think these two points were misclassified?
1. Import the decision tree classifier from sklearn.
from sklearn.tree import DecisionTreeClassifier
2. Create a decision tree classifier with max_depth = 3.
tree_model = DecisionTreeClassifier(
max_depth = 3)
3. Train the model using the training data.
tree_model.fit(
X = X_train,
y = y_train)
4. Import the tree visualizer from sklearn.
from sklearn.tree import export_graphviz
5. Visualize the decision tree.
import graphviz
tree_graph = export_graphviz(
decision_tree = tree_model,
feature_names = list(X_train.columns.values),
class_names = list(y_train.unique()),
out_file = None)
graphviz.Source(tree_graph)
6. Question: Are you able to read and follow the logic of this decision tree?
7. Predict the labels of the test set with the model.
tree_predictions = tree_model.predict(X_test)
8. Get the prediction accuracy.
tree_score = accuracy_score(
y_true = y_test,
y_pred = tree_predictions)
9. Inspect the prediction accuracy.
print(tree_score)
10. Visualize the prediction errors (in red).
plt.scatter(
x = X_test.Petal_Length,
y = X_test.Petal_Width,
color = np.where(
y_test == tree_predictions,
'black',
'red'))
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.show()
1. Import the standard scaler from sklearn.
from sklearn.preprocessing import StandardScaler
2. Create a standard scaler.
scaler = StandardScaler()
3. Fit the scaler to all training data (i.e. X).
scaler.fit(X)
4. Scale the training and test set.
X_train_scaled = scaler.transform(X_train)
X_test_scaled = scaler.transform(X_test)
5. Import the neural network classifier from sklearn.
from sklearn.neural_network import MLPClassifier
6. Create a neural network classifier with 4 hidden tanh layers.
neural_model = MLPClassifier(
hidden_layer_sizes = (4),
activation = "tanh",
max_iter = 2000)
7. Train the model using the training data.
neural_model.fit(
X = X_train_scaled,
y = y_train)
8. Predict the test set labels using the model.
neural_predictions = neural_model.predict(X_test_scaled)
9. Get the prediction accuracy.
neural_score = accuracy_score(
y_true = y_test,
y_pred = neural_predictions)
10. Inspect the prediction accuracy.
print(neural_score)
11. Visualize the prediction errors (in red).
plt.scatter(
x = X_test.Petal_Length,
y = X_test.Petal_Width,
color = np.where(
y_test == neural_predictions,
'black',
'red'))
plt.xlabel("Petal Length")
plt.ylabel("Petal Width")
plt.show()
1. Compare the accuracy of all three models.
print("KNN: ", knn_score)
print("Tree:", tree_score)
print("NNet:", neural_score)
2. Question: Which of these three classifiers would you choose? Why?