1. Explore the Data

  1. Set the working directory to “C:/Workshop/Data”
setwd("C:/Workshop/Data")
  1. Load the Rates.csv data into a dataframe called “policies”
policies <- read.csv("Rates.csv")
  1. Load the dplyr package
library(dplyr)
  1. Convert all policies data into numeric values with the following code.
policiesNumeric <- policies %>%
  mutate(Gender = as.numeric(Gender)) %>%
  select(-State)
  1. Inspect the results
head(policiesNumeric)
##   Gender State.Rate Height Weight      BMI Age       Rate
## 1      2 0.10043368    184   67.8 20.02599  77 0.33200000
## 2      2 0.14172319    163   89.4 33.64824  82 0.86914779
## 3      2 0.09080315    170   81.2 28.09689  31 0.01000000
## 4      2 0.11997276    175   99.7 32.55510  39 0.02153204
## 5      2 0.11034460    184   72.1 21.29608  68 0.14975000
## 6      2 0.16292470    166   98.4 35.70910  64 0.21123703
  1. Question: Why do you think we are converting all policy data to numeric values before using our clustering algorithms?

2. Create Equal Size Clusters using a Single Variable

  1. Load the RColorBrewer package
library(RColorBrewer)
  1. Create a “Set2” color palette with 3 colors
palette <- brewer.pal(3, "Set2")
  1. Cut the policies mortality rates into 3 equaldistant clusters
cuts <- cut(policiesNumeric$Rate, 3)
  1. Create a scatterplot matrix colored by mortality rate clusters.
plot(
  x = policiesNumeric, 
  col = palette[cuts],
  pch = 19)

  1. Question: How might these three market segments (based on Mortality Rates) be useful?

3. Create Clusters using k-Means

  1. Set the seed to 42 to make randomness reproducable.
set.seed(42)
  1. Create K-means clusters.
kClusters <- kmeans(
  x = policiesNumeric, 
  centers = 3, 
  nstart = 10)
  1. Create a scatterplot matrix colored by cluster.
plot(
  x = policies, 
  col = palette[kClusters$cluster])

  1. Create a scatterplot of BMI vs Age colored by cluster.
plot(
  x = policiesNumeric$BMI,
  y = policiesNumeric$Age, 
  col = palette[kClusters$cluster])

  1. Plot the centroids of the clusters.
plot(
  x = policiesNumeric$BMI,
  y = policiesNumeric$Age, 
  col = palette[kClusters$cluster])
  
points(
  x = kClusters$centers[, "BMI"], 
  y = kClusters$centers[, "Age"],
  pch = 4, 
  lwd = 4, 
  col = "blue")

  1. Plot the labels of the clusters.
plot(
  x = policiesNumeric$BMI,
  y = policiesNumeric$Age, 
  col = palette[kClusters$cluster])

text(
  x = kClusters$centers[, "BMI"], 
  y = kClusters$centers[, "Age"],
  labels = c(1, 2, 3),
  cex = 4, 
  lwd = 4, 
  col = "blue")