| Variable | Median | Mean | SD | Min | Max | NAs |
|---|---|---|---|---|---|---|
| age | 39 | 41.2 | 12.7 | 19 | 80 | 0 |
| female | 1 | 0.5 | 0.5 | 0 | 1 | 0 |
| income | 52,014 | 50,936.5 | 20,137.5 | −5,183 | 114,278 | 0 |
| kids | 1 | 1.3 | 1.4 | 0 | 7 | 0 |
| own_home | 0 | 0.5 | 0.5 | 0 | 1 | 0 |
| subscribe | 0 | 0.1 | 0.3 | 0 | 1 | 0 |
Using unsupervised learning techniques to divide customers into meaningful groups.
| Variable | Median | Mean | SD | Min | Max | NAs |
|---|---|---|---|---|---|---|
| age | 39 | 41.2 | 12.7 | 19 | 80 | 0 |
| female | 1 | 0.5 | 0.5 | 0 | 1 | 0 |
| income | 52,014 | 50,936.5 | 20,137.5 | −5,183 | 114,278 | 0 |
| kids | 1 | 1.3 | 1.4 | 0 | 7 | 0 |
| own_home | 0 | 0.5 | 0.5 | 0 | 1 | 0 |
| subscribe | 0 | 0.1 | 0.3 | 0 | 1 | 0 |

| cluster | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 100 | 55 | 60,107 | 0 | 79% | 8% | 51% |
| 2 | 101 | 38 | 56,016 | 3 | 41% | 5% | 77% |
| 3 | 99 | 28 | 28,270 | 1 | 21% | 27% | 28% |



| cluster | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 99 | 28 | 28,270 | 1 | 21% | 27% | 28% |
| 2 | 101 | 38 | 56,016 | 3 | 41% | 5% | 77% |
| 3 | 100 | 55 | 60,107 | 0 | 79% | 8% | 51% |
| cluster | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 169 | 41 | 52,298 | 1.00 | 0.51 | 0.15 | 0.52 |
| 2 | 63 | 48 | 73,241 | 0.00 | 0.52 | 0.08 | 0.63 |
| 3 | 68 | 25 | 22,979 | 1.00 | 0.32 | 0.15 | 0.43 |

| segment | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 181 | 45 | 57,037 | 0 | 60% | 13% | 49% |
| 2 | 59 | 38 | 54,509 | 3 | 44% | 2% | 83% |
| 3 | 60 | 25 | 23,116 | 1 | 10% | 27% | 32% |

| segment | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 96 | 31 | 28,793 | 1 | 21% | 28% | 26% |
| 2 | 103 | 54 | 60,168 | 0 | 80% | 9% | 52% |
| 3 | 101 | 38 | 55,847 | 3 | 39% | 4% | 77% |

| segment | n | age | income | kids | own_home | subscribe | female |
|---|---|---|---|---|---|---|---|
| 1 | 104 | 55 | 58,215 | 0 | 77% | 11% | 42% |
| 2 | 126 | 37 | 55,613 | 2 | 33% | 8% | 72% |
| 3 | 70 | 25 | 24,872 | 1 | 29% | 27% | 31% |
Overlaps could be because of the choices I made in transforming to categorical data. For simplicity, I chose to divide the data at the median line, but it might be better another way.
# Create a copy of our base dataset
lat_clust_data <- df
# Transform each variable into a categorical (or factor) variable
lat_clust_data$age <- factor(ifelse(lat_clust_data$age < median(lat_clust_data$age), 1, 2))
lat_clust_data$income <- factor(ifelse(lat_clust_data$income < median(lat_clust_data$income), 1, 2))
lat_clust_data$kids <- factor(ifelse(lat_clust_data$kids < median(lat_clust_data$kids), 1, 2))
lat_clust_data$own_home <- factor(lat_clust_data$own_home)
lat_clust_data$female <- factor(lat_clust_data$female)
lat_clust_data$subscribe <- factor(lat_clust_data$subscribe)



![]()
