| variable | Median | Mean | SD | Min | Max | N | NA |
|---|---|---|---|---|---|---|---|
| engine_size | 4 | 4.06 | 1.53 | 1.00 | 7.00 | 100 | 0 |
| horsepower | 4 | 4.11 | 1.44 | 1.00 | 7.00 | 100 | 0 |
| intent | 5 | 4.48 | 1.45 | 1.00 | 7.00 | 100 | 0 |
| sound_system | 4 | 3.51 | 1.13 | 1.00 | 7.00 | 100 | 0 |
| torque | 4 | 3.75 | 1.39 | 1.00 | 7.00 | 100 | 0 |
Using logistic regression approaches to solve a retention question.

Auto Case
| variable | Median | Mean | SD | Min | Max | N | NA |
|---|---|---|---|---|---|---|---|
| engine_size | 4 | 4.06 | 1.53 | 1.00 | 7.00 | 100 | 0 |
| horsepower | 4 | 4.11 | 1.44 | 1.00 | 7.00 | 100 | 0 |
| intent | 5 | 4.48 | 1.45 | 1.00 | 7.00 | 100 | 0 |
| sound_system | 4 | 3.51 | 1.13 | 1.00 | 7.00 | 100 | 0 |
| torque | 4 | 3.75 | 1.39 | 1.00 | 7.00 | 100 | 0 |
| column | intent | engine_size | horsepower | torque | sound_system |
|---|---|---|---|---|---|
| intent | 1.00 | 0.68 | 0.68 | 0.69 | 0.52 |
| engine_size | 0.68 | 1.00 | 0.91 | 0.78 | −0.15 |
| horsepower | 0.68 | 0.91 | 1.00 | 0.68 | −0.08 |
| torque | 0.69 | 0.78 | 0.68 | 1.00 | −0.10 |
| sound_system | 0.52 | −0.15 | −0.08 | −0.10 | 1.00 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | −1.83 | 0.18 | −9.95 | 0.00 | * |
| engine_size | 0.17 | 0.08 | 2.25 | 0.03 | * |
| horsepower | 0.29 | 0.07 | 4.33 | 0.00 | * |
| torque | 0.43 | 0.05 | 9.52 | 0.00 | * |
| sound_system | 0.79 | 0.04 | 22.45 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.93 | 0.93 | 0.39 | 317.54 | 0.00 | 4 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | −1.83 | 0.18 | −9.95 | 0.00 | * |
| engine_size | 0.17 | 0.08 | 2.25 | 0.03 | * |
| horsepower | 0.29 | 0.07 | 4.33 | 0.00 | * |
| torque | 0.43 | 0.05 | 9.52 | 0.00 | * |
| sound_system | 0.79 | 0.04 | 22.45 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.93 | 0.93 | 0.39 | 317.54 | 0.00 | 4 |
VIF
| variable | value |
|---|---|
| engine_size | 8.66 |
| horsepower | 6.22 |
| torque | 2.58 |
| sound_system | 1.04 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 1.87 | 0.31 | 6.10 | 0.00 |
| engine_size | 0.64 | 0.07 | 9.13 | 0.00 |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.46 | 0.45 | 1.07 | 83.28 | 0.00 | 1 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 2.12 | 0.41 | 5.21 | 0.00 |
| sound_system | 0.67 | 0.11 | 6.11 | 0.00 |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.28 | 0.27 | 1.24 | 37.29 | 0.00 | 1 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | −1.36 | 0.25 | −5.41 | 0.00 | * |
| engine_size | 0.73 | 0.04 | 19.89 | 0.00 | * |
| sound_system | 0.82 | 0.05 | 16.45 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.86 | 0.85 | 0.55 | 291.39 | 0.00 | 2 |
VIF
| variable | value |
|---|---|
| engine_size | 1.02 |
| sound_system | 1.02 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | −1.70 | 0.20 | −8.61 | 0.00 | * |
| engine_size | 0.45 | 0.04 | 9.95 | 0.00 | * |
| torque | 0.41 | 0.05 | 8.27 | 0.00 | * |
| sound_system | 0.81 | 0.04 | 21.29 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.92 | 0.91 | 0.43 | 352.03 | 0.00 | 3 |
VIF
| variable | value |
|---|---|
| engine_size | 2.56 |
| torque | 2.53 |
| sound_system | 1.02 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | −1.44 | 0.25 | −5.76 | 0.00 | * |
| engine_size | 0.56 | 0.09 | 6.24 | 0.00 | * |
| horsepower | 0.20 | 0.09 | 2.18 | 0.03 | * |
| sound_system | 0.80 | 0.05 | 16.37 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.86 | 0.86 | 0.54 | 203.35 | 0.00 | 3 |
VIF
| variable | value |
|---|---|
| engine_size | 6.19 |
| horsepower | 6.10 |
| sound_system | 1.04 |
| column | profit_20 | app | age | inc | tenure | region |
|---|---|---|---|---|---|---|
| profit_20 | 1.00 | 0.01 | 0.14 | 0.15 | 0.17 | 0.00 |
| app | 0.01 | 1.00 | −0.17 | 0.09 | −0.08 | 0.01 |
| age | 0.14 | −0.17 | 1.00 | −0.08 | 0.42 | −0.03 |
| inc | 0.15 | 0.09 | −0.08 | 1.00 | 0.03 | 0.03 |
| tenure | 0.17 | −0.08 | 0.42 | 0.03 | 1.00 | −0.01 |
| region | 0.00 | 0.01 | −0.03 | 0.03 | −0.01 | 1.00 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 19.82 | 6.70 | 2.96 | 0.00 | * |
| app | 18.77 | 5.84 | 3.21 | 0.00 | * |
| region_1200 | 15.10 | 6.55 | 2.30 | 0.02 | * |
| region_1300 | 11.21 | 8.15 | 1.37 | 0.17 | |
| tenure | 0.92 | 0.23 | 4.02 | 0.00 | * |
| profit_20 | 0.83 | 0.01 | 118.55 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.36 | 0.36 | 312.04 | 2,968.07 | 0.00 | 5 |
VIF
| variable | value |
|---|---|
| app | 1.01 |
| region_1200 | 2.04 |
| region_1300 | 2.03 |
| tenure | 1.04 |
| profit_20 | 1.04 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 32.96 | 3.23 | 10.20 | 0.00 | * |
| app | 19.44 | 5.83 | 3.33 | 0.00 | * |
| tenure | 0.90 | 0.23 | 3.94 | 0.00 | * |
| profit_20 | 0.83 | 0.01 | 118.81 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.36 | 0.36 | 312.06 | 4,944.31 | 0.00 | 3 |
VIF
| variable | value |
|---|---|
| app | 1.01 |
| tenure | 1.04 |
| profit_20 | 1.04 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 0.76 | 0.00 | 224.82 | 0.00 | * |
| app | 0.03 | 0.01 | 5.10 | 0.00 | * |
| tenure | 0.01 | 0.00 | 25.57 | 0.00 | * |
| profit_20 | 0.00 | 0.00 | 7.17 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.03 | 0.03 | 0.37 | 272.26 | 0.00 | 3 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | 2.81 | 0.74 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | 1.00 | 0.50 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | 2.81 | 0.74 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | 1.00 | 0.50 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 0.92 | 0.03 | 30.75 | 0.00 | 2.51 | 0.72 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 1.18 | 0.16 | 7.38 | 0.00 | 3.25 | 0.76 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
In the initial logistic regression model, 83% of cases actually were retained. This resulted in a lopsided model that predicted all cases in the test set were retained, when in fact 16% were not. To compensate, I trained a new model using a test set that had equal proportions of retained and churned customers. This was accomplished by undersampling retained cases.
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 0.08 | 0.02 | 3.40 | 0.00 | 1.08 | 0.52 | * |
| app | 0.07 | 0.02 | 3.21 | 0.00 | 1.08 | 0.52 | * |
| profit_20 | 0.14 | 0.03 | 5.44 | 0.00 | 1.15 | 0.54 | * |
| tenure | 0.46 | 0.03 | 17.20 | 0.00 | 1.58 | 0.61 | * |
