| estimate | estimate1 | estimate2 | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| −5.88 | 110.79 | 116.67 | −1.21 | 0.23 | −15.39 | 3.63 |
Leveraging linear regression to solve a critical customer marketing challenge.
![]()
app is the experimental treatment: a 1 means the customer used the app and a 0 means they did not

| estimate | estimate1 | estimate2 | statistic | p.value | conf.low | conf.high |
|---|---|---|---|---|---|---|
| −5.88 | 110.79 | 116.67 | −1.21 | 0.23 | −15.39 | 3.63 |
| Statistic | N | Mean | SD | Min | Max | NA |
|---|---|---|---|---|---|---|
| age | 31,634 | 4.05 | 1.64 | 1.00 | 7.00 | 8,289 |
| app | 31,634 | 0.12 | 0.33 | 0.00 | 1.00 | 0 |
| id | 31,634 | 15,817.50 | 9,132.09 | 1.00 | 31,634.00 | 0 |
| inc | 31,634 | 5.46 | 2.35 | 1.00 | 9.00 | 8,261 |
| profit_20 | 31,634 | 111.50 | 272.84 | -221.00 | 2,071.00 | 0 |
| profit_21 | 31,634 | 144.83 | 389.99 | -5,643.00 | 27,086.00 | 5,238 |
| region | 31,634 | 1,203.19 | 47.91 | 1,100.00 | 1,300.00 | 0 |
| tenure | 31,634 | 10.16 | 8.45 | 0.16 | 41.16 | 0 |
| Dependent variable: | |
| profit_20 | |
| App Only | |
| app | 5.88 (4.69) |
| Constant | 110.79*** (1.64) |
| Observations | 31,634 |
| R2 | 0.0000 |
| Adjusted R2 | 0.0000 |
| Residual Std. Error | 272.84 (df = 31632) |
| F Statistic | 1.57 (df = 1; 31632) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 |
| Dependent variable: | ||
| profit_20 | ||
| App Only | App + Age | |
| (1) | (2) | |
| app | 5.88 (4.69) | 27.19*** (5.52) |
| age | 25.86*** (1.12) | |
| Constant | 110.79*** (1.64) | 17.08*** (5.06) |
| Observations | 31,634 | 23,345 |
| R2 | 0.0000 | 0.02 |
| Adjusted R2 | 0.0000 | 0.02 |
| Residual Std. Error | 272.84 (df = 31632) | 278.29 (df = 23342) |
| F Statistic | 1.57 (df = 1; 31632) | 264.95*** (df = 2; 23342) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
![]()
Create a new dummy variable age_exists that serves as a predictor. If it has a statistically significant impact on the results, we should be cautious about dropping missing values.

| Dependent variable: | ||
| profit_20 | ||
| App Only | App + Age Exists | |
| (1) | (2) | |
| app | 5.88 (4.69) | 3.56 (4.68) |
| age_exists | 52.14*** (3.48) | |
| Constant | 110.79*** (1.64) | 72.59*** (3.03) |
| Observations | 31,634 | 31,634 |
| R2 | 0.0000 | 0.01 |
| Adjusted R2 | 0.0000 | 0.01 |
| Residual Std. Error | 272.84 (df = 31632) | 271.88 (df = 31631) |
| F Statistic | 1.57 (df = 1; 31632) | 113.14*** (df = 2; 31631) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| Dependent variable: | ||
| profit_20 | ||
| Age Zero | Age Avg. | |
| (1) | (2) | |
| app | 19.65*** (4.69) | 19.65*** (4.69) |
| age_exists | -51.85*** (5.60) | 51.74*** (3.45) |
| age_zero | 25.60*** (1.09) | |
| age_avg | 25.60*** (1.09) | |
| Constant | 70.93*** (3.00) | -32.66*** (5.38) |
| Observations | 31,634 | 31,634 |
| R2 | 0.02 | 0.02 |
| Adjusted R2 | 0.02 | 0.02 |
| Residual Std. Error (df = 31630) | 269.52 | 269.52 |
| F Statistic (df = 3; 31630) | 262.12*** | 262.12*** |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| Dependent variable: | |||
| profit_20 | |||
| Age Zero | Age Avg | Age RF | |
| (1) | (2) | (3) | |
| app | 19.65*** (4.69) | 19.65*** (4.69) | 27.31*** (4.69) |
| age_exists | -51.85*** (5.60) | 51.74*** (3.45) | 47.47*** (3.44) |
| age_zero | 25.60*** (1.09) | ||
| age_avg | 25.60*** (1.09) | ||
| age_rf | 30.61*** (1.07) | ||
| Constant | 70.93*** (3.00) | -32.66*** (5.38) | -49.64*** (5.22) |
| Observations | 31,634 | 31,634 | 31,634 |
| R2 | 0.02 | 0.02 | 0.03 |
| Adjusted R2 | 0.02 | 0.02 | 0.03 |
| Residual Std. Error (df = 31630) | 269.52 | 269.52 | 268.44 |
| F Statistic (df = 3; 31630) | 262.12*** | 262.12*** | 349.40*** |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | ||
| Dependent variable: | ||
| profit_20 | ||
| App Only | App + Inc | |
| (1) | (2) | |
| app | 5.88 (4.69) | 16.17*** (4.64) |
| age_exists | 9.51 (8.20) | |
| age_rf | 31.90*** (1.06) | |
| inc_exists | 35.17*** (8.21) | |
| inc_rf | 21.90*** (0.74) | |
| Constant | 110.79*** (1.64) | -169.90*** (6.53) |
| Observations | 31,634 | 31,634 |
| R2 | 0.0000 | 0.06 |
| Adjusted R2 | 0.0000 | 0.06 |
| Residual Std. Error | 272.84 (df = 31632) | 264.74 (df = 31628) |
| F Statistic | 1.57 (df = 1; 31632) | 394.24*** (df = 5; 31628) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| Dependent variable: | ||
| profit_20 | ||
| App Only | App + Region | |
| (1) | (2) | |
| app | 5.88 (4.69) | 15.79*** (4.64) |
| age_exists | 9.21 (8.20) | |
| age_rf | 32.09*** (1.06) | |
| inc_exists | 35.32*** (8.21) | |
| inc_rf | 21.39*** (0.76) | |
| region1200 | 13.78*** (5.14) | |
| region1300 | 5.94 (6.28) | |
| Constant | 110.79*** (1.64) | -179.11*** (7.70) |
| Observations | 31,634 | 31,634 |
| R2 | 0.0000 | 0.06 |
| Adjusted R2 | 0.0000 | 0.06 |
| Residual Std. Error | 272.84 (df = 31632) | 264.71 (df = 31626) |
| F Statistic | 1.57 (df = 1; 31632) | 282.95*** (df = 7; 31626) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| Dependent variable: | ||
| profit_20 | ||
| App Only | App + All | |
| (1) | (2) | |
| app | 5.88 (4.69) | 15.93*** (4.61) |
| age_exists | 2.21 (8.15) | |
| age_rf | 21.69*** (1.17) | |
| inc_exists | 32.24*** (8.16) | |
| inc_rf | 19.97*** (0.75) | |
| region1200 | 15.19*** (5.10) | |
| region1300 | 6.05 (6.24) | |
| tenure | 4.07*** (0.20) | |
| Constant | 110.79*** (1.64) | -164.75*** (7.69) |
| Observations | 31,634 | 31,634 |
| R2 | 0.0000 | 0.07 |
| Adjusted R2 | 0.0000 | 0.07 |
| Residual Std. Error | 272.84 (df = 31632) | 262.95 (df = 31625) |
| F Statistic | 1.57 (df = 1; 31632) | 304.08*** (df = 8; 31625) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 | |
| Dependent variable: | |
| profit_21 | |
| demographics | 47.53*** (5.98) |
| Constant | 106.86*** (5.34) |
| Observations | 26,396 |
| R2 | 0.002 |
| Adjusted R2 | 0.002 |
| Residual Std. Error | 389.54 (df = 26394) |
| F Statistic | 63.19*** (df = 1; 26394) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 |
| Dependent variable: | |
| profit_21 | |
| app | 18.77*** (5.84) |
| region1200 | 15.10** (6.55) |
| region1300 | 11.21 (8.15) |
| tenure | 0.92*** (0.23) |
| profit_20 | 0.83*** (0.01) |
| Constant | 19.82*** (6.70) |
| Observations | 26,396 |
| R2 | 0.36 |
| Adjusted R2 | 0.36 |
| Residual Std. Error | 312.04 (df = 26390) |
| F Statistic | 2,968.07*** (df = 5; 26390) |
| Note: | Significance: * p < 0.1, ** p < 0.05, *** p < 0.01 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 32.96 | 3.23 | 10.20 | 0.00 | * |
| app | 19.44 | 5.83 | 3.33 | 0.00 | * |
| tenure | 0.90 | 0.23 | 3.94 | 0.00 | * |
| profit_20 | 0.83 | 0.01 | 118.81 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.36 | 0.36 | 312.06 | 4,944.31 | 0.00 | 3 |
VIF
| variable | value |
|---|---|
| app | 1.01 |
| tenure | 1.04 |
| profit_20 | 1.04 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 0.76 | 0.00 | 224.82 | 0.00 | * |
| app | 0.03 | 0.01 | 5.10 | 0.00 | * |
| tenure | 0.01 | 0.00 | 25.57 | 0.00 | * |
| profit_20 | 0.00 | 0.00 | 7.17 | 0.00 | * |
Model Fit
| r.squared | adj.r.squared | sigma | statistic | p.value | df |
|---|---|---|---|---|---|
| 0.03 | 0.03 | 0.37 | 272.26 | 0.00 | 3 |
| term | estimate | std.error | statistic | p.value | sig |
|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | 2.81 | 0.74 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | 1.00 | 0.50 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 1.03 | 0.02 | 42.71 | 0.00 | 2.81 | 0.74 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 0.00 | 0.00 | 7.38 | 0.00 | 1.00 | 0.50 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 0.92 | 0.03 | 30.75 | 0.00 | 2.51 | 0.72 | * |
| app | 0.23 | 0.05 | 4.81 | 0.00 | 1.26 | 0.56 | * |
| profit_20 | 1.18 | 0.16 | 7.38 | 0.00 | 3.25 | 0.76 | * |
| tenure | 0.06 | 0.00 | 24.78 | 0.00 | 1.06 | 0.51 | * |
In the initial logistic regression model, 83% of cases actually were retained. This resulted in a lopsided model that predicted all cases in the test set were retained, when in fact 16% were not. To compensate, I trained a new model using a test set that had equal proportions of retained and churned customers. This was accomplished by undersampling retained cases.
| term | estimate | std.error | statistic | p.value | odds | p | sig |
|---|---|---|---|---|---|---|---|
| (Intercept) | 0.08 | 0.02 | 3.40 | 0.00 | 1.08 | 0.52 | * |
| app | 0.07 | 0.02 | 3.21 | 0.00 | 1.08 | 0.52 | * |
| profit_20 | 0.14 | 0.03 | 5.44 | 0.00 | 1.15 | 0.54 | * |
| tenure | 0.46 | 0.03 | 17.20 | 0.00 | 1.58 | 0.61 | * |