Analyzing Survey Data

Overview

A basic overview of best practices in inspecting, cleaning, formatting, and analyzing survey data.

Presented by:
Larry Vincent,
Professor of the Practice
of Marketing
Presented to:
MKT 512
January 27, 2026

Inspect. Clean. Analyze

Inspect

ID Q1 Q2 Q3 Q4 Q5 Duration Open-End
001 4 3 5 4 3 8:42 The staff was friendly but the wait was too long
002 3 3 3 3 3 2:14 good
003 5 4 4 5 4 9:17 I love coming here on weekends with my family
004 2 5 1 4 2 0:47 asdfasdf
005 4 4 5 3 4 7:53 Coffee is great, parking is terrible

To delete,
or not to delete.
That is the question.

Strategies

Strategy How It Works When to Use
Flag with dummy variable Create a column (e.g., suspect = 1) to mark questionable respondents. Keep them in the dataset but exclude from primary analysis. When you’re uncertain about data quality and want to run sensitivity checks.
Quarantine and compare Run your analysis twice — once with all data, once excluding flagged cases. See if conclusions change. When sample size is tight and you can’t afford to lose cases without knowing the cost.
Weight down rather than delete Assign lower weights to suspect respondents rather than removing them entirely. When you have a weighting scheme and want to reduce influence without elimination.
Segment and report separately Treat suspect respondents as their own group. Report their patterns alongside the clean sample. When “bad” responses might actually reflect a real population (e.g., disengaged customers).
Set thresholds in advance Define exclusion rules before looking at the data (e.g., “anyone completing in under 2 minutes”). Document in your analysis plan. Always. Prevents post-hoc fishing for rules that conveniently support your hypothesis.

Which metric should you use?

Metrics

Measure What It Tells You When It Misleads
Counts How many people said X When you forget that n=6 isn’t a trend
Percentages What share said X When you ignore the base (35% of 20 ≠ 35% of 200)
Mean The mathematical average When the distribution is skewed or bimodal
Median The midpoint response When you care about what’s happening at the extremes—the median ignores them entirely
Standard Deviation How much responses vary When you report it without context (SD of what?)
Top-Two-Box % who gave the highest responses When you have a polarized distribution. A high T2B can hide a large group of detractors

Count data

Proportions

The mean satisfaction rating was 3.55.

On a five-point likert scale.

Distributions

Stats

Distribution Mean SD T2B
Bimodal 3.55 1.50 67%
Left Skew 3.55 1.36 59%
Right Skew 3.55 1.64 57%
Normal 3.55 0.97 54%

Bimodal Data

Metric
Product
Cold Brew Other
Mean 4.21 3.12
SD 1.02 1.31
T2B 78% 44%
N 80 120

Celebrity Analysis

Sentiment by demographic

Sentiment by viewing profile