Market Research Surveys and Predictive Modeling: A Case Study

/ / Marketing

Written by Joseph Ogrodowczyk, edited by Miranda Mills, Carol Huang

Image for post

Joseph Ogrodowczyk is the Director of Analytic Solutions at FocusKPI, Inc. With more than 18 years of experience in data analytics, Joseph is an expert at providing analytical solutions and using actionable data to produce quality predictive models.

Market Research Survey

While in academia, I studied the market survey strategy. I applied that strategy during the data collection phase of my doctorate research. I was surveying from among a population universe defined as my students. (Good for expenses but limiting in drawing conclusions to a larger population)

My research examined the willingness-to-pay for specific results from environmental conservation. For example, how much would someone be willing to pay to see 5 miles farther from a mountain top?

I focused on calibrating survey responses using a Certainty scale. A sample survey question asked the students if they would be willing to pay $5 to see 5 miles farther from the top of Mount Washington. This was followed by a Certainty scale question such as “On a scale of 1 to 10 where 10 is very certain, how certain are you about your answer?”.

A week after I collected the answers to these questions, I passed out envelopes and asked the students to take out their wallets and put in the $5 (Yes, this is a time when people still carried cash). I recorded the participants and how much was given. These data points were correlated with the Certainty scale question. I found that for modeling purposes a response of 7 or higher on the Certainty scale could be considered likely to actually pay. The dependent variable for the model could be dichotomous.

“Market research surveys are a useful tool when you need very specific data, especially with a well-designed survey and a robust sampling population.”

In my case, the sampling population (college students taking economics classes) was too limited to draw conclusions to the greater population. However, it did indicate that the methodology was robust.

Case Study in the Real Business World

The client was a financial institution and its geographic marketing area spanned several states. They offered a variety of financial products including first-time checking, brokerage and retirement services, and personal and business lending. As is the case in the financial industry, the customers were motivated by personal views of finance and attitudes about money in general.

Business Challenge

The client wanted to create bundles of products and services. These bundles would be tailored to marketing segments based on an individual customer’s personal relationship to money and finances. They had 3.1 million customers and no data on that relationship. Thus, a market research survey would be used to collect the data as needed.

The task of Analytics would be to assign the new customers segments created from the survey to all of the customers in the database.

The population universe of the survey consisted of 1000 randomly selected non-customers from the general population across the states where the client did business (Group A) and 1000 customers of the client randomly selected from the customer database (Group B).

The survey contained open-ended demographic variables to collect data for quality purposes.

Business Objective:

Segment all customers in the client database into the market segments determined by the survey responses


Step 1: Comparing the respondents

The first analysis completed compared Group A (non-customers) with Group B (customers). The results of the analysis would not have been applicable to the general population (and therefore any prospective customers) if the responses from Group A were statistically different then those of Group B. All responses (attitudinal and demographic) were examined. Group A and Group B were statistically no different.

Step 2: Validity of responses

The next step was to ensure that the data collected was accurate. For this, I used only Group B (customers). The answers to the demographic questions were compared with similar variables from the database. The survey questions and database variables were chosen during the survey creation process. The responses were statistically no different than the database variables.

Step 3: Estimating the first model

The market segments were created using the data from Group A (non-customers) and Group B (customers). Since the groups are statistically no different from each other, I now used only Group B to explain the assigned market segment and included all of the survey questions. However, in this model, I substituted the database values for the responses to the demographic survey questions. I found a statistically significant model predicting the market segments.

Step 4: Estimating the second model

However, most of the client’s customers were not included in the survey and so I needed to find data to act as a substitute for the attitudinal responses. My universe could only include Group B.

I started by converting some of the questions into categorical variables and correlated those with database variables. Then I ran the remainder through factor analysis to reduce the number of variables to around a dozen. I created models that predicted each of these factors using database variables.

The output from this step was a set of models based solely on database variables of Group B that predicted the responses to the attitudinal survey questions.

Step 5: Combining the two models

Recall that the predictors in the first model were database variables and survey responses. I substituted the set of models that predicted the survey responses for all of the response data in the first model. This combination yielded a final model that predicted the attitudinal market segments with only on database variables. I scored the database.


The actionable insight that I provided arose from analyzing the scored database. The most important predictors differed by market segment. This reflected the assumption that all consumers in a segment had similar attitudes and emotions about finances and that these attitudes differed by segment. These sets of important predictors could drive the messaging for each segment.

There are basically two methods for creating market segments. The first is to use small amounts of targeted data collected the data and predict the population is large. This is usually accomplished using market research surveys. The second way is to use vast amounts of data and explore the data to find insights. This is usually accomplished in a cloud-type database and artificial-intelligent algorithms. Which method is chosen depends on the budget, timing, and business challenges.


I enjoyed this project because I learned a lot about leveraging models for non-typical uses. There are definitely benefits to thinking outside of the box and bringing new ideas to the table. Sometimes we need to explore outside of our usual haunts.

Market research surveys need to be carefully crafted because a small sample will determine how a much larger population is treated. That is, an unbiased random sample is a must.

Instrumental variables can be constructed from any database variable as long as the correlation is statistically significant and the field is well-populated.

Multi-variable logistic regression can be difficult to estimate because statistics significance can vary by segment.