Data Science

Predicting CPG Sales: A Comparison of Demographics vs Traditional Segmentation vs PersonaLive

Elizabeth Gilbert
Jack Schroder


August 11, 2021

“How can I best predict my retail sales?”

With so many options to predict sales outcomes, is demographics, psychographic segmentations, or something else the most powerful choice? In this case study, we’ll explore the options, and find how best to predict retail sales. 

Input Data

In this study, we’ll compare 3 different datasets’ performance in predicting sales data: 

  1. Basic demographics variables age and income
  2. A leading psychographic segmentation
  3. PersonaLive™, the new segmentation by 

The new PersonaLive™ segmentation system organizes households into one of 80 behavioral segments based on social, mobile, web, and individual demographic characteristics.


Psychographic segmentation systems categorize people based on demographic traits alone, whereas PersonaLive™ was created using not only demographics but also cell phone movement data, social media, and website visitation. We hypothesize that this added behavioral data improves the segmentation’s ability to predict sales outcomes.

Test Dataset

We will be modeling Iowa liquor sales as our outcome variable. In this dataset, sales are given as dollars per transaction. Monthly sales are 2.2 million bottles, $29 million, and 1.8 million liters on average, and there are 1,741 stores included in the final analysis. 

Table 1: Monthly volume of Iowa liquor sales data


In preparation for modeling, we filtered the data to focus on 10 high volume products that vary in who consumes them: Black Velvet, Captain Morgan Original Spiced, Crown Royal Regal Apple, Grey Goose, Hawkeye Vodka, Jose Cuervo Lime Margarita, Malibu Coconut Rum, Smirnoff 80 proof, and Tito’s Vodka. Additionally, high volume orders were removed to avoid bulk orders from businesses and focus on consumers. 


In this analysis, the state of Iowa was split into hexagonal grid cells using H3. Sales of each product were combined within each cell. By aggregating sales this way, we can model for what factors lead to sales in a given area rather than in specific locations, which may have many other variables to consider.

For each chosen product, we built 3 linear models: one for each input dataset. Sales are scaled by number of stores (to capture average performance per store) and population. Ultimately, we are evaluating each dataset's ability to predict how well this liquor product does in a given location. Further, these transformations give us an index for how well a product suits the population of an area. 

Each dataset's performance will be reported as a percentage improvement on simply predicting the mean value. You can think of this performance metric as answering the question: how much did we reduce mean absolute error from baseline using predictions from this model?

1: Basic demographics model

Our first model uses basic demographics: age and income. This data is available from the US Census, and is our baseline in predicting sales outcomes. The variables we are using are the population percentage per block group for each age range and income range.

Table 2: Census age & income data

Running the linear regression, we find that on average across all 10 brands, demographics reduce error by 8.3%, compared to guessing mean sales per person per store for each brand. 

2: Leading psychographic segmentation model

Our second model uses a leading psychographic segmentation (also referred to as geodemographic). This type of dataset is frequently used in predicting sales outcomes. The variables we are using are the percentage of households per block group in each segment.

Table 3: the format of the leading psychographic segmentation data

On average across all 10 brands, this leading psychographic segmentation reduces error by 21.8%, compared to predicting the mean value for each brand. This is a 13.5% improvement compared to demographics alone. This is good, but can we do better?

3: PersonaLive™ model

Our third model uses the PersonaLive™ segmentation. PersonaLive™ is a new segmentation by that uses social, mobile, web, and individual demographic characteristics to organize households into one of 80 behavioral segments. The variables we are using are the percentage of households per block group in each PersonaLive™ segment.

Table 4: Sample of PersonaLive™ segment data

Across all 10 brands, the PersonaLive™ segmentation reduces error by 25.3%, compared to guessing mean sales per person per store for each brand. This is a 17.0% improvement compared to demographics alone, and a 3.5% improvement compared to the leading psychographic segmentation that was used in this analysis. 


This table shows PersonaLive™outperforming both basic demographics and a leading psychographic segmentation in predicting sales for the liquors in this analysis. These liquors were chosen to represent a breadth of products without any prior knowledge of dataset performance in predicting their sales. Compared to demographics and a leading psychographic segmentation, PersonaLive™ has the highest reduction in error for 8 of 10 brands. 


Using PersonaLive™ data and this type of modeling, each of these brands can not only identify locations where their products will succeed but also understand the behaviors that are associated with people who buy their product. This type of analysis has implications that stretch into industries well beyond CPG liquor sales.


In this case study, we investigated three datasets: age and income demographics, a leading psychographic segmentation, and’s new PersonaLive™ segmentation. Amongst these datasets PersonaLive™ had an edge in predicting liquor sales. 

While this post described just one sales dataset, we used a large variety of outcome data from various industries to optimize the performance of PersonaLive™. In the majority of tests, PersonaLive™ outperformed traditional segmentation.

If you’d like to learn more about PersonaLive™ and how to leverage it for your use case, simply contact us.


This post was inspired by How to Enrich POS Data to Analyze & Predict CPG Sales by Argyrios Kyrgiazos and the Carto team. We were impressed not only by the solid approach taken by Argyrios but also the quality of the data itself. We took the same methodological approach, but applied it as a test in our evaluation of the PersonaLive™ dataset, as compared to traditional segmentation and demographic variables.
How PersonaLive Segmentation System Works

Personalive segmentation uses social media, mobile foot traffic, online activity, and individual-level demographics to organize every US household into one of 80 behavioral segments. These segments provide visibility to the online and offline preferences of the customers visiting any US property.

01 Append

Draw a polygon around a property to identify the behavioral segment of every visitor.

02 Analyze

Rank the top customer types visiting a location. Then match retailers based on online and offline activities of visitors.

03 Activate

Demonstrate visitor brand affinity to close deals. Activate marketing campaigns to drive target segments to your location.

Related posts

Data Science

3 Ways To Segment Foot Traffic Data

Jack Schroder


April 12, 2022
Data Science

Redefining Community Boundaries with Social Media Data

Phil Martin


November 7, 2017

Ready to get started with Geosocial data?