Contents

Geosocial Segmentation Whitepaper

A brief introduction to our dataset and the approach for building out the segmentation.

Will Kiessling

•

Last updated:

January 15, 2019

Last updated

Location intelligence is vital to the success of any business that sells physical products or services. They rely on accurate location data to optimize distribution, plan site selection, and extract market insights. This data comes at a premium. The U.S. census, a primary source of location intelligence across industries, cost the government over $13B. Companies spend millions to conduct surveys of communities just to aggregate a useful sample. Recently, companies have begun using mobile location data to understand human movement, despite the high price and privacy concerns.

Consider if there was a source of data, composed of billions of data points across the entire U.S. and much of the world, where people were publicly and knowingly sharing the behaviors, experiences, and personalities that take place in their neighborhoods. This dataset exists and has existed since at least 2009. It is social media data that has a location associated with it, otherwise known as Geosocial data. It is contributed to organically, unlike the census or surveys, removing bias. It is open and public data, that anyone could access right now for free, unlike costly and exclusive mobile location data.

The problem companies face when using this data is that is a very large source of unstructured text data. To make adequate use of the data companies need to be able to extract insight from content and organize this information into a format that can be mapped or used in analytics. This is the problem Spatial.ai set out to solve.

Testing

Spatial.ai works with some of the largest companies in the U.S. to help them understand location performance. Through this work and authorized usage of their performance indicators (in most cases this is revenue), we were able to create a benchmark to test the effectiveness of any methods attempted. Results of all tests could be compared to location-based business performance to see how much it could account for real-world outcomes. Additional desired criteria were to minimize human bias, to use as much of the data as possible, and to tie the output to actionable insights.

Approaches

Approach 1 - Human defined segments

Our team consists of ethnographers and data scientists, so as a first approach, we used human researchers to identify posts and terms used on social media that could signify types of behavior. This resulted in a handful of categories of social activity that when tested against business performance, demonstrated the ability to predict outcomes with moderate success. The two primary problems identified with this approach were that it contained human bias (humans were deciding what information was important initially) and that it did not make use of any data that was not chosen by the researcher, limiting how much information was contained in the end product.

Approach 2 - Supervised learning

The second approach used supervised machine learning to extend the efforts of the research team. Researchers categorized posts and topics to train a machine learning model to differentiate themes between posts. Over time, the model began to understand more complete behaviors and relationships between topics. For example, it could accurately categorize “gears” as a topic related to bicycling even though the researcher never identified a social media post that included the term. When tested against business performance, it had a 300% improvement over the previous method. This was considered a strong candidate for the permanent solution. However, this approach still contains human biases. People are still deciding which data matters the most. And a human can only categorize a small sample of the data, so there were insights that we were still missing.

Approach 3 - Unsupervised learning

The final approach was unsupervised learning. Using this method, we were able to use all of the dimensions of the data, including phrases, terms, time, and proximity. After many iterations of this approach, the team produced a dataset of over 70 social media “segments” that were then passed to the research team. The research team was easily able to identify the themes or behaviors that the machine learning process identified from human conversation. Now, instead of trying to figure out which data to use, the researchers simply had to help interpret and communicate the data the machine had organized organically. This satisfied our requirement for reducing human bias. Now, the ultimate test was to compare the results for predicting business outcomes. Not only did it outperform the previous best method by 30%, but for some clients, it proved more insightful as a single source of information than data from the census.

Conclusion

Using the results of these tests, Geosocial data has been organized with unsupervised learning and is now available in 70+ segments of activity that can be provided as percentiles for any geographic unit across the entire U.S. and Canada. Clients are using this data to successfully map behaviors in their markets and predict business performance.

To explore these segments visit our taxonomy.

Explore Data Taxonomy

To learn more about Geosocial data, visit our Essential Guide to Geosocial Data.

What you should do now

Whenever you're ready, here are 3 ways Spatial.ai can help:

See PersonaLive In Action. If you'd like to segment and target your best customers using real-time behavioral data, schedule a free 30-min demo to get started.
Subscribe To Consumer Code. If you've found this helpful, check out our newsletter and podcast where we share more consumer research and insights for retail marketers.
Share This Post. If you know another marketer who’d enjoy reading this post, share it with them on Linkedin, X, or Facebook.

Get retail marketing tips

We email every monday with smart growth strategy ideas. Almost no promotion. Just value.

Customer Segmentation Toolkit

5-step guide to maximize the value of your customer data.

Download Now

Level up your retail marketing strategy

Learn how to append ethnicity to your customer records to uncover hidden cultural and geographical insights.

Watch now

Join our live webinar

Learn how to use billions of anonymous consumer signals to deliver actionable insights quickly and easily.

Watch now

Unlock Powerful Consumer Insights

AI In Retail Location Strategy Masterclass

Learn how to streamline and scale your site selection process with AI.

Save your seat

The Nudge Method

5-step framework for unlocking the Meta Ads Algorithm and building successful campaigns.

Download Now

All Posts

Build the AI Foundation for Retail Location Strategy: 11 Prompts to Get Started Fast

Your store database in the foundation for retail location strategy—here's how to build one with the help of AI.

Lyden Foust

•

Updated:

April 10, 2025

Last updated

How AI Is Reshaping Retail Location Strategy, Pt 1

In this series, we’ll break down how AI is transforming each stage of the site expansion process.

Lyden Foust

•

Updated:

April 2, 2025

Last updated

Vuori Brand Teardown: The $825M Bet

From a psychic's advice to an $825M war chest, Vuori’s meteoric rise in athleisure is a story of bold moves and smart strategy.

Lyden Foust

•

Updated:

January 28, 2025

Last updated