We are excited to announce that we now provide full Geosocial data coverage for the Mexico market.
While Spatial.ai has always offered custom coverage to our customers, Mexico is the third country (joining the U.S. and Canada) where full data collection and coverage will be provided to any company interested in the market.
Data in Mexico is key for expanding businesses. Site selection and real estate strategies still rely on having up-to-date, accurate data, but many traditional sources that are ubiquitous in the U.S. and Canada are non-existent in Mexico. To ensure communities attract the best possible businesses to meet their interests and needs, it was important for us to provide Geosocial data as an option for data-driven businesses.
The Spatial.ai data team has rigorous standards for data meeting our minimum requirements. We identified two criteria for testing in Mexico that we had to account for and evaluate in order to feel confident providing this product to customers:
To evaluate the first criteria, we measured whether we could translate Spanish speaking data into English in a way where our segmentation system could accurately identify behaviors.
The validation test was to compare segment scores in areas where both languages are frequently used. This means cities with large Hispanic populations; Miami, Houston, Los Angeles, etc. We found block groups with a high volume of both languages, split the media within the block group into English and Spanish data, then translated the Spanish data and generated scores across all of our segments. We hoped to find that the data that originated in the Spanish language had similar scores in our segmentation as the natural English data, because the people in the communities were experiencing the same activities, even if they were sharing these experiences in different languages.
We found that overall the correlations between scores were very strong, signaling that we could rely on translated Spanish data as an input for our segmentation system.
However there were a few that didn't translate well, "Hipster", "Awestruck", and "Dating Life" had low correlations across languages. This could either be because different words were used to represent these behaviors, or just that the behaviors aren't as common. We removed these segments from our final Mexico dataset.
After we confirmed that we could translate the data, we took it a step further to compare our understanding of behaviors across cultures. In the U.S., we identified which data points from the Census correlate with each segment. Although the Mexico census is very different, we checked our segments in Mexico for correlation to their census variables to see if they had similarities to our U.S. census correlation.
This proved to be a challenge, but we were able to tell that the resulting correlations followed consistent patterns. For example, the "Connected Motherhood" segment displays a strong negative correlation with single residents in both Mexico and the U.S. We often find certain segments are more or less common in cities versus sparsely populated areas, a pattern which held true in Mexico where interests like "Hip Hop Culture" are far more common in cities no matter which country the behavior is observed in.
Not all segments met our criteria for following expected correlation patterns with the census. The behavior "Bookish" had inconsistent correlations across markets, and was removed from the new Mexico dataset.
Mexico geographic boundaries are different than the block-based boundaries in the U.S. or Dissemination Areas of Canada. The standard unit in Mexico is the Área Geoestadistica Básica, which we found to be generally comparable to U.S. block groups. The Mexican government divides this into urban and rural AGEBs. The urban areas have around 2,500 people, and the rural ones often have very low population. In our analysis, most business activity, particularly relevant to our customers, takes place in the urban AGEBs. We chose to limit our coverage to the urban geographies.
To provide consistent coverage, we look for each area to meet our minimum thresholds for what we consider to be statistically significant scores for our segmentation. We were able to achieve this threshold in 42,011 urban AGEBs, or 75% of geographies.
We are excited to now present this comprehensive, accurate dataset to our partners and customers. Communities in Mexico now for the first time will be accurately represented in a robust, behavioral dataset that will guide the right businesses and products to the right locations.