Personalysis: AI driven personas can accurately predict consumer ratings
We’ve recently been starting to deploy personas derived from survey data using a variety of the latest AI models.
Personas are a very popular explanatory aid to bring qualitative and quantitative research data to life. But they are time consuming to generate and require a lot of human thinking. However, we have found they can now be generated from primary data using Large Language Model AI, which takes a lot of the heavy lifting out of it. The question we wanted to answer is, how valid is this approach, and how closely can it mimic archetypal humans for insight purposes?
Almost all clients work with some form of market segmentation. So to prove the concept, from our own research data we created a values-based segmentation of the UK population. This surfaced all of the usual suspects who show up in commercial segmentations (Tech-Savvy Urbanites, Mainstream Suburban Families, and so on). We settled on 12 clusters for the persona experiment, and instructed our model to create 5 detailed personas per segment – a total of 60 ‘artificial people’ – representing different flavours and variations within a segment.
When reviewing the personas we had generated we thought of many random questions: what would be their taste in interior design? How do they feel about AI? What music do they like? Were their “personalities” particularly different? And so on.
But the question was really this: could the personas be used as an extrapolator or “expert witness” to generate synthetic consumer-like data, and how accurate would it be?
The results have been surprising, to put it mildly.
We’ve run analyses using this general model on data from 50 concept testing studies conducted by a research partner, global insights consultancy Verve. We generated synthetic data representing the responses of our model to different stimuli as presented in the research (appeal ratings), and comparing this data with the real life data collected by Verve. The results are very highly correlated, far higher than we expected.
The chart below amalgamates test data on all these test items. The real-life top 2 box “appeal” results from a 1-7 scale are shown along the X Axis. In each instance a text-based concept – a tagline, an idea, a brand promise, etc – was tested amongst a NatRep sample of consumers, with an average base size of 400.
We then used the synthetic ratings from the personas to estimate NatRep appeal.
The correlation between the synthetic ratings and the real ratings was 0.88, an R2 of 0.77. In non-stats terms, this means that the AI model appears to predict consumer ratings with a high degree of accuracy. And the correlation between the synthetic ratings and the risk measure was also strong, at -0.79, an R2 of 0.62.
A reasonably successful test outcome…
This is with no detailed “fine tuning” i.e. extra training of the AI models. Obviously, for a client deploying a bespoke segmentation approach, the results would be further improved through enhanced training on much more detailed and specific datasets and enrichment assets. Indeed, we are working on exactly this with a couple of clients.
But it is worth knowing that technically we have found deploying this persona based approach does two things compared to a pure “zero shot” approach (i.e. asking the LLMs to give a generic appeal rating without any segmental training, something we did a number of times to get an average prediction).
- The persona based approach materially improved the correlation of our real and artificial appeal ratings across various surveys from circa 0.45 to 0.88, i.e. shifting “predictivity” from around 20% to 77%.
- Equally importantly, it not only sharpened the picture, it more than doubled the coefficient of variation (the spread of ratings) of the synthetic scores. There seemed to be a lot of bet-hedging going on in the zero shot test.
Here are the “zero shot” results. You can see (a) the scores skew more positive, and (b) the synthetic ratings are now operating in a narrower band of variation. Moreover, the correlation with research ratings is materially lower. It’s worth noting that there nonetheless is a correlation – so our hypothesis is that the generic LLM predictions must be attempting to evaluate some form of intrinsic “goodness” in the language of the test items.
Overall, these results indicate very strongly that AI Large Language Models can be used to simulate and even predict real world responses of consumers to certain types of stimulus with a high degree of accuracy.
Of course, this is early days. But we think this is potentially a highly significant development for the MR world and could radically change the economics of MR globally. For example, concept, brand and communications testing and screening could be streamlined by using AI models to predict real world consumer responses. This could allow much faster and more efficient shortlisting of concepts to put into research, saving both time and fieldwork money. We also think this approach could also add significant value to segmentation outputs in general.
Signoi is actively involved with clients in building bespoke models to achieve this.
* * * * * * * * * * * *
We’ll be writing more on this and other AI related subjects shortly, meanwhile please do get in touch if you found this interesting…
Email us today to say hello and learn more about what Signoi’s AI-driven analysis software can do for you.
Please contact us at email@example.com