Brief
Резюме
- Companies are using synthetic customers to accelerate product development, test marketing, and train frontline teams.
- Organizations that build synthetic customers should rely on their first-party data rather than on vendors’ third-party data.
- Improving model accuracy allows teams to test more variables, eliminate weak ideas earlier, and focus human research where it matters most.
- Large language models still lack true empathy, leaving a vital role for human judgment.
Synthetic customers—AI-generated representations of real customers—have reached an inflection point that goes beyond qualitative exploration toward structured, repeatable, and accurate quantitative insights. These proxies can come in the form of one-to-one digital twins of customers or segment-based personas derived from a mix of internal company data (such as transactional, behavioral, demographic, and voice-of-the-customer research data) and external sources (product reviews and social media scraping).
Demand for continuous, always-on insights about product or service performance has outgrown the limits of traditional research methods. Concerns around speed, cost, and risk reduction have spurred adoption of digital proxies that emulate human behavior, preferences, and decision making. For example, US Bank has used synthetic audiences to understand how high-net-worth households and other customer segments think about financial topics, test messaging, and refine creative campaigns before launch. Retailer Target tests products and promotions on synthetic audiences to simulate how various consumers would respond to them before live testing on websites.
Market leaders that can iterate quickly, test more ideas, and kill weak concepts early consistently outperform those tied to slow, episodic, siloed insight cycles.
Where traditional research falls short
Traditional research remains valuable in many situations but is increasingly constrained. Conjoint and discrete choice models are limited by the number of price points, features, or interaction effects that can feasibly be tested. Teams finish studies wishing they had tested more, or wanting to extrapolate beyond what was tested, which slows learning and introduces uncertainty.
Human-based survey research has encountered other problems in recent years. The volume of fraud has increased, and participant engagement has become more variable, which forces researchers to recruit larger samples or deploy costly quality control measures just to get usable data. Bot contamination of surveys has forced constant upgrades. Moreover, the classic issue of people saying one thing but doing another persists. And in business-to-business (B2B) markets, there may be too few key customers, such as CFOs in a single industry, to reliably sample.
How synthetic customers perform
It’s not surprising, then, that many product, strategy, and marketing teams are using off-the-shelf AI tools to gather qualitative insights around new features, pricing, and messaging. However, these tools often lack grounding in proprietary customer data, statistical validation, or clear governance. Fortunately, recent generations of large language models (LLMs) demonstrate stronger reasoning, more stable trade-offs, and better alignment with human decision patterns in structured tasks.
Our work with a leading consumer technology company illustrates the step change in performance and accuracy that synthetic customers can produce when paired with their own first-party proprietary data. The team backtested synthetic output against a prior large-scale quantitative conjoint study, using the original research as ground truth. We built digital twins from historical respondent-level data and ran the same tasks used in the original study, excluding the study itself from the training inputs. The digital twins replicated about 90% of key outcomes from the original research, including the following (see Figures 1 and 2):
- identification of the most influential features that drive customer choices;
- preference share for most of the products tested;
- correct portfolio-level decisions about which products to launch or retain; and
- preliminary price sensitivity curves that showed promise.
Notes: Average feature importance based on conjoint results; LLM used is Gemini 3.0; n=1,500
Source: Bain & CompanyNotes: LLM used is Gemini 3.0; n= 1,500
Source: Bain & CompanySimilar results emerged when we tested synthetic customers against an existing human consumer survey exploring attitudes, usage, and behavior around GLP-1 drugs. We generated synthetic respondents using demographic and attitudinal inputs and evaluated their responses across closed-ended questions, as well as questions answered along a five-point scale. The synthetic outputs tracked closely with human responses, with variance increasing only when prompt questions were more ambiguous.
The results reinforce that what you ask the LLM to do matters, but synthetic customers are increasingly reliable for quantitative use cases. And using proprietary first-party data to enrich what’s available from third parties adds nuance and reliability.
Looking ahead, synthetic customers have the potential to reshape the entire marketing process and the product development lifecycle. Specifically, for product development, they will add value in several ways:
- extend prior pricing and conjoint research by testing new price points, bundles, or feature combinations without restarting fieldwork;
- refine and stress-test at the customer segment level, exploring how segments respond to changes in product, pricing, or messaging before a company commits to new studies;
- screen early concepts, features, and messaging to rapidly narrow the options so human research focuses on the highest-value questions; and
- enable low-risk testing for hard-to-reach segments before engaging with a scarce pool of human customers.
The same principles shaping marketing in consumer industries also apply in B2B contexts. For instance, synthetic customer use cases can include prepping sales teams using simulated buyer personas and interactive avatars to help teams rehearse objections, refine value propositions, and test messaging.
For a global services firm, we built synthetic personas based on several years of Net Promoter® loyalty data collected from its clients. With the same data, we concurrently ran traditional statistical (latent class) segmentation methods and landed in a similar place. Once personas were created, we trained the LLM on third-party data and published articles for proper context. Sales teams then could practice pitching to value-conscious CIOs and other executive personas. The models were scaled and distributed across their global offices within weeks.
To get started, augment rather than replace
Our experience building synthetic customer capabilities across a range of industries shows that it’s most effective to start by augmenting, not replacing, existing research methods. Leading organizations first deploy synthetic customers as an augmentation layer to narrow options, pressure test assumptions, and focus human research on the highest-value questions, or to build proofs of concept that show accuracy.
Success here will depend on treating synthetic customers as a capability, not a tool, which means owning how the company defines personas, simulates decisions, and validates outputs across use cases. Specifically:
- Backtest to prove reliability. This ensures the rest of the organization will support insights that are synthetically generated.
- Proprietary data matters most. The data and context that ground these models—such as historical customer research, pricing and sales data, segmentation attributes, and voice-of-customer inputs—matter more than the choice of model.
- Balance build vs. buy. Most vendors focus on qualitative or lightly structured use cases and can support early experimentation. However, organizations seeking decision-grade applications increasingly combine vendor tools with internally built models to retain control over data, logic, and learning. No off-the-shelf solution currently meets all requirements.
- Adapt the operating model. Using synthetic customers requires changes in workflows, decision rights, and governance. Research teams, for instance, will need to ask questions differently so as to provide better input to synthetic audiences. Organizations must rethink how insights are generated and how research, product, and marketing teams collaborate.
Leading organizations already benefit from initial learnings in the form of faster iterations, richer data and insights, and increasingly accurate in-market outcomes. Over time, synthetic customers will likely become a reusable decision infrastructure, embedding institutional learning and compounding advantage. As adoption and use cases scale, synthetic customers will function as an always-on insights platform across product, marketing, and customer experience. The cumulative depth of proprietary data and learning embedded in these systems could become a durable competitive advantage.