Expert Commentary

How Machine Learning and Natural Language Processing Produce Deeper Survey Insights

Analyzing open-ended responses from a large group requires tools emerging from the artificial intelligence revolution.

By Ruud Hellemons and Roger Zhu

First published on marzo 23, 2021
min read


Executive Summary When surveys of large numbers of people contain open-ended responses, traditional analytical approaches fall short. However, machine learning and natural language processing can handle the statistical and contextual challenges involved. That’s how a global retailer was able to derive insights about its culture and values from a survey of tens of thousands of employees around the world. When assessing consumer or employee sentiment, traditional approaches tend to focus on management interviews, focus groups, and numerically based survey questions as the core basis for insights. On the other hand, open-ended survey text responses have played a sparing role, given the large analytical effort involved. The traditional approaches were not sufficient for a global retailer. It wanted to deeply understand employee sentiment and how well employees believed the company and its leaders were living up to the stated values. To understand the root causes, the company worked with Bain to conduct an in-depth diagnostic. The assessment involved management interviews and focus groups, as well as a survey of tens of thousands of frontline and corporate employees in more than 20 languages. The scale of the assessment and the need to understand trends specific to different locations and functions required an in-depth analysis of open-ended survey responses. We decided to use machine learning (ML) and natural language processing (NLP) techniques to address several challenges: Selection bias: Due to the diversity of employees, the retailer wanted to ensure that insights were not solely derived from a small group of interviews and focus group participants, but instead captured a broad range of opinions and experiences. Statistical significance: Given the assessment’s global reach, the insights needed to have a statistically rigorous foundation backed by the survey data. Lack of nuance when analyzing open-ended text: The survey focused on complex themes around culture and ways of working. Any analysis of the free text responses required a much greater degree of analytical rigor than simple word categorizations, in order to effectively capture sentiment. Complexity of language: Analyzing open-ended responses involves relatively sparse textual data. Respondents may refer to the same underlying theme using different language that contains no shared words—for instance, “I think my salary is too low” and “They are not paying me enough.” The methodology employed must be able to recognize and manage this complexity. Unique context: It has become common practice in text analytics to employ pretrained models to classify open-ended responses, based on a predetermined set of themes, and occasionally refined by manually tagging a limited set of responses as examples for new themes. However, our experience with surveys suggests that each use case has a unique context. That makes it difficult to determine a complete set of themes ahead of the survey. This is particularly true for large global enterprises, given the varying cultural nuances, company-specific language, and geographic variations. Instead, one needs an unopinionated, data-driven approach for determining themes, based on the core data set. The approach: Dialect text analytics To tackle these challenges, we deployed our Dialect text analytics software to understand, categorize, and produce visualizations for key themes from the survey responses. This software employs recent breakthroughs in language models and ML, and can stand up to the complexities of open-ended responses from large surveys, including spelling errors and very short or incomplete phrases, such as “Pay is good, management not so much.” Over the past few years, the “deep learning” revolution in AI and ML has made strides in text analytics, from chatbots to sentiment analysis and text generation. In certain applications, algorithms now match or even exceed human capabilities. Nevertheless, there remains room for improvement in topic modeling to understand common themes mentioned in text. That’s why Bain built a text analytics library, which enable this form of unsupervised analytics. The first phase in our approach involves exploratory modeling to detect themes in the data. This unsupervised topic model ensures that the underlying data informs the identification of themes, without bias toward preconceptions and unstated assumptions. This phase generally consists of four steps: Preprocessing the text: We first apply textual cleaning practices such as removing punctuation, setting words to lower case, filtering out uninformative words, and correcting spelling errors. Even with large-scale surveys, the volume of text can be relatively low for ML purposes compared with other data sources such as online reviews or social media posts. As a result, it is critical to condense the data by recoding verbs and nouns to their lemma (grouped inflectional forms of words). Finally, respondents regularly tend to provide feedback as a compact list of issues (such as “nice colleagues, long working hours”), which requires tokenization to split responses into sentence parts, so that we can attribute a single response to multiple themes. Training the language model to learn relationships between words: After preprocessing, we transform words in the texts to vectors called “word embeddings,” which convert the text information into numerical inputs and infer meaning by their contextual similarity. Here, it is critical to tailor the embeddings to the business context, as the same word may have dramatically different connotations in different companies and industries. Running theme detection using an unsupervised clustering algorithm: The unsupervised algorithm clusters the text into themes without initial human intervention, while also allowing model customizations for different survey populations or questions (see Figure 1). Theme review: Once aggregated, we review the theme clustering identified by the unsupervised algorithm, and corroborate it against interviews, refining if required. Figure 1 The algorithm clusters survey responses into themes Following refinement, there is an optional, second step to leverage a supervised ML model that can accurately assign identified themes to new data. At this stage, the theme definitions are fixed, which can be valuable in use cases with frequent data updates, such as regular employee pulse surveys. The Dialect software also can add the corresponding sentiment to each theme in an open-ended response. Greater confidence in the insights For the retailer, the model created with the Dialect text analytics library identified organizational themes that confirmed and complemented those from management interviews and focus groups. It also provided a level of rigor and detail that allowed for customized insights at a geographic and functional level across 80 summary reports. The text analytics work was critical to the success of a major priority for senior executives, establishing confidence in the identified strengths and challenges related to company culture. This enabled the executive team to rapidly align on organizational priorities through a series of workshops. With growing sophistication, emerging text analytics tools will increasingly unlock faster and deeper insights into employee and customer sentiment. The authors thank the following colleagues for their help with this expert commentary: Sarah Salzman, Anli Chen, Marion Louvel, Katrijn DePaepe, and Linda Raaijmakers.

Executive Summary

When surveys of large numbers of people contain open-ended responses, traditional analytical approaches fall short.
However, machine learning and natural language processing can handle the statistical and contextual challenges involved.
That’s how a global retailer was able to derive insights about its culture and values from a survey of tens of thousands of employees around the world.

When assessing consumer or employee sentiment, traditional approaches tend to focus on management interviews, focus groups, and numerically based survey questions as the core basis for insights. On the other hand, open-ended survey text responses have played a sparing role, given the large analytical effort involved.

The traditional approaches were not sufficient for a global retailer. It wanted to deeply understand employee sentiment and how well employees believed the company and its leaders were living up to the stated values. To understand the root causes, the company worked with Bain to conduct an in-depth diagnostic. The assessment involved management interviews and focus groups, as well as a survey of tens of thousands of frontline and corporate employees in more than 20 languages. The scale of the assessment and the need to understand trends specific to different locations and functions required an in-depth analysis of open-ended survey responses.

We decided to use machine learning (ML) and natural language processing (NLP) techniques to address several challenges:

Selection bias: Due to the diversity of employees, the retailer wanted to ensure that insights were not solely derived from a small group of interviews and focus group participants, but instead captured a broad range of opinions and experiences.
Statistical significance: Given the assessment’s global reach, the insights needed to have a statistically rigorous foundation backed by the survey data.
Lack of nuance when analyzing open-ended text: The survey focused on complex themes around culture and ways of working. Any analysis of the free text responses required a much greater degree of analytical rigor than simple word categorizations, in order to effectively capture sentiment.
Complexity of language: Analyzing open-ended responses involves relatively sparse textual data. Respondents may refer to the same underlying theme using different language that contains no shared words—for instance, “I think my salary is too low” and “They are not paying me enough.” The methodology employed must be able to recognize and manage this complexity.
Unique context: It has become common practice in text analytics to employ pretrained models to classify open-ended responses, based on a predetermined set of themes, and occasionally refined by manually tagging a limited set of responses as examples for new themes. However, our experience with surveys suggests that each use case has a unique context. That makes it difficult to determine a complete set of themes ahead of the survey. This is particularly true for large global enterprises, given the varying cultural nuances, company-specific language, and geographic variations. Instead, one needs an unopinionated, data-driven approach for determining themes, based on the core data set.

The approach: Dialect text analytics

To tackle these challenges, we deployed our Dialect text analytics software to understand, categorize, and produce visualizations for key themes from the survey responses. This software employs recent breakthroughs in language models and ML, and can stand up to the complexities of open-ended responses from large surveys, including spelling errors and very short or incomplete phrases, such as “Pay is good, management not so much.”

Over the past few years, the “deep learning” revolution in AI and ML has made strides in text analytics, from chatbots to sentiment analysis and text generation. In certain applications, algorithms now match or even exceed human capabilities. Nevertheless, there remains room for improvement in topic modeling to understand common themes mentioned in text. That’s why Bain built a text analytics library, which enable this form of unsupervised analytics.

The first phase in our approach involves exploratory modeling to detect themes in the data. This unsupervised topic model ensures that the underlying data informs the identification of themes, without bias toward preconceptions and unstated assumptions. This phase generally consists of four steps:

Preprocessing the text: We first apply textual cleaning practices such as removing punctuation, setting words to lower case, filtering out uninformative words, and correcting spelling errors. Even with large-scale surveys, the volume of text can be relatively low for ML purposes compared with other data sources such as online reviews or social media posts. As a result, it is critical to condense the data by recoding verbs and nouns to their lemma (grouped inflectional forms of words). Finally, respondents regularly tend to provide feedback as a compact list of issues (such as “nice colleagues, long working hours”), which requires tokenization to split responses into sentence parts, so that we can attribute a single response to multiple themes.
Training the language model to learn relationships between words: After preprocessing, we transform words in the texts to vectors called “word embeddings,” which convert the text information into numerical inputs and infer meaning by their contextual similarity. Here, it is critical to tailor the embeddings to the business context, as the same word may have dramatically different connotations in different companies and industries.
Running theme detection using an unsupervised clustering algorithm: The unsupervised algorithm clusters the text into themes without initial human intervention, while also allowing model customizations for different survey populations or questions (see Figure 1).
Theme review: Once aggregated, we review the theme clustering identified by the unsupervised algorithm, and corroborate it against interviews, refining if required.

The algorithm clusters survey responses into themes

Following refinement, there is an optional, second step to leverage a supervised ML model that can accurately assign identified themes to new data. At this stage, the theme definitions are fixed, which can be valuable in use cases with frequent data updates, such as regular employee pulse surveys. The Dialect software also can add the corresponding sentiment to each theme in an open-ended response.

Greater confidence in the insights

For the retailer, the model created with the Dialect text analytics library identified organizational themes that confirmed and complemented those from management interviews and focus groups. It also provided a level of rigor and detail that allowed for customized insights at a geographic and functional level across 80 summary reports.

The text analytics work was critical to the success of a major priority for senior executives, establishing confidence in the identified strengths and challenges related to company culture. This enabled the executive team to rapidly align on organizational priorities through a series of workshops. With growing sophistication, emerging text analytics tools will increasingly unlock faster and deeper insights into employee and customer sentiment.

The authors thank the following colleagues for their help with this expert commentary: Sarah Salzman, Anli Chen, Marion Louvel, Katrijn DePaepe, and Linda Raaijmakers.