Analyzing text data can provide rich insights about customers’ perceptions and experiences, as well as information about root causes of problems they encounter in their interactions with a company. Gleaning these insights, however, depends on using the right tools, to harness the power of text, in the right ways. And the landscape of tools and platforms is rapidly changing. Here, I will discuss some of the most useful criteria for evaluating commercial text analytics platforms, based on recent experience.
Currently, the text analytics vendor market is relatively fragmented, and few vendors stand out for their differentiated offerings. No strong industry leader has emerged, as the research and complex algorithms continue to evolve.
Success with advanced analytics requires both technical know-how and a thoughtful approach. In this series, Bain's experts offer practical advice on some of the most common data issues.
Choosing the appropriate tools and methodologies will depend on the particular use case. Corporate customers that have a stable topic domain have different requirements than firms dealing with a shifting array of multiple topics. In our experience, platforms that excel in speed and flexibility of analysis do the best job of serving use cases that do not focus on one domain. Some companies, though, may prefer platforms that integrate with customer relationship management (CRM) systems as production tools and can be tailored to a specific domain. Platforms with completely automated techniques tend to be brittle and too inflexible to incorporate new assumptions, terms or topics. Thus, initially defining key criteria by use case is an important step to consider when selecting a text analytics platform.
As a quick primer, here are the main features that companies should look for in a text analytics platform.
- Application of analytical methods that provide accurate topic identification and sentiment analysis…
- with the ability to incorporate flexible logic to interpret technical terms and local jargon; and
- the ability to work with small and large data sets effectively.
- Metadata used to complement text data and improve the quality of results.
- Flexible reporting dashboards that help visualize text insights…
- through customizable word clouds that allow combining of words, exclusion of words and formatting;
- with the ability to explore topics and sentiment at varying levels of detail; and
- the ability to export results in flexible formats, including tagged responses with topic and sentiment.
- Ease of use and navigation, and speedy performance when using the platform.
- Multiple language support.
Let’s expand on these points.
Analytics methods that provide accurate topic identification and sentiment analysis. The most advanced platforms apply a combination of methods to analyze text, including machine learning techniques (supervised or unsupervised methods), semantic or natural language processing (NLP), topic identification and business rules.
Platforms with machine learning methods use advanced algorithms, but offer limited customization and tend to be black-box models that do not show users the vendor’s combination of algorithms, so the user may not always be able to explain results. These platforms can produce results quickly, but fall short with small data sets, short phrases and comments with technical terms or sarcasm. Running test data through multiple models and comparing the results helps to identify the model with highest recall and precision rate, and this is where more transparency and control over the selection of algorithms can help.
NLP and business-rules-based models tend to focus on industry- or domain-specific taxonomies and classifiers. This approach works well for repeated projects based on consistent topics or industry domains, mainly for larger companies. Additionally, established vendors tend to have documented libraries that work as starter kits for analysis and can be customized for a new use case. But if you stray from the original libraries, performance can degrade.
When it comes to sentiment analysis, most platforms provide a tag of positive, negative or neutral for each textual statement. Some platforms go further and provide results on a scale of 1 to 10. However, accuracy varies by platform and may depend on the quality of text being analyzed. Platforms that produce high-accuracy sentiment tagging are rare, and most platforms still do not accurately interpret sarcasm. In some instances, companies might increase accuracy by tuning sentiment, creating custom rules or hand coding results, but this may not be worth the effort and is not practical for large data sets. Be sure you can measure your false-positives rate here, and tune for your desired levels of sensitivity and specificity.
The ability to use metadata to complement text data. Despite being relatively easy to implement, many platforms and models lack this capability. Structured data such as a Net Promoter Score® (a key metric of customer loyalty) can complement text results and sometimes act as tie breaker for statements with ambiguous sentiment.
Flexible reporting dashboards that help visualize text insights. Most platforms provide reporting on results with basic charts. Some go further with customizable reports to better understand the results and improve analytics models, or have an enhanced dashboard. And some can improve the visual aspects of insights by integrating with tools such as Tableau. Word clouds, for instance, provide a quick highlight of key words and themes. Here, two useful features are the ability to combine similar words or topics and exclude words. Another function, frequency charts, allow users to sort the data on sentiment, topics and subtopics, in order to get a high-level overview of results.
Users also should look for the ability to export results in flexible formats and to add or remove variables. For example, they should be able to tag each row of text with sentiment and topic.
A fast and easy-to-use interface. A simple interface makes it easy for analysts to learn and use the platform, navigate through the various functions and visualize results. Users want a fast upload and download of data, and the ability to handle multiple file formats.
Multiple language support. Many platforms still work best only with English, and some do not support other languages at all. Good platforms support multiple languages, using algorithms that take into account each language’s grammar, rather than having to translate into English, which eliminates valuable nuances embedded in a language. Platforms built by regional vendors handle their main local language (mostly Spanish, French, German, Dutch and Chinese), but most are still limited in sentiment identification.
Platforms that can support the criteria discussed here will be most useful. And the best platforms will continue to improve their features and interface, adopting the latest methods in text analytics.
Harika Guddanti is a senior specialist in Bain & Company’s Advanced Analytics practice, and is based in Chicago.
Net Promoter Score® is a registered trademark of Bain & Company, Inc.; Fred Reichheld; and Satmetrix Systems, Inc.