Skip to Content
  • オフィス

    オフィス

    北米・南米
    • Atlanta
    • Austin
    • Bogota
    • Boston
    • Buenos Aires
    • Chicago
    • Dallas
    • Denver
    • Houston
    • Los Angeles
    • Mexico City
    • Minneapolis
    • Monterrey
    • Montreal
    • New York
    • Rio de Janeiro
    • San Francisco
    • Santiago
    • São Paulo
    • Seattle
    • Silicon Valley
    • Toronto
    • Washington, DC
    ヨーロッパ・中東・アフリカ
    • Amsterdam
    • Athens
    • Berlin
    • Brussels
    • Copenhagen
    • Doha
    • Dubai
    • Dusseldorf
    • Frankfurt
    • Helsinki
    • Istanbul
    • Johannesburg
    • Kyiv
    • Lisbon
    • London
    • Madrid
    • Milan
    • Munich
    • Oslo
    • Paris
    • Riyadh
    • Rome
    • Stockholm
    • Vienna
    • Warsaw
    • Zurich
    アジア・オーストラリア
    • Bangkok
    • Beijing
    • Bengaluru
    • Brisbane
    • Ho Chi Minh City
    • Hong Kong
    • Jakarta
    • Kuala Lumpur
    • Manila
    • Melbourne
    • Mumbai
    • New Delhi
    • Perth
    • Shanghai
    • Singapore
    • Sydney
    • Tokyo
    全てのオフィス
  • アルムナイ
  • メディア
  • お問い合わせ
  • 東京オフィス
  • Japan | 日本語

    地域と言語を選択

    グローバル
    • Global (English)
    北米・南米
    • Brazil (Português)
    • Argentina (Español)
    • Canada (Français)
    • Chile (Español)
    • Colombia (Español)
    ヨーロッパ・中東・アフリカ
    • France (Français)
    • DACH Region (Deutsch)
    • Italy (Italiano)
    • Spain (Español)
    • Greece (Elliniká)
    アジア・オーストラリア
    • China (中文版)
    • Korea (한국어)
    • Japan (日本語)
  • Saved items (0)
    Saved items (0)

    You have no saved items.

    後で閲読、共有できるようにするためにブックマークしてください

    Explore Bain Insights
  • 業界別プラクティス
    メインメニュー

    業界別プラクティス

    • 航空宇宙、防衛、政府関連
    • 農業
    • 化学製品
    • インフラ、建設
    • 消費財
    • 金融サービス
    • ヘルスケア
    • 産業機械、設備
    • メディア、エンターテインメント
    • 金属
    • 採掘・鉱業
    • 石油、ガス
    • 紙、パッケージ
    • プライベートエクイティ
    • 公共、社会セクター
    • 小売
    • テクノロジー
    • 通信
    • 交通
    • 観光産業
    • 公益事業、再生可能エネルギー
  • 機能別プラクティス
    メインメニュー

    機能別プラクティス

    • カスタマー・エクスペリエンス
    • サステイナビリティ、 社会貢献
    • Innovation
    • 企業買収、合併 (M&A)
    • オペレーション
    • 組織
    • プライベートエクイティ
    • マーケティング・営業
    • 戦略
    • アドバンスド・アナリティクス
    • Technology
    • フルポテンシャル・トランスフォーメーション
  • Digital
  • 知見/レポート
  • ベイン・アンド・カンパニーについて
    メインメニュー

    ベイン・アンド・カンパニーについて

    • ベインの信条
    • 活動内容
    • 社員とリーダーシップ
    • プレス・メディア情報
    • クライアントの結果
    • 受賞歴
    • パートナーシップを結んでいる団体
    Further: Our global responsibility
    • ダイバーシティ
    • 社会貢献
    • サステイナビリティへの取り組み
    • 世界経済フォーラム(WEF)
    Learn more about Further
  • キャリア
    メインメニュー

    キャリア

    • ベインで働く
      キャリア
      ベインで働く
      • Find Your Place
      • ベインで活躍する機会
      • ベインのチーム体制
      • 学生向けページ
      • インターンシップ
      • 採用イベント
    • ベインでの体験
      キャリア
      ベインでの体験
      • キャリアストーリー
      • 社員紹介
      • Where We Work
      • 成長を後押しするサポート体制
      • アフィニティ・グループ
      • 福利厚生
    • Impact Stories
    • 採用情報
      キャリア
      採用情報
      • 採用プロセス
      • 面接内容
    FIND JOBS
  • オフィス
    メインメニュー

    オフィス

    • 北米・南米
      オフィス
      北米・南米
      • Atlanta
      • Austin
      • Bogota
      • Boston
      • Buenos Aires
      • Chicago
      • Dallas
      • Denver
      • Houston
      • Los Angeles
      • Mexico City
      • Minneapolis
      • Monterrey
      • Montreal
      • New York
      • Rio de Janeiro
      • San Francisco
      • Santiago
      • São Paulo
      • Seattle
      • Silicon Valley
      • Toronto
      • Washington, DC
    • ヨーロッパ・中東・アフリカ
      オフィス
      ヨーロッパ・中東・アフリカ
      • Amsterdam
      • Athens
      • Berlin
      • Brussels
      • Copenhagen
      • Doha
      • Dubai
      • Dusseldorf
      • Frankfurt
      • Helsinki
      • Istanbul
      • Johannesburg
      • Kyiv
      • Lisbon
      • London
      • Madrid
      • Milan
      • Munich
      • Oslo
      • Paris
      • Riyadh
      • Rome
      • Stockholm
      • Vienna
      • Warsaw
      • Zurich
    • アジア・オーストラリア
      オフィス
      アジア・オーストラリア
      • Bangkok
      • Beijing
      • Bengaluru
      • Brisbane
      • Ho Chi Minh City
      • Hong Kong
      • Jakarta
      • Kuala Lumpur
      • Manila
      • Melbourne
      • Mumbai
      • New Delhi
      • Perth
      • Shanghai
      • Singapore
      • Sydney
      • Tokyo
    全てのオフィス
  • アルムナイ
  • メディア
  • お問い合わせ
  • 東京オフィス
  • Japan | 日本語
    メインメニュー

    地域と言語を選択

    • グローバル
      地域と言語を選択
      グローバル
      • Global (English)
    • 北米・南米
      地域と言語を選択
      北米・南米
      • Brazil (Português)
      • Argentina (Español)
      • Canada (Français)
      • Chile (Español)
      • Colombia (Español)
    • ヨーロッパ・中東・アフリカ
      地域と言語を選択
      ヨーロッパ・中東・アフリカ
      • France (Français)
      • DACH Region (Deutsch)
      • Italy (Italiano)
      • Spain (Español)
      • Greece (Elliniká)
    • アジア・オーストラリア
      地域と言語を選択
      アジア・オーストラリア
      • China (中文版)
      • Korea (한국어)
      • Japan (日本語)
  • Saved items  (0)
    メインメニュー
    Saved items (0)

    You have no saved items.

    後で閲読、共有できるようにするためにブックマークしてください

    Explore Bain Insights
  • 業界別プラクティス
    • 業界別プラクティス

      • 航空宇宙、防衛、政府関連
      • 農業
      • 化学製品
      • インフラ、建設
      • 消費財
      • 金融サービス
      • ヘルスケア
      • 産業機械、設備
      • メディア、エンターテインメント
      • 金属
      • 採掘・鉱業
      • 石油、ガス
      • 紙、パッケージ
      • プライベートエクイティ
      • 公共、社会セクター
      • 小売
      • テクノロジー
      • 通信
      • 交通
      • 観光産業
      • 公益事業、再生可能エネルギー
  • 機能別プラクティス
    • 機能別プラクティス

      • カスタマー・エクスペリエンス
      • サステイナビリティ、 社会貢献
      • Innovation
      • 企業買収、合併 (M&A)
      • オペレーション
      • 組織
      • プライベートエクイティ
      • マーケティング・営業
      • 戦略
      • アドバンスド・アナリティクス
      • Technology
      • フルポテンシャル・トランスフォーメーション
  • Digital
  • 知見/レポート
  • ベイン・アンド・カンパニーについて
    • ベイン・アンド・カンパニーについて

      • ベインの信条
      • 活動内容
      • 社員とリーダーシップ
      • プレス・メディア情報
      • クライアントの結果
      • 受賞歴
      • パートナーシップを結んでいる団体
      Further: Our global responsibility
      • ダイバーシティ
      • 社会貢献
      • サステイナビリティへの取り組み
      • 世界経済フォーラム(WEF)
      Learn more about Further
  • キャリア
    人気検索キーワード
    • デジタル
    • 戦略
    前回の検索
      最近訪れたページ

      Content added to saved items

      Saved items (0)

      Removed from saved items

      Saved items (0)

      Expert Commentary

      Mission Possible: Driver Analysis with Collinear Variables

      Mission Possible: Driver Analysis with Collinear Variables

      Many commonly used methods have serious limitations when assessing the variable importance of collinear drivers.

      著者:Eleonora Nazander and Ilker Carikcioglu

      • min read
      }

      論説

      Mission Possible: Driver Analysis with Collinear Variables
      en
      概要
      • To determine which drivers have the greatest influence on an outcome variable, many analysts turn to techniques such as multiple linear regression, random forest, or Shapley values.
      • But these methods don’t work well when several drivers are highly collinear.
      • To understand overall variable importance, simple methods such as Pearson correlation can more effectively assess the strength of the relationship between a driver and the outcome variable independently of other potential drivers.
      • With that understanding, managers can then address how to improve performance along the relevant drivers, either singly or in logical clusters.

      Analytical techniques to perform driver analysis come in handy when a company seeks to understand a particular outcome, such as customer satisfaction or profit per store, as a function of several potential drivers. Ranking potential drivers by how strongly they affect the outcome metric allows the company to focus resources on improving performance along the right ones.

      Common techniques include multiple linear regression (MLR), random forest, Shapley values, Johnson’s relative weights, partial correlations, and Pearson correlation. Many of these methods control for effects of other drivers and might not be suitable for ranking them in terms of importance. The reason they may not be suitable is the distinction (which we will explain) between concepts of overall variable importance and marginal variable importance.

      Elements of Value® in retail banking

      Consider the case of a retail bank trying to understand what drives customer advocacy. Data comes from a survey of 2,500 consumers, asking how likely they are to recommend a certain brand to a friend or colleague—the core Net Promoter ScoreSM question. This likelihood to recommend becomes the outcome variable, with our goal being to understand which variables have the strongest associations with this metric.

      Potential drivers are a set of 30 attitudinal statements capturing how well a certain brand performs on the Elements of Value as experienced by customers (see Figure 1). The survey asked respondents to rate their experience with the bank on each Element of Value using a scale of 0–10. Delivering on multiple Elements of Value can lift products or services above commodity status.

      Figure 1
      The Elements of Value®
      Elements of Value®
      Elements of Value®

      In such a scenario, some analysts would turn to MLR, interpreting standardized coefficients1 as indicators of variable importance. (Standardized coefficients imply normalization of driver variables for differences in scale, so standardized coefficients are more comparable across variables than raw coefficients.) Anyone familiar with MLR knows that for a model to produce meaningful results, one must first select the appropriate variables. As is common in psychometric research, some of the 30 drivers correlate highly with others, a phenomenon called multicollinearity. Figure 2 contains model coefficients as well as additional statistical metrics for each driver selected by the algorithm.

      Figure 2
      With a likelihood to recommend as the outcome variable, here’s what MLR produces
      With a likelihood to recommend as the outcome variable, here’s what MLR produces
      With a likelihood to recommend as the outcome variable, here’s what MLR produces

      Here we need to acknowledge that a set of variables included in MLR will differ depending on how the analyst selects them. A hypothesis-driven approach ensures the highest possible interpretability of the model. But regardless of the approach, the analyst can only include a subset of variables in the model. After we removed insignificant variables, 7 of our 30 potential ones remained. The other 23 were excluded because they lacked relevance for predicting the outcome variable, or because of high collinearity with drivers already included in the model.

      Because collinearity was one reason for excluding certain variables, we cannot conclude that the seven drivers included in the model are the only important ones. In other words, MLR didn’t produce a ranking of all potential drivers by their importance.

      As for the seven included in the model, can we interpret coefficients as being relative levels of importance of each driver included in the model? Coefficients of MLR indicate what increase in the outcome variable is associated with a one-unit increase in each driver, keeping other drivers constant. In this case, though, it’s not possible to keep those constant. When drivers are collinear, which often happens with psychometric statements, improving one will likely improve others as well. MLR coefficients thus have limited practical importance here.

      MLR can still prove useful for predicting how the outcome variable would change if the company improved drivers included in the model.

      How random forest falls short

      Turning to another common predictive method, random forest,2 a useful feature of this algorithm is that it estimates how model performance would suffer if you left out a particular variable. (Random forest is an ensemble method that constructs a large number of decision trees and produces a mean or mode prediction depending on whether the dependent variable is numerical or categorical. We used the programming language R’s randomForest package and kept the default value of mtry hyperparameter—that is, the number of drivers divided by 3, or in our case mtry=10. Ntree=5,000.)  We rank our drivers from highest to lowest according to random forest’s metric of variable importance, %IncMSE3 (see Figure 3). IncMSE is defined as a percentage increase in Mean Squared Error after a driver was randomly permuted. It indicates a decrease in accuracy associated with leaving a certain driver out of the model.

      Figure 3
      Random forest’s %IncMSE produces this ranking of potential drivers
      Random forest's %IncMSE produces this ranking of potential drivers
      Random forest's %IncMSE produces this ranking of potential drivers

      As discussed earlier, psychometric data typically features high collinearity between some of the drivers. When choosing a method for analyzing variable importance, we need to make sure it provides robust results even when drivers are highly collinear. An important driver should, in theory, still be important even if it is collinear with others. To test whether random forest produces a reliable ranking of drivers even in cases of high degrees of collinearity, we created a duplicate of the driver ranked second, quality, and included this duplicate variable in the model. (A duplicate variable is a copy of the original variable and is the extreme case of collinearity—the correlation coefficient between the variable and the duplicate is 1.)

      We would expect that duplicating a driver would have no effect on its ranking. However, in this experiment, neither “quality” nor its copy ranks second any longer (see Figure 4). Just like the MLR coefficients, the drivers in random forest were penalized for high collinearity with other drivers. (One can minimize the effect of collinearity in random forest by setting mtry=1. Such hyperparameter settings will make random forest consider only one independent variable at each split, making the decision trees less similar to each other. When using random forest for predictive purposes, though, this technique might decrease performance of the model.)

      Figure 4
      After duplicating “quality,” which originally ranked second, here is the new ranking of drivers by %IncMSE
      After duplicating “quality,” which originally ranked second, here is the new ranking of drivers by %IncMSE
      After duplicating “quality,” which originally ranked second, here is the new ranking of drivers by %IncMSE

      This result makes sense once we recall how random forest defines variable importance: It indicates how model performance would suffer if we left out a particular driver—an illustration of marginal variable importance. What we want to understand, though, is how strongly each driver relates to our outcome variable independent of the effect of other drivers—the phenomenon of overall variable importance.

      Other methods commonly employed for driver analysis include Shapley values, Johnson’s relative weights, and partial correlations. Similar to MLR coefficients and random forest variable importance measures, scores provided by these methods are affected by the collinearity of drivers. Such collinear drivers often receive lower scores than drivers of similar predictive strength that don’t correlate with other drivers.

      To truly understand overall variable importance, we need to explore methods that assess the strength of the relationship between a driver and the outcome variable independently of other potential drivers. The simplest, most common of such methods is Pearson correlation.

      First we rank potential drivers by their importance for predicting the likelihood to recommend (see Figure 5). Note that Pearson correlations do not give reliable estimates of strength of relationships between variables if the variables are non-normally distributed, the data contains outliers, or the associations between variables are non-linear. In such cases, one should use other methods, such as Spearman rank correlation, which is more robust in the presence of outliers.

      Figure 5
      The Pearson correlation produces a different ranking of drivers
      The Pearson correlation produces a different ranking of drivers
      The Pearson correlation produces a different ranking of drivers

      Many companies conclude their driver analysis by shortlisting the top 5 or 10 drivers. We recommend taking an additional step, namely analyzing whether top drivers are interrelated and may be addressed simultaneously. We will explore this topic in an upcoming article.

      • Endnotes (click to expand)

        1 Standardized coefficients imply normalization of driver variables for differences in scale, so standardized coefficients are more comparable across variables than raw coefficients.

        2 Random forest is an ensemble method that constructs a large number of decision trees and produces a mean or mode prediction depending on whether the dependent variable is numerical or categorical. We used the programming language R’s randomForest package and kept the default value of mtry hyperparameter (that is, the number of drivers divided by 3, or in our case mtry=10). Ntree=5,000.

        3%IncMSE is defined as a percentage increase in Mean Squared Error after a driver was randomly permuted. It indicates a decrease in accuracy associated with leaving a certain driver out of the model.


      Elements of Value® is a registered trademark of Bain & Company, Inc.

      著者
      • Headshot of Eleonora Nazander
        Eleonora Nazander
        Expert Senior Manager, Data Science, Denver
      • Headshot of Ilker Carikcioglu
        Ilker Carikcioglu
        Expert Associate Partner, Boston
      関連するコンサルティングサービス
      • アドバンスド・アナリティクス
      アドバンスド・アナリティクスエキスパートのコメント
      Defining the Intelligent Enterprise

      A recap from DeepLearning.AI’s AI Dev 25 × NYC.

      詳細
      アドバンスド・アナリティクス
      How AI Is Starting to Transform Circular Packaging

      There are 15 AI use cases companies across the value chain can use today to accelerate circularity.

      詳細
      アドバンスド・アナリティクスエキスパートのコメント
      Making Friends with Collinearity: How Driver Interactions Can Inform Targeted Interventions

      Driver analysis helps inform decisions on which drivers deserve the greatest effort.

      詳細
      アドバンスド・アナリティクス
      How Life Sciences Leaders Are Widening the AI Capability Gap

      Most pharma and medtech companies agree that a strong data foundation is table stakes. Few invest equally in the behaviors needed to move from pilots to adoption.

      詳細
      アドバンスド・アナリティクスエキスパートのコメント
      An Alternative Methodology for Demand Forecasting with Small Data Sets

      Nested bivariate regressions can provide confidence in situations containing multiple predictors.

      詳細
      First published in 11月 2022
      Tags
      • アドバンスド・アナリティクス
      • アドバンスド・アナリティクスエキスパートのコメント

      クライアント支援事例

      アドバンスド・アナリティクス Advanced Analytics Breakthrough Lets Metals Company Optimize Yield Cost

      ケーススタディを見る

      アドバンスド・アナリティクス Advanced Analytics powers up UtilityCo’s reliability, and customers notice

      ケーススタディを見る

      顧客戦略、マーケティング Direct marketing excellence through experimental design

      ケーススタディを見る

      お気軽にご連絡下さい

      私達は、グローバルに活躍する経営者が抱える最重要経営課題に対して、厳しい競争環境の中でも成長し続け、「結果」を出すために支援しています。

      Net Promoter®, NPS®, NPS Prism®, and the NPS-related emoticons are registered trademarks and Net Promoter Score℠ and Net Promoter System℠ are service marks of Bain & Company, Inc., Satmetrix Systems, Inc., and Fred Reichheld.

      ベインの知見。競争が激化するグローバルビジネス環境で、日々直面するであろう問題について論じている知見を毎月お届けします。

      *プライバシーポリシーの内容を確認し、合意しました。

      プライバシーポリシーをご確認頂き、合意頂けますようお願い致します。
      Bain & Company
      お問い合わせ Sustainability Accessibility Terms of use Privacy Cookie Policy Sitemap Log In

      © 1996-2026 Bain & Company, Inc.

      お問い合わせ

      How can we help you?

      • ビジネスについて
      • プレス報道について
      • 採用について
      全てのオフィス