Conifer

How to Measure L&D Effectiveness

Intro

"So we made this investment into learning and development, but how do we know it actually made an impact?"

This is a common question I receive from leaders and finance teams. The return on investment for L&D is not always immediately clear.

Investments generally are designed to yield two outcomes: increase revenue, such as investing in hiring a salesperson, or decreasing cost, such as switching to a lower cost technology vendor. In both cases, profit rises. This is straightforward to measure, making the value in the upfront investment clear.

L&D investments, however, don't always have a clear or direct ROI. The throughline between the upfront investment and the downstream impact are unclear.

This is for a few reasons. One, L&D does not exist in a vacuum. L&D exists to improve the outcomes across the organization: helping sales sell more, helping increase productivity for engineering teams, and so forth. Attribution of the improvements in these areas to L&D is challenging for two reasons: 1) credit is shared by L&D programs and team leadership, creating tension on who "deserves" credit for performance improvement and 2) splitting up that credit is a technical (as well as political) exercise that teams are often not equipped for.

Whether it's an organization structure challenge or a technical one, the issue is the same: L&D has a seemingly invisible influence on performance that requires surface level assumptions to glean insight into the value it provides. Other functions in business don't have this problem, as they directly own the outcome they drive. L&D is a lubricant, or an accelerator of these outcomes whose influence is hard to break out without deeper technical expertise and measurement rigor that many teams are ill-equipped to conduct.

Because of these difficulties, it leads to leaders to focus on delivery or activity as a metric of success. L&D practitioners focus on how well their training was received and its perception by participants. While this shows the organization that 1) the team actually delivered something and 2) participants liked it, the true value of L&D investment is obscured. Because ROI is murky, it makes investing in L&D a more difficult proposition for leaders evaluating where to invest limited budget.

If you are a CFO, will you want to make a large investment when the return is unclear? I'd rather invest in something easier to justify and defend, where the dollars spent has a clear throughline to a desirable business outcome. L&D's impact usually relies on the "feeling" of its impact, versus hard data. People making financial decisions prefer numbers to anecdotes, and so thus, L&D investment is nominal -- enough to "have" L&D and treating it as a cost center versus a driver of value. It's not uncommon to blame a lack of training on sales performance or a lack of training for safety mishaps in a plant, so it is treated like a corporate band aid when there aren't any other clear solutions.

That said, there are ways to measure L&D effectiveness -- but it requires work and analytical rigor. One of the most common frameworks for L&D effectiveness measurement is the Kirkpatrick model. This was built by Dr. Donald Kirkpatrick in the 1950's. Kirkpatrick was a professor at the University of Wisconsin who wrote about this topic for his PhD dissertation, but his ideas were not widely popularized until his 1994 work: Evaluating Training Programs. A descriptive title.

Kirkpatrick asserts there are four levels, or categories, that measure different aspects of L&D programs.

  • Level 1: Reaction: This measures how participants react to the training -- their satisfaction and engagement.
  • Level 2: Learning: This measures what participants learned -- knowledge and skills acquired from the training content.
  • Level 3: Behavior: This measures how participants apply what they learned on the job -- doing what the training told them is the best way to do something.
  • Level 4: Results: This measures the organizational impact -- what numbers did the training completion lead to.

I like the Kirkpatrick model and think it correctly identifies the categories that L&D programs affect and that teams should measure. However, in my experience, academic-sounding models and approaches often are seen as too abstract and removed from the business to be seen as effective.

I can't say that I blame them -- when I am hearing from people in disciplines I'm less fluent in, I don't need to hear all of the technical jargon and academic principles that inform their POV. I need it explained simply. I need to understand how it is relevant to me and the problem I am trying to solve. While L&D pros want to sound smart by flexing their fluency in various principles, the true value comes from making what they know as easy as possible for teams to digest so that they can make effective investment decisions. Talk about how you know Kirkpatrick and Bloom's Taxonomy in the interview with the L&D leadership, but consider simplifying messaging when interviewing with other folks in the business. As always, tailor for your audience.

In my experience, I've found alternative framing helpful for getting buy-in and support for L&D initiatives. In addition, this is also how I categorize different types of measures. It is slightly simpler than what Kirkpatrick recommends, collapsing categories and relabeling it for ease.

Let's review each of these categories, what they are, the core measures, and why they matter. Driving improvement in each area is unique.

Adoption

Adoption focuses on training completions. I look at this category to measure the participant's activity during the training session, from enrolling, joining, and completing any activities or assessment. This signals that they "adopted" the training content and completed it.

Adoption is the "top of the funnel" for all other measures in L&D. Training needs to be adopted by the team first before we can measure other categories. In addition, low adoption may signal issues with training availability or awareness.

Specific metrics:

  • Training enrollments: Total number of people enrolled in the training, such as by signing up or adding it to their transcript in an LMS. This could also be the number of people assigned a training by their managers.
  • Participation rate: Percentage of people who enroll that attend the training.
  • Training engagements/views: Total number of people that participated in the training.
  • Training completions: Total number of people that successfully completed the training.
  • Completion rate: Percentage of people that complete trainings after starting (engagement / completions)
  • Total assessment passes: Total number of people that pass an assessment, either part of a training or as a standalone
  • Assessment pass rate: Percentage of people that pass an assessment.
  • Assessment score average: Average assessment scores across all assessment participants

These metrics are important for L&D teams and department leaders to understand if their learners are engaging with training and if they are passing course assessments. These metrics hold teams accountable to assigned training and that their teams are using what the L&D team, often in partnership with their own teams, is creating.

Driving adoption needs to be a shared responsibility between teams building training and the teams receiving it. Teams will only prioritize what leaders signal is important. L&D professionals can assign training, but without signals from their managers or department leads that this training matters or is worth prioritizing, team members will not take it seriously. Reporting on adoption helps hold leaders accountable to training that they invest in developing.

Adoption measures are easy to gather and help understand the size and scope of training's reach and performance. It contextualizes the metrics we cover below and the overall scale of the impact.

Quality

Quality metrics focus on sentiment from participants to assess if the training is perceived as valuable, or is of a high quality. These are subjective measures, typically gathered through surveys that are attached at the conclusion of training.

These measures are guardrails for L&D teams to ensure that what they produce is perceived as useful by participants. This is the primary scorecard of the learning experience designers and subject matter experts who develop the training, helping understand if the training is meeting the mark. These metrics can be used to articulate if specific audiences respond well to certain types of delivery methods, or modalities, varied training lengths, and other design choices. This helps the team building training improve their designs.

There are many ways to design surveys. Getting direct feedback from training participants, the core customer of training programs, is incredibly valuable. However, there is a balance to strike between thoroughness and ease of completion. We don't want every training to take too long, because getting a little perspective from many learners is more valuable than the inverse.

I recommend doing two types of surveys: 1) short, reaction-based surveys to individual courses or sets of courses and 2) In-depth, thorough evaluation of training programs holistically.

The in-depth survey should be used to gather broader feedback and assess trends, understand training needs to inform future priorities, and gather data to inform changes. This survey seeks to understand the overall perception of current training programs and new opportunities. These should be conducted once or twice per year as an input to inform hiring and training roadmaps.

The short, reaction-based survey should be used to evaluate the effectiveness of individual training courses. This provides fast, immediate feedback on individual courses which can then be correlated across programs to identify broad trends. Generally, I recommend keeping these standard across courses, albeit there should be additional, diverging sections pending whether the training is self-paced or live.

Survey metrics to use:

  • Net Promoter Score (NPS): Standard measure of participant willingness to recommend training. This is typically conducted on a 1-10 scale, with 9/10 as promoters, 7/8 as passives, and 5 or below as detractors. Tally these numbers, then do the calculation: (Percentage of promoters / percentage of detractors). This should give a number of -100 to 100 to articulate overall sentiment. I also like streamlining this to a 1 to 5 scale w/ 1/2 as detractor, 3 as passive, and 4/5 as promoter. This matches the metrics covered below.

The below should be calculated on a standard, 1-5 scale (also called a "likert scale") with 1s representing "Poor" and 5s representing "Excellent". This can also be "strongly disagree" at 1 or "strongly agree" at 5, pending how you choose to frame the question.

  • Relevance to role: Measure of the relevance of the training to the individual.
  • Clarity of content: Measure the content's ease of understanding, overall presentation and digestibility
  • Organization of the materials: Measure of the content's flow, length, and structure
  • Able to immediately apply: Measure of how likely the participant is to immediately apply what they learned in their job

These should be the backbone of your training quality measurements, providing learning design teams with clear feedback and guidance on if learners are getting what they need out of the training and feel that participating is a worthwhile experience.

Let's walk through sample surveys you can use to gather these

Survey Sample 1: Course completion survey

Purpose: Gather feedback from training participants on their perceptions of training. This should map back to the metrics above.

Here is a sample list of statements and questions you can use to capture the metrics listed earlier in a survey.

Please respond based on whether you strongly disagree (1) or strongly agree (5) with the statement below.

1) This training was valuable to me 2) This training was relevant to me and my job scope 3) This training was clear and easy to understand 4) The training materials were well-organized 5) I can immediately apply what I learned in the training

Please complete the free response questions below:

6) What was your favorite or most useful part of the training? 7) How can we improve this training? 8) Is there any additional feedback you'd like to share with us?

You can add different questions based on specifics of the training content that you'd like to capture, but this should serve as a baseline for any survey you need to create for training content. For this type of survey, we want it to be easy for participants to complete so that they willingly complete it and share their honest opinion. The first five questions can be put into a quantifiable metric, while the free responses will help you color the numeric scores.

This should take most participants under five minutes to complete. These surveys should be administered at the end of training sessions. Surveys should be anonymous, but collated at specific levels we discuss below to assess trends across different cohorts.

In some cases, you may want to make these surveys required to get feedback from all participants. I usually prefer making it easy to complete or incentivizing completion through some sort of prize for participating versus withholding completion status until surveys are completed. Not everyone wants to share their feedback or will be willing to provide us with useful information. We should encourage participants to want to share rather than forcing them to, in my opinion.

While the above should be applied to individual courses and sets of courses, we also need to conduct broader surveys to understand the current sentiment around training as a whole and gather insight into what to focus on next.

Survey Sample 2: L&D Program Survey

Purpose: Gather feedback from leaders and training participants on overall value and effectiveness of training programs, learn about preferences, and gather feedback into training needs to inform future roadmap plans and deliverables.

(All should have drop downs for ease of selection unless noted)

Section 1: Demographic

1) Role Level? 2) Department? 3) Manager or Individual Contributor? 4) Tenure at the organization? 5) Office location?

Section 2: L&D Program Feedback

How would you rate the overall quality of our current L&D programs?

Rate your satisfaction with each of the following aspects of our L&D programs?

(1-5 chart with the metrics we discuss above)

Please stack rank which type of training delivery is most valuable:

__ Self-paced e-learning __ Live classroom instruction / seminar-style discussions with peers __ Video __ Written materials (job aids, guides, wikis, etc.) __ Shadowing peers __ On-the-job training / stretch projects

How satisfied are you with our current learning management system?

What type of training content would you like to see more of?

Do you have any additional feedback about L&D programs or training content you would like to share?

Section 3: Organizational Learning Culture

1-5: I feel supported by my manager to invest time in learning 1-5: I feel learning and development is an organizational priority 1-5: I feel like training courses are the appropriate length 1-5: I feel like training courses allow me to build new skills vs. passively learn about theory 1-5: I feel it is easy to balance my day-to-day responsibilities with learning opportunities 1-5: I feel connected to the L&D team and am aware of training offerings 1-5: I feel confident that participating in organizational L&D initiatives will help improve my job performance and grow my career

Which groups would you like to hear more from in training courses? __ Peers __ Managers __ Executive Leadership __ Internal subject matter experts __ L&D Representatives __ Outside consultants

What do you think our organization does particularly well?

What would you like leaders to know about our L&D culture?

Section 4: Future Requests and Needs

Please rank which type of training content you'd like to see more of from 1 to 5: __ Product __ Company Culture & Values __ Job __ Leadership development __ Compliance

Why is your top selection most important to you?

What specific areas within your top selections are you most interested?

What do you think others in the organization need training on to help improve working relationships and outcomes?

Is there any other request you have for the L&D team?

This survey will be a good way to get a pulse on the overall L&D program and needs. This should give the team data to help make future program decisions and justify specific changes or enhancements to L&D programs.

Impact

Training impact is tough to measure. Unlike adoption and quality measures, which are simple counts or averages of numbers generated from the training, impact measures are less straightforward. Impact measures typically involve more advanced calculations and a deeper hypothesis on what behavior change and results training is meant to drive.

In addition, training impact data is inherently noisy. Isolating training is a variable to assess its true impact is a difficult task that involves more sophisticated analysis and experimentation. This is usually outside of scope or skill for most L&D teams. At a more basic level, the best starting point is to get directional feedback based on correlation. You can do this by assessing "trained" versus "untrained" populations and looking across different measures before and after to get some signal that training may influence improved outcomes. We will go deeper on this below.

For any team looking to start showing the impact of the work, it's important to start with understanding why the training is being built in the first place and working with stakeholder teams to determine what outcomes the training is designed to help drive. There is nuance here pending the type of organization you are working in, but if we go back to our five training pillars, we can get some ideas of what this should look like:

  • Company Culture & Values Training
  • Job Training
  • Leadership Development
  • Product
  • Compliance

Correlation is a good first step. The question we should ask is: "Does participating in training lead to better outcomes?" There are a few ways we can do this.

Before and after. Look at how the population of participants performs against a metric prior to the training participation. Post-training, track the same measure for long enough to see if there is an improvement. For example, when I was working with sales teams, I looked at the stages of the sales cycles and which parts of the cycle our teams were struggling with. "Struggle" was measured by the percentage of deals that passed from one stage to the next, looking for significant dropoffs where the majority of deals died. For us, that was discovery training -- the phase of the sales process where our team meets with customers to understand their pain points, goals, and assess fit for our solution. After building a training focused on best-in-class discovery calls, we looked at data for the next three months and saw an overall improvement in the success rate at the stage. Most notably, we saw the sales teams that participated in the training saw greater improvements in aggregate than those that did not participate. Was training the sole reason for this improvement? No, but it certainly played a role.

It may not be immediately clear which metrics training will influence. I recommend starting broadly, using judgment to create a hypothesis to identify metrics training could influence. Test and learn and assess data across different time periods to gauge where training may have had an impact. It's important to remember sample size or seasonality effects in your analysis as well. When doing correlation, we know training isn't the only thing causing a change, but at this stage, we are looking to see where training influences change to help us understand what behaviors training should help drive. Training is ultimately about building knowledge and skills that change behavior to help drive a desired result. If we know what it takes to get to a desired result, we can reverse engineer the behaviors from that to build effective training.

However, if you're like me, you may not be satisfied with only correlation. Knowing that training may influence specific areas is a good directional insight, but what about the true impact? It is difficult to isolate all of the other variables to determine what training uniquely drives. However, there are ways we can do this, especially if we have a large enough data set to get stronger signals.

I'll note that this portion is going to get more technical. It's important to note that causal analysis is not going to get us to 100% truth or certainty, but it will get us closer, which makes it a worthy aim. Understanding where and how training makes its most notable impact will help L&D teams build what is most powerful, helping drive incremental improvement to results.

There are a few different types of causal analysis that we can conduct to get there. Let's start with difference-in-difference (DiD) analysis. This is useful for understanding the lift in performance of a sales team after participating in a training or earning a certification designed to improve their product knowledge or sales skills. We'll use this as our example case since it's likely the most high value to an L&D team and also likely the most data rich use case for internal L&D.

DiD analysis compares changes in outcomes over time between a treatment group (those who received training) and a control group (those who didn't). This approach helps control for unobservable factors and pre-existing differences between groups.

Let's discuss what's needed.

Data: * Sales performance metrics: Pre- and post- training. I recommend starting with broad metrics to explore, drilling down where the lift is most notable to see if there is a stronger signal. These should include pipeline or sales cycle metrics (e.g. time to close, percentage of deals that pass from certain stages), pitch metrics (e.g. number of joint pitches with a partner organization, number of pitches with specific products or offerings, etc.), and revenue metrics (e.g. total deals, average deal size, quota attainment, etc.). * Training participation status: Identify which parts of your sales team completed the training courses associated with the outcome. This could be individual courses, series of courses, or badges indicating completion of a certification. Any completion field indicating they demonstrated understanding of the material should work here. * Control variables: This includes factors that need to be weighed appropriately to help isolate training's effect. For example, factors such as years of experience, geography, industry or vertical strength the salesperson engages with, and so forth, will need to be accounted for. Take these metrics and standardize to a "z-score" by subtracting the mean and dividing by the standard deviation of values for each. Then create a composite score with different factor weights based on the perceived effect of the item on results. This will require some testing and learning to understand if the weights feel right. You won't know for sure until you experiment and see the formula in action. * Timeline data: Know when the training occurred so you can observe before and after effects. If training occurs at different times, you can run controls for different time periods (especially important when considering seasonal effects on performance) or segment the audience by cohorts to see if time plays a significant role in relative performance.

As you may gather from reading the above, getting access to this data, ensuring its veracity and cleanliness, and preparing the data for calculation is a critical and time consuming step that will take time and resources to get to.

From there, identify your "treatment group" (e.g. salespeople who completed training) and "control group" (e.g. salespeople who did not complete training). We can then plug our numbers into a formula to start getting the analysis and gauge its accuracy.

Performance = β₀ + β₁(Training) + β₂(Period) + β₃(Training × Period) + β₄(Experience_Score) + β₅(Territory_Score) + β₆(Market_Score) + ε

You can add more here pending different controls you want to account for to get to a closer picture of reality. By plugging your numbers into excel or using programming languages like python or R to run this analysis, we get an output that helps us understand the relative lift of training participation on performance.

You may be asking: "this sounds like a lot of work", and yes, it is. I wouldn't recommend going this deep unless you have a large enough data set to account for the noise and volatility associated with low number of data points. In addition, we're looking for directional signal to understand the impact of our training to understand the ROI on training and also guide us towards improvements to our program based on these measurable outcomes.

Simple correlation may be "good enough" in most environments, but pending how data oriented the organization is, going a level deeper here will help more concretely speak to the value of training, because we are accounting for other variables that may take credit for the performance improvements. Just because performance improved after training doesn't necessarily mean the training caused the improvement. Other factors like market changes, personal growth, or territory quality might be responsible. So causal analysis will account for these factors and help assign credit where credit is due.

Filters and data views

All of the data we discussed should not only be captured at an aggregate, organizational level, but observable across different time periods and cuts to assess trends. This helps us understand where our impact is greatest and identify trends.

Time periods: * Annual * Quarterly * Monthly * Weekly * Before and after launch (window varies)

Cuts: * By department or role * By seniority or job level * By tenure * By locale or region * By training type (e.g. different training pillar, training program, etc.) * By training modality (e.g. live, self-paced)

How to report on training effectiveness

So we've talked a lot about data. Getting the numbers, ensuring their accuracy, and understanding our own performance is good -- but we also need to take the next step: sharing it with others in a way that is easy to understand.

In general, I recommend automating as much as possible. Standard operating metrics should be captured in such a way they can be pulled from the data table in an LMS into a dashboard that is updated daily, if not in real-time, to capture and report on these metrics. Creating the right views for yourself and for leaders are important.

What's next, is how we interpret this data and tell stories. This is slightly outside the scope of this, but it's important to remember that dumping a bunch of data points won't really matter if we can't analyze trends and articulate what it means. From there, we must also highlight why these data points matter and how they inform future actions. I generally like to make dashboards have different views for analyzing at different grains of depth, while also picking and choosing the most high impact views to articulate my point. Accompanying an easy-to-understand view with a short commentary on what the metrics mean and the hypothesis for what drove them is usually a good start.

Whenever looking at measurable results, it's important to remember that data is a signal to help us understand the past and forecast the future, encouraging us to make adjustments and take big swings with confidence.

In Moneyball, one of the key tensions Michael Lewis calls out is the animosity between the "old school" scouts that rely on instinct and visual appraisal and "new school" analysts who rely on data. This is still a debate in baseball circles, and how we weigh these different qualitative and quantitative factors to make the most optimal decisions. In my opinion, one is not strictly superior to the other, but must be blended to get the keenest insight and mitigate the risk of being completely wrong. We may not be selling jeans, but we shouldn't discount factors that don't show up in the data, either. Combining analytical prowess with functional expertise is usually a winning formula for making good decisions.

So when measuring performance of your L&D programs, the it's important to layer in your knowledge of the business, the trainees, and the training programs when interpreting the data. Combining this makes for more effective analysis and will help you make better decisions -- the reason we bother to measure anything at all.

Measurement is an investment in your business intelligence. The better we are at measuring the impact of our work, the better positioned we are to make the next iteration better. While there is an upfront cost to measurement, the ROI should be clear -- smarter decisions will help the business in the long run and with L&D, turning a seemingly intractable problem into numbers will go a long way in legitimizing the function in the eyes of others in the business. We all know intuitively that L&D is valuable -- this is why we go to school or clamor for training -- but we need a comprehensive mix of metrics to show this.

I hope this chapter is helpful in showing how you can better articulate the impact of your work.