When using AI to predict pay, context matters

June 29, 2023

Share this

A growing number of compensation benchmarking tools use AI to predict pay.

Pay benchmarking algorithms learn complex patterns between what people earn, the work they do, where they do it, and in what economic conditions. To master those patterns, these algorithms rely on data that combine information about compensation, work environment, and the broader economy. Together, these high-quality data sources represent the context in which people work. That rich context helps algorithmic compensation benchmarking tools make accurate estimates for detailed worker profiles, even for locations and jobs with smaller sample sizes.

ADP Research Institute built a simple model to show why context matters.

The ADP Research Institute (ADPRI) built an algorithm similar to those that power many AI-driven compensation benchmarks — but with one major difference: It predicts pay using far less context. ADPRI’s model — which doesn’t drive any ADP product — predicts pay from the characteristics of someone’s job title alone.

Amazingly, this model automatically learns patterns that capture economic realities about pay. But it also learns patterns that reflect little more than the model’s missing context.

In this report, we explain how our transformer works to estimate pay. Then we use the model to show why employers who use AI-driven compensation benchmarking tools must create an environment where compensation experts and AI experts combine their strengths.

Architecture of a transformer built by ADP Research Institute that predicts wage percentiles from job title. Model not used in any ADP product.

Architecture of a transformer built by ADP Research Institute that predicts wage percentiles from job titles. Model not used in any ADP product.

Transformers: Insights in disguise

Algorithms called large language models, or LLMs, drive ground-breaking systems such as OpenAI’s ChatGPT. Part of what powers LLMs are machine-learning algorithms called transformers. Given a user’s input text, these transformers output a response that — most of the time — looks like it was written by a human. Transformers do this by learning complex patterns of association between words and their context.

ADPRI’s model uses transformers, as do the algorithms that drive many compensation benchmarking tools on the market. Yet instead of predicting a sequence of human-readable text, our model predicts the percentile of a worker’s pay based only on the characteristics of their job title. A job title’s characteristics include the words it contains, the meaning of those words as used in job titles, and the relative positions of those words in the title.

Because we predict pay based on the characteristics of a job title rather than the job title itself, the transformer can predict pay for alternative titles for the same job, or for jobs with similar titles, even if the title never appeared in the training data.

The trouble comes when the user asks the transformer to predict pay for titles too far outside the scope of the data we used to train it.

How the strength of transformers reveals a potential weakness

Because ADPRI’s transformer draws on the meaning and ordering of words in job titles, it can predict pay for almost any title imaginable.

For example, it estimates that the middle 50 percent of workers with the ludicrous and fictional job title professional swan fighter will earn between $28.62 and $34.73 an hour, while an apprentice wizard commands only $16.36 to $21.40 an hour.

These fictitious job titles use words that exist in the titles we used to train the transformer, but in combinations outside the scope of the training data. Even more amazing, the transformer makes plausible-seeming predictions about pay for job titles that contain words that never appear in the training data.

One such word: squirrel. If your dog could read the transformer’s output, they might demand an hourly wage between $30.43 and $35.97 as a senior squirrel chaser. We offer more hilarious examples in an appendix, but for the rest of this case study, we’ll stick to less-whimsical job titles.

No one fights swans for a living in the United States, and few if any workers battle Balrogs for a paycheck. The idea that someone could earn nearly $70,000 per year chasing squirrels seems doubtful, no matter how senior they are. Yet our toy model’s flexibility allows users to engage in such flights of fancy.

In quantitative analysis, the term extrapolation describes this practice of making predictions outside the scope of training data. Silly job titles make the risks of extrapolation obvious. Yet the most problematic extrapolations also are the most subtle. For example, if a user-entered job title includes a word that exists in the model’s training data but outside of the word’s original context — such as outside of the industry where it’s well defined — the results can be just as misleading as the fictitious job titles.

Below, we use ADPRI’s title-to-pay transformer to illustrate a few of these subtle but problematic extrapolations.

Four takeaways

  1. Context matters. Because our transformer predicts wages based on job titles alone, it can’t pick up on how other factors such as job type or industry relate to wages, or how the relationship between a job title and wages depends on those other factors. Compensation subject-matter experts will help AI developers identify the contextual factors to include in a model.
  2. Transformers learn the conventions of the past, so be cautious when using them to guide policy. Machine-learning algorithms like transformers learn the patterns that emerge from the practices that produced their training data. They don’t know the value or cost of those practices, or whether those conventions are ethical in your present situation.
  3. Correlation doesn’t imply causation.1The opposite also is true: a lack of correlation doesn’t imply a lack of causation! No matter how much context you feed into your transformer, the patterns it learns don’t necessarily tell you about cause and effect. A job title that mentions a commercial driver’s license, or CDL, pays higher wages than a baseline driving job. That doesn’t mean removing “Class A” from the title while keeping the license requirement will lower the job’s median wage. Compensation experts must work together with quantitative analysts to accurately interpret a model’s implications.
  4. Transformers show tremendous promise despite these drawbacks. Our simple transformer automatically learned how job title modifiers and other attributes relate to pay. Provided more context about job type, job location, and other factors that predict pay, it could learn more complex patterns with minimal human intervention.

What this means for you

If you’re building AI algorithms to inform compensation policy, consult with subject-matter experts when building the model, defining the acceptable output, and deciding how to incorporate model output into policy. If you’re vetting compensation benchmark providers, involve both subject-matter and AI experts in the process to ask tough questions about what factors the model considers, and how the application helps users avoid the potential pitfalls of extrapolation.

Methods

Pay comparisons and transformer architecture

The visualization slides above contain notes on how we calculate baseline pay for a given job category and baseline pay comparisons for a given job modifier (e.g. “Associate”), as well as a description of the transformer algorithm’s architecture.

Job category definitions

We define three job categories: tech, customer service, and trades. We defined the candidate job titles for these types in the following steps:

  1. For each job category, select two seed jobs from the latest Occupation Informational Network (O*NET) taxonomy (accessed online March 27, 2023) that capture the essence of what we want to include in that category. For tech jobs, these are Data Scientist and Computer Programmer; for customer service, they’re Customer Service Representative and Retail Salesperson; for trades, they’re Construction Laborers and General Maintenance and Repair Workers.
  2. For each of those seed jobs, take the 10 most closely related jobs from the O*NET web page for that job (listed at the bottom of the job’s page). This yields about 20 jobs per category, including the seed jobs.
  3. From the O*NET pages for each of the jobs identified in step two, use the “sample job titles” available at the top of the O*NET page for that job. These job titles are the ones included in each category.

Appendix

Pay percentiles for some hilarious (and fake) job titles

Title25th percentile75th percentile
Apprentice wizard$16.36$21.40
Aquatic basket weaver$20.35$26.06
Chief beverage consumption officer$48.70$54.32
Satellite hit by a softball$29.54$30.43
Bull in a china shop$22.20$34.82
Person keeping a secret$20.04$23.28
Person keeping a great secret$26.07$27.60
You$26.92$30.33
Professional swan fighter$28.62$34.73
Selected hourly wage percentile predictions for fictional job titles. Output from the ADP Research Institute’s transformer.