Research questions and technical notes
In the September issue of Today at Work, we examined employee motivation and commitment to their employer. My colleagues Dr. Mary Hayes and Jared Northup developed a tool – the EMC Index – to measure those aspects of worker sentiment, a project described in detail for ADPRI’s Data Lab.
Their work inspired a number of questions. Does a first promotion affect an employee’s chance of leaving their company? And does the impact of a promotion vary by the type of worker promoted?
In this technical note, we describe the methods used as we sought to estimate the impact of promotions. We use footnotes to cite references, highlight limitations of the study, and how we plan to address those limitations.
Population of interest
Our study focused on U.S. workers at large, private nonfarm employers. Large employers have greater resources and thus are more likely to keep accurate and up-to-date HR records, which provides us with quality data necessary to address our research questions. We were particularly interested in workers who had recently been promoted for the first time by their employer.
Units of analysis
- Employee-month: Our most granular unit of analysis, defined as a calendar month during which ADP observed a particular person working at a particular employer for at least one day.
- Employment spell: A continuous period of employment from the month of hire to the month of termination (if one has occurred) or to the present month. A person can work more than one employment spell for the same employer.
- Employee: A person working for a particular employer.
- Person: An individual identified by their Social Security number (SSN), which is anonymized via a hashing function so that researchers do not see the actual SSN, and cannot link to other third-party data where individuals are identified by their SSN.
- Employer: A firm, company, or other enterprise identified and de-duplicated using internal and third-party data.
We focused on the impact of an employee’s first promotion achieved within a particular employment spell rather than the impact of promotion over a person’s career across multiple employment spells.1Future research will consider the impact of any promotion, no matter how many promotions were received prior, as well as how that impact varies based on an employee’s promotion history within – and possibly across – employment spells.
Outcomes of interest
The risk of leaving a company is based on an event: termination of employment for any reason.
We defined termination as the month when the date of termination occurs (if the date is known) or a month that starts a period of unpaid and inactive employee status that lasts at least three months (if the termination date is unknown).2Future research will explore the impact of promotion on cause-specific termination, such as voluntary termination (quitting, retiring) vs. involuntary termination (layoff, firing, and the like).
We measured termination risk using a cumulative incidence function (CIF), a curve that measures the chance that a person’s employment spell ends within a given number of months since their first promotion. We measured the CIF over a nine-month post-promotion window for each of the time horizons within it (that is, within one month of promotion, two months, and so on). We chose a nine-month window based on the assumption that this period was long enough to capture the effects of a promotion, but not so long that our findings would be biased by the effects of later promotions.3When we extend this research to include all promotions (not just the first), we won’t have this problem because the effect of promotion will depend on prior promotion history.
Below is a pair of post-promotion CIFs that describe an employee’s risk of leaving under two scenarios, one in which they were promoted (the factual scenario, given our interest in employees who get promoted), and one in which they received no promotion (the counterfactual scenario, because those employees were in fact promoted).
Treatment definition
Our research asked questions about causation. Causal inference seeks to measure the impact of an intervention, often called the treatment. In our report, the treatment was an event: An employee is awarded their first promotion.
We identify that first promotion as the first month in an employment spell in which an employee’s managerial level is at least one level higher than it was in the previous month. Once a first promotion occurs, we consider an employee to be in an ever-promoted state for all later months in the employment spell.
To capture the potential time-varying effects of our treatment – the first promotion – we define the first promotion treatment variable as the number of months during which an employee has been in the ever-promoted state. Defining the treatment variable in this way allows us to measure how the impact of the treatment evolves over time.
Treatment effect definition
With the first promotion treatment variable in hand, we define the treatment effect based on contrasts in termination risk between the factual scenario (an employee received their first promotion) and a counterfactual scenario (the employee did not get promoted).
Because this treatment effect focuses on employees who got the treatment (a first promotion), it is a conditional average treatment effect among the treated.4We focus on the treatment effect among the treated because it requires fewer assumptions to estimate using our statistical modeling approach. This treatment effect shows what would have happened if employers did not promote the employees that they promoted. The treatment effect among the untreated would tell us what would have happened if employers had promoted employees they had not promoted. It is an average effect because it compares two sets of probabilities, which are averages over binary outcomes. It is a conditional average effect because we measure it for a particular subset of the population as defined by employee and job attributes.
To compare these two scenarios, we calculate a risk ratio by dividing the termination CIF in the promotion scenario by an estimate of what the termination CIF would have been without promotion.
We use the risk ratio because we want to compare the promotion effect across multiple groups based on job attributes. Because those groups differ in their baseline termination risk, it makes the risk difference difficult to assess across groups. To transform the risk ratio into a familiar metric, we subtract one, then multiply by 100 to obtain the percent difference in termination risk between the promotion and no-promotion scenarios.
Another treatment effect we report is the difference in expected remaining tenure, which tells us how many months of work are gained or lost as a result of a first promotion. To calculate this treatment effect, we sum the differences between promotion and no-promotion termination risk estimates along the corresponding CIFs.5Ideally, we should have reported the relative difference in expected remaining tenure, as well, because baseline remaining tenure varies widely by job requirements and level.
Causal identification
The fundamental problem of causal inference is that we can’t directly observe factual and counterfactual scenarios at the same time. In our case, we can’t see what would happen both with and without promotion. Instead, we compare the termination risk of promoted employees to employees who weren’t promoted.
Because employers don’t randomly promote employees, it’s highly likely that any difference in termination risk between promoted and unpromoted employees is because they differ in a way that is related to both promotion and termination, rather than because of the treatment itself.
For example, the promoted group perhaps includes more people in jobs that require a high level of educational attainment, which conceivably improves promotion opportunity and either reduces or increases termination risk, depending on the circumstances.
Such shared causes of both treatment and outcome are called confounders. Confounders – when left unaccounted for – bias estimates of treatment effect.
In this analysis, we account for four observed confounders:6Clearly, these are not the only possible confounders of promotion and termination. Future research will adjust for other observed confounders, such as wage history, industry, and work location. Moreover, there are multiple confounders that we cannot observe from ADP data, such as employee productivity, employer promotion and termination practices, and manager characteristics. Future research will estimate employee/manager/employer effects (either fixed effects or Mundlak-ian random effects) to assess how robust our findings are to adjustment by unobserved confounders.
- Tenure: The number of months since hire within an employment spell. It’s well-known that employees with less tenure leave at higher rates but promote at lower rates.
- Job requirements: The amount of education, experience, and training that a job requires, which influences someone’s promotion prospects as well as their ability to find another job. We measure job requirements using Occupational Information Network (O*NET) Job Zones, a five-point ordinal scale for the level of education, related experience, and on-the-job training that an occupation (as defined by O*NET) requires. A proprietary ADP algorithm matches O*NET Standard Occupational Classification (O*NET-SOC) codes to employee job titles entered by ADP clients. Each O*NET-SOC code is assigned a Job Zone by O*NET researchers.
- Managerial level: A person’s managerial rank might improve promotion prospects and outside job opportunities. For our research, these levels were obtained from direct reporting structure as recorded by ADP clients. We define the following managerial levels:
- Individual contributors have no direct reports and have no words in their job title that signify management.
- Managers without direct reports have no direct reports but do have words in their job title that signify management.
- 1st-level managers have direct reports who do not themselves have any direct reports.
- 2nd-level managers are a level above 1st-level managers, and 3rd-level managers are a level up again from 2nd-level managers.
- 4th-level managers and above are one or more levels above 3rd-level managers.
- Gender: Norms and discrimination influence both promotion prospects and job opportunities.7Other demographic characteristics could have similar effects, such as age and ethnicity. Future research will adjust for these confounders. Adjusting for demographic characteristics such as ethnicity will, however, be tricky given the relatively low response rates and high “decline to state” rates for some EEO-1 questionnaire items. For example, gender might work against women seeking promotions or access to professional networks that might improve their promotion prospects. Gender norms also tend to increase caretaking burdens among women, which influences promotion and job prospects. Our measure of gender comes from Equal Employment Opportunity Component 1 (EEO-1) survey questions typically asked either of job applicants or new hires.
Data sources
The analysis uses three joined data sets:
- ADP payroll and HR data: Data obtained from ADP human capital management systems such as Workforce Now and Global View, taken primarily from businesses with more than 50 workers. Clients who have opted out of analytics have been excluded.
- ADP client data: Information on industries and location supplied by ADP clients. Client data is anonymized and aggregated for all results shown in Today at Work reports.
- O*NET occupational codes and Job Zones.
Sample inclusion criteria
Period inclusion criteria
- Due to issues with the quality of ADP-to-O*NET matching before mid-2018, we include HCM data dating from the beginning of 2019.
- Because COVID-19 disrupted seasonal patterns, our post-2018 sample covers only one pre-pandemic year (2019), one year split between the pre- and post-pandemic period (2020), two full post-pandemic years (2021 and 2022), and one partial post-pandemic year (2023). That is not enough time to reliably measure seasonal effects or how they changed in response to the pandemic shock. We avoid seasonality bias for the partial year of 2023 by excluding HCM data past the end of 2022.8Because promotion and termination rates vary by year due to economic shocks such as the pandemic, future research will at the very least also adjust for year as a period confounder.
Employer inclusion criteria
- Meet the study’s period inclusion criteria.
- Not included in the government or resources/mining industry supersectors, as defined by the North American Industry Classification System (NAICS).
- Employ at least 1,000 people. We operationalize this criterion by including an employment spell in the sample only if all the employee-months in that employment spell were spent at a company that meets the headcount criterion.
- Possess at least 90 percent data coverage needed to calculate outcome, treatment, and confounder variables, including confounder variables not covered in this study, but that we intend to include in future analysis extensions (namely industry, wages, age, and work location). We assume that HCM data quality is poor for employers who do not report these variables with high frequency.9This assumption is problematic for O*NET-SOC codes because the quality of the match between occupation codes and the job titles entered by ADP clients may depend on factors that – left unaccounted for – bias our treatment effect estimate. Future research will try to account for these factors by considering the potential pathways along which O*NET-SOC code measurement error can bias the estimate.
Employee inclusion criteria
- The full employment spell is observed from the month of hire to termination (if it occurred) or to the present. Without observing an employment spell since the month of hire, it’s impossible to know from our HCM data whether an employee has ever been promoted. That missing information leads to a phenomenon known as immortal time bias.
- Employment spells where the employee was hired to the highest managerial level are excluded. Our managerial level measure is truncated at the fourth level. For this reason, there is no chance of promotion for individuals hired into that level, so the treatment of getting first promotion is irrelevant.
Employee-month inclusion criteria
We include all hire-months during eligible employment spells except the hire month. We exclude the hire month because we eventually want to adjust for the confounding effects of pre-treatment wage history, which is, at least for the current employer, immeasurable during the hire month. In any case, few promotions occur during the hire month.
We also include only employee-months with non-missing outcome, predictor, and confounder variables. Because we already select companies that maintain quality HCM records, we assume that any remaining missing variables are random and don’t bias our impact estimates.10Obviously, this assumption is tenuous. Future research will examine how robust our results are when we relax our assumption that data is missing at random at the worker-month level, either through imputation, or perhaps through post-stratification.
Statistical model
Modeling approach
Our treatment variable and our tenure confounder are approximately continuous. Our outcome variable is a right-censored event, meaning that it may not have yet occurred for a given employment spell. For these reasons, we estimate the conditional average promotion effect among the promoted using model-based standardization and the parametric g-formula within the framework of survival analysis.11See Hernán MA, Robins JM (2020). Causal Inference: What If. Chapter 17. Boca Raton: Chapman & Hall/CRC.
The modeling proceeds in the following steps:12Some researchers may ask why we didn’t use one of the many modern staggered-entry, difference-in-difference (DiD) methods if our interest is in the average treatment effect among the treated, which is the target of DiD. Although we might at some point examine how our results differ under DiD specification, we eventually intend to look also into the average treatment effect among the untreated and the overall average treatment effect. The parametric g-formula makes the estimation of these alternative treatment effects possible and tractable.
- Construct data set of employee-months that meet the sample inclusion criteria. Each employee-month observation has: a binary indicator for whether termination occurred in that month; the treatment variable counting the number of months that the employment spell has been in the ever-promoted state (zero if not yet promoted); and the values of the confounders in that employee-month.
- Build a generalized additive model (GAM) from the logit family. This GAM estimates the conditional probability of termination for an employee-month as a function of simple linear and smoothing spline terms for the treatment variable and confounders (see model specification, below). When used to estimate probabilities for each employee-month in an employment spell, this GAM estimates a conditional discrete hazard function.
- For members of the sample who were in fact promoted, use the GAM to predict the discrete hazard of termination during the nine months after the promotion month both in the case that they were promoted (the factual scenario) and were not promoted (the counterfactual scenario).
- Define the groups over which to disaggregate the treatment effect. For example, suppose we want to see how the treatment effect varies by Job Zone.
- For each month in the post-promotion window within each disaggregation group, take the average of the predicted hazards for each treatment scenario, respectively.
- For each disaggregation group, use the standardized hazards from step 5 to compute the standardized CIF.
- User the standardized CIFs from step 6 to compute the treatment effects of interest.
Model specification
We use the bam function in the mgcv package in R to specify the GAM with the following treatment and confounder effects:
- Penalized tensor product smooth representing treatment effect modification by tenure. This accounts for possibilities such as promotion mattering more to people who have waited a long time for it. The penalized tensor product smooth also accounts for the confounding effect of tenure.
- Penalized spline of treatment that varies by Job Zone, plus a dummy-encoded (aka one-hot encoded) Job Zone effect. These effects account for the possibility that promotion has different implications depending on education, training, and experience. It also adjusts for job requirement confounding.
- A similar pair of spline and treatment effects for managerial level and gender, respectively, included for analogous reasons.
Sample characteristics
The sample used to build the statistical model comprises 1.25 million employees at 381 employers for 7.19 million employee-months during 1.32 million employment spells. Of those employee-months, 157,379 (2 percent) were spent in the ever-promoted state. Of the total employee-months, there were 949,464 terminations (13 percent), which implies a high monthly termination rate compared to the analogous national average seasonally adjusted job separations rate.
Higher turnover in larger companies is one reason for the higher-than-average termination rate in our sample. Another reason is that the sample is weighted heavily toward the high-turnover leisure and hospitality industry at the expense of lower-turnover industries such as education and health services.
Yet another reason for the high termination rate in our sample is an over-representation of employees at jobs in the lowest two Job Zones. The lower the Job Zone, the higher the turnover. Together with the over-representation of leisure and hospitality, the over-representation of lower-level Job Zones suggests that large employers who maintain quality HR records tend to be large companies that employ sizable frontline workforces with high turnover.
Another reason for the high termination rate in our sample is a slight over-representation of individual contributors, and thus an under-representation of managers. The higher the managerial level, the lower the turnover.
In addition, our sample comprises workers hired no earlier than January 2019. For that reason, half of the employee-months in our sample are for employees who had less than 10 months of tenure in that month. Low-tenure workers have higher termination rates.
Finally, our data contains employee-months from 2020, a year of high separation rates due to the onset of a global coronavirus pandemic.
Where we go from here
This technical note highlights the limitations of our Today at Work research on promotions. In future research, we will extend the analysis described in this note in several ways to overcome those limitations. The five most important extensions will be:
- Calculate the effect of promotion among employees who were not in fact promoted by their employer, as opposed to the effect of promotion of employees who were in fact promoted by their employer. If these effects are different, it can give a clue about whether the assumptions of our causal identification approach are violated. These two effects also can differ if the baseline termination rates between promoted and unpromoted individuals are different.
- Examine the effects of any promotion, not just the first. Perhaps the second promotion matters more than the first, and so on. Perhaps long gaps in promotion history matter to employees. Moreover, by extending the analysis to any promotion, we can worry less about our assumption that nine months is a short enough time to exclude the effects of later promotions.
- Adjust for unobserved confounding at the employee, manager, and employer level. Unobserved confounders in our case would include employee productivity, manager efficacy, and employer-specific norms concerning promotion and termination.
- Adjust for more confounders that we can observe (such as ethnicity, wage history, industry, and work location).
- Explore imputation methods to fill in missing values rather than select on observed values.
It’s possible that our perspective on the impact of promotions might change after extending the analysis in these ways. That’s how science works.