Layoff Logic

Angelica Patlán
Mar 21
4 min read

Examining Age & Tenure in Workforce Reductions with R

Background

“We have started layoffs today and unfortunately our team has been impacted.” I still remember this 1:1 with my manager when it set in that the HR team was going to lose teammates. I remember the ominously silent feeling that slowly crept into every corner of the virtual workplace and how the air felt heavy for days to come.

Layoffs have a profound effect on belief and trust in the organization for those who left and those who remained. Therefore, HR must ensure these processes are unbiased and that they review the data for concerning trends.

For this project, I acted as a People Data Analyst for the IBM Human Resources Department and assisted my manager with identifying insights into key problems.

The Data

This dataset was created by real IBM data scientists who amended it so that it is not 100% real data. The dataset has 1470 rows of individual employee data and 35 columns of employee descriptors.

Key Problems

1. A previous employee is suing our company because they think layoffs were heavily influenced by ageism. They're claiming the older employees were let go at a higher rate than the younger folks.

2. Another previous employee states that the layoffs were based on the Employee Number and that new employees were let go more than tenured employees.

Key Insights

1. A Welch Two Sample t-test showed that there was a statistically significant difference between employees laid off and those who remained based on age. Younger employees were laid off more than older employees.

2. A second Welch Two Sample t-test determined that we cannot conclude a statistically significant difference between the employee numbers who stayed and those who left. The mean of x and y shows an average of high employee numbers being laid off, which translates to employees with higher tenure at the organization.

R Analysis

To begin my analysis, I wanted to see how the important demographics (age, daily rate, distance from home, education, hourly rate, monthly income, monthly rate, number of companies worked, total working years, and training times last year) correlate, I created a correlation matrix using the cor function:

Based on the correlation matrix, there are relations between:

· Education and age

· Age and monthly income

· Age and number of companies worked

· Age and total working years

· Monthly income and total working years

To learn more about these relationships I created a scatterplot:

Based on the scatterplot, there is a linear relationship between age and total working years as there is a logical limit shown on the scatterplot. As the age increases, so does the total number of working years. There is also an interesting relationship between monthly income and age where 30–40-year-olds are making more money than the 50-60 age range. Lastly, there is a relationship between total working years and the monthly income where monthly income increases as total working years increase.

Now that there is an understanding of the different relationships in the data, I dove into the major question about age and layoffs by creating a boxplot:

While these boxplots look relatively similar, however, the “Yes” boxplot’s median is slightly lower than the “No” boxplot. Based on this visual analysis, the average age of those who were laid off is lower than those who remained at the company. But our goal is to determine how much of a difference there is and if that is significant. To do that, I did a Welch Two Sample t-test to determine the p-value or probability value as shown below. This test showed that there is a statistically significant difference between the two samples (who left and who didn’t) and that the employee’s hypothesis is incorrect: those who were laid off were younger while those who stayed were older.

I did a similar process to determine if newer employees were laid off more than tenured employees due to the company using EmployeeNumber by first creating a boxplot.

Visually analyzing this boxplot is more difficult than the previous one because the differences are much smaller. By using the Welch Two Sample t-test, I identified the p-value 0.6768 which signals that we cannot conclude a statistically significant difference between the employee numbers who stayed and those who left. By looking at the mean of x and y, we see that employees with high employee numbers were the average of those who were laid off. By looking at the data, this number is associated with employees with higher tenure, which refutes the second employee’s claim.

Conclusion

This was my first time using R to analyze employee data in this way and it was an interesting way to visualize and test hypotheses. My suggestion to my manager would be to do this analysis before a layoff, as there is usually a list of impacted employees created before actual notices go out. By analyzing the data before executing decisions, IBM can audit for potential biases and identify risks before they occur. Doing due diligence beforehand helps to mitigate reactive work afterwards.

What analysis would you have done on this data?

This project was done as part of the DAA Boot Camp Projects. Educational purpose.