top of page

From Ore To Iron

Analyzing Metal Extractions with Python


Background

They’re there every time I come in, ready to use. They give me a “gnarly pump” when the lift hits just right. What are “they”? Dumbbells! And if you were lucky to work out at one of the old Gold’s Gyms, your dumbbells were made of iron!


But before iron ended up at your gym or even helped Arnold Schwarzenegger shape his famous body, it had to be mined!


For this project, I am acting as a data analyst for a mining company called Metals R’ Us. Metals R’ Us collects large chunks of ore-bearing material that contains valuable iron deposits mixed with unwanted impurities such as dirt, sand, and silica. Metals R’ Us uses a flotation process where crushed ore is made into a liquid pulp, mixed with chemicals to help detach the iron from those impurities, then air bubbles are pumped through the mixture, causing iron particles to float to the surface and unwanted minerals to sink to the bottom.

 

The Data

This dataset is real data taken from March to September 2017 and is messy as some columns sampled every 20 seconds, and others every hour. Every row is a time point at 20 second intervals and the date column lists the day, month, year, and hour but not the minutes.


There are 737453 rows, and 24 columns present, and originally the data came in with commas instead of decimals, so I cleaned up the data with the following functions:


I also wanted to confirm what type of variable the date imported as, and verified it came in a string. I wanted to change this variable type to a date to clean up the data so, I used the following function:



Now that I cleaned up the data and identified the variables I am working with, I moved onto the ask from Metals R’ Us.


The Asks

My manager at Metals R’ Us has asked me to provide summary statistics for each of the columns, specifically the average, median, min, and max for every column.

The manager has advised that something “weird” happened on June 1st, 2017, and asks to investigate that timeframe. In addition, they want to know how the variables all relate.


The manager also wants to see how the % Iron Concentrate changes throughout the day.


The Insights

1.      The most important aspect of this process is the % Iron Concentrate, which showed a max of 68.01, a min of 62.05, and median of 65.05006799.

2.      Based on the visualizations of all variables during the requested timeframe, there is an inverse relationship between % Iron Concentrate and % Silica Concentrate. When % Silica Concentrate increases dramatically, the % Iron Concentrate decreases dramatically.

3.      Throughout the day, the % Iron Concentrate is not constant and increases and decreases sharply.


Python Analysis

To provide my manager with summary statistics for each of the columns, including average, median, min, and max, I used the function: df.describe()


To understand what happened on June 1st, 2017, we first need to know how long this dataset goes for, so I used the following function to get the minimum and maximum date.


Now that we have the timeframe that this dataset goes through, we can isolate the date our manager asked us about.


I then filtered the rows using a Boolean mask and created a new dataframe df_june and to address the columns, I used the variable important_cols



To help my manager understand what (if any) correlations exist, I want to use a scatterplot. However, due to the volume of variables and their relationships that requires 6 different plots. For visualization, I did the following:


To confirm whether there is a correlation, I created a correlation matrix using corr()


To help my manager understand the changes that % Iron Concentrate went through during that day, I created a line plot.



To visualize the other variables across the same time frame, I used a for loop to create separate graphs.



Conclusions

The visualizations of all of the variables showed an inverse relationship between % Iron Concentrate and % Silica Concentrate on this particular day. When % Silica Concentrate increases dramatically, the % Iron Concentrate decreases dramatically. Therefore, I would recommend to my manager to investigate why there was such a sharp increase in % Silica Concentrate. This appears to be the “weird” event that happened on June 1st, 2017, and I wonder if this is human error or a system malfunction. Either way, we would want to identify the source of this increase in % Silica Concentrate and correct it so that we get the most % Iron Concentrate.


This was my first time looking at data from the manufacturing environment and it is fascinating! What variables would you have wanted to explore more?


This project was done as part of the DAA Boot Camp Projects. Educational purpose.

Comments


bottom of page