top of page

The Data Behind The Discharge

SQL Insights Into Healthcare


Background 

“You need to come into the hospital right away. Our radiologist read a brain bleed.” 

Time suddenly stood still, and the walls of that San Jose Target started closing in around me. I couldn’t believe that an ER visit for a headache led to this.  


Soon, I’d find myself whisked into a Kaiser ER getting labs and CT scans done. Thoughts of what came next filled my mind as I prepared myself for the possibility of brain surgery. But to my relief the Chief of Neurology declared that I didn’t have a brain bleed and sent me home.  


While that experience happened 10 years ago (whew!), I still remember it vividly today and the healthcare workers who helped me. That is why I was excited to dive into healthcare data to help healthcare professionals get answers to their questions.  


The Data 

The dataset used for this project is from the UC Irvine Machine Learning Repository, which focused on predicting diabetes patients' readmissions. 


Key Questions 

The questions that I sought to answer through this analysis were: 

  1. What does the distribution of time spent in the hospital look like? Do most patients stay less than 7 days? 

  2. What is the average number of procedures per specialty and which specialties have more than 2.5 average procedures? 

  3. Based on the number of lab procedures done, is the hospital treating patients differently by race? 

  4. How does the number of lab procedures correlate to the number of days in the hospital? 

  5. What were the biggest success stories (emergency patients who stayed less than the average time in the hospital) for Cardiology? 

  6. What were the summaries for the top 50 medication patients? 


Key Insights 
  1. Using a histogram in SQL, most patients stay less than 7 days at this hospital, with the most patients staying 2-3 days. 

  2. The specialties that average more than 2.5 procedures are vascular surgery, cardiology, radiology, cardiovascular surgery, and thoracic surgery. The latter two average the most procedures.  

  3. Looking at the average number of lab procedures, race does not impact care.  

  4. As the lab procedure frequency goes up, so does the average time in the hospital.  

  5. The top 50 medication patients ranged from having 64 medications to 81 and had 55 to 98 lab procedures.  


SQL Analysis 

This dataset contained 50 columns and 101,766 rows of patient data.  

To answer the first question about the distribution of time in the hospital, I created a histogram in SQL using this query.  





The stars in the histogram represented how many hundreds of patients stayed within the hospital, and based on this visualization, most patients do stay within the 7-day window, and a majority stay 3 days.  


 A new Hospital Director wanted to know which medical specialties are doing the greatest number of procedures on average. They specified they wanted a list with specialties that had at least 50 patients and averaged more than 2.5 procedures. To provide this list, I used this query to find that Thoracic and Cardiovascular Surgery were the top two out of five specialties.



 The Chief of Nursing wanted to know if the hospital is treating patients differently based on race by examining the number of lab procedures performed. To process this request, I used the following query to find that there were no significant variances that would definitively point to preferential or disparate treatment. However, we will discuss recommendations at the conclusion of this analysis.




Another ask was to identify how the number of lab procedures might correlate to the number of days in the hospital. In other words, did those who received lab procedures stay in the hospital longer? The answer I found was yes, as lab procedures increased, so did the average time spent in the hospital.  



The Hospital Administrator wanted to highlight some of the biggest success stories of the hospital. They were looking for opportunities where patients came into the hospital with an emergency but stayed less than the average time in the hospital. I decided to focus on cardiology patients and was able to pull a list using this query: 


Finally, there was a request to summarize the top medication patients and I used this query to provide the list.  




Conclusion 

By doing this analysis, I learned that time in hospital is a very important metric for hospital operations. Patients at this hospital were staying within the desired 7-day range, however, there were outliers such as 14 days. I recommend reviewing these 14-day stays to understand whether operational inefficiencies or solely health reasons led to extended stays.


In addition, while there was not a huge variance to definitively suggest that patients were receiving different care by race, there is still some. Therefore, I recommend reviewing these cases to understand why patients were referred to the lab versus others. This can help practitioners understand if there is medical bias involved or just a difference in diagnoses.


Overall, I really enjoyed analyzing this dataset and found the patient data fascinating! I welcome your feedback on my analyses and would love to know: what analysis would you have wanted to do on this data?

This project was done as part of the DAA Boot Camp Projects. Educational purpose.

Comments


bottom of page