I conducted a medical cost analysis using a Kaggle dataset, applying data preprocessing, cleaning, visualization, and transformation techniques with SQL and Python libraries. Additionally, I used Python for predictive modeling, specifically creating a multiple linear regression model. The key findings are as follows:
Correlation Matrix: The analysis revealed correlations between medical charges and various factors, including:
Children: Very weak positive correlation.
Sex: Very weak positive correlation.
Age: Moderate correlation.
BMI: Weak correlation.
Smoking Habits: Strong correlation.
Factors Impacting Medical Charges: The analysis highlighted that being a smoker and increasing age are the most influential factors contributing to higher medical charges. Sex and BMI have a relatively weaker impact.
Model Performance: The multiple linear regression model achieved an R-squared score of 0.749, indicating it effectively explains a significant portion of the variation in medical costs based on age, sex, children, BMI, and smoking habits. This demonstrates a moderate-to-strong relationship between these factors and medical charges.
Prediction: When applied to new data, the model predicted that a 35-year-old male smoker with an obesity-related BMI of 30 would incur a medical charge of $30,863. This estimate surpasses the median average medical charge, supporting the notion that smokers generally have higher medical costs. Additionally, being male, having a higher BMI, and smoking all contribute to expected increases in medical charges.
These findings offer valuable insights into the factors influencing medical charges, contributing to the understanding and prediction of healthcare costs
View project
Cervical Cancer Analysis with Python and Data Science Tools
Summary:
I conducted an analysis of cervical cancer data using Python and popular data science libraries, including NumPy, Pandas, Seaborn, and Matplotlib. The dataset was sourced from the UCI Repository and aimed to create a comprehensive plan to address cervical cancer as a public health concern.
About the Disease:
Cervical cancer is a significant health issue, ranking as the fourth most common cancer among women worldwide. In 2018 alone, approximately 570,000 women were diagnosed with cervical cancer, leading to 311,000 fatalities.
Disease Development:
Cervical cancer typically progresses slowly over time. Early-stage cervical cancer often presents no noticeable symptoms, but as it advances, symptoms like vaginal bleeding, watery and bloody discharge, and pelvic pain may occur.
Analysis Highlights:
I explored the relationship between several risk factors and cervical cancer, including age, HPV infection, number of sexual partners, STDs, smoking status, and hormonal contraceptives. Notably, persistent HPV infections were strongly associated with cervical cancer risk.
Screening Test:
Effective and accessible screening tests, such as the Hinselmann test and colposcopy, play a crucial role in early detection. Colposcopy, in particular, aids in precise diagnosis for women with abnormal pap smear results.
Diagnostic and Treatment:
The treatment approach for cervical cancer depends on factors like cancer stage, patient health, and preferences. Treatment options may include chemotherapy, radiation therapy, surgery, or a combination of these modalities.
Title: Superstore Sales Analysis using Python
Summary:
This portfolio project delves into a comprehensive sales analysis of a major U.S. Superstore. Leveraging Python and its libraries, the analysis explores various facets of sales trends, offering insights into past performance, market potential, customer behavior, and product performance.
Background:
The dataset contains 9,994 data points, originally featuring 20 columns that provide insights into customer purchase habits. Feature engineering was applied to enhance the dataset, with a focus on sales trends from 2014 to 2017.
Objectives:
The primary aim is to assess sales growth across various influencing factors. The analysis answers key questions about peak sales periods, top-selling product categories, target customer demographics, and key sales regions. The process involves data cleaning, manipulation, feature engineering, and data visualization using Python libraries.
Key Findings:
The analysis reveals that sales have significantly increased, particularly between 2016 and 2017, driven by high-demand technical appliances. The majority of customers are individual consumers located in California, with a keen interest in technology products. Recommendations include maintaining product quality, availability, and customer relations to secure future sales.