Ritik Naresh Raut
DATA ANALYST
Alabama, USA 205-***-**** *****.*.*****@*****.*** LinkedIn
SUMMARY
• Skilled Data Analyst with 4 years of experience specializing in data ingestion, transformation, and advanced analytics to support data-driven business decisions across healthcare and SaaS domains.
• Proficient in developing robust ETL pipelines and managing large-scale data workflows using modern cloud platforms including AWS Glue, Redshift, and S3 for scalable data processing.
• Experienced in applying statistical methods and machine learning techniques such as logistic regression, survival analysis, and gradient boosting to build predictive models for customer behavior and risk assessment.
• Adept at feature engineering and exploratory data analysis using Python libraries (pandas, NumPy, matplotlib, seaborn) and SQL to derive actionable insights from complex datasets.
• Strong expertise in designing interactive dashboards and visualizations using Tableau and Power BI to communicate analytical findings effectively to business stakeholders and cross-functional teams.
• Skilled in implementing explainable AI frameworks, leveraging tools like SHAP to enhance model transparency and drive trust among users and decision-makers.
• Collaborative team player with experience working in Agile environments, coordinating closely with Data Engineering, Product, and Customer Success teams to translate analytics into impactful business strategies.
• Knowledgeable in data governance, quality assurance, and compliance standards, ensuring data integrity and adherence to organizational policies throughout the analytics lifecycle. SKILLS
Data Analysis & Statistics: Descriptive & Inferential Statistics, Hypothesis Testing, ANOVA, T-Test, Chi-Square Test, A/B Testing, Correlation & Regression Analysis, Time Series Forecasting, Cohort Analysis Programming Languages: Python, R, SQL
IDEs: Visual Studio Code, PyCharm, Jupyter Notebook Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow, Seaborn, dplyr, ggplot2, Keras Database Technologies: MySQL, PostgreSQL, SQL Server, MongoDB, Cassandra, Snowflake Data Visualization: Tableau, Power BI, Looker, Google Data Studio, Matplotlib, Seaborn, Plotly ETL & Data Pipeline Tools: Apache Airflow, Alteryx, SSIS, Informatica, Talend, dbt (Data Build Tool) Cloud Platforms: AWS (S3, Redshift, Athena, Lambda) Business Intelligence: KPI Design, OLAP, Dashboarding, Ad-hoc Analysis, Funnel Analysis, Root Cause Analysis, Churn Analysis, Marketing Mix Modeling
Machine Learning: Supervised & Unsupervised Learning, Clustering, Decision Trees, Logistic Regression, Random Forests, Model Validation, Feature Engineering Excel & Productivity Tools: Advanced Excel (Power Query, Power Pivot, Index-Match, PivotTables, VBA), Google Sheets, Microsoft Office Suite, Notion, Confluence Data Governance & Security: Data Quality Management, Data Privacy (HIPAA, GDPR), PII Masking, Access Control, Audit Logs, Compliance Reporting
Agile & Collaboration: Agile Methodology, Scrum Framework, Jira, Trello, Azure Boards, Git, GitHub, Bitbucket, Slack, Zoom, Microsoft Teams
Soft Skills: Cross-Functional Collaboration, Problem Solving, Critical Thinking, Stakeholder Communication, Requirements Gathering, Presentation Skills, Documentation, Mentorship WORK EXPERIENCE
Centene Corporation Alabama, USA
Data Analyst Jan 2024 – Present
• Consolidated over 12 million Medicaid and Medicare claim, encounter, and pharmacy records from SQL Server and Snowflake into a single analytical dataset, achieving 99.8% data accuracy through rigorous validation.
• Cleaned and standardized large-scale healthcare datasets using Python (pandas, NumPy) to resolve missing values, normalize diagnosis/procedure codes, and align provider identifiers, reducing preparation time by 35%.
• Designed SQL-based transformation logic to create patient-level timelines, enabling accurate sequencing of claims, pharmacy fills, and care management events for predictive modeling.
• Developed over 25 patient risk indicators, including 90-day readmission counts, chronic disease scores, emergency visit frequency, and medication adherence metrics, significantly improving model input quality.
• Collaborated with data science teams to train and validate XGBoost and LightGBM models, achieving an AUC of 0.87 for predicting 7, 14, and 30-day unplanned readmissions.
• Utilized SHAP (SHapley Additive Explanations) to identify key readmission drivers such as uncontrolled diabetes, recent ER visits, and multiple chronic conditions, translating results into actionable insights for clinical staff.
• Integrated Social Determinants of Health (SDoH) data including housing instability, food insecurity, and transportation gaps into the scoring pipeline, increasing high-risk patient identification recall by 12%.
• Automated daily ETL workflows using SSIS and SQL stored procedures to refresh risk scores and deliver them to the care coordination platform within two hours of claim ingestion, enabling same-day outreach.
• Designed interactive Tableau dashboards displaying patient risk rankings, regional heat maps, and top contributing factors, allowing care managers to target 28% more high-risk members without additional staffing.
• Tracked program impact through KPI monitoring in Tableau and Excel, demonstrating a 6.5% reduction in 30-day readmissions in pilot regions and providing ROI metrics to executive leadership.
• Created comprehensive data dictionaries, transformation documentation, and audit-ready artifacts to maintain compliance with NCQA and CMS quality reporting standards.
Druva Software India
Data Analyst Oct 2020 – Mar 2023
• Ingested multi-source customer data including backup frequency logs, restore activity records, license utilization metrics, billing history, and NPS survey results from Druva’s SaaS platform into AWS S3 using scheduled extraction jobs.
• Designed and maintained AWS Glue ETL pipelines to clean, standardize, and merge telemetry, support ticket histories, and subscription data, loading the processed datasets into Amazon Redshift for modeling and analytics.
• Utilized Amazon Redshift Spectrum to query over 500 million historical records directly from S3, reducing analytical query runtimes by 60% and enabling rapid model retraining
• Conducted detailed exploratory data analysis in Python (pandas, NumPy, matplotlib, seaborn) and SQL to uncover churn indicators such as declining backup success rates, increasing restore latency, and unresolved high-priority support cases.
• Created over 50 engineered features from operational metrics, including backup job failure ratios, license underutilization percentages, and SLA adherence scores, enhancing model input quality and improving predictive accuracy.
• Developed churn prediction models using logistic regression, survival analysis, and XGBoost in Python (scikit-learn, xgboost), achieving an 87% accuracy rate and detecting churn risk 22% earlier than previous methods.
• Implemented SHAP-based model explainability to generate account-level churn drivers, enabling Customer Success teams to apply targeted retention strategies with quantifiable business impact.
• Built interactive Tableau dashboards presenting churn probabilities, customer health scores, retention trends, and upsell opportunities, reducing account review time by 40% for Customer Success Managers.
• Deployed automated churn alerts via AWS Lambda integrated with Salesforce CRM, triggering notifications within 24 hours of risk detection, improving proactive outreach response times by 35%.
• Partnered with Data Engineering, Product, and Customer Success teams to operationalize model outputs, contributing to an 18% increase in renewal rates among high-risk accounts. EDUCATION
Master of Science in Computer Science
University of Alabama at Birmingham, Alabama, USA
Bachelor of Engineering in Computer Science and Engineering Nagpur University, Nagpur, India