Sai Sumanth Kolanupaka
Data Analyst
Austin, Tx, 78717 • **************@*****.*** • 816-***-****• www.linkedin.com/in/sai-sumanth-13b600210 SUMMARY
Data Analyst with over 3+ years of experience delivering actionable insights and improving business efficiency through data analytics and engineering solutions, including 2+ years in healthcare data analytics, working with STARs, HEDIS, RAF, utilization metrics, and MLR/HBR. Proficient in Python, SQL, .NET, with hands-on expertise in geodatabase design, ETL workflows, and GIS application development. Experienced in developing, deploying, and maintaining enterprise-scale GIS and asset management solutions. Skilled in building automated ETL pipelines, developing interactive dashboards, statistical testing, A/B experiment design and implementing machine learning models to streamline workflows. Demonstrated experience using SQL and DOMO to manage and visualize data effectively. Competent in deploying data-driven solutions to optimize resources, reduce costs, and improve decision-making. Strong at collaborating cross-functionally to analyze complex data, ensuring high-quality reporting and real-time analytics. SKILLS
● Programming & Scripting: Python (Pandas, NumPy, Matplotlib), SQL, Django, R
● GIS Development: ArcGIS Pro, ArcGIS Server, ArcGIS Portal, ArcGIS Online
● Big Data & Cloud Tools: Kafka, Spark, PySpark, Alteryx, Snowflake, Azure Data Factory (ADF), MongoDB, MS Azure, AWS
● Data Analytics: Data Cleaning, ETL Pipelines, Machine Learning Algorithms, Survey/Likert Data Analysis, Alteryx
● Data Visualization: DOMO, Power BI (DAX, Data Modeling), Tableau, R Shiny
● Databases: MySQL, MongoDB, Oracle, Snowflake
● Operations & Process Analysis: Lean Six Sigma, Kaizen
● Version Control & Automation: Git, MS Office, JIRA, Jenkins, Docker, Apache Airflow
● Data Governance: Data Quality Checks, HIPAA Compliance, Data Integrity Management PROFESSIONAL EXPERIENCE
MCKESSON, KS
Software Engineer/ Data Analyst Jul 2024 - Current
● Designed and maintained complex SQL queries to support data extraction, transformation, and loading processes, improving query efficiency and reducing processing time by 30%.
● Utilized advanced Python libraries such as Pandas, NumPy, and Matplotlib for data manipulation, analysis, and visualization, enabling actionable insights for healthcare operations.
● Conducted comprehensive data cleansing workflows, including handling missing data, outlier detection, and normalization, improving dataset accuracy and reliability by 40%.
● Developed Python-based scripts for automated data preprocessing, reducing manual effort and ensuring consistency across datasets containing over 50,000 records.
● Streamlined data validation processes using SQL triggers and stored procedures, ensuring data accuracy and compliance with regulatory standards.
● Leveraged PySpark to process large-scale clinical datasets, optimizing computation times and enabling real-time analytics for critical business decisions.
● Built reusable Python modules for data cleansing tasks, including imputation and scaling techniques, enhancing team productivity and consistency.
● Integrated SQL-based analytics with Python workflows for seamless data pipeline execution and reporting, improving end-to-end processing efficiency by 25%.
● Applied statistical analysis and machine learning techniques using SciPy and Statsmodels to uncover key trends, aiding in decision-making for patient care and resource planning.
● Enhanced database indexing and optimization strategies in SQL, reducing query execution times for high-volume datasets by 50%.
● Created Python scripts for data reconciliation tasks, ensuring alignment between production and backup databases and reducing errors by 20%.
● Visualized cleansed and processed data through interactive Power BI dashboards, improving stakeholder understanding and supporting data-driven decisions.
UBER, MO
Software Developer / Data Analyst Intern Jan 2024 – May 2024
● Assisted in developing and optimizing ETL workflows using PySpark and Snowflake, processing 100,000+ records to support ride and vehicle analytics dashboards.
● Built Django-based applications for backend data ingestion and real-time processing of ride metrics, improving analysis efficiency by 25%.
● Collaborated with the team to implement Apache Kafka for streaming real-time ride and vehicle data into MongoDB, ensuring seamless and fault-tolerant data pipelines.
● Performed data preprocessing with Python libraries (Pandas, NumPy), applying imputation, scaling, and feature engineering to enhance data accuracy by 90%.
● Supported the integration of Django with big data tools like PySpark and Snowflake to extract insights on fleet optimization, contributing to 12% cost savings.
● Assisted in demand forecasting using Python-based time-series analysis, improving peak-hour fleet availability by 15%.
● Participated in creating 3 Power BI dashboards, sourcing data from Snowflake and PySpark pipelines to provide actionable insights for decision-making.
● Contributed to building real-time analytics pipelines using Kafka and PySpark, reducing latency and manual data updates by 40%.
ADANI, INDIA
Data Analyst Feb 2021 – Jun 2022
● Conducted data mining and analysis on large-scale energy datasets using SQL, improving data accuracy and optimizing resource allocation by 15%.
● Developed and optimized SQL queries for data extraction, validation, and transformation, ensuring adherence to data governance standards and reducing redundancies by 20%.
● Built and automated data pipelines to streamline reporting processes, improving data availability and operational efficiency.
● Performed anomaly detection using SQL-based analysis, identifying equipment inefficiencies and improving operational uptime by 20%.
● Designed and implemented data quality checks to ensure clean and reliable datasets for energy performance reporting.
● Leveraged SQL insights to support the development of energy demand forecasting models, improving accuracy and enabling data-driven decision-making.
● Automated the generation of performance dashboards and reports, reducing manual effort and improving real-time monitoring capabilities.
CIPLA, INDIA
Data Analyst Intern Aug 2020 – Jan 2021
● Cleaned and analyzed pharmaceutical sales data using Python (Pandas, NumPy), improving data accuracy and enabling 10% more reliable demand forecasting.
● Developed automated ETL workflows using Azure Data Factory, streamlining data integration processes and reducing manual intervention by 25%.
● Conducted data mining and cleansing techniques to preprocess large datasets, ensuring high-quality data for advanced analytics and reporting.
● Created 2 interactive Power BI dashboards to track inventory trends and supply chain metrics, enhancing planning efficiency for stakeholders.
● Designed Python-based analytical reports, reducing turnaround time for supply chain analysis by 4 hours/week.
● Conceptualized MongoDB for data storage and processing, enabling scalable solutions for clinical data management.
● Performed time-series analysis to identify demand trends, optimizing inventory management with 8% fewer stockouts. EDUCATION
UNIVERSITY OF CENTRAL MISSOURI Missouri, USA
Master of Science in Computer Science 2022 - 2024
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY India
Bachelor of Technology in Electronics and Communication Engineering 2017 - 2021 CERTIFICATIONS
● Internship on Internet of Things(IoT)/Adobe Red Hat
● Certification in Python Programming / Michigan State University, Coursera
● Certification in SQL programming / W3Schools
● Certification in IEEE and also hosted the IEEE conference (2018-2019)
● Certification in NASA Space App Challenge 2018