Sai Krishna Pudi
Data Analyst
Georgetown, TX (Open to Relocate) +1-302-***-**** ***********@*******.*** Linkedin PROFESSIONAL SUMMARY
Data Engineer & Data Analyst with 4+ years of experience designing data pipelines, analyzing complex datasets, and building cloud-native solutions across healthcare, tech, and IT services.
Specialized in building scalable ETL pipelines, real-time data ingestion workflows, and data lake architectures using tools like Databricks, Apache Spark, Azure Data Factory, and AWS Glue.
Proficient in writing and optimizing complex SQL and Spark SQL queries for large-scale datasets, improving query performance and enabling real-time decision-making.
Hands-on experience with cloud data platforms including AWS (S3, Redshift, Glue, Athena) and Azure (Data Lake Gen2, Synapse) for distributed processing and secure storage.
Delivered executive-level insights through Tableau and Power BI dashboards, enhancing cross-functional reporting, KPI tracking, and stakeholder engagement.
Applied data modeling techniques (Star and Snowflake schema), data normalization, and data quality validation to support scalable and accurate business intelligence.
Collaborated closely with data scientists, analysts, and engineers to deliver feature-ready datasets, automate reporting, and reduce data prep time by up to 40%.
Strong foundation in data governance, HIPAA/GDPR compliance, and metadata management, ensuring ethical and secure data use across industries.
EXPERIENCE
McKesson Georgetown, TX
Data Engineer Jul 2024 – Present
Built and optimized real-time and batch data pipelines using Databricks, Apache Spark, and Azure Data Factory, enabling analytics on millions of healthcare transactions.
Engineered robust ETL workflows for ingesting structured and semi-structured data (JSON, CSV, Parquet) from APIs and third-party systems, accelerating data ingestion by 40%.
Developed scalable pipelines using SQL and PySpark for data transformation, normalization, and enrichment of clinical and operational datasets.
Designed secure and scalable architecture using Azure Data Lake Gen2, supporting both batch processing and streaming analytics.
Orchestrated event-driven ingestion workflows via Kafka and Azure Event Hubs, powering real-time alerting and reporting.
Implemented data quality validation using Great Expectations, reducing data inconsistencies in production pipelines by 35%.
Modeled Star and Snowflake schemas within Snowflake, optimizing BI query performance and enabling enterprise-level data analysis.
Delivered interactive executive dashboards in Power BI, enabling real-time visibility into inventory, logistics, and patient metrics.
Ensured compliance with HIPAA and GDPR by applying encryption, RBAC, and row-level security across all Azure- hosted datasets.
Collaborated with data science and analytics teams to provide feature-ready datasets, cutting manual data preparation time by 30%.
Capgemini India
Data Analyst Jun 2021 – Mar 2023
Orchestrated end-to-end data workflows using SQL, Spark SQL, and Python (Pandas, NumPy) to extract insights from enterprise-scale datasets and enhance client reporting capabilities.
Designed scalable ETL pipelines with AWS Glue to automate ingestion from legacy systems and third-party APIs, significantly reducing manual intervention.
Architected cloud-based data models in Amazon Redshift using star and snowflake schemas, enabling efficient business intelligence and smoother system performance.
Spearheaded the creation of dynamic dashboards in Tableau, delivering executive-level reporting that accelerated business decision-making.
Optimized complex SQL queries and Spark jobs to reduce processing time and improve system reliability across distributed environments.
Collaborated cross-functionally with business analysts and developers to translate complex business needs into structured data insights through effective communication.
Conducted advanced behavioral segmentation and cohort analysis to support targeted customer retention strategies and marketing initiatives.
Ensured end-to-end data quality and governance across pipelines using automated validation scripts and schema monitoring to maintain data consistency and trust. Trigent Software India
Data Analyst Intern May 2020 – Jun2021
Assisted in extracting and analyzing structured datasets using SQL and Excel, supporting client reporting needs for IT service projects.
Created weekly and monthly data summary reports to track key operational metrics, improving visibility into project performance for delivery managers.
Cleaned and preprocessed raw data using Python (Pandas, NumPy) to support data visualization and stakeholder analysis.
Built simple dashboards and charts in Excel to communicate trends and patterns across project timelines, resourcing, and budget allocation.
Documented data quality issues and worked with engineering teams to validate data pipelines, ensuring consistency across internal reports.
Participated in internal knowledge-sharing sessions on data visualization tools and ETL processes, gaining exposure to tools like Tableau.
TECHNICAL SKILLS
Programming & Scripting: Python (Pandas, NumPy, PySpark), SQL, Shell Scripting
Data Engineering Tools: Apache Spark, Apache Airflow, Kafka, AWS Glue, Azure Data Factory, Databricks
ETL & Data Pipelines: ETL design and orchestration, data ingestion, transformation, and scheduling
Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Oracle, Amazon Redshift, Google BigQuery, Snowflake
Big Data Technologies: Hadoop, Hive, Spark SQL, HDFS
Data Modeling & Optimization: Star/Snowflake Schema, Normalization, Indexing, Query Optimization
Cloud Platforms: AWS (Glue, S3, Redshift, Lambda), Azure (Data Factory, Blob Storage), GCP (BigQuery, Cloud Functions)
BI & Visualization Tools: Power BI, Tableau, Looker, Google Data Studio, Matplotlib, Seaborn
Analytics & Reporting: Exploratory Data Analysis (EDA), KPI dashboards, Statistical Analysis, A/B Testing
Data Governance: Data quality validation, data lineage tracking, metadata management, compliance (GDPR, HIPAA)
Version Control & Collaboration: Git, GitHub, Bitbucket, Jira
Workflow & Scheduling Tools: Apache Airflow, Luigi, Cron EDUCATION
Wilmington University, Delaware, USA
Master of Science in Computer and Information Systems — Jan 2023 – Apr 2024 TKR College of Engineering & Technology, Hyderabad, India Bachelor of Technology in Information Technology — May 2016 – Apr 2020