Data Analyst Automation Engineer

Location:

Texas

Posted:

September 10, 2025

Contact this candidate

Resume:

Sai Krishna Pudi

Data Analyst

Georgetown, TX (Open to Relocate) +1-302-***-**** ***********@*******.*** Linkedin PROFESSIONAL SUMMARY

Data Engineer & Data Analyst with 4+ years of experience designing data pipelines, analyzing complex datasets, and building cloud-native solutions across healthcare, tech, and IT services.

Specialized in building scalable ETL pipelines, real-time data ingestion workflows, and data lake architectures using tools like Databricks, Apache Spark, Azure Data Factory, and AWS Glue.

Proficient in writing and optimizing complex SQL and Spark SQL queries for large-scale datasets, improving query performance and enabling real-time decision-making.

Hands-on experience with cloud data platforms including AWS (S3, Redshift, Glue, Athena) and Azure (Data Lake Gen2, Synapse) for distributed processing and secure storage.

Delivered executive-level insights through Tableau and Power BI dashboards, enhancing cross-functional reporting, KPI tracking, and stakeholder engagement.

Applied data modeling techniques (Star and Snowflake schema), data normalization, and data quality validation to support scalable and accurate business intelligence.

Collaborated closely with data scientists, analysts, and engineers to deliver feature-ready datasets, automate reporting, and reduce data prep time by up to 40%.

Strong foundation in data governance, HIPAA/GDPR compliance, and metadata management, ensuring ethical and secure data use across industries.

EXPERIENCE

McKesson Georgetown, TX

Data Engineer Jul 2024 – Present

Built and optimized real-time and batch data pipelines using Databricks, Apache Spark, and Azure Data Factory, enabling analytics on millions of healthcare transactions.

Engineered robust ETL workflows for ingesting structured and semi-structured data (JSON, CSV, Parquet) from APIs and third-party systems, accelerating data ingestion by 40%.

Developed scalable pipelines using SQL and PySpark for data transformation, normalization, and enrichment of clinical and operational datasets.

Designed secure and scalable architecture using Azure Data Lake Gen2, supporting both batch processing and streaming analytics.

Orchestrated event-driven ingestion workflows via Kafka and Azure Event Hubs, powering real-time alerting and reporting.

Implemented data quality validation using Great Expectations, reducing data inconsistencies in production pipelines by 35%.

Modeled Star and Snowflake schemas within Snowflake, optimizing BI query performance and enabling enterprise-level data analysis.

Delivered interactive executive dashboards in Power BI, enabling real-time visibility into inventory, logistics, and patient metrics.

Ensured compliance with HIPAA and GDPR by applying encryption, RBAC, and row-level security across all Azure- hosted datasets.

Collaborated with data science and analytics teams to provide feature-ready datasets, cutting manual data preparation time by 30%.

Capgemini India

Data Analyst Jun 2021 – Mar 2023

Orchestrated end-to-end data workflows using SQL, Spark SQL, and Python (Pandas, NumPy) to extract insights from enterprise-scale datasets and enhance client reporting capabilities.

Designed scalable ETL pipelines with AWS Glue to automate ingestion from legacy systems and third-party APIs, significantly reducing manual intervention.

Architected cloud-based data models in Amazon Redshift using star and snowflake schemas, enabling efficient business intelligence and smoother system performance.

Spearheaded the creation of dynamic dashboards in Tableau, delivering executive-level reporting that accelerated business decision-making.

Optimized complex SQL queries and Spark jobs to reduce processing time and improve system reliability across distributed environments.

Collaborated cross-functionally with business analysts and developers to translate complex business needs into structured data insights through effective communication.

Conducted advanced behavioral segmentation and cohort analysis to support targeted customer retention strategies and marketing initiatives.

Ensured end-to-end data quality and governance across pipelines using automated validation scripts and schema monitoring to maintain data consistency and trust. Trigent Software India

Data Analyst Intern May 2020 – Jun2021

Assisted in extracting and analyzing structured datasets using SQL and Excel, supporting client reporting needs for IT service projects.

Created weekly and monthly data summary reports to track key operational metrics, improving visibility into project performance for delivery managers.

Cleaned and preprocessed raw data using Python (Pandas, NumPy) to support data visualization and stakeholder analysis.

Built simple dashboards and charts in Excel to communicate trends and patterns across project timelines, resourcing, and budget allocation.

Documented data quality issues and worked with engineering teams to validate data pipelines, ensuring consistency across internal reports.

Participated in internal knowledge-sharing sessions on data visualization tools and ETL processes, gaining exposure to tools like Tableau.

TECHNICAL SKILLS

Programming & Scripting: Python (Pandas, NumPy, PySpark), SQL, Shell Scripting

Data Engineering Tools: Apache Spark, Apache Airflow, Kafka, AWS Glue, Azure Data Factory, Databricks

ETL & Data Pipelines: ETL design and orchestration, data ingestion, transformation, and scheduling

Databases & Warehousing: PostgreSQL, MySQL, MongoDB, Oracle, Amazon Redshift, Google BigQuery, Snowflake

Big Data Technologies: Hadoop, Hive, Spark SQL, HDFS

Data Modeling & Optimization: Star/Snowflake Schema, Normalization, Indexing, Query Optimization

Cloud Platforms: AWS (Glue, S3, Redshift, Lambda), Azure (Data Factory, Blob Storage), GCP (BigQuery, Cloud Functions)

BI & Visualization Tools: Power BI, Tableau, Looker, Google Data Studio, Matplotlib, Seaborn

Analytics & Reporting: Exploratory Data Analysis (EDA), KPI dashboards, Statistical Analysis, A/B Testing

Data Governance: Data quality validation, data lineage tracking, metadata management, compliance (GDPR, HIPAA)

Version Control & Collaboration: Git, GitHub, Bitbucket, Jira

Workflow & Scheduling Tools: Apache Airflow, Luigi, Cron EDUCATION

Wilmington University, Delaware, USA

Master of Science in Computer and Information Systems — Jan 2023 – Apr 2024 TKR College of Engineering & Technology, Hyderabad, India Bachelor of Technology in Information Technology — May 2016 – Apr 2020

Contact this candidate