SATEESH REDDY
Data Engineer
Atlanta, GA +1-210-***-**** ***********@*****.*** LinkedIn
SUMMARY
Senior Data Engineer with 8+ years of experience in data pipeline development, data lake/warehouse design, and big data processing across Retail, Healthcare, and Airlines.
Expertise in ETL/ELT pipelines, data ingestion, transformation, and orchestration using SQL, Python, Spark, Databricks, Airflow, Azure Data Factory, and Big Query.
Skilled in cloud platforms: Azure (ADF, Databricks, Synapse, ADLS), GCP (Big Query, Dataflow, Composer), and AWS
(Redshift, EMR, S3, Lambda).
Strong background in data modelling (star, snowflake), schema design, and performance tuning for analytics, BI, and ML workloads.
Hands-on with real-time data streaming using Kafka, Event Hubs, and Flume, enabling low-latency processing and insights.
Implemented CI/CD pipelines (Jenkins, GitHub, Docker, Terraform, CloudFormation) improving deployment speed and reliability.
Proven ability to reduce ETL runtime by 50%, optimize query performance by 40%, and migrate 100+ TB of enterprise data to cloud platforms.
Experienced in data governance, data quality frameworks, metadata management, and compliance reporting for enterprise systems.
Agile/Scrum practitioner with strong collaboration, stakeholder management, and leadership skills in cross-functional teams.
Certified AWS Solutions Architect – Associate and Microsoft Azure Fundamentals (AZ-900), ensuring alignment with industry standards.
TECHNICAL SKILLS
Programming & Data Tools: SQL, Python, PySpark, Java, PL/SQL, Pandas, NumPy, Scala, Shell Scripting, R
Cloud Platforms: Azure (ADF, Databricks, Synapse, ADLS, HDInsight), GCP (Big Query, Dataflow, Composer, Cloud Storage, Pub/Sub), AWS (Redshift, EMR, S3, Glue, Lambda, Athena, Kinesis)
Data Engineering: ETL/ELT Pipelines, Data Warehousing, Data Modeling (Star/Snowflake), Data Lakehouse, Data Migration, Real-Time Streaming (Kafka, Event Hubs, Pub/Sub, Kinesis), Batch Processing, API Integration
Big Data Frameworks: Hadoop, Hive, Sqoop, MapReduce, Spark (Structured Streaming, MLlib, GraphX), Flink, Storm
DevOps & Automation: Jenkins, GitHub/GitLab, Docker, Kubernetes, Terraform, CloudFormation, Airflow, CI/CD, Monitoring (Prometheus, Grafana, Datadog), Apache NiFi
Visualization & BI: Power BI, Tableau, Excel, Looker, QlikView, Google Data Studio, Reporting Solutions
Databases: Oracle, MySQL, PostgreSQL, MongoDB, Cassandra, DynamoDB, Snowflake, dbt
Other Tools & Practices: Data Governance, Metadata Management, Data Quality (Great Expectations, Deequ), Logging/Monitoring, Security/Compliance (HIPAA, GDPR, SOC2), API/REST integration
Soft Skills: Agile/Scrum, Cross-Functional Collaboration, Stakeholder Communication, Technical Documentation, Leadership PROFESSIONAL EXPERIENCE
Walmart, Senior Data Engineer Oct 2024– Present
Leading data engineering initiatives on GCP for large-scale retail analytics and reporting.
Designed and deployed scalable GCP pipelines (Cloud Storage, Dataflow, Big Query) to process terabytes of retail data, enabling faster analytics and reporting.
Built a data lake architecture with partitioned datasets, secure access policies, and lifecycle management, reducing storage cost by 20%.
Developed Airflow DAGs (Cloud Composer) for workflow orchestration, ensuring 99% pipeline reliability and improved monitoring visibility.
Optimized Big Query SQL queries and partition strategies, reducing reporting latency by 40% and improving stakeholder decision-making.
Implemented CI/CD pipelines (Jenkins + GitHub), automating deployments and cutting manual effort by 50%. PepsiCo, Senior Azure Data Engineer Feb 2020 – Sep 2024 Owned enterprise-scale Azure data engineering solutions for reporting, analytics, and migration programs.
Migrated 100+ TB of enterprise data from Teradata and Oracle into Azure Synapse/SQL DB, leveraging Sqoop, ADF, and Info Works.
Designed end-to-end ETL/ELT pipelines in Databricks (PySpark, Spark SQL), standardizing ingestion, cleansing, and transformation processes.
Enhanced data processing speed by 25% through pipeline optimization and parallelization techniques in Spark.
Built star and snowflake data models, improving BI performance in Power BI and Tableau dashboards by 30%.
Implemented ADF pipelines with monitoring and error handling, reducing downtime and enabling seamless global reporting. United Airlines, Azure Data Engineer Aug 2018 - Dec 2019 Built ETL and cloud migration pipelines supporting airline operations and customer analytics.
Designed and implemented ETL workflows with ADF, Databricks, and PL/SQL, supporting airline operations, flight analytics, and customer insights.
Migrated legacy Oracle pipelines into Azure Synapse Analytics, improving scalability and reducing maintenance overhead.
Developed data lake ingestion pipelines (Azure Blob + ADLS) to handle structured/unstructured data from multiple sources.
Built PySpark transformation logic in Azure Databricks, streamlining reporting workflows and reducing processing time from 6 hours to 2 hours.
Collaborated with analysts to deliver ad-hoc SQL/Spark queries, enabling real-time insights into flight delays, revenue, and customer satisfaction.
UnitedHealthcare, Hadoop Developer Feb 2017 – Sep 2018 Supported healthcare analytics systems using Hadoop ecosystem tools.
Developed Java MapReduce jobs for large-scale healthcare data preprocessing and cleaning within the Hadoop ecosystem.
Implemented Flume & Sqoop pipelines for ingesting multi-source data into HDFS and exporting into Oracle/Sybase systems.
Built Hive partitioned tables and custom UDFs, reducing query execution time and improving efficiency for analysts.
Automated workflows with TWS scheduler, reducing manual intervention by 40% and improving reliability of data pipelines.
Partnered with compliance teams to ensure HIPAA data governance and secure data processing. Purple Talk, SQL Developer May 2012 - Jul 2015
Contributed to database development and support for enterprise applications.
Designed schemas, optimized SQL queries, and developed stored procedures, triggers, and functions to improve system performance.
Performed database migrations, backups, and recovery operations, ensuring 99.9% uptime for production systems.
Created and maintained ER diagrams, data dictionaries, and technical documentation, supporting consistent database standards.
Supported application teams by providing ad-hoc SQL reporting and query optimization, reducing reporting turnaround by 30%.
Assisted in tuning transaction-heavy queries, improving application response time for end-users. EDUCATION
Master of Science in Computer Science, University of the Cumberlands, USA
Bachelor of Technology in Computer Science, JNTU, India CERTIFICATIONS
AWS Certified Solutions Architect – Associate
Microsoft Certified: Azure Fundamentals (AZ-900) KEY ACHIEVEMENTS
Retail Analytics (Walmart): Increased pipeline efficiency by 30% through GCP workflow optimization.
ETL Runtime Reduction (PepsiCo): Cut processing time from 8 hrs to 4 hrs using Spark optimizations.
Data Migration (PepsiCo): Migrated 100+ TB of historical data to Azure Synapse, enabling unified reporting.
Automation (Walmart & PepsiCo): Reduced manual deployment effort by 50% with CI/CD.
Healthcare Analytics (UHC): Streamlined patient data ingestion, improving compliance reporting accuracy.