SAINATH NAINI
DATA ENGINEER
TEXAS ************@*****.*** 469-***-****
linkedin.com/in/sainath-patel-6a8a7a2b2/
PROFESSIONAL SUMMARY
Experienced and results-oriented AWS Data Engineer with 6 years of progressive experience in building, automating, and optimizing scalable data pipelines in cloud-native and hybrid environments. Specialized in AWS big data services including Glue, Redshift, EMR, S3, Lambda, and Athena, with strong expertise in data lake architecture, streaming data, ETL development, and data warehouse solutions. Adept at working in Agile teams, collaborating with cross-functional stakeholders, and delivering robust data solutions to support analytics, machine learning, and real-time reporting needs. TECHNICAL SKILLS
Cloud Platforms: AWS (Glue, Redshift, S3, EMR, Athena, Lambda, Kinesis, CloudFormation, CloudWatch), Azure (ADF)
Data Engineering: Apache Spark, AWS Glue, Apache Airflow, Kafka, Hive, DBT, Step Functions, Hudi, Delta Lake
Programming Languages: Python, SQL, PySpark, Shell Scripting Databases: Redshift, PostgreSQL, MySQL, DynamoDB, MongoDB, Aurora, Oracle Big Data & Streaming: Hadoop, Spark, Kafka, Flink (basic), AWS Kinesis, Storm (basic) Data Modeling: Star/Snowflake Schema, Normalization, Dimensional Modeling, Data Vault DevOps & Tools: Git, Jenkins, Docker, Terraform, CloudWatch, GitHub Actions, Jira, Confluence
EDUCATION
Masters in Advance Data Analytics May 2024
University of North Texas USA
PROFESSIONAL EXPERIENCE
AWS Data Engineer Jul 2024 – Present
MASTERCARD USA
Architected and maintained scalable ETL pipelines using AWS Glue and PySpark, processing over 5 TB of data daily from multiple structured and semi-structured data sources including APIs, databases, and flat files.
Designed and implemented a fully automated data lake solution on AWS S3, integrating data from 25+ internal and external sources for a 360 customer analytics platform.
Developed and deployed real-time data streaming pipelines using AWS Kinesis Data Streams, Kinesis Firehose, and Lambda, enabling near-instant reporting for fraud detection use cases.
Built and scheduled Apache Airflow DAGs to orchestrate and monitor complex multi-stage ETL workflows with conditional branching and retries.
Optimized Redshift performance by partitioning large fact tables, implementing workload management
(WLM) queues, and tuning sort/dist keys, improving query performance by over 65%.
Automated data quality checks and validation logic using Great Expectations and custom Python scripts to ensure high accuracy and consistency of ingested data.
Worked closely with DevOps teams to manage infrastructure as code using Terraform and set up monitoring/logging through CloudWatch and SNS for ETL failure alerts.
Mentored 3 junior data engineers and conducted internal workshops on PySpark, AWS Glue, and best practices in cloud data engineering.
Partnered with data science and analytics teams to expose clean, modeled datasets through Redshift for use in predictive modeling and business dashboards. Tools & Tech: AWS Glue, Redshift, S3, Athena, EMR, Kinesis, Lambda, Airflow, Python, PySpark Data Engineer Jun 2020 – Dec 2022
CGI INDIA
Built robust data pipelines using Apache Spark on EMR to ingest, transform, and store massive datasets
(~3 TB/day) from operational systems and third-party APIs.
Designed streaming ingestion framework using Kafka and AWS Glue for processing IoT data from connected devices in real-time.
Maintained a multi-zone S3 data lake, establishing clear standards for raw, processed, and curated data layers following lakehouse architecture principles.
Developed modular and reusable Python and SQL scripts for ETL transformations, error handling, and logging using AWS Glue Jobs and PySpark scripts.
Performed data partitioning, bucketing, and compression (Parquet, ORC) to optimize query performance and reduce storage costs by 30%.
Wrote HiveQL scripts for ad hoc analytics on EMR, working with analysts to support urgent reporting needs.
Created parameterized Airflow DAGs to support dynamic pipeline configurations and implemented SLA monitoring to track job performance.
Implemented row-level and column-level access controls on Redshift and S3 using AWS Lake Formation and IAM policies to ensure secure data access. Tools & Tech: AWS EMR, S3, Glue, Kafka, Hive, HDFS, Python, SQL, Airflow Jr. Data Engineer Feb 2019 – May 2020
TRANSOL SYSTEMS INDIA
Assisted in the design, development, and maintenance of ETL pipelines to support the ingestion and transformation of operational data from multiple systems into a centralized data warehouse.
Developed and optimized complex SQL queries and stored procedures to clean, join, and aggregate large datasets for reporting and dashboarding.
Participated in the migration of legacy Excel-based reporting workflows to automated ETL pipelines using Python scripts and SQL, improving efficiency and reducing manual errors.
Worked with senior data engineers to move data from on-premises systems to AWS S3 for initial cloud storage and analysis, learning the fundamentals of cloud-based data engineering.
Built and maintained scheduled data refresh jobs using cron and shell scripts to load and validate data into MySQL and PostgreSQL environments.
Conducted data profiling and validation to identify anomalies, missing data, and inconsistencies; implemented initial data quality rules using Python and SQL checks.
Created and updated technical documentation, including data dictionaries, workflow diagrams, and standard operating procedures for recurring ETL tasks.
Developed and maintained business dashboards using Tableau, transforming raw data into actionable visual insights for sales, finance, and operations teams.
Provided ad hoc data extracts and reporting support to business users using SQL, Excel, and Tableau.
Collaborated with QA teams to define test cases for data validation and contributed to UAT testing of newly deployed pipelines.
Participated in Agile development processes, including daily standups, sprint planning, and retrospectives to track tasks and enhance team collaboration.
Shadowed senior engineers on performance tuning initiatives, learning about indexing, partitioning, and query optimization techniques.
Supported the deployment of basic data masking techniques to protect sensitive customer and financial data in non-production environments.
Tools & Tech: SQL, MySQL, Python, Tableau, AWS S3, Shell Scripting, ETL Tools, Excel