Data Engineer - ETL - Spark - Airflow - AWS Expert

Location:

Erode, Tamil Nadu, India

Salary:

90000

Posted:

November 27, 2025

Contact this candidate

Resume:

Raviteja Nemali

Email: ****************@*****.*** Mobile: 659-***-**** Location: Alabama, United States Visa: GC-EAD

PROFESSIONAL SUMMARY

Over 5 years of experience as a Data Engineer, specializing in ETL transformations and data pipeline orchestration for analytics.

Proficient in implementing Change Data Capture (CDC) using Debezium and other tools to ensure accurate data hydration for data lakes.

Extensive experience with Apache Spark, including Data Frames, Spark SQL, and Spark Streaming for both batch and streaming data processing.

Skilled in developing and managing ETL jobs with Apache Airflow, ensuring efficient data workflows and transformations.

Strong knowledge of AWS services, including S3, EMR, Glue Data Catalog, and Lambda functions for scalable data solutions.

Proven ability to optimize performance tuning for big data applications, enhancing processing efficiency and reducing latency.

Experience in utilizing Python and PySpark for data manipulation and analysis, contributing to robust data engineering solutions.

Committed to continuous learning and professional development, holding multiple relevant certifications in data engineering and cloud architecture.

SKILLS

Programming Languages: Java (Mid to Senior level), Python (Mid level, PySpark), Scala (Basic knowledge)

Operating Systems: Windows, Linux

Cloud Platforms: AWS (S3, EMR, Glue Data Catalog, Lambda, Step Functions, MWAA), Azure

DevOps & CI/CD: Apache Airflow, AWS Batch

Development Tools: Apache Spark (Data Frames, Spark SQL, Spark Streaming), Apache Hudi (Basic knowledge), Apache Griffin (Basic knowledge)

Reporting Tools: Power BI, Tableau, Excel

Frameworks & Libraries: Apache Kafka, Apache Flink

Databases & Data Warehousing: SQL Server, MySQL, PostgreSQL

Big Data & Streaming: Change Data Capture (CDC), ETL Pipelines, Data Lakes

Testing & QA: Data Validation, Performance Testing

Security & Compliance: Data Governance, Compliance Standards

Monitoring & Observability: AWS CloudWatch, Apache Spark Monitoring

Collaboration Tools: JIRA, Confluence

Documentation Tools: Microsoft Office Suite, Markdown

CERTIFICATIONS

Azure Data Engineer Associate (DP-203)

Databricks Certified Data Engineer Associate

AWS Solution Architect – Associate

SnowPro Specialty Certifications

EDUCATION

Masters in Information Systems Faulkner University 4.0 GPA

Bachelors in Computer Science Bharath University of Technology 85%

WORK EXPERIENCE

Humana – Louisville, KY

Senior Data Engineer – Jun 2024 to Present

Spearheaded the implementation of Change Data Capture (CDC) using Debezium, enhancing data ingestion processes and improving data lake hydration efficiency by 40%.

Engineered robust ETL pipelines utilizing Apache Spark, optimizing batch processing and streaming data transformations, resulting in a 30% reduction in processing time.

Automated data orchestration workflows with Apache Airflow, leading to a 25% increase in operational efficiency and minimizing manual intervention in data processing tasks.

Collaborated with cross-functional teams to design scalable data solutions on AWS, leveraging S3 and EMR for high-performance data storage and processing.

Delivered comprehensive data analytics solutions by implementing Spark SQL, enabling real-time insights and improving decision-making processes across departments.

Optimized performance tuning for Spark DataFrames, achieving a 20% increase in query performance and enhancing overall system responsiveness.

Developed and maintained documentation for data engineering processes, ensuring compliance with best practices and facilitating knowledge transfer within the team.

Mentored junior engineers on best practices in data engineering, fostering a culture of continuous learning and innovation within the team.

Conducted regular code reviews and performance assessments, ensuring adherence to coding standards and improving code quality by 15%.

Implemented AWS Lambda functions for automated data processing tasks, reducing operational costs by 10% and enhancing system scalability.

Technologies Used: Java, Python, Apache Spark, Apache Airflow, AWS S3, AWS EMR, Spark SQL, Spark Streaming, AWS Glue, AWS Lambda

Synchrony Financial – Stamford, CT

Big Data Engineer – Jan 2023 to Apr 2024

Delivered high-impact data solutions by designing and implementing ETL pipelines using Apache Spark, improving data processing speed by 35%.

Automated data ingestion processes through CDC frameworks, significantly enhancing data accuracy and reducing latency in data availability for analytics.

Collaborated with data scientists to develop machine learning models, integrating them into data pipelines for predictive analytics, which improved customer targeting by 20%.

Optimized data storage solutions on AWS, utilizing S3 and EMR Serverless to achieve cost-effective data management and scalability for big data applications.

Engineered data transformation processes using Spark DataFrames, resulting in a 30% increase in data processing efficiency and reliability.

Implemented monitoring and logging solutions for data pipelines, ensuring high availability and performance, and reducing downtime incidents by 15%.

Participated in Agile ceremonies, contributing to sprint planning and retrospectives, which improved team collaboration and project delivery timelines.

Conducted training sessions on Apache Spark and AWS services for team members, enhancing overall team skill sets and project capabilities.

Developed data quality checks using AWS Deequ, ensuring data integrity and compliance with business requirements.

Led initiatives to migrate legacy data systems to cloud-based solutions, achieving a 25% reduction in operational costs and improving system performance.

Technologies Used: Java, Python, Apache Spark, AWS S3, AWS EMR, Apache Airflow, Spark SQL, AWS Glue, AWS Deequ, AWS Batch

Macy’s Technology – San Francisco, CA

Data Engineer – Mar 2020 to Dec 2022

Optimized ETL processes using Apache Spark, improving data transformation efficiency by 30% and enabling faster access to analytics for business stakeholders.

Developed and maintained data pipelines for real-time data processing, leveraging Spark Streaming to support dynamic reporting and analytics needs.

Collaborated with data architects to design scalable data models, enhancing data accessibility and usability across various business units.

Implemented data quality frameworks to ensure the accuracy and reliability of data used in analytics, resulting in a 15% increase in data trustworthiness.

Automated data workflows using Apache Airflow, reducing manual intervention and improving operational efficiency by 20%.

Conducted performance tuning for Spark applications, achieving a 25% reduction in resource consumption and enhancing overall system performance.

Engaged in cross-functional teams to gather requirements and deliver data solutions that aligned with business objectives, improving stakeholder satisfaction by 30%.

Created comprehensive documentation for data engineering processes, facilitating onboarding and knowledge sharing within the team.

Mentored junior data engineers, providing guidance on best practices in data engineering and fostering a collaborative team environment.

Participated in code reviews and contributed to the development of coding standards, improving code quality and maintainability across projects.

Technologies Used: Java, Python, Apache Spark, Spark SQL, Spark Streaming, Apache Airflow, AWS S3, AWS Glue, AWS EMR, AWS Lambda

Contact this candidate