Raviteja Nemali
Email: ****************@*****.*** Mobile: 659-***-**** Location: Alabama, United States Visa: GC-EAD
PROFESSIONAL SUMMARY
Over 5 years of experience as a Data Engineer, specializing in ETL transformations and data pipeline orchestration for analytics.
Proficient in implementing Change Data Capture (CDC) using Debezium and other tools to ensure accurate data hydration for data lakes.
Extensive experience with Apache Spark, including Data Frames, Spark SQL, and Spark Streaming for both batch and streaming data processing.
Skilled in developing and managing ETL jobs with Apache Airflow, ensuring efficient data workflows and transformations.
Strong knowledge of AWS services, including S3, EMR, Glue Data Catalog, and Lambda functions for scalable data solutions.
Proven ability to optimize performance tuning for big data applications, enhancing processing efficiency and reducing latency.
Experience in utilizing Python and PySpark for data manipulation and analysis, contributing to robust data engineering solutions.
Committed to continuous learning and professional development, holding multiple relevant certifications in data engineering and cloud architecture.
SKILLS
Programming Languages: Java (Mid to Senior level), Python (Mid level, PySpark), Scala (Basic knowledge)
Operating Systems: Windows, Linux
Cloud Platforms: AWS (S3, EMR, Glue Data Catalog, Lambda, Step Functions, MWAA), Azure
DevOps & CI/CD: Apache Airflow, AWS Batch
Development Tools: Apache Spark (Data Frames, Spark SQL, Spark Streaming), Apache Hudi (Basic knowledge), Apache Griffin (Basic knowledge)
Reporting Tools: Power BI, Tableau, Excel
Frameworks & Libraries: Apache Kafka, Apache Flink
Databases & Data Warehousing: SQL Server, MySQL, PostgreSQL
Big Data & Streaming: Change Data Capture (CDC), ETL Pipelines, Data Lakes
Testing & QA: Data Validation, Performance Testing
Security & Compliance: Data Governance, Compliance Standards
Monitoring & Observability: AWS CloudWatch, Apache Spark Monitoring
Collaboration Tools: JIRA, Confluence
Documentation Tools: Microsoft Office Suite, Markdown
CERTIFICATIONS
Azure Data Engineer Associate (DP-203)
Databricks Certified Data Engineer Associate
AWS Solution Architect – Associate
SnowPro Specialty Certifications
EDUCATION
Masters in Information Systems Faulkner University 4.0 GPA
Bachelors in Computer Science Bharath University of Technology 85%
WORK EXPERIENCE
Humana – Louisville, KY
Senior Data Engineer – Jun 2024 to Present
Spearheaded the implementation of Change Data Capture (CDC) using Debezium, enhancing data ingestion processes and improving data lake hydration efficiency by 40%.
Engineered robust ETL pipelines utilizing Apache Spark, optimizing batch processing and streaming data transformations, resulting in a 30% reduction in processing time.
Automated data orchestration workflows with Apache Airflow, leading to a 25% increase in operational efficiency and minimizing manual intervention in data processing tasks.
Collaborated with cross-functional teams to design scalable data solutions on AWS, leveraging S3 and EMR for high-performance data storage and processing.
Delivered comprehensive data analytics solutions by implementing Spark SQL, enabling real-time insights and improving decision-making processes across departments.
Optimized performance tuning for Spark DataFrames, achieving a 20% increase in query performance and enhancing overall system responsiveness.
Developed and maintained documentation for data engineering processes, ensuring compliance with best practices and facilitating knowledge transfer within the team.
Mentored junior engineers on best practices in data engineering, fostering a culture of continuous learning and innovation within the team.
Conducted regular code reviews and performance assessments, ensuring adherence to coding standards and improving code quality by 15%.
Implemented AWS Lambda functions for automated data processing tasks, reducing operational costs by 10% and enhancing system scalability.
Technologies Used: Java, Python, Apache Spark, Apache Airflow, AWS S3, AWS EMR, Spark SQL, Spark Streaming, AWS Glue, AWS Lambda
Synchrony Financial – Stamford, CT
Big Data Engineer – Jan 2023 to Apr 2024
Delivered high-impact data solutions by designing and implementing ETL pipelines using Apache Spark, improving data processing speed by 35%.
Automated data ingestion processes through CDC frameworks, significantly enhancing data accuracy and reducing latency in data availability for analytics.
Collaborated with data scientists to develop machine learning models, integrating them into data pipelines for predictive analytics, which improved customer targeting by 20%.
Optimized data storage solutions on AWS, utilizing S3 and EMR Serverless to achieve cost-effective data management and scalability for big data applications.
Engineered data transformation processes using Spark DataFrames, resulting in a 30% increase in data processing efficiency and reliability.
Implemented monitoring and logging solutions for data pipelines, ensuring high availability and performance, and reducing downtime incidents by 15%.
Participated in Agile ceremonies, contributing to sprint planning and retrospectives, which improved team collaboration and project delivery timelines.
Conducted training sessions on Apache Spark and AWS services for team members, enhancing overall team skill sets and project capabilities.
Developed data quality checks using AWS Deequ, ensuring data integrity and compliance with business requirements.
Led initiatives to migrate legacy data systems to cloud-based solutions, achieving a 25% reduction in operational costs and improving system performance.
Technologies Used: Java, Python, Apache Spark, AWS S3, AWS EMR, Apache Airflow, Spark SQL, AWS Glue, AWS Deequ, AWS Batch
Macy’s Technology – San Francisco, CA
Data Engineer – Mar 2020 to Dec 2022
Optimized ETL processes using Apache Spark, improving data transformation efficiency by 30% and enabling faster access to analytics for business stakeholders.
Developed and maintained data pipelines for real-time data processing, leveraging Spark Streaming to support dynamic reporting and analytics needs.
Collaborated with data architects to design scalable data models, enhancing data accessibility and usability across various business units.
Implemented data quality frameworks to ensure the accuracy and reliability of data used in analytics, resulting in a 15% increase in data trustworthiness.
Automated data workflows using Apache Airflow, reducing manual intervention and improving operational efficiency by 20%.
Conducted performance tuning for Spark applications, achieving a 25% reduction in resource consumption and enhancing overall system performance.
Engaged in cross-functional teams to gather requirements and deliver data solutions that aligned with business objectives, improving stakeholder satisfaction by 30%.
Created comprehensive documentation for data engineering processes, facilitating onboarding and knowledge sharing within the team.
Mentored junior data engineers, providing guidance on best practices in data engineering and fostering a collaborative team environment.
Participated in code reviews and contributed to the development of coding standards, improving code quality and maintainability across projects.
Technologies Used: Java, Python, Apache Spark, Spark SQL, Spark Streaming, Apache Airflow, AWS S3, AWS Glue, AWS EMR, AWS Lambda