NARENDRA BABU G
*******@*****.*** 469-***-****
PROFESSIONAL SUMMARY
• 5+ years of experience designing and implementing scalable ETL/ELT pipelines in cloud and on-prem environments.
• Hands-on expertise in setting up Change Data Capture (CDC) processes using Debezium and AWS-native tools.
• Skilled in building Spark-based ETL pipelines for both streaming and batch data processing.
• Strong experience with Apache Airflow for orchestrating and automating data workflows.
• Proficient in Python (PySpark) and Java for large-scale data transformation and integration.
• Experienced in data lake hydration, schema evolution, and data modeling for analytics.
• Deep understanding of AWS services including S3, Glue, EMR, Lambda, and Step Functions.
• Adept at optimizing Spark DataFrame performance, partitioning, and caching strategies.
• Knowledgeable in data quality frameworks and validation processes for CDC-driven pipelines.
• Skilled in debugging data ingestion issues and implementing fault-tolerant recovery mechanisms.
• Collaborated with cross-functional teams to design data pipelines for analytics and BI systems.
• Committed to building efficient, maintainable, and performance-driven data solutions. TECHNICAL SKILLS
• Languages: Python, Java, SQL, PySpark
• Big Data & Processing: Apache Spark (DataFrames, Spark SQL, Streaming), Apache Airflow, Debezium, Kafka
• Cloud Platforms: AWS (S3, EMR, Glue, Lambda, Step Functions, MWAA, Batch)
• ETL/ELT & Orchestration: Airflow, Glue Data Catalog, Spark Jobs, AWS Step Functions
• Data Quality & Governance: AWS Deequ, data validation scripts, schema checks
• Optional Frameworks (Plus Skills): Apache Hudi, Apache Griffin, Scala (basic understanding)
• Version Control / CI-CD: Git, GitHub, Jenkins
• Databases: MySQL, PostgreSQL, SQL Server, Oracle (CDC configurations)
• Performance Tuning: Spark optimization, partitioning, caching, cluster scaling
• Monitoring: CloudWatch, Airflow DAG logs, S3 event triggers EDUCATION
Master’s Degree – Management Information Systems – Lamar University – 2024 PROFESSIONAL EXPERIENCE
DATA ENGINEER ANTHEM BLUE CROSS AND BLUE SHIELD FRISCO,TEXAS JULY 2024 – PRESENT
• Built and optimized ETL pipelines using PySpark and Airflow to integrate claims and provider data.
• Developed incremental CDC processes using Debezium and AWS Glue for real-time data ingestion.
• Designed and maintained AWS data lake architecture using S3, Redshift, and Glue Data Catalog.
• Created Spark jobs for data transformation, schema evolution, and aggregation for analytics.
• Improved pipeline runtime by 65% through optimized partitioning and caching strategies.
• Automated workflows using AWS Lambda and Step Functions for end-to-end orchestration.
• Implemented data validation and monitoring using CloudWatch and custom Python scripts.
• Worked closely with analytics teams to define CDC logic and maintain synchronization integrity.
• Supported batch and streaming ingestion from multiple source systems.
• Built Airflow DAGs for scheduling and dependency management across AWS environments.
• Enhanced pipeline reliability through data quality checks and fault-tolerant designs.
• Documented architecture, lineage, and operational playbooks for ongoing maintenance. DATA ENGINEER RELIANCE GENERAL INSURANCE INDIA MAY 2018 – JULY 2022
• Developed ETL pipelines using SSIS and Python for extracting and transforming policy data.
• Created CDC-based incremental loads for financial and claims data updates.
• Designed and deployed Spark jobs for large-scale batch data processing.
• Automated job dependencies and scheduling using Airflow and SQL Server Agent.
• Built Power BI dashboards and data models for real-time analytics.
• Migrated on-prem data to AWS S3 and automated data refresh processes.
• Collaborated with cross-functional teams to ensure end-to-end data accuracy.
• Improved query performance using optimized joins and data partitioning techniques.
• Implemented reusable PySpark scripts for standardized data transformations.
• Ensured data integrity through validation and reconciliation checks.
• Maintained version control through Git and documented pipeline architecture.
• Delivered reliable, CDC-driven data solutions supporting analytics and reporting teams.