Post Job Free
Sign in

Data Processing Solutions

Location:
Fort Worth, TX
Salary:
110000
Posted:
October 15, 2025

Contact this candidate

Resume:

NARENDRA BABU G

*******@*****.*** 469-***-****

PROFESSIONAL SUMMARY

• 5+ years of experience designing and implementing scalable ETL/ELT pipelines in cloud and on-prem environments.

• Hands-on expertise in setting up Change Data Capture (CDC) processes using Debezium and AWS-native tools.

• Skilled in building Spark-based ETL pipelines for both streaming and batch data processing.

• Strong experience with Apache Airflow for orchestrating and automating data workflows.

• Proficient in Python (PySpark) and Java for large-scale data transformation and integration.

• Experienced in data lake hydration, schema evolution, and data modeling for analytics.

• Deep understanding of AWS services including S3, Glue, EMR, Lambda, and Step Functions.

• Adept at optimizing Spark DataFrame performance, partitioning, and caching strategies.

• Knowledgeable in data quality frameworks and validation processes for CDC-driven pipelines.

• Skilled in debugging data ingestion issues and implementing fault-tolerant recovery mechanisms.

• Collaborated with cross-functional teams to design data pipelines for analytics and BI systems.

• Committed to building efficient, maintainable, and performance-driven data solutions. TECHNICAL SKILLS

• Languages: Python, Java, SQL, PySpark

• Big Data & Processing: Apache Spark (DataFrames, Spark SQL, Streaming), Apache Airflow, Debezium, Kafka

• Cloud Platforms: AWS (S3, EMR, Glue, Lambda, Step Functions, MWAA, Batch)

• ETL/ELT & Orchestration: Airflow, Glue Data Catalog, Spark Jobs, AWS Step Functions

• Data Quality & Governance: AWS Deequ, data validation scripts, schema checks

• Optional Frameworks (Plus Skills): Apache Hudi, Apache Griffin, Scala (basic understanding)

• Version Control / CI-CD: Git, GitHub, Jenkins

• Databases: MySQL, PostgreSQL, SQL Server, Oracle (CDC configurations)

• Performance Tuning: Spark optimization, partitioning, caching, cluster scaling

• Monitoring: CloudWatch, Airflow DAG logs, S3 event triggers EDUCATION

Master’s Degree – Management Information Systems – Lamar University – 2024 PROFESSIONAL EXPERIENCE

DATA ENGINEER ANTHEM BLUE CROSS AND BLUE SHIELD FRISCO,TEXAS JULY 2024 – PRESENT

• Built and optimized ETL pipelines using PySpark and Airflow to integrate claims and provider data.

• Developed incremental CDC processes using Debezium and AWS Glue for real-time data ingestion.

• Designed and maintained AWS data lake architecture using S3, Redshift, and Glue Data Catalog.

• Created Spark jobs for data transformation, schema evolution, and aggregation for analytics.

• Improved pipeline runtime by 65% through optimized partitioning and caching strategies.

• Automated workflows using AWS Lambda and Step Functions for end-to-end orchestration.

• Implemented data validation and monitoring using CloudWatch and custom Python scripts.

• Worked closely with analytics teams to define CDC logic and maintain synchronization integrity.

• Supported batch and streaming ingestion from multiple source systems.

• Built Airflow DAGs for scheduling and dependency management across AWS environments.

• Enhanced pipeline reliability through data quality checks and fault-tolerant designs.

• Documented architecture, lineage, and operational playbooks for ongoing maintenance. DATA ENGINEER RELIANCE GENERAL INSURANCE INDIA MAY 2018 – JULY 2022

• Developed ETL pipelines using SSIS and Python for extracting and transforming policy data.

• Created CDC-based incremental loads for financial and claims data updates.

• Designed and deployed Spark jobs for large-scale batch data processing.

• Automated job dependencies and scheduling using Airflow and SQL Server Agent.

• Built Power BI dashboards and data models for real-time analytics.

• Migrated on-prem data to AWS S3 and automated data refresh processes.

• Collaborated with cross-functional teams to ensure end-to-end data accuracy.

• Improved query performance using optimized joins and data partitioning techniques.

• Implemented reusable PySpark scripts for standardized data transformations.

• Ensured data integrity through validation and reconciliation checks.

• Maintained version control through Git and documented pipeline architecture.

• Delivered reliable, CDC-driven data solutions supporting analytics and reporting teams.



Contact this candidate