Data Engineer with 5+ Years of Hadoop to Lakehouse Expertise

Location:

Corpus Christi, TX, 78412

Salary:

70000

Posted:

March 19, 2026

Contact this candidate

Resume:

Rahul Padma Data Engineer

Corpus, Christi, TX +1-205-***-**** *************@*****.*** LinkedIn SUMMARY

Data Engineer with 5+ years building production-grade pipelines, Lakehouse architectures, and analytics-ready datasets across healthcare and enterprise domains. Deep expertise in PySpark, Airflow, dbt, Kafka, and AWS — with a strong track record in CDC-based ingestion, HIPAA-compliant data governance, and data observability using Monte Carlo. Equally comfortable enabling downstream analytics by delivering pipeline-driven datasets for KPI reporting, dashboards, and stakeholder insights via Power BI and Tableau. TECHNICAL SKILLS

Programming & Querying: Python (pandas, PySpark), SQL (complex joins, window functions, CTEs, performance tuning), Bash Data Engineering & Architecture: ETL/ELT, medallion architecture (Bronze/Silver/Gold), lakehouse architecture, CDC, schema evolution, data lineage, SCD Type 1/2, star/snowflake schema, data contracts, data mesh, metadata management, batch & streaming pipelines Big Data & Processing: Apache Spark, PySpark, Apache Kafka (Confluent Cloud), Apache Flink, Spark Structured Streaming, Delta Lake, Apache Iceberg, Parquet, Avro, ORC, partition pruning, broadcast joins Orchestration & Transformation: Apache Airflow (DAG authoring, scheduling, SLA monitoring), AWS Step Functions, dbt Core & dbt Cloud

(models, tests, snapshots, macros), Azure Data Factory Cloud Platforms: AWS (Glue, EMR, S3, Redshift, Lake Formation, Glue Catalog, CloudWatch, Step Functions), Azure (Databricks) Databases & Storage: Amazon Redshift, Snowflake, Databricks Lakehouse, Azure Synapse, PostgreSQL, MySQL, S3 / ADLS Data Quality & Governance: Great Expectations, Monte Carlo, dbt tests, Debezium (CDC), OpenLineage, AWS Glue Catalog, Azure Purview, HIPAA compliance, PII masking, anomaly detection

DevOps, BI & Tooling: Docker, Kubernetes, Terraform, CI/CD (GitHub Actions), Git, Power BI, Tableau PROFESSIONAL EXPERIENCE

Change Healthcare, TX Jan 2025 – Present

Data Engineer

• Designed scalable PySpark pipelines on AWS Glue with Delta Lake, implementing medallion architecture across bronze, silver, and gold layers for claims and member ingestion. Resolved schema mismatch and late-arriving data issues at ingestion and integrated Monte Carlo to establish end-to-end data observability across the platform.

• Implemented Change Data Capture (CDC) using Debezium and AWS Glue with S3 event triggers for incremental ingestion from upstream healthcare systems, eliminating full-reload processing and cutting daily pipeline runtime.

• Built Airflow DAGs to orchestrate batch and streaming pipeline workflows across the data platform, incorporating retry logic, SLA callbacks, and dependency management to improve pipeline reliability and reduce incident resolution time.

• Configured data pipeline monitoring dashboards and CloudWatch alerting across all pipeline layers, enabling proactive SLA breach detection and improving data reliability and observability across the platform.

• Developed dbt models for claims, provider, and eligibility domains with star schema mart design, SCD Type 2-dimension handling, and built-in dbt tests covering null checks, uniqueness, and referential integrity, enforcing data contracts between producer and consumer teams to catch quality issues before they reach dashboards.

• Deployed Great Expectations suites covering schema validation, null rates, and referential integrity across all ingestion pipelines, establishing a foundation for data reliability and trust across the platform.

• Established data governance practices including PII classification, OpenLineage-based data lineage tracked via AWS Glue Data Catalog, and HIPAA-compliant fine-grained access controls using AWS Lake Formation across all ingestion pipelines.

• Packaged dbt projects and pipeline code in Docker containers and deployed via GitHub Actions CI/CD with automated test gates, moving the team from weekly to daily releases and significantly reducing production incidents. Hexaware Technologies, India Sep 2021 – Dec 2023

Data Engineer

• Designed AWS-based ETL/ELT pipelines using PySpark on EMR and Glue, ingesting data from enterprise APIs and relational databases into Redshift. Improved data availability by removing manual handoffs, automating full ingestion flows, and aligning pipeline ownership with data mesh domain principles to support decentralized data platform engineering.

• Introduced Apache Kafka managed via Confluent Cloud for event-driven ingestion of high-frequency transactional data and built Python consumers streaming events into S3 landing zones, replacing brittle batch polling with a reliable, low-latency streaming ingestion layer.

• Built Apache Flink and Spark Structured Streaming jobs for real-time transformations, unifying batch and streaming pipelines under a single data platform for consistent, low-latency data delivery.

• Migrated reporting pipelines to a lakehouse architecture on S3 and Delta Lake, enabling ACID transactions, time-travel, and schema evolution. Reduced reprocessing incidents and cut analytics query costs using Parquet partitioning on Athena.

• Built a data catalog on AWS Glue Catalog with schema docs, ownership tagging, and metadata management covering lineage, classification, and business glossary. Ran Great Expectations checks at ingestion boundaries and deployed via Docker and GitHub Actions CI/CD, bringing deployment failure rate down from 15% to under 2%. Mphasis, India Nov 2019 – Aug 2021

Associate Data Engineer

• Built ETL pipelines in Python and SQL to extract and load data from PostgreSQL, MySQL, and flat files into a central data warehouse — automating ingestion from multiple source systems to support business reporting.

• Improved extraction and transformation runtime by rewriting slow SQL queries with indexing, partition awareness, and Python batch processing; wrote reconciliation checks that reduced recurring data issues.

• Gained hands-on experience with AWS S3 and Redshift pipeline patterns including incremental load design, Parquet file formats, and cloud storage best practices.

• Maintained data flow documentation in Confluence and handled on-call support to restore failed pipelines, keeping data availability above 98% and enabling consistent KPI reporting, ad-hoc analysis, and dashboard reliability for business stakeholders. EDUCATION

Master’s Degree in Computer Science Dec 2025

Auburn University at Montgomery, Alabama

Bachelor’s Degree in Electrical & Electronics Engineering May 2021 MVSR Engineering College, Hyderabad, India

Contact this candidate