Post Job Free
Sign in

Data Engineer Governance

Location:
Aubrey, TX
Salary:
70000
Posted:
October 15, 2025

Contact this candidate

Resume:

Kalyani Chittipolu

Data Engineer *******************@*****.*** LinkedIn +1-945-***-****

SUMMARY

Data Engineer with 4+ years of experience designing and deploying scalable data pipelines, cloud-native architectures, and lakehouse solutions across healthcare and enterprise environments.

Strong expertise in AWS and Azure ecosystems with hands-on experience in PySpark, SQL, dbt, Airflow, and containerized deployments (Docker, Kubernetes).

Skilled in IaC (Terraform), CI/CD for data pipelines, and modern orchestration frameworks.

Experienced in data governance, audit logging, and HIPAA-compliant solutions for secure clinical and claims reporting.

Adept at multi-cloud exposure (AWS, Azure, GCP) and collaborating with cross-functional teams to deliver production- ready, analytics-driven solutions.

SKILLS

Languages & Tools: SQL, Python, PySpark, Bash, Git, dbt, Pandas, NumPy

Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, Kinesis, EMR, DataSync), Azure (Data Factory, Synapse, Data Lake, Blob Storage), GCP (BigQuery, Dataflow, Pub/Sub – exposure)

Data Technologies: Databricks, Snowflake, Hadoop, Hive, Delta Lake, Kafka, Spark Structured Streaming, Apache Flink

Orchestration & CI/CD: Apache Airflow, Jenkins, GitHub Actions, Azure DevOps, Terraform, Docker, Kubernetes, Argo Workflows (exposure), Prefect (exposure)

Databases: SQL Server, PostgreSQL, Oracle, MySQL, MongoDB

Visualization & Reporting: Power BI, Tableau

Data Governance & Security: Great Expectations, AWS Glue Data Catalog, Collibra, IAM, audit logging, access control, encryption policies, HIPAA-compliance

EDUCATION

Master's in Computer Science from Texas Tech University, TX EXPERIENCE

Elevance Health (Anthem) Aug 2024 – Current

Data Engineer

Designed and optimized AWS Glue + PySpark pipelines on S3, processing 200K+ records/day with 99.9% accuracy.

Integrated dbt models in Redshift and modularized transformations, reducing query time by 30%.

Automated ingestion from SFTP/REST APIs via AWS Lambda & Step Functions, cutting manual effort by 60%.

Deployed containerized ETL workflows (Docker, Kubernetes) for claims processing workloads, ensuring scalability.

Built and optimized analytical data models in Redshift with stored procedures for faster claims and medical analytics.

Integrated on-premises data into AWS Data Lake via DataSync, enabling unified patient/claims analysis.

Implemented real-time fraud detection with Kinesis Data Streams, Spark Structured Streaming, and Flink.

Enforced HIPAA-compliant data governance with IAM, audit logging, and encryption in transit/at rest.

Managed CI/CD for pipelines using AWS CodePipeline, GitHub Actions, and Terraform. Tools & Tech: AWS Glue, Redshift, S3, dbt, PySpark, Docker, Kubernetes, Terraform, Lambda, Step Functions, Kinesis, Power BI, Airflow, SQL Server, REST API, JIRA

Cerner India (Oracle Health) Sep 2020 -Jul 2023

Data Engineer

Developed Spark + Airflow pipelines integrating EHR and clinical data into Redshift, enabling centralized reporting.

Built schema-on-read structures in Hadoop/Hive and transformed raw data in Glue/Databricks, cutting latency by 30%.

Implemented incremental CDC workflows & parameterized DAGs, reducing ETL runtime by 50%.

Automated data quality checks with Great Expectations + Slack alerts, resolving 90% of mismatches proactively.

Optimized SQL queries and Redshift schema designs, improving execution times by 25%.

Deployed containerized Spark jobs on EMR with Docker, scaling workloads for multi-tenant use cases.

Documented ETL pipelines, schema definitions, and data dictionaries for team knowledge sharing.

Collaborated in Agile sprints, code reviews, and cross-functional meetings to enhance delivery speed and standards. Tools & Tech: Spark, Hive, Airflow, Databricks, AWS Glue, Redshift, Hadoop, PostgreSQL, Docker, SQL, Python, JIRA



Contact this candidate