Data Engineer Capture

Location:

New Paltz, NY

Salary:

$125000

Posted:

October 16, 2025

Contact this candidate

Resume:

Jagadeesh Kolla

845-***-**** *********.*.********@*****.*** Linkedin

SUMMARY

Data Engineer with 5+ years of experience designing ETL pipelines and implementing Change Data Capture across diverse databases. Expertise in Apache Spark—in- cluding Spark SQL and Spark Streaming—for both batch and real-time data processing. Proven ability in automating workflows with Airflow and AWS Batch to optimize data lake hydration and analytics.

WORK EXPERINCE

Zions Bancorporation Jul 2024 - Present

Azure/Big Data Engineer Salt Lake City, Utah, USA

• Designed scalable ETL pipelines with Azure Data Factory and Databricks for mortgage and credit operations, integrating Change Data Capture techniques for multi-source banking data ingestion to achieve 40% faster loan-risk analytics.

• Enhanced financial data query performance by 30% through the development of Snowflake data warehouses with secure RBAC controls, columnar storage, and row-level security for multi-tenant analytics.

• Boosted underwriting analytics accuracy by 25% by creating OLAP and dimensional data models in Azure Synapse and applying DBT transformations for regulatory compliance and loan risk reporting.

• Optimized real-time fraud detection by building streaming architectures using Apache Kafka and Azure Event Hubs, reducing detection latency by 35% for transaction monitoring and customer behavior analysis.

• Strengthened data privacy compliance by automating Azure SQL masking and Python-based scripts, reducing manual masking errors by 60% for enhanced PII protection and auditing.

• Enabled real-time customer analytics by integrating MongoDB and Cosmos DB, achieving 50% faster data retrieval for banking applications and personalized financial insights.

CHG Healthcare Aug 2023 - May 2024

AWS Data Engineer Midvale, Utah, USA

• Engineered HIPAA-compliant analytics by designing secure AWS S3 data lakes with Lake Formation and Terraform, utilizing Change Data Capture principles for structured and unstructured healthcare data to achieve 40% faster query responses.

• Improved ETL efficiency by 35% by developing AWS Glue and Lambda-based pipelines with Boto3, incorporating CDC strategies to ensure clinical data integrity and faster patient record processing.

• Enhanced workflow scheduling by orchestrating automated healthcare processes in Apache Airflow for ETL monitoring, error handling, and schema validation of clinical datasets, achieving a 30% improvement in accuracy.

• Reduced query runtime by 45% by implementing dimensional models in Redshift and Snowflake using star and snowflake schemas for medical claims analysis and patient outcome dashboards.

• Enabled real-time patient monitoring with AWS Kinesis and Kafka streaming pipelines, achieving sub-second alert generation for critical healthcare events and device data.

Philips Oct 2021 - Dec 2022

GCP Data Engineer Bangalore, India

• Developed scalable cloud ETL automation with Dataflow and Apache Beam pipelines integrated with Cloud Composer, incorporating CDC techniques to accelerate data delivery by 40% across large-scale IT systems.

• Optimized query performance by 35% through the design of ETL workflows integrating BigQuery, Cloud Storage, and Cloud SQL for efficient data transformation, governance, and analytics.

• Enhanced real-time monitoring accuracy by 30% by implementing Kafka and Google Dataflow streaming solutions, reducing event latency through Redis caching and precise pipeline tuning.

• Reduced infrastructure costs by 25% by migrating on-prem Hadoop clusters to Google Dataproc, optimizing cluster configuration with Hive, Impala, and Phoenix integration.

• Accelerated data analytics by 40% by engineering BigQuery datasets with advanced partitioning, clustering, and SQL optimization for enterprise dashboards and reporting.

Siemens Healthineers Jan 2020 - Sep 2021

Data Engineer Bangalore, India

• Designed and optimized ETL pipelines utilizing Java, Python, and shell scripts under HIPAA compliance, integrating Change Data Capture methodologies to accomplish a 40% improvement in healthcare data ingestion.

• Boosted data processing accuracy by 30% by developing PL/SQL packages, stored procedures, and triggers to manage sensitive medical data from diagnostic and imaging systems.

• Increased data pipeline reliability by 35% by automating batch workflows through UNIX scripts, ensuring timely delivery of clinical data to analytical systems and reporting dashboards.

• Enhanced ETL scalability by 40% through the implementation of Informatica PowerCenter workflows that transformed multi-source medical data with advanced data quality and validation rules.

• Reduced integration errors by 25% by designing SOA-based web services for secure data exchange between lab systems, imaging devices, and healthcare applications. TECHNICAL SKILLS

• Cloud Platforms: AWS (S3, EMR, Redshift, Lambda, Glue, Kinesis, Athena, Data Lake), Azure (Data Lake, Data Factory, Databricks, Azure SQL), GCP (Big Query), AWS Skillset

• Programming & Scripting Languages: Python, Scala, Java, Shell Scripting (Bash), Hibernate, JDBC, JSON, HTML, CSS

• Big Data & Hadoop Ecosystem: HDFS, MapReduce, Hive, PIG, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, PySpark, Airflow, Cloudera Manager, Kerberos, Apache Spark, Change Data Capture, Apache Hudi, Apache Griffin, AWS Deequ

• Databases: Oracle, MySQL, SQL Server, PostgreSQL, Snowflake, Cassandra, MongoDB, HBase

• ETL & Middleware Tools: Talend, Informatica, SSIS, Azure Data Factory, Azure Data Bricks

• Data Visualization Tools: Tableau, PowerBI

• Version Control & Build Tools: Git, Maven, SBT, CBT

• Web/Application Servers: Apache Tomcat, WebLogic, WebSphere

• Operating Systems: Windows, Unix, Linux

• Development Tools & IDEs: Eclipse, Visual Studio, SQL Developer, Azure Data Studio, TOAD, SoapUI, Dreamweaver, SSMS, Teradata SQL Assistant

• CI/CD & Automation: Jenkins, GitHub, SharePoint

• Generative AI & NLP: LLMS (GPT, BERT, T5), Prompt Engineering, Text Generation, Code Generation, Diffusion Models EDUCATION

SUNY Newpaltz, USA

Masters, computer and informational sciences

Contact this candidate