Jagadeesh Kolla
845-***-**** *********.*.********@*****.*** Linkedin
SUMMARY
Data Engineer with 5+ years of experience designing ETL pipelines and implementing Change Data Capture across diverse databases. Expertise in Apache Spark—in- cluding Spark SQL and Spark Streaming—for both batch and real-time data processing. Proven ability in automating workflows with Airflow and AWS Batch to optimize data lake hydration and analytics.
WORK EXPERINCE
Zions Bancorporation Jul 2024 - Present
Azure/Big Data Engineer Salt Lake City, Utah, USA
• Designed scalable ETL pipelines with Azure Data Factory and Databricks for mortgage and credit operations, integrating Change Data Capture techniques for multi-source banking data ingestion to achieve 40% faster loan-risk analytics.
• Enhanced financial data query performance by 30% through the development of Snowflake data warehouses with secure RBAC controls, columnar storage, and row-level security for multi-tenant analytics.
• Boosted underwriting analytics accuracy by 25% by creating OLAP and dimensional data models in Azure Synapse and applying DBT transformations for regulatory compliance and loan risk reporting.
• Optimized real-time fraud detection by building streaming architectures using Apache Kafka and Azure Event Hubs, reducing detection latency by 35% for transaction monitoring and customer behavior analysis.
• Strengthened data privacy compliance by automating Azure SQL masking and Python-based scripts, reducing manual masking errors by 60% for enhanced PII protection and auditing.
• Enabled real-time customer analytics by integrating MongoDB and Cosmos DB, achieving 50% faster data retrieval for banking applications and personalized financial insights.
CHG Healthcare Aug 2023 - May 2024
AWS Data Engineer Midvale, Utah, USA
• Engineered HIPAA-compliant analytics by designing secure AWS S3 data lakes with Lake Formation and Terraform, utilizing Change Data Capture principles for structured and unstructured healthcare data to achieve 40% faster query responses.
• Improved ETL efficiency by 35% by developing AWS Glue and Lambda-based pipelines with Boto3, incorporating CDC strategies to ensure clinical data integrity and faster patient record processing.
• Enhanced workflow scheduling by orchestrating automated healthcare processes in Apache Airflow for ETL monitoring, error handling, and schema validation of clinical datasets, achieving a 30% improvement in accuracy.
• Reduced query runtime by 45% by implementing dimensional models in Redshift and Snowflake using star and snowflake schemas for medical claims analysis and patient outcome dashboards.
• Enabled real-time patient monitoring with AWS Kinesis and Kafka streaming pipelines, achieving sub-second alert generation for critical healthcare events and device data.
Philips Oct 2021 - Dec 2022
GCP Data Engineer Bangalore, India
• Developed scalable cloud ETL automation with Dataflow and Apache Beam pipelines integrated with Cloud Composer, incorporating CDC techniques to accelerate data delivery by 40% across large-scale IT systems.
• Optimized query performance by 35% through the design of ETL workflows integrating BigQuery, Cloud Storage, and Cloud SQL for efficient data transformation, governance, and analytics.
• Enhanced real-time monitoring accuracy by 30% by implementing Kafka and Google Dataflow streaming solutions, reducing event latency through Redis caching and precise pipeline tuning.
• Reduced infrastructure costs by 25% by migrating on-prem Hadoop clusters to Google Dataproc, optimizing cluster configuration with Hive, Impala, and Phoenix integration.
• Accelerated data analytics by 40% by engineering BigQuery datasets with advanced partitioning, clustering, and SQL optimization for enterprise dashboards and reporting.
Siemens Healthineers Jan 2020 - Sep 2021
Data Engineer Bangalore, India
• Designed and optimized ETL pipelines utilizing Java, Python, and shell scripts under HIPAA compliance, integrating Change Data Capture methodologies to accomplish a 40% improvement in healthcare data ingestion.
• Boosted data processing accuracy by 30% by developing PL/SQL packages, stored procedures, and triggers to manage sensitive medical data from diagnostic and imaging systems.
• Increased data pipeline reliability by 35% by automating batch workflows through UNIX scripts, ensuring timely delivery of clinical data to analytical systems and reporting dashboards.
• Enhanced ETL scalability by 40% through the implementation of Informatica PowerCenter workflows that transformed multi-source medical data with advanced data quality and validation rules.
• Reduced integration errors by 25% by designing SOA-based web services for secure data exchange between lab systems, imaging devices, and healthcare applications. TECHNICAL SKILLS
• Cloud Platforms: AWS (S3, EMR, Redshift, Lambda, Glue, Kinesis, Athena, Data Lake), Azure (Data Lake, Data Factory, Databricks, Azure SQL), GCP (Big Query), AWS Skillset
• Programming & Scripting Languages: Python, Scala, Java, Shell Scripting (Bash), Hibernate, JDBC, JSON, HTML, CSS
• Big Data & Hadoop Ecosystem: HDFS, MapReduce, Hive, PIG, HBase, Sqoop, Impala, Zookeeper, Flume, Kafka, Yarn, PySpark, Airflow, Cloudera Manager, Kerberos, Apache Spark, Change Data Capture, Apache Hudi, Apache Griffin, AWS Deequ
• Databases: Oracle, MySQL, SQL Server, PostgreSQL, Snowflake, Cassandra, MongoDB, HBase
• ETL & Middleware Tools: Talend, Informatica, SSIS, Azure Data Factory, Azure Data Bricks
• Data Visualization Tools: Tableau, PowerBI
• Version Control & Build Tools: Git, Maven, SBT, CBT
• Web/Application Servers: Apache Tomcat, WebLogic, WebSphere
• Operating Systems: Windows, Unix, Linux
• Development Tools & IDEs: Eclipse, Visual Studio, SQL Developer, Azure Data Studio, TOAD, SoapUI, Dreamweaver, SSMS, Teradata SQL Assistant
• CI/CD & Automation: Jenkins, GitHub, SharePoint
• Generative AI & NLP: LLMS (GPT, BERT, T5), Prompt Engineering, Text Generation, Code Generation, Diffusion Models EDUCATION
SUNY Newpaltz, USA
Masters, computer and informational sciences