Data Engineer Lake

Location:

Dallas, TX

Posted:

October 15, 2025

Contact this candidate

Resume:

JAYANTH KUMAR KOTNI

Dallas, TX +1-469-***-**** *************@*****.***

Data Engineer

PROFESSIONAL SUMMARY

Highly skilled Data Engineer with 5 years of experience building data lake pipelines and real-time streaming solutions using Apache Spark, PySpark, Java, Scala, and AWS Cloud. Proven expertise in Change Data Capture (CDC), Debezium, and data lake hydration for enterprise analytics. Hands-on experience with Apache Airflow, AWS Glue, EMR, Step Functions, and Lambda (Python). Adept at designing and optimizing ETL and ELT workflows, Spark DataFrames, and performance-tuned transformations. Strong foundation in big data concepts, data modeling, and orchestrating large-scale batch and streaming pipelines. PROFESSIONAL EXPERIENCE

Data Engineer Kroger, Dallas, TX Jan 2024 – Present

• Designed and deployed CDC pipelines using Debezium, integrating multi-source databases into AWS data lakes.

• Built Spark ETL pipelines for both batch and streaming data using PySpark and Scala, achieving high throughput.

• Orchestrated end-to-end workflows with Apache Airflow and AWS Step Functions for automated scheduling and recovery.

• Developed and managed AWS Lambda (Python) functions for event-driven processing and CDC stream enrichment.

• Implemented data quality validation using Apache Griffin and AWS Glue Data Catalog.

• Managed AWS EMR clusters, tuning Spark jobs for performance and cost efficiency.

• Collaborated with business and analytics teams to deliver reliable, analytics-ready datasets across data lake zones.

Data Engineer Accenture (Client: Axis Bank), Bengaluru, India Jul 2021 – Dec 2023

• Engineered and maintained Spark streaming and batch ETL pipelines for high-volume banking data.

• Developed scalable CDC frameworks for relational databases and hydrated AWS data lakes.

• Wrote PySpark and Scala transformations integrated into AWS Glue and EMR Serverless jobs.

• Leveraged Apache Airflow to orchestrate data workflows and automate error alerts.

• Deployed Lambda functions (Python) for event-driven ETL triggers and pipeline automation.

• Implemented Apache Griffin for data-quality validation and anomaly detection across CDC streams.

• Partnered with DevOps teams to enhance CI/CD pipelines using Jenkins and Terraform. Data Analyst BEPEC Solutions, Hyderabad, India Jan 2019 – Jun 2021

• Developed automated ETL pipelines using Python, SQL, and Airflow for business analytics.

• Built and optimized Spark DataFrames and streaming jobs for real-time data ingestion.

• Managed schema and metadata through AWS Glue Data Catalog and S3 bucket lifecycle rules.

• Created Power BI and Tableau dashboards for near-real-time business metrics.

• Supported data lake hydration and incremental updates using AWS Batch and Step Functions.

• Tuned SQL queries and Spark jobs for faster execution and reduced data-processing costs.

• Maintained data integrity and lineage through Griffin-based validation and documentation. TECHNICAL SKILLS

Programming & Frameworks: Java, Python, Scala, PySpark, SQL, Shell Scripting Big Data & Streaming: Apache Spark (DataFrames, SQL, Streaming), Kafka, Debezium, Hudi, Griffin ETL & Orchestration: Apache Airflow, AWS Step Functions, AWS Batch, CDC, Glue, Lambda (Python) Cloud Platforms: AWS (S3, EMR, Glue, Lambda, MWAA), Azure (ADF, Synapse), GCP (BigQuery) Databases: Oracle, PostgreSQL, MySQL, MongoDB

DevOps Tools: Jenkins, Terraform, Docker, GitHub

Visualization: Power BI, Tableau, Microsoft Office Suite Concepts: Data Lake Architecture, Data Modeling, Performance Tuning, Data Hydration EDUCATION

Master of Science (M.S.) in Data Engineering

University of North Texas (UNT), Denton, TX — Graduated Dec 2024

Contact this candidate