JAYANTH KUMAR KOTNI
Dallas, TX +1-469-***-**** *************@*****.***
Data Engineer
PROFESSIONAL SUMMARY
Highly skilled Data Engineer with 5 years of experience building data lake pipelines and real-time streaming solutions using Apache Spark, PySpark, Java, Scala, and AWS Cloud. Proven expertise in Change Data Capture (CDC), Debezium, and data lake hydration for enterprise analytics. Hands-on experience with Apache Airflow, AWS Glue, EMR, Step Functions, and Lambda (Python). Adept at designing and optimizing ETL and ELT workflows, Spark DataFrames, and performance-tuned transformations. Strong foundation in big data concepts, data modeling, and orchestrating large-scale batch and streaming pipelines. PROFESSIONAL EXPERIENCE
Data Engineer Kroger, Dallas, TX Jan 2024 – Present
• Designed and deployed CDC pipelines using Debezium, integrating multi-source databases into AWS data lakes.
• Built Spark ETL pipelines for both batch and streaming data using PySpark and Scala, achieving high throughput.
• Orchestrated end-to-end workflows with Apache Airflow and AWS Step Functions for automated scheduling and recovery.
• Developed and managed AWS Lambda (Python) functions for event-driven processing and CDC stream enrichment.
• Implemented data quality validation using Apache Griffin and AWS Glue Data Catalog.
• Managed AWS EMR clusters, tuning Spark jobs for performance and cost efficiency.
• Collaborated with business and analytics teams to deliver reliable, analytics-ready datasets across data lake zones.
Data Engineer Accenture (Client: Axis Bank), Bengaluru, India Jul 2021 – Dec 2023
• Engineered and maintained Spark streaming and batch ETL pipelines for high-volume banking data.
• Developed scalable CDC frameworks for relational databases and hydrated AWS data lakes.
• Wrote PySpark and Scala transformations integrated into AWS Glue and EMR Serverless jobs.
• Leveraged Apache Airflow to orchestrate data workflows and automate error alerts.
• Deployed Lambda functions (Python) for event-driven ETL triggers and pipeline automation.
• Implemented Apache Griffin for data-quality validation and anomaly detection across CDC streams.
• Partnered with DevOps teams to enhance CI/CD pipelines using Jenkins and Terraform. Data Analyst BEPEC Solutions, Hyderabad, India Jan 2019 – Jun 2021
• Developed automated ETL pipelines using Python, SQL, and Airflow for business analytics.
• Built and optimized Spark DataFrames and streaming jobs for real-time data ingestion.
• Managed schema and metadata through AWS Glue Data Catalog and S3 bucket lifecycle rules.
• Created Power BI and Tableau dashboards for near-real-time business metrics.
• Supported data lake hydration and incremental updates using AWS Batch and Step Functions.
• Tuned SQL queries and Spark jobs for faster execution and reduced data-processing costs.
• Maintained data integrity and lineage through Griffin-based validation and documentation. TECHNICAL SKILLS
Programming & Frameworks: Java, Python, Scala, PySpark, SQL, Shell Scripting Big Data & Streaming: Apache Spark (DataFrames, SQL, Streaming), Kafka, Debezium, Hudi, Griffin ETL & Orchestration: Apache Airflow, AWS Step Functions, AWS Batch, CDC, Glue, Lambda (Python) Cloud Platforms: AWS (S3, EMR, Glue, Lambda, MWAA), Azure (ADF, Synapse), GCP (BigQuery) Databases: Oracle, PostgreSQL, MySQL, MongoDB
DevOps Tools: Jenkins, Terraform, Docker, GitHub
Visualization: Power BI, Tableau, Microsoft Office Suite Concepts: Data Lake Architecture, Data Modeling, Performance Tuning, Data Hydration EDUCATION
Master of Science (M.S.) in Data Engineering
University of North Texas (UNT), Denton, TX — Graduated Dec 2024