Data Engineer Real-Time

Location:

Charleston, IL

Posted:

October 15, 2025

Contact this candidate

Resume:

Charan Datti

Charleston, IL

+1-217-***-**** # ************@*****.***

Profile Summary

• Data Engineer with 4+ years of experience designing scalable CDC and ETL frameworks using Debezium, Apache Spark, and AWS to deliver real-time, analytics-ready data.

• Strong background in streaming and batch data pipelines, Spark SQL optimization, and AWS data-lake design leveraging S3, Glue, EMR, and Step Functions.

• Hands-on expertise in building data ingestion systems for relational and non-relational databases while ensuring performance, reliability, and schema evolution.

• Proven success orchestrating workloads in Airflow / MWAA, automating deployments, and reducing pipeline runtime through partition tuning and resource profiling.

• Collaborative engineer skilled in troubleshooting complex Spark jobs, enforcing data quality frameworks, and translating raw CDC data into business insights.

Experience

Stifel Financial Corp Feb 2025 – Present

Data Engineer Decatur, Illinois (Remote)

– Implemented Debezium-based CDC pipelines across Oracle and PostgreSQL to hydrate the enterprise data lake, enabling near real-time analytics for finance teams.

– Developed Spark ETL jobs (batch & streaming) in PySpark to transform transactional data into partitioned Parquet tables stored in AWS S3.

– Optimized Spark SQL queries using broadcast joins and caching, improving job performance by 30 % on EMR Serverless clusters.

– Orchestrated complex multi-stage workflows with Apache Airflow (MWAA), integrating Glue Catalog and Step Functions for dependency management.

– Built data-quality checks in Python using AWS Deequ, ensuring CDC feed completeness and integrity across multiple schemas.

– Implemented monitoring and cost-optimization strategies using CloudWatch and Auto-Optimize, cutting pipeline compute costs by 18 %.

– Collaborated with analytics and DevOps teams to design CDC ingestion patterns supporting both append-only and merge-based incremental updates.

– Documented data lineage and recovery procedures, improving audit traceability and system resiliency during high-volume ingestion cycles.

ERGO Group Jan 2022 – Nov 2022

Analytics Engineer Mumbai, India

– Designed ETL pipelines in Spark SQL & Python to consolidate policy, claims, and premium data into curated Delta tables for analytical consumption.

– Automated data ingestion from SQL Server to S3 via AWS Glue Jobs and incremental CDC logic, reducing refresh latency by 40 %.

– Built streaming dataflows using Spark Structured Streaming to process claim updates in near real time for actuarial monitoring dashboards.

– Applied performance tuning techniques—partition pruning, coalescing, and caching—to lower job runtime by 25 %.

– Implemented Airflow DAGs for daily and event-triggered workflows, increasing process reliability and on-time delivery of datasets.

– Collaborated with data scientists to structure feature sets for risk-modeling pipelines leveraging historical CDC snapshots.

– Ensured regulatory compliance through audit-ready metadata, schema validation, and secure S3 bucket policies.

– Documented ETL logic, dependencies, and data-quality checks to enhance team onboarding and production support. H&M Jun 2020 – Dec 2021

Data Analyst Mumbai, India

– Developed automated ETL scripts in Python & SQL for integrating sales, logistics, and pricing data into centralized reporting datasets.

– Implemented incremental extraction logic to update daily sales feeds, reducing redundant data pulls by 35 %.

– Built transformation jobs in Spark DataFrames for cleansing POS and e-commerce feeds, improving data accuracy across dashboards.

– Created Power BI dashboards visualizing margin trends and stock movement, supporting data-driven retail pricing decisions.

– Optimized SQL queries and indexing strategies to reduce warehouse load time by 28 %.

– Partnered with IT teams to migrate reporting workflows to AWS Glue and S3-based storage.

– Defined data validation rules and reconciliation scripts to ensure consistency between transactional and analytics layers.

– Authored process documentation for ingestion & error-handling routines to improve transparency & supportability. Technical Skills

• Big Data & ETL: Apache Spark (Python/PySpark, Spark SQL, Streaming), Debezium, Airflow (MWAA), AWS Glue, CDC pipelines, ETL frameworks.

• AWS Ecosystem: S3, EMR & EMR Serverless, Lambda, Step Functions, Glue Catalog, CloudWatch, AWS Batch, IAM, Deequ (basic).

• Programming Languages: Python, Java, SQL, Scala (basic), Bash.

• Data Modeling & Governance: Delta Lake, Hudi (familiar), Data Quality Frameworks, Metadata Management, Schema Evolution.

• Visualization & Analytics: Power BI, Tableau, Excel (Pivots, VBA Macros).

• Version Control & CI/CD: GitHub Actions, Azure DevOps, Jenkins, Terraform (basic infra as code).

• Certifications: AWS Certified Data Engineer – Associate (ongoing), Azure Data Engineer Associate, Databricks Lakehouse Fundamentals.

Education

Eastern Illinois University Charleston, IL, USA

Masters in Computer Technology

Contact this candidate