Charan Datti
Charleston, IL
+1-217-***-**** # ************@*****.***
Profile Summary
• Data Engineer with 4+ years of experience designing scalable CDC and ETL frameworks using Debezium, Apache Spark, and AWS to deliver real-time, analytics-ready data.
• Strong background in streaming and batch data pipelines, Spark SQL optimization, and AWS data-lake design leveraging S3, Glue, EMR, and Step Functions.
• Hands-on expertise in building data ingestion systems for relational and non-relational databases while ensuring performance, reliability, and schema evolution.
• Proven success orchestrating workloads in Airflow / MWAA, automating deployments, and reducing pipeline runtime through partition tuning and resource profiling.
• Collaborative engineer skilled in troubleshooting complex Spark jobs, enforcing data quality frameworks, and translating raw CDC data into business insights.
Experience
Stifel Financial Corp Feb 2025 – Present
Data Engineer Decatur, Illinois (Remote)
– Implemented Debezium-based CDC pipelines across Oracle and PostgreSQL to hydrate the enterprise data lake, enabling near real-time analytics for finance teams.
– Developed Spark ETL jobs (batch & streaming) in PySpark to transform transactional data into partitioned Parquet tables stored in AWS S3.
– Optimized Spark SQL queries using broadcast joins and caching, improving job performance by 30 % on EMR Serverless clusters.
– Orchestrated complex multi-stage workflows with Apache Airflow (MWAA), integrating Glue Catalog and Step Functions for dependency management.
– Built data-quality checks in Python using AWS Deequ, ensuring CDC feed completeness and integrity across multiple schemas.
– Implemented monitoring and cost-optimization strategies using CloudWatch and Auto-Optimize, cutting pipeline compute costs by 18 %.
– Collaborated with analytics and DevOps teams to design CDC ingestion patterns supporting both append-only and merge-based incremental updates.
– Documented data lineage and recovery procedures, improving audit traceability and system resiliency during high-volume ingestion cycles.
ERGO Group Jan 2022 – Nov 2022
Analytics Engineer Mumbai, India
– Designed ETL pipelines in Spark SQL & Python to consolidate policy, claims, and premium data into curated Delta tables for analytical consumption.
– Automated data ingestion from SQL Server to S3 via AWS Glue Jobs and incremental CDC logic, reducing refresh latency by 40 %.
– Built streaming dataflows using Spark Structured Streaming to process claim updates in near real time for actuarial monitoring dashboards.
– Applied performance tuning techniques—partition pruning, coalescing, and caching—to lower job runtime by 25 %.
– Implemented Airflow DAGs for daily and event-triggered workflows, increasing process reliability and on-time delivery of datasets.
– Collaborated with data scientists to structure feature sets for risk-modeling pipelines leveraging historical CDC snapshots.
– Ensured regulatory compliance through audit-ready metadata, schema validation, and secure S3 bucket policies.
– Documented ETL logic, dependencies, and data-quality checks to enhance team onboarding and production support. H&M Jun 2020 – Dec 2021
Data Analyst Mumbai, India
– Developed automated ETL scripts in Python & SQL for integrating sales, logistics, and pricing data into centralized reporting datasets.
– Implemented incremental extraction logic to update daily sales feeds, reducing redundant data pulls by 35 %.
– Built transformation jobs in Spark DataFrames for cleansing POS and e-commerce feeds, improving data accuracy across dashboards.
– Created Power BI dashboards visualizing margin trends and stock movement, supporting data-driven retail pricing decisions.
– Optimized SQL queries and indexing strategies to reduce warehouse load time by 28 %.
– Partnered with IT teams to migrate reporting workflows to AWS Glue and S3-based storage.
– Defined data validation rules and reconciliation scripts to ensure consistency between transactional and analytics layers.
– Authored process documentation for ingestion & error-handling routines to improve transparency & supportability. Technical Skills
• Big Data & ETL: Apache Spark (Python/PySpark, Spark SQL, Streaming), Debezium, Airflow (MWAA), AWS Glue, CDC pipelines, ETL frameworks.
• AWS Ecosystem: S3, EMR & EMR Serverless, Lambda, Step Functions, Glue Catalog, CloudWatch, AWS Batch, IAM, Deequ (basic).
• Programming Languages: Python, Java, SQL, Scala (basic), Bash.
• Data Modeling & Governance: Delta Lake, Hudi (familiar), Data Quality Frameworks, Metadata Management, Schema Evolution.
• Visualization & Analytics: Power BI, Tableau, Excel (Pivots, VBA Macros).
• Version Control & CI/CD: GitHub Actions, Azure DevOps, Jenkins, Terraform (basic infra as code).
• Certifications: AWS Certified Data Engineer – Associate (ongoing), Azure Data Engineer Associate, Databricks Lakehouse Fundamentals.
Education
Eastern Illinois University Charleston, IL, USA
Masters in Computer Technology