Post Job Free
Sign in

Senior Data Engineer (AWS/Azure) - Real-Time & Analytics Orchestration

Location:
Chicago, IL
Posted:
February 26, 2026

Contact this candidate

Resume:

Shanmukh Sai Madhu

Data Engineer

Chicago, IL, USA +1-501-***-**** ***********@*****.*** LinkedIn GitHub SUMMARY

• Data Engineer with 5 years of experience across finance, AI, and public sector projects, specializing in real-time pipelines, ETL orchestration, and cloud analytics on AWS and Azure.

• Build and optimize pipelines using ADF, Databricks, PySpark, Kafka, Airflow, and dbt, processing 25 million+ interactions and high- volume transactional data.

• Manage Snowflake and Synapse environments with strong skills in schema design, SCD patterns, partitioning, and performance tuning for large-scale dashboards.

• Improve governance and reliability using Great Expectations, Purview, and data lineage workflows, supporting 12 plus audits and reducing reconciliation effort.

• Develop analytics assets with Power BI, Tableau, Pandas, and NumPy, driving insights for operations, reporting, and community-focused initiatives.

WORK EXPERIENCE

JP Morgan Chase & Co., Chicago, IL February 2025 – Current Data Engineer

• Built real-time streaming pipelines using Kafka (MSK), AWS S3, and EMR, processing 25 million+ financial transactions daily to modernize fraud monitoring across lines of business.

• Orchestrated batch ingestion pipelines via Azure Data Factory and Apache Airflow, aligning historical refresh cycles with real-time streams for consistent analytics.

• Refactored legacy PySpark processes into dbt models on Databricks, reducing pipeline failures and cutting daily data refresh time from 2 hours to under 30 minutes.

• Designed Snowflake and Redshift data models using star schema and data vault patterns, delivering 200+ standardized KPIs across fraud, treasury, and wealth use cases.

• Embedded data quality and lineage validation with Great Expectations, Lake Formation, and Azure Purview, supporting 12+ successful audit reviews with zero discrepancies.

• Streamlined CI/CD releases using Azure DevOps with containerized Spark runtimes, reducing deployment cycles from 5 hours to under 1 hour across Dev, UAT, and Prod.

• Developed AI agents using Snowflake Cortex to automate fraud-risk scoring, validate KPIs, and summarize anomalies, reducing analyst review time by 35 percent.

• Automated multi-cloud infrastructure provisioning via Terraform, eliminating environment drift across AWS and Azure resources. Florida Data Science for Social Good, Jacksonville, FL June 2024 – August 2024 Data Science Intern

• Integrated 67 county-level census datasets into a Databricks Lakehouse, using Unity Catalog for governance and collaborating with civic partners on standardized data access.

• Standardized and deduplicated 5M+ population records with PySpark, reducing data preparation workload by 120 staff hours per refresh cycle.

• Supported predictive model development using Python, Pandas, and scikit-learn to identify vulnerable communities and guide public resource allocation.

• Built Tableau dashboards and clustering models to surface 150+ undercount-prone neighborhoods, informing targeted outreach strategies for nonprofit agencies.

University of South Dakota, Vermillion, SD August 2023 – May 2024 Research Assistant

• Engineered machine learning models for breast cancer and pneumonia detection using TensorFlow, PyTorch, and scikit-learn, improving accuracy on 12k+ medical images.

• Processed and transformed large-scale imaging datasets in Python, applying feature extraction and augmentation techniques, enabling faster model training cycles by 72 hours.

• Integrated Hugging Face transformers and LLM methods to combine imaging and text data, improving multi-modal analysis across 5 clinical studies.

• Built statistical analysis and visualization pipelines with Seaborn and Matplotlib, producing 15 research reports and mentoring students on reproducible ML workflows.

• Implemented a medical RAG workflow that retrieved clinical notes and literature to contextualize model predictions, improving interpretability and supporting research presentations. Purecode Software Inc, India November 2022 – July 2023 Software Engineer / Data Engineer

• Streamlined ingestion workflows with Azure Data Factory and Databricks, collaborating with ML engineers to prepare 5TB+ AI training datasets, eliminating 200+ hours of manual intervention per quarter.

• Processed interaction data in ADLS Gen2 using PySpark joins, filters, and time-window aggregations, partnering with product managers to enable personalization features for 2M+ active users.

• Orchestrated data and AI pipelines in Apache Airflow with dependencies, retries, SLAs, and alerts to maintain reliable production runs.

• Built a centralized Snowflake warehouse with SQL models, enabling 30+ features and reducing reporting time by 3 hours.

• Designed AI workflows by integrating model inference, vector embeddings, and semantic search into production pipelines to support recommendation and retrieval-based use cases.

• Deployed containerized pipelines on AKS with Azure DevOps CI/CD, coordinating with DevOps teams to streamline promotions across 4 environments, reducing deployment time from 6 hours to under 1 hour. iMerit Technology Services Pvt. Ltd., India November 2020 – November 2022 ETL Developer / ITES

• Built ADF orchestration flows linking S3 and ADLS Gen2 to load 10TB+ into Synapse Analytics, reducing prep time by 15 hours per cycle.

• Developed PySpark pipelines scheduled through ADF, handling 500K+ NLP and image-classification records per day across S3 and ADLS.

• Containerized ETL pipelines with Docker and deployed to AKS via Azure DevOps, reducing rollback issues by 12 per quarter.

• Built Power BI dashboards visualizing 20+ SLA and KPIs, helping project managers accelerate issue detection by 2 days per release.

• Standardized metadata, schema evolution, versioning, and ETL best practices to support audit-ready workflows across 8 reviews. Dell Technologies, India August 2019 – August 2020 Data Analyst Intern

• Built ADF pipelines to extract and load 100K+ records from SQL Server and Oracle, improving data availability for reporting.

• Developed SQL and Python scripts for data cleaning and validation, increasing accuracy of recurring operational reports.

• Created Power BI dashboards with 10+ KPIs, helping sales and product teams track performance trends. SKILLS

• Programming & Scripting: Python, SQL, R, JavaScript, Bash, Shell Scripting

• Data Engineering & Big Data: Apache Spark (PySpark), Airflow, Kafka, dbt, Delta Lake, ETL/ELT, Schema Evolution, Slowly Changing Dimensions (SCD), Partitioning, Performance Tuning, Data Lineage, Great Expectations

• Cloud & Data Platforms: Azure (Data Factory, Databricks, Synapse, ADLS, Azure SQL, Functions, Event Hub, AKS, Purview, Key Vault, DevOps), AWS (S3, EMR, Glue Catalog, MSK, Lambda, Redshift), Snowflake, GCP

• AI Tools & Frameworks: Hugging Face Transformers, TensorFlow, PyTorch, Scikit-learn, LangChain, Vector Databases (FAISS, Chroma), OpenAI API, RAG Pipelines, Prompt Engineering, Model Serving on AKS/SageMaker

• Databases & Storage: SQL Server, T-SQL, Oracle, PostgreSQL, MySQL, MongoDB, Parquet, ORC, Avro, JSON, CSV

• Visualization & Analysis: Power BI, Tableau, QuickSight, Pandas, NumPy

• DevOps & Tools: GitHub, Docker, Jenkins, Linux CLI, Agile (Scrum, Kanban), Jira, Infrastructure as Code (Terraform) EDUCATION

University of South Dakota, Vermillion, SD August 2023 - December 2024 Masters in Computer Science

Jawaharlal Nehru Technological University, India June 2016 - December 2020 Bachelor of Technology in Computer Science and Engineering PROJECTS

Data Pipeline Implementation for Retail Analytics

• Built an end-to-end data pipeline using ADF, ADLS, and Synapse to process sample retail sales data and generate Power BI dashboards. Text Mining on Electronic Health Records for Cancer Care

• Developed Python-based NLP scripts to clean and analyze open-source EHR text data, applying Random Forest and SVM models to explore cancer-related term patterns.

Crypto Tracker Link

• Created a React web app using Material UI and the CoinGecko API to display real-time cryptocurrency prices and trends. CERTIFICATIONS

• SnowPro Associate: Platform Certification (Verify)

• HackerRank SQL Advanced Certificate (Verify)



Contact this candidate