SAI SAHITYA
Data Engineer
*************@*****.*** 224-***-**** Chicago, IL
Data Engineer with 4+ years of experience designing and optimizing large-scale data pipelines and cloud infrastructure across Azure and AWS environments. Proven expertise in building scalable ETL/ELT workflows using Azure Data Factory, Databricks, and Synapse Analytics, and developing real-time and batch processing solutions with Apache Spark, PySpark, and Flink. Skilled in data modeling, data lake architecture, and workflow orchestration using Apache Airflow and Terraform. Strong command of SQL, Python, and Power BI to deliver clean, accurate, and actionable insights.
PROFESSIONAL EXPERIENCE
Data Engineer Northern Trust Jan 2024 – present
Engineered scalable ETL and ELT workflows using Azure Data Factory, Databricks, and Synapse Analytics to unify diverse data sources across the enterprise.
Designed and optimized PySpark-based data pipelines to process high-volume structured and unstructured datasets with reduced latency.
Executed cloud migration of on-premise data systems to Azure-native platforms, increasing processing throughput by 30% and reducing infrastructure overhead.
Built hybrid data processing solutions utilizing Azure Stream Analytics and Blob Storage for both real-time and batch workloads.
Automated deployment workflows and release cycles using Azure DevOps (ADO), enhancing code reliability and deployment speed.
Partnered with cross-functional stakeholders to translate business requirements into reliable data models, ensuring alignment with project goals and delivery timelines.
Azure Data Engineer Illinois State Treasurerʼs Office Aug 2022 – Oct 2023
Developed custom SQL stored procedures and views to support data validation, transformation, and downstream analytics across multiple financial systems.
Conducted data quality assessments and implemented automated anomaly detection scripts to improve data integrity and reduce reporting errors by 40%.
Designed metadata-driven frameworks to manage dynamic ingestion logic for varying file structures in ADLS Gen2.
Supported Power BI data modeling efforts by creating optimized DAX measures and calculated columns for executive-level dashboards.
Implemented role-based access control (RBAC) policies and managed user permissions across Azure resources to ensure secure data handling.
Collaborated with compliance teams to document data flow diagrams, lineage, and control measures as part of annual audit readiness.
Participated in Agile ceremonies and sprint planning to align development tasks with treasury reporting and modernization objectives. Cloud Engineer Sri Sai Infra Developers Pvt.Ltd Aug 2020 – Nov 2021
Spearheaded the migration of 15+ legacy servers and databases to Azure Cloud, increasing system uptime by 99.5% and reducing deployment failures by 35%.
Streamlined ETL operations using Azure Data Factory for over 20 data sources, reducing processing time by 40% and enabling near real- time data availability.
Created 10+ interactive Power BI dashboards using optimized SQL views, providing real-time visibility into 30+ infrastructure KPIs and improving executive decision-making.
Developed and scheduled 25+ Apache Airflow DAGs to automate complex ETL pipelines, enhancing workflow visibility and reducing manual intervention by 60%.
Configured Azure Monitor and Log Analytics across 50+ resources to proactively track system performance, cutting incident resolution time by 45%.
Deployed secure and scalable infrastructure by provisioning 20+ Azure Virtual Networks, Storage Accounts, and managing RBAC for 100+ users.
PROJECTS
Data Pipeline Automation using AWS Services
Car Price Prediction using SAS
EDUCATION
MS in Management Information Systems, University of Illinois, Springfield, USA Jan 2022- Oct 2023 CERTIFICATES
DP-203 Azure Data Engineer Associate
Azure Databricks Data Engineer Associate
TECHNICAL SKILLS
Programming: Python, Java, Scala, SQL, PySpark
Data Processing & Frameworks: Apache Spark, Flink, Hadoop MapReduce
Big Data & Databases: Azure Synapse, Snowflake, Hive, Cassandra, PostgreSQL, SQL Server
Cloud Platforms: Azure (Data Factory, ADLS Gen2, Databricks), AWS (S3, Redshift, Lambda)
Workflow Orchestration & Tools: Apache Airflow, Terraform, Kubernetes, Git, Jenkins
Data Visualization & Reporting: Power BI, Tableau
API & Messaging: RESTful APIs, MQTT
Development Methodologies: Agile, Scrum