ANJALI JAGDISH TAVHARE
Chicago, IL, USA 773-***-**** ***************@*******.*** LinkedIn GitHub SUMMARY
Data Engineer and Analyst with 3 years of experience building batch and streaming pipelines using Azure Synapse, Databricks, Airflow, and Snowflake. Skilled in data lakehouse design across healthcare, manufacturing, and finance domains using Delta Lake, SQL-based dimensional modeling, and Spark-based orchestration. Experienced in CI/CD implementation, pipeline monitoring, and governance using Azure DevOps, Unity Catalog, and logging frameworks. Also supported cross-functional initiatives involving data validation, transformation logic, and BI reporting using Power BI and dbt. PROFESSIONAL EXPERIENCE
Getinge New Jersey, USA
Data Engineer Intern July 2024 – Present
● Constructed 35+ data pipelines across batch and streaming workloads using Synapse, Data Factory, Databricks, and Airflow, supporting ingestion from 10+ internal sources and cutting weekly maintenance by 12 hours.
● Orchestrated distributed PySpark workflows in Databricks to process 180+ REST API sources, validated using Postman, and delivered structured outputs to Azure Data Lake Storage (ADLS) for analytics teams.
● Built real-time ingestion pipelines using Apache Kafka, configuring 6 producers and 8 consumers across 4 partitions, which enabled near real-time processing of 500K+ log events daily via Kafka Connect.
● Mapped a Lakehouse architecture using Delta Lake and Synapse Dedicated/Serverless Pools with Medallion layering, optimizing query performance on 3 million+ curated records through partitioning and indexing.
● Developed 25+ SQL pipelines across SSMS and PostgreSQL (DBeaver), incorporating stored procedures and entity models to synchronize the Gold Layer with downstream applications and regulatory reports.
● Implemented metadata governance with Unity Catalog and Hive Metastore, introducing lineage tracking and validation rules that reduced average root cause analysis from 6 hours to under 2 hours.
● Deployed modular Airflow DAGs to automate Databricks jobs and Synapse workflows, integrating simulated AWS S3 storage via MinIO and ensuring consistency in CI/CD releases through Azure DevOps. Dell Technologies India
Data Engineer / Analyst July 2022 – August 2023
● Created 15+ automated ETL pipelines using Python, SQL, and Apache Airflow to ingest data from 10+ REST APIs and 6 external CSV log sources into Snowflake and Redshift, reducing manual ingestion cycles by 4 hours/week.
● Migrated 7 legacy reporting workflows to AWS Glue and Lambda, decreasing job runtime from 28 to 9 minutes and eliminating 20+ recurring QA issues through event-triggered monitoring.
● Modeled 12+ star schema datasets incorporating SCD Type-2 logic to support attribution analysis for 5 regional campaigns, enabling accurate reporting across 3 internal dashboards and 2 executive reviews.
● Integrated Snowpipe and leveraged Snowsight to automate continuous data ingestion from S3 into Snowflake tables, streamlining near real-time analytics and supporting 3 high-traffic dashboards used in campaign performance reporting.
● Engineered data validation checks and pre-aggregation logic using dbt, supporting 6 BI pipelines and reducing dashboard load latency by 12 seconds across 3 key stakeholder dashboards.
● Configured alerting and SLA monitoring with SNS and custom logs, improving reliability of 4 mission-critical workflows and flagging anomalies within 5 minutes of failure events.
● Integrated predictive model outputs for customer churn into 2 production pipelines, aligning outputs with Power BI dashboards used by 4 business units and supporting weekly retention review cycles. Adani Group India
Data Analyst Intern May 2021 – July 2022
● Architected structured workflows in Python (pandas) to extract, clean, and standardize raw data from 8+ source systems, reducing manual preparation by 6 hours per week for recurring reports.
● Optimized complex SQL queries in PostgreSQL to support 12+ business dashboards, decreasing data retrieval time by over 20 seconds per query.
● Provisioned reusable data validation scripts using Python to process 30,000+ production records daily, ensuring accuracy before visualization and executive review.
● Collaborated with BI developers and business analysts to define KPIs and align data models across 3 operational departments, improving cross-functional reporting consistency.
● Documented data dictionaries, table joins, and validation logic in Confluence, enabling faster onboarding for 2 new analysts and supporting smoother audit handoffs.
● Designed and deployed Data Factory Mapping Data Flows for 5+ pipelines ingesting energy production data, enabling trigger-based transformations and supporting scale-up across 3 departments. TECHNICAL SKILLS
Programming: Python, SQL, PySpark
Big Data & Processing: Apache Spark, Delta Lake, Kafka, Databricks Cloud Platforms: Azure (ADF, Synapse, ADLS, DevOps, Key Vault), AWS (S3, Glue, Lambda, Redshift), Snowflake ETL & Orchestration: Airflow, dbt, Azure Data Factory, SSIS, Informatica Data Modeling: Star Schema, Snowflake Schema, Medallion Architecture, SCD Types, Partitioning Monitoring & Governance: Unity Catalog, Great Expectations, Logging, Alerting Visualization: Power BI, Tableau
DevOps & Version Control: Git, Azure DevOps, GitHub Actions, Docker APIs & Automation: REST APIs, Postman, CI/CD
Machine Learning: scikit-learn, MLflow, Feature Engineering, Model Deployment Additional: Distributed System Design, Stakeholder Communication, Agile (JIRA, Confluence) PROJECTS
Automated ETL Pipeline for Music Streaming Analytics Using Airflow Link
● Orchestrated 5 scalable ETL pipelines using Apache Airflow, automating ingestion and transformation of 2K+ daily S3-based JSON/CSV logs into Redshift, reducing 12+ hours/week of manual effort.
● Engineered 3 custom Airflow operators with embedded integrity checks, stabilizing failure-prone tasks and reducing QA review cycles by over 18 hours/month across 4 dependent teams.
Azure-Based Clinical Data Lakehouse & Analytics Platform Link
● Integrated over 1 million EHR, EMR, and claims records using Azure Data Factory and Databricks, implemented in a HIPAA-compliant Medallion Architecture supporting 120+ reporting dashboards.
● Enabled pipeline observability via Spark logs, alerting triggers, and validation checks, resolving 8 recurring pipeline failures and improving downstream dashboard reliability for 3 clinical units. Sales Insights & Forecasting Dashboard
● Established a centralized reporting system using SQL Server, Power BI, and Azure Synapse, processing 1M+ sales transactions weekly to deliver sub-5 second response times across 5 executive dashboards.
● Executed KPI monitoring and trend forecasting using DAX measures, custom visuals, and row-level security, identifying 25+ sales anomalies/month and enabling regional teams to respond before quarterly impact. EDUCATION
Master of Applied Science in Data Science USA May 2025 Illinois Institute of Technology
Bachelor of Technology in Computer Engineering India May 2022 Dr. Babasaheb Ambedkar Technological University
Diploma in Computer Engineering India May 2019
Maharashtra State Board of Technical Education