Data Engineer

Location:

Albany, NY, 12208

Posted:

October 15, 2025

Contact this candidate

Resume:

Sai Deekshith Chinthalwar

Data Engineer

360-***-**** **************@*****.*** linkedin.com/in/sai-deekshith02/

SUMMARY

Data Engineer with 5+ years of experience designing, developing, and optimizing scalable, cloud-based data pipelines and analytics platforms across healthcare, finance, and software industries. Skilled in Azure and AWS ecosystems with expertise in ETL/ELT workflows, Spark, Python, SQL, and big data processing. Experienced in DataOps, MLOps, and Machine Learning pipeline integration, focusing on automation, scalability, and reliability. Proven ability to reduce ETL runtimes by up to 30% and process multi-terabyte data daily. Seeking a Data Engineer role to build analytics-ready data systems that drive data-driven decisions.

TECHNICAL SKILLS

Cloud Platforms: Azure (Data Factory, Databricks, Synapse, ADLS, Key Vault), AWS (Glue, S3, EMR, Lambda, Redshift, RDS, DynamoDB)

Big Data & Processing: Apache Spark, Hadoop, Kafka, Delta Lake, Snowflake

Programming & Scripting: Python, PySpark, SQL, Scala, Shell Script

ETL/ELT & Orchestration: Azure Data Factory, AWS Glue, Apache Airflow, dbt, SSIS, Oozie

Data Modeling & Storage: Star Schema, Snowflake Schema, Dimensional Modeling, Data Lakehouse Architecture, Cosmos DB

DevOps & Monitoring: Git, Azure DevOps, CI/CD, Terraform, CloudWatch, Datadog, Prometheus

Analytics & Visualization: Power BI, Tableau, Alteryx

AI/ML Integration: MLOps, Machine Learning pipelines, Model Deployment, Data Preparation for ML

PROJECT HIGHLIGHTS & ACHIEVEMENTS

Built cloud-native data lakehouse architectures integrating Delta Lake, Databricks, and Snowflake for large-scale analytics.

Developed machine learning data pipelines enabling model training and inference within production workflows.

Implemented CDC (Change Data Capture) and incremental loading strategies for high data availability.

Reduced ETL processing time by up to 30% through query tuning, partitioning, and caching strategies.

Automated CI/CD deployments ensuring high-quality, repeatable releases across multiple environments.

Contributed to DataOps and MLOps initiatives, improving model deployment and monitoring.

Delivered secure, compliant, and scalable data platforms across Azure and AWS ecosystems.

PROFESSIONAL EXPERIENCE

Data Engineer – Medline Industries Northfield, IL July 2024 – Present

Designed and implemented Azure Data Factory (ADF) pipelines to ingest, transform, and load multi-terabyte healthcare supply chain data from SQL Server, REST APIs, and flat files into Azure Data Lake Storage (ADLS Gen2).

Built real-time data streaming pipelines using Azure Event Hubs and Databricks Structured Streaming, reducing data latency by 40% for operational reporting.

Developed PySpark-based ETL workflows in Azure Databricks, implementing partitioning, caching, and incremental loads using Delta Lake to optimize performance and reliability.

Created data validation and reconciliation frameworks comparing counts and transformations across Cosmos DB, Synapse Analytics, and ADLS, ensuring data integrity.

Integrated Azure Key Vault for secure credential management and enforced RBAC for all pipelines, enhancing security and compliance across datasets.

Collaborated with business and analytics teams to define KPIs and design Power BI dashboards, enabling near real-time insights into healthcare supply chain operations.

Developed metadata-driven pipelines in ADF, reducing manual intervention and enabling automated schema evolution for changing source systems.

Built Azure Databricks ML pipelines to support predictive analytics for inventory demand forecasting, integrating model inference directly into production workflows.

Implemented end-to-end monitoring using Azure Monitor, Log Analytics, and Databricks metrics, proactively detecting pipeline failures and SLA breaches.

Designed Snowflake-integrated pipelines for long-term storage and analytical processing, ensuring high availability and consistency across datasets.

Engineered message acknowledgment and traceability logic for Event Hubs streams, updating Cosmos DB to maintain full lineage and auditability.

Built temporary ETL solutions to replace legacy MuleSoft integrations, maintaining enterprise-level performance while reducing operational dependencies and costs.

Led early-stage POC migration for portions of the platform to open-source, on-premises frameworks, reducing cloud dependency and cost.

Partnered with cross-functional teams (developers, architects, and analysts) to gather requirements, troubleshoot complex issues, and ensure timely delivery of analytics-ready pipelines.

Documented system architecture, pipeline workflows, and best practices to enable knowledge transfer, reproducibility, and maintainability across the data engineering team.

Implemented role-based data access policies and automated pipeline security audits across Azure data services to ensure HIPAA compliance.

Optimized Databricks job performance by tuning Spark configurations, caching, and broadcast joins, reducing ETL runtimes by 25%.

Designed dynamic partitioning and schema evolution strategies to handle incremental ingestion of multi-terabyte datasets with zero downtime.

Created self-service data pipelines for analysts using parameterized ADF datasets and notebooks, reducing dependency on engineering for ad-hoc reports.

Standardized data lineage and metadata tracking using Azure Purview, enabling faster impact analysis and change management.

Conducted data quality root-cause analysis for recurring pipeline issues, implementing automated anomaly detection and alerting workflows.

Supported ML model deployment into production pipelines for predictive analytics use cases.

Data Engineer – Wells Fargo San Francisco, CA Sep 2022 – June 2024

Designed and implemented an enterprise-scale data lake on Azure Data Lake Storage (ADLS Gen2) to support analytics, reporting, and rapidly changing financial data workloads.

Maintained high-quality reference data through data cleansing, transformation, and validation using Azure Data Factory (ADF) and Databricks.

Built a security and access control framework using Azure Key Vault and RBAC, ensuring fine-grained object-level permissions across data assets.

Conducted end-to-end architecture and implementation assessments of Azure Synapse Analytics, Databricks, and ADF, optimizing data movement and query performance.

Implemented predictive analytics pipelines in PySpark and Databricks MLflow, forecasting transaction anomalies and customer trends for proactive insights.

Processed large-scale financial and operational datasets using Spark SQL on Azure Databricks, enabling near real-time analytics for reporting teams.

Automated ETL frameworks using ADF, Spark, and Synapse, reducing manual data ingestion time by 40%.

Integrated Apache Airflow with Azure ML and Databricks Jobs API to orchestrate multi-stage ML workflows for model training and deployment.

Developed SQL-based data validation scripts and reconciliation processes for Snowflake and Synapse, ensuring high data accuracy and consistency.

Built Power BI dashboards and datasets for executive reporting, connecting directly to curated Azure datasets and Synapse views.

Data Engineer – Accenture India Feb 2020 – July 2022

Designed and implemented data streaming and batch processing solutions using AWS Glue, Apache Spark, and Flink, enabling scalable ingestion of large enterprise datasets.

Built and managed data ingestion pipelines for diverse sources such as application logs, web logs, and clickstream data, improving data availability for analytics.

Automated and scheduled ETL workflows using AWS Data Pipeline and AWS Batch, ensuring timely data delivery across environments.

Developed data validation and verification frameworks using Python and Spark to ensure accuracy, consistency, and completeness across AWS regions.

Implemented real-time streaming solutions using AWS Kinesis and Apache Kafka to process high-velocity event data in near real-time.

Engineered data transformation and aggregation pipelines using AWS Glue and Spark SQL, optimizing data models for analytical workloads.

Designed data partitioning and indexing strategies in Hive and Redshift, improving query performance and reducing runtime by 25%.

Built serverless data processing pipelines using AWS Lambda and API Gateway, reducing infrastructure overhead and costs.

Developed hybrid storage solutions integrating AWS DataSync, Storage Gateway, and on-prem systems for seamless data mobility.

Established data cataloging and governance frameworks using AWS Glue Data Catalog and Collibra, ensuring compliance and lineage tracking.

Worked with AWS Lake Formation, CloudTrail, and KMS to implement data encryption, auditing, and access control policies.

Deployed ETL processes using Sqoop, Flume, and Hive, moving data from multiple relational and unstructured sources into HDFS.

Built Spark Streaming applications to process Kafka topics with stateful and stateless transformations for near real-time analytics.

Developed custom transformation scripts in Scala and PySpark to optimize existing MapReduce workflows and improve efficiency.

Designed and maintained ETL pipelines for Snowflake using Python and SnowSQL, supporting enterprise reporting and analytics use cases.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Amazon Sage Maker, Apache Spark, HBase, Apache Kafka, HIVE, SQOOP, Map Reduce, Snowflake, Apache Pig, Python, SSRS, Tableau.

EDUCATION

Master of Science in Data Science

State University of New York at Albany – NY Graduated: May 2024

Contact this candidate