Haritha Danda
Data Engineer
*************@*****.*** +1-913-***-**** Seattle, WA LinkedIn
SUMMARY
Results-driven Data Engineer with 5+ years of experience building scalable ETL/ELT pipelines, real-time data streaming frameworks, and cloud-native data systems across AWS, Azure, and GCP. Proficient in Apache NiFi, PySpark, Kafka, Snowflake, and Databricks, with expertise in dbt, Airflow, Great Expectations, and CI/CD workflows. Strong background in integrating diverse data sources, optimizing ingestion and transformation logic, and supporting advanced analytics and ML initiatives in production-grade environments. PROFESSIONAL EXPERIENCE
Data Engineer, Shopify 01/2023 – Present USA
• Developed and managed robust ETL/ELT data pipelines using Apache NiFi, PySpark, and dbt to extract structured and semi-structured data from Shopify APIs and Microsoft Dynamics 365 ERP, transforming and loading it into Snowflake, Azure Blob Storage, and BigQuery for use by analytics, marketing, and product teams.
• Built real-time streaming data workflows with Kafka, Spark Structured Streaming, and Databricks to power machine learning-driven recommendation engines and personalize customer experience, successfully reducing event latency by 18% and enhancing the responsiveness of insights.
• Contributed to a large-scale cloud migration from Amazon Redshift to Snowflake by assisting in redesigning SQL logic, optimizing compute usage, and applying cost-aware storage strategies, which resulted in a 30% improvement in dashboard load performance and enabled near real-time analytics across business units.
• Collaborated with ML engineering teams to integrate Azure ML pipelines with Databricks for real-time and batch deployment of recommendation models, improving inference efficiency and scaling personalization capabilities across the e-commerce platform.
• Designed and deployed CI/CD pipelines using Azure DevOps, Git, and YAML configuration, ensuring standardized, version-controlled deployment of data pipelines across dev, test, and production environments, and reducing release lead times by 35%.
• Implemented robust data validation using Great Expectations, dbt test suites, and schema enforcement at both batch and streaming layers, significantly improving anomaly detection rates and increasing pipeline reliability.
• Created real-time, interactive Power BI dashboards by sourcing data from Snowflake, providing executive and operational stakeholders with visibility into KPIs such as fulfillment efficiency, customer retention, and conversion trends.
• Engineered enhancements to Kafka data ingestion workflows using Airflow DAG orchestration with custom retry logic, schema registry integration, and back-pressure handling, resulting in a 40% decrease in ingestion lag and improved system resilience under load.
Data Engineer, Amazon 05/2019 – 04/2022 India
• Contributed to the design and development of high-performance ETL pipelines using AWS Glue, Python, and Redshift to process large-scale warehouse and logistics datasets that supported key business intelligence initiatives.
• Collaborated with senior engineers to build real-time data pipelines using Kafka, Kinesis, and Apache Flink, enabling accurate tracking of inventory events and delivery statuses, which helped reduce system latency by 25%.
• Assisted in creating ingestion workflows for IoT data (location, temperature, fuel) using Apache Beam, AWS Lambda, and S3, which supported predictive analytics for fleet monitoring and improved logistics visibility.
• Helped implement reusable SQL-based data validation layers integrated within Airflow DAGs to perform automated data checks and anomaly detection, contributing to a 20% improvement in early error identification.
• Supported the development of operational dashboards in Amazon QuickSight and contributed Python-based alerting scripts to monitor distribution center bottlenecks, which reduced response times to incidents by 17%.
• Worked closely with the DevOps team to develop and maintain CI/CD workflows using Jenkins, Git, Lambda, and CircleCI, enabling automated testing and deployment of data pipelines across environments.
• Contributed to containerizing pipeline components using Docker and deploying them with Kubernetes, which helped improve scalability and reduced manual operational overhead.
• Assisted in implementing AWS Lake Formation for data access control and encryption, ensuring compliance with data governance policies such as GDPR and CCPA through secure and auditable pipeline configurations. EDUCATION
Master’s in computer science
University of Central Missouri
SKILLS
Programming & Scripting: Python, SQL (T-SQL, PL/SQL), PySpark, Java, Scala, Shell Scripting, R, YAML, JSON, XML Big Data & Distributed Systems: Apache Spark, Spark Structured Streaming, Apache Kafka, Apache Flink, Apache Beam, Hadoop, Hive, HDFS, Databricks
ETL & Data Pipeline Orchestration: Apache NiFi, AWS Glue, Azure Data Factory, Apache Airflow, dbt (data build tool) Cloud Platforms: AWS, Google Cloud Platform (GCP), Microsoft Azure, Snowflake, Redshift, D365 ERP Cloud Data Warehousing & Databases: Snowflake, Amazon Redshift, Google BigQuery, Azure Synapse, PostgreSQL, MySQL, SQL Server, Oracle
CI/CD & DevOps Tools: Git, GitHub, Jenkins, Azure DevOps, CircleCI, Docker, Kubernetes, YAML-based deployment, Infrastructure as Code (IaC)
Data Quality, Testing & Governance: Great Expectations, dbt tests, Data Validation Frameworks, Unit Testing
(PyTest, unittest), Data Lineage, Schema Registry, GDPR, CCPA, AWS Lake Formation Visualization & Reporting Tools: Power BI, Amazon QuickSight, Tableau (basic), Custom Python Dashboards, KPI Reporting
Data Modeling & Analytics: Dimensional Modeling, Star & Snowflake Schemas, OLAP/OLTP, Data Mapping, Data Wrangling, Data Cleansing, Anomaly Detection
Workflow Methodologies: Agile, Scrum, DevOps, CI/CD Best Practices, Cross-Functional Collaboration, Technical Documentation, Version Control
CERTIFICATIONS
Microsoft Certified: Azure Data Engineer Associate AWS Certified: Data Engineer Associates