Data engineer 2 year experience

Location:

Indore, Madhya Pradesh, India

Salary:

800000

Posted:

July 22, 2025

Contact this candidate

Resume:

Sushmita Das

Professional Summary

Result-oriented Data Engineer with hands-on experience designing and implementing scalable data pipelines and distributed data systems using Apache Spark, Databricks, and cloud platforms like AWS and Azure. Proficient in real-time and batch data ingestion, ETL workflows, Delta Lake optimization, orchestration with Databricks Jobs, and performance tuning. Passionate about transforming raw data into reliable, high-quality datasets to support data-driven decision-making. Technical Skills

Programming & Querying: Python (Pandas, NumPy), SQL, PySpark Big Data & Processing: Apache Spark (RDD, SQL, DataFrame API), Databricks (Jobs, DLT/workflows, Autoloader, Unity Catalog), kafka

Data Architecture: Delta Lake, Medallion Architecture, Structured Streaming, ETL Cloud Platforms:

- AWS: S3, Lambda

- Azure: Azure Databricks, ADF

Databases: SQL Server, MySQL, Databricks SQL

Tools & DevOps: Git, GitHub Actions, Power BI, Tableau, VSCode, draw.IO Soft Skills: Collaboration, Problem-Solving, Quick Learning, Time Management Languages: English, Hindi, Bengali

Professional Experience

Junior Technical Consultant (Data Engineer)

Digivate Labs Oct 2023 – Present (1.10yrs)

- Developed scalable batch and real-time data pipelines using Databricks, Autoloader, and Apache Spark.

- Implemented Unity Catalog for robust data governance and secure access controls.

- Built and deployed data jobs orchestrated via Databricks Jobs, enhancing pipeline automation.

- Partnered with analysts and data scientists to ensure data quality and lineage across projects.

E-commerce Data Platform Migration

- Automated migration 35TB+ of historical and real-time data from vertica to s3 to databricks, ensuring data integrity and minimal downtime.

- Re-engineered and optimized 150+ ETL pipelines and batch workflows using Databricks notebooks and Delta Lake.

- Refactored 1100+ SQL scripts to align with updated naming standards

- Implemented robust data governance and access control with Unity Catalog.

- Applied naming rules: tables lowercase, columns camelCase

- Automated data quality validation and monitoring to ensure accuracy and consistency post-migration.

- Achieved a 40% reduction in infrastructure costs and improved query performance by 3x.

- Collaborated with cross-functional teams to align migration with business requirements and SLAs.

- Utilized best practices in schema evolution, data transformation, and cost optimization.

- Built metadata-driven batch processing for scalable table migration Key Skills: Databricks, AWS, ETL, Data Migration, Delta Lake, Python, SQL, Data Quality, Data Governance, Big Data, Real-time Data Processing, CI/CD, Stakeholder Management

Project: Data Pipeline Modernization

- Refactored and optimized 10+ legacy Python scripts into scalable PySpark jobs, significantly improving data processing performance for large-scale IoT datasets.

- Migrated complex ETL pipelines to Databricks, leveraging advanced Spark features (partitioning, caching, optimized joins) to enable distributed, real-time analytics.

- Integrated geospatial data processing (GeoPandas, DBSCAN) and implemented robust data enrichment, aggregation, and alerting mechanisms (Slack integration).

- Enhanced data quality and reliability by implementing comprehensive error handling, logging, and monitoring.

- Collaborated cross-functionally to ensure seamless migration, validation, and deployment of new data workflows, contributing to improved operational efficiency and analytics capabilities.

- Key Skills: PySpark, Databricks, Distributed Data Processing, ETL, Kafka, Delta Lake, Geospatial Analytics, Python, Real-Time Data Pipelines, Data Engineering, Automation, Monitoring & Alerting

- Refactored inefficient Spark jobs, reducing runtime by >40% via caching, partitioning, and code modularization.

- Applied best practices in Delta Lake storage layout and metadata handling. Key Skills: PySpark, Databricks, Distributed Data Processing, ETL, Kafka, Delta Lake, Geospatial Analytics, Python, Real-Time Data Pipelines, Data Engineering, Automation, Monitoring & Alerting

Indore +91-934******* ***************@*****.*** LinkedIn: linkedin.com/in/sushmita-das-43574a226/

Real-Time Streaming Analytics Platform

- Designed and implemented a real-time streaming data pipeline for an e-commerce client using Databricks Delta Live Tables (DLT) and PySpark, processing real-time data from Azure Data Lake Storage.

- Developed and deployed a scalable Medallion architecture Bronze, Silver, Gold layers).to optimize ingestion, transformation, and analytics

workflows, ensuring seamless data processing, implemented separately in both Python (PySpark) and SQL.

- Built automated data quality monitoring using expectation-based validations, quarantining invalid records and achieving 99.9% data accuracy for downstream reporting.

- Implemented SCD Type 1 & Type 2 transformations for both dimension and fact tables, enabling historical tracking of changes and enhancing data warehouse integrity.

- Integrated a real-time analytics dashboard, automatically refreshing as soon as new data arrives in ADLS,

- providing instant insights into total revenue, customer retention, discount impact, and product performance KPIs, accelerating business decision-making and market responsiveness.

Associate Trainee Software Engineer

Techcoopers software solutions pvt ltd Sep 2022 – Jan 2023 (5 mos.)

Trainee data analyst

Zivaya wellness pvt ltd Mar 2023 – Apr 2023(1 mon) Certifications

- Databricks Certified Data Engineer Associate

- Databricks Certified Data Engineer Professional

- Databricks Accredited Platform Architect – AWS, Azure Education

Msc data science and analytics DAVV University 11/2020 - 08/2022 Bsc DAVV University 05/2017 - 06/2020

Diploma in computer applications Pioneer institute of technology and management 03/2018 - 06/2019 12th Neeraj bal mandir higher secondary school 03/2016 - 04/2017 10th Daily mirror public higher secondary school 03/2014 - 04/2015

Contact this candidate