Data Engineer Machine Learning

Location:

Arlington, TX

Posted:

July 16, 2025

Contact this candidate

Resume:

DEVASHISH NALAPAREDDY

Arlington, TX +1-682-***-**** *********.***********@*****.*** Linkedin

Summary

Data Engineer with 3+ years of experience designing scalable data pipelines, real-time streaming architectures, and cloud-native solutions across Azure, AWS, and hybrid environments. Skilled in ETL/ELT workflows, big data processing with Hadoop, Hive, and Spark, and data warehousing with Azure SQL, Redshift, Databricks, and Microsoft Fabric. Adept at collaborating with cross-functional teams to deliver analytics-ready data and support machine learning deployment pipelines. Skilled in ETL/ELT workflows, big data processing with Hadoop, Hive, and Spark, and data warehousing with Azure SQL, Redshift, and Databricks. Adept at collaborating with cross-functional teams to deliver analytics-ready data and support machine learning deployment pipelines.

Education

The University of Texas at Arlington, TX May 2025

Master of Science in Data Science CGPA: 4.0

Skills

Programming & Scripting: Python, SQL, Bash, HTML

Data Engineering & Cloud:Azure Data Factory, Azure Data Lake (ADLS), AWS Glue, Redshift, S3, Kinesis, Hive, Hadoop, Databricks, Apache Spark, Microsoft Fabric, ETL/ELT Pipelines

Cloud Platforms: Microsoft Azure, AWS (Lambda, Cognito, RDS, QuickSight)

Streaming & Messaging: Apache Kafka, AWS Kinesis

Data Modeling & Warehousing: Azure SQL, PostgreSQL, MySQL, Snowflake (basic), Star/Snowflake Schema

Monitoring & CI/CD: Azure DevOps, Git, Great Expectations, Power BI, Streamlit, Tableau

Frameworks & Tools: Pandas, NumPy, OpenCV, LangChain, Django

Experience

Azure Data Engineer Tata Consultancy Services July 2020 – July 2023

Built and optimized ETL pipelines using Azure Data Factory, ADLS, and Azure SQL to process structured and unstructured data across internal and external sources.

Integrated legacy Hadoop + Hive data lake with Azure Cloud ecosystem to support analytics, ML training pipelines, and reporting.

Improved pipeline efficiency by 30% via SQL optimization, parallel processing, and robust alerting mechanisms.

Developed Power BI dashboards for business stakeholders to monitor KPIs and data quality metrics.

Engineered and maintained cloud-native data lakes used for business intelligence and predictive analytics.

Collaborated with analysts and data scientists to ensure post-production validation and quality assurance.

Projects

Unified Sales Analytics Lakehouse on Microsoft Fabric

Designed a modern data lakehouse using Microsoft Fabric to centralize sales, customer, and transaction data from various business units.

Ingested data from flat files and external APIs into OneLake, then built Dataflows Gen2 pipelines to cleanse, transform, and enrich the data.

Created semantic models using Fabric Lakehouse and integrated them with Power BI for dynamic reporting across departments.

Implemented role-based access and workspace governance to support cross-functional analytics and security.

Tools: Microsoft Fabric, OneLake, Dataflows Gen2, Power BI, KQL Notebooks

Scalable Data Pipeline for Real-Time Traffic Sign Recognition & Enhancement

Designed a cloud-based pipeline for ingesting traffic images, detecting noise using a VGG19 classifier, enhancing degraded images with a CNN, and identifying traffic signs using YOLOv5.

Integrated Azure Functions for event-driven processing and stored enhanced images and logs in ADLS for downstream analytics.

Developed Power BI dashboards to monitor system accuracy across adverse weather scenarios and track latency.

Tools: Azure Functions, Python, YOLOv5, VGG19, ADLS, Power BI

Real-Time Fraud Detection Platform Using Kafka and AI Agents

Simulated real-time transaction streams using Kafka producers and modular LangChain-based agents for fraud detection, anomaly detection, and rule-based filtering.

Served fraud models using TensorFlow Lite and monitored predictions and stream latency via a real-time Streamlit dashboard.

Streamed flagged transactions to AWS S3 and ETL’d into Redshift using AWS Glue for long-term trend analysis.

Tools: Apache Kafka, LangChain, TensorFlow Lite, Streamlit, AWS Glue, Redshift, Python

MediTrack: Cloud-Based Health Data Platform (AWS)

Developed a modular, cloud-native hospital information system using AWS Lambda, API Gateway, and S3 to manage patient records, doctor schedules, and appointments via REST APIs.

Designed a secure pipeline to stream logs through Amazon Kinesis and transform data via AWS Glue into Redshift for centralized analytics.

Created ETL workflows to cleanse multi-branch hospital data and enabled executive dashboards via Amazon QuickSight.

Used Amazon RDS for OLTP workloads and integrated AWS Cognito for secure role-based access control.

Tools: AWS Lambda, API Gateway, S3, Kinesis, Glue, Redshift, RDS, Cognito, QuickSight, Python

Certifications

Microsoft Azure fundamentals (AZ-900)

Microsoft Azure data fundamentals (DP-900)

Udemy certification: The complete Data Structures and Algorithms Course in Python

LinkedIn endorsed in Python (programming language)

Contact this candidate