DEVASHISH NALAPAREDDY
Arlington, TX +1-682-***-**** *********.***********@*****.*** Linkedin
Summary
Data Engineer with 3+ years of experience designing scalable data pipelines, real-time streaming architectures, and cloud-native solutions across Azure, AWS, and hybrid environments. Skilled in ETL/ELT workflows, big data processing with Hadoop, Hive, and Spark, and data warehousing with Azure SQL, Redshift, Databricks, and Microsoft Fabric. Adept at collaborating with cross-functional teams to deliver analytics-ready data and support machine learning deployment pipelines. Skilled in ETL/ELT workflows, big data processing with Hadoop, Hive, and Spark, and data warehousing with Azure SQL, Redshift, and Databricks. Adept at collaborating with cross-functional teams to deliver analytics-ready data and support machine learning deployment pipelines.
Education
The University of Texas at Arlington, TX May 2025
Master of Science in Data Science CGPA: 4.0
Skills
Programming & Scripting: Python, SQL, Bash, HTML
Data Engineering & Cloud:Azure Data Factory, Azure Data Lake (ADLS), AWS Glue, Redshift, S3, Kinesis, Hive, Hadoop, Databricks, Apache Spark, Microsoft Fabric, ETL/ELT Pipelines
Cloud Platforms: Microsoft Azure, AWS (Lambda, Cognito, RDS, QuickSight)
Streaming & Messaging: Apache Kafka, AWS Kinesis
Data Modeling & Warehousing: Azure SQL, PostgreSQL, MySQL, Snowflake (basic), Star/Snowflake Schema
Monitoring & CI/CD: Azure DevOps, Git, Great Expectations, Power BI, Streamlit, Tableau
Frameworks & Tools: Pandas, NumPy, OpenCV, LangChain, Django
Experience
Azure Data Engineer Tata Consultancy Services July 2020 – July 2023
Built and optimized ETL pipelines using Azure Data Factory, ADLS, and Azure SQL to process structured and unstructured data across internal and external sources.
Integrated legacy Hadoop + Hive data lake with Azure Cloud ecosystem to support analytics, ML training pipelines, and reporting.
Improved pipeline efficiency by 30% via SQL optimization, parallel processing, and robust alerting mechanisms.
Developed Power BI dashboards for business stakeholders to monitor KPIs and data quality metrics.
Engineered and maintained cloud-native data lakes used for business intelligence and predictive analytics.
Collaborated with analysts and data scientists to ensure post-production validation and quality assurance.
Projects
Unified Sales Analytics Lakehouse on Microsoft Fabric
Designed a modern data lakehouse using Microsoft Fabric to centralize sales, customer, and transaction data from various business units.
Ingested data from flat files and external APIs into OneLake, then built Dataflows Gen2 pipelines to cleanse, transform, and enrich the data.
Created semantic models using Fabric Lakehouse and integrated them with Power BI for dynamic reporting across departments.
Implemented role-based access and workspace governance to support cross-functional analytics and security.
Tools: Microsoft Fabric, OneLake, Dataflows Gen2, Power BI, KQL Notebooks
Scalable Data Pipeline for Real-Time Traffic Sign Recognition & Enhancement
Designed a cloud-based pipeline for ingesting traffic images, detecting noise using a VGG19 classifier, enhancing degraded images with a CNN, and identifying traffic signs using YOLOv5.
Integrated Azure Functions for event-driven processing and stored enhanced images and logs in ADLS for downstream analytics.
Developed Power BI dashboards to monitor system accuracy across adverse weather scenarios and track latency.
Tools: Azure Functions, Python, YOLOv5, VGG19, ADLS, Power BI
Real-Time Fraud Detection Platform Using Kafka and AI Agents
Simulated real-time transaction streams using Kafka producers and modular LangChain-based agents for fraud detection, anomaly detection, and rule-based filtering.
Served fraud models using TensorFlow Lite and monitored predictions and stream latency via a real-time Streamlit dashboard.
Streamed flagged transactions to AWS S3 and ETL’d into Redshift using AWS Glue for long-term trend analysis.
Tools: Apache Kafka, LangChain, TensorFlow Lite, Streamlit, AWS Glue, Redshift, Python
MediTrack: Cloud-Based Health Data Platform (AWS)
Developed a modular, cloud-native hospital information system using AWS Lambda, API Gateway, and S3 to manage patient records, doctor schedules, and appointments via REST APIs.
Designed a secure pipeline to stream logs through Amazon Kinesis and transform data via AWS Glue into Redshift for centralized analytics.
Created ETL workflows to cleanse multi-branch hospital data and enabled executive dashboards via Amazon QuickSight.
Used Amazon RDS for OLTP workloads and integrated AWS Cognito for secure role-based access control.
Tools: AWS Lambda, API Gateway, S3, Kinesis, Glue, Redshift, RDS, Cognito, QuickSight, Python
Certifications
Microsoft Azure fundamentals (AZ-900)
Microsoft Azure data fundamentals (DP-900)
Udemy certification: The complete Data Structures and Algorithms Course in Python
LinkedIn endorsed in Python (programming language)