Data Engineer Quality

Location:

United States

Posted:

September 15, 2025

Contact this candidate

Resume:

NAVEENKUMAR DEVARAPALLI

+1-636-***-**** ***************@*****.*** www.linkedin.com/in/naveen098 PROFESSIONAL SUMMARY

Experienced Data Engineer with 5+ years of hands-on expertise in designing and optimizing large-scale data pipe lines using cloud platforms (AWS, Azure, GCP) and big data technologies like PySpark, Databricks, and Apache Spark. Proficient in building robust ETL/ELT frame works for both batch and streaming workloads, including real-time data ingestion using Apache Kafka and event-driven architectures. Proven track record of accelerating query execution by 30%, reducing operational costs by 20%, and improving data processing efficiency by 60% while managing 500,000+ records daily. Strong experience with CI/CD automation, data quality frameworks, and observe ability tools. Demonstrated ability to collaborate with cross-functional teams including data scientists and ML engineers to deliver secure, scalable, and cost-effective data solutions aligned with business SLAs and KPIs. EDUCATION

Master of Science in Applied Computer Science, GPA: 3.7 South east Missouri State University, Missouri Aug 2022–May 2024

Bachelor of Technology in Engineering, GPA: 3.0

Acharya Nagarjuna University, India Aug 2017–Jul 2021 TECHNICAL SKILLS

Programming Languages: Python, SQL, PySpark, SparkSQL, R Databases & Data Sources: MySQL, Snowflake, PostgreSQL, BigQuery, SQLServer, Cosmos DB, Mongo DB,Structured& Unstructured Data

Data Engineering Tools & Technologies : ETL/ELT Pipelines, Apache Spark, Databricks, Azure Synapse Analytics,AWS Glue, Delta Lake, API Integration

Cloud & BigData Technologies:AWS(EC2,S3,Lambda,StepFunctions,EMR,CloudFormation),Azure(DataFactory, Data Lake Storage Gen2, Databricks), GCP (BigQuery, Cloud Composer, Dataflow, Cloud Functions) Frameworks & Orchestration:Apache Airflow,ApacheKafka,Docker,CI/CDPipelines,InfrastructureasCode Analytics & Visualization: PowerBI, Tableau, Statistical Analysis, Data Modeling Additional Skills: Data Pipeline Optimization, Query Performance Tuning, Spark Performance Tuning, Medallion Architecture, Agile Methodologies, Stakeholder Communication EXPERIENCE

AWS Data Engineer—E- Shift INC, Missouri

Aug2024–Jul2025

Developed production-grade, event-driven data work flows using Lambda, Step Functions, and Apache Kaf kastreams, automating financial transaction processing for 100,000+ daily records and reducing manual intervention by 80%

Engineered dynamic data lake solutions on Amazon S3 with intelligent partitioning, schem a evolution, and Delta Lake integration, decreasing data retrieval latency by 45% and improving compliance reporting efficiency by 60%

Built comprehensive fraud intelligence pipelines using EMR, Py Spark, and advanced Spark tuning techniques (broad cast joins, caching, partitioning), enhancing an omaly detection accuracy by 35% and reducing processing time from 90 minutes to 40 minutes

Implemented Infrastructure as Code practice using Cloud Formation and integrated CI/CD pipelines with version- controlled deployments, reducing deployment failures by 70% and accelerating release cycles by 50%

Collaborated with data science and ML teams to deliver feature store pipelines and machine learning-ready data solutions, improving model training workflows and risk assessment capabilities by 40%

Designed RBAC controlled access layers and implemented comprehensive monitoring using Cloud Watch and alerting systems, ensuring 99.9% pipeline reliability and reducing incident response time by 35% Cloud Data Engineer (GCP)—South east Missouri State University, Missouri Aug 2023–May 2024

Architected and orchestrated production-scale data pipelines using Google Cloud Composer, Dataflow, and Big Query, processing 500,000+ scholarship and academic records with 40% improvement in end-to-end processing efficiency

Implemented medal lion architecture(bronze-silver-gold) using Big Query and Cloud Storage, enabling incremental data processing and reducing storage redundancy by 30% while achieving 60% faster query performance

Developed graph-based data visualization frameworks and interactive dashboards using Python and Big Query ML, improving scholarship eligibility assessment accuracy by 35% and supporting data-driven funding allocation decisions

Automated real-time data ingestion work flows with Cloud Functions, Pub/Sub, and Cloud Storage triggers, ensuring 99.9% data consistency and enabling live updates of financial aid records

Implemented data quality validation layers using Cloud Data flow and custom PySparkl ibraries, reducing data discrepancies by 90% and ensuring compliance with educational data privacy regulations

Created parameterized ETL jobs with dynamic execution and dependency control, reducing manual errors by 75% and boosting deployment speed across analytics team.

Azure Data Engineer—HCL Technologies, India Aug 2021–Jul 2022

Architected enterprise-scale data pipe lines using Azure Data bricks, Data Factory, and PySpark, migrating legacy on- premises systems and reducing ETL runtime by 30% while decreasing data quality issues by 25%

Engineered secure data lakes with Azure Data Lake Storage Gen2 and implemented layered zone organization with schema enforcement, enabling real-time BI analytics and driving 20% infrastructure cost savings

Developed scalable ingestion pipelines supporting both batch and streaming workloads, successfully processing 10,000+ daily records from multiple data sources (SQL data bases, APIs, on-premises systems) with 99.5% accuracy

Built custom transformation libraries in Python to handle JSON,CSV, and Parquet formats, improving pipeline reusability by 50% and enabling standardized data processing across healthcare and financial domains

Implemented CI/CD integration using Azure DevOps Pipelines and Data bricks Repos, ensuring version- controlled deployments, automated testing, and comprehensive audit trails via Azure Log Analytics

Collaborated with data science teams to enable feature engineering pipelines using Kafka and Snowflake, supporting predictive analytics that improved operational efficiency by 25% and reduced model training time by 40% KEY PROJECTS

Credit Card Fraud Detection System

Developed high-performance ETL pipelines using Python and P and as to process 1+ million transaction records, improving data processing efficiency by 60% and reducing processing time from hours to minutes

Built and optimized Random Forest classification model achieving 98% fraud detection precision, resulting in $2M+ potential fraud prevention annually

Designed optimized My SQL database scheam as supporting sub-second query performance forreal-time transactional analytics and fraud scoring

Containerized application using Docker and deployed on AWS/Azure, enabling scalable cloud-native architecture with 99.9% uptime

User Authentication System

Engineered secure, enterprise-grade authentication system with role-based access control (RBAC) supporting1,000+ concurrent users and ensuring zero security breaches

Built to optimized PySpark ETL pipelines for user activity log processing and normalization, enabling comprehensive audit trails and compliance reporting

Deployed scalable infrastructure on AWS EC2 with Docker containerization and S3 storage, achieving 40% cost reduction compared to traditional hosting

Automated ETL work flows using Apache Airflow, ensuring 99.9% pipeline reliability and reducing manual monitoring effort by 80%

CERTIFICATIONS

AWS Data Analytics Specialty-Amazon Web Services

Google Cloud Associate Cloud Engineer-Google Cloud Platform

Python for Data Science, AI& Development-IBM (Coursera) Credential ID: IU5FQ6E22XQK

Contact this candidate