NAVEENKUMAR DEVARAPALLI
+1-636-***-**** ***************@*****.*** www.linkedin.com/in/naveen098 PROFESSIONAL SUMMARY
Experienced Data Engineer with 5+ years of hands-on expertise in designing and optimizing large-scale data pipe lines using cloud platforms (AWS, Azure, GCP) and big data technologies like PySpark, Databricks, and Apache Spark. Proficient in building robust ETL/ELT frame works for both batch and streaming workloads, including real-time data ingestion using Apache Kafka and event-driven architectures. Proven track record of accelerating query execution by 30%, reducing operational costs by 20%, and improving data processing efficiency by 60% while managing 500,000+ records daily. Strong experience with CI/CD automation, data quality frameworks, and observe ability tools. Demonstrated ability to collaborate with cross-functional teams including data scientists and ML engineers to deliver secure, scalable, and cost-effective data solutions aligned with business SLAs and KPIs. EDUCATION
Master of Science in Applied Computer Science, GPA: 3.7 South east Missouri State University, Missouri Aug 2022–May 2024
Bachelor of Technology in Engineering, GPA: 3.0
Acharya Nagarjuna University, India Aug 2017–Jul 2021 TECHNICAL SKILLS
Programming Languages: Python, SQL, PySpark, SparkSQL, R Databases & Data Sources: MySQL, Snowflake, PostgreSQL, BigQuery, SQLServer, Cosmos DB, Mongo DB,Structured& Unstructured Data
Data Engineering Tools & Technologies : ETL/ELT Pipelines, Apache Spark, Databricks, Azure Synapse Analytics,AWS Glue, Delta Lake, API Integration
Cloud & BigData Technologies:AWS(EC2,S3,Lambda,StepFunctions,EMR,CloudFormation),Azure(DataFactory, Data Lake Storage Gen2, Databricks), GCP (BigQuery, Cloud Composer, Dataflow, Cloud Functions) Frameworks & Orchestration:Apache Airflow,ApacheKafka,Docker,CI/CDPipelines,InfrastructureasCode Analytics & Visualization: PowerBI, Tableau, Statistical Analysis, Data Modeling Additional Skills: Data Pipeline Optimization, Query Performance Tuning, Spark Performance Tuning, Medallion Architecture, Agile Methodologies, Stakeholder Communication EXPERIENCE
AWS Data Engineer—E- Shift INC, Missouri
Aug2024–Jul2025
Developed production-grade, event-driven data work flows using Lambda, Step Functions, and Apache Kaf kastreams, automating financial transaction processing for 100,000+ daily records and reducing manual intervention by 80%
Engineered dynamic data lake solutions on Amazon S3 with intelligent partitioning, schem a evolution, and Delta Lake integration, decreasing data retrieval latency by 45% and improving compliance reporting efficiency by 60%
Built comprehensive fraud intelligence pipelines using EMR, Py Spark, and advanced Spark tuning techniques (broad cast joins, caching, partitioning), enhancing an omaly detection accuracy by 35% and reducing processing time from 90 minutes to 40 minutes
Implemented Infrastructure as Code practice using Cloud Formation and integrated CI/CD pipelines with version- controlled deployments, reducing deployment failures by 70% and accelerating release cycles by 50%
Collaborated with data science and ML teams to deliver feature store pipelines and machine learning-ready data solutions, improving model training workflows and risk assessment capabilities by 40%
Designed RBAC controlled access layers and implemented comprehensive monitoring using Cloud Watch and alerting systems, ensuring 99.9% pipeline reliability and reducing incident response time by 35% Cloud Data Engineer (GCP)—South east Missouri State University, Missouri Aug 2023–May 2024
Architected and orchestrated production-scale data pipelines using Google Cloud Composer, Dataflow, and Big Query, processing 500,000+ scholarship and academic records with 40% improvement in end-to-end processing efficiency
Implemented medal lion architecture(bronze-silver-gold) using Big Query and Cloud Storage, enabling incremental data processing and reducing storage redundancy by 30% while achieving 60% faster query performance
Developed graph-based data visualization frameworks and interactive dashboards using Python and Big Query ML, improving scholarship eligibility assessment accuracy by 35% and supporting data-driven funding allocation decisions
Automated real-time data ingestion work flows with Cloud Functions, Pub/Sub, and Cloud Storage triggers, ensuring 99.9% data consistency and enabling live updates of financial aid records
Implemented data quality validation layers using Cloud Data flow and custom PySparkl ibraries, reducing data discrepancies by 90% and ensuring compliance with educational data privacy regulations
Created parameterized ETL jobs with dynamic execution and dependency control, reducing manual errors by 75% and boosting deployment speed across analytics team.
Azure Data Engineer—HCL Technologies, India Aug 2021–Jul 2022
Architected enterprise-scale data pipe lines using Azure Data bricks, Data Factory, and PySpark, migrating legacy on- premises systems and reducing ETL runtime by 30% while decreasing data quality issues by 25%
Engineered secure data lakes with Azure Data Lake Storage Gen2 and implemented layered zone organization with schema enforcement, enabling real-time BI analytics and driving 20% infrastructure cost savings
Developed scalable ingestion pipelines supporting both batch and streaming workloads, successfully processing 10,000+ daily records from multiple data sources (SQL data bases, APIs, on-premises systems) with 99.5% accuracy
Built custom transformation libraries in Python to handle JSON,CSV, and Parquet formats, improving pipeline reusability by 50% and enabling standardized data processing across healthcare and financial domains
Implemented CI/CD integration using Azure DevOps Pipelines and Data bricks Repos, ensuring version- controlled deployments, automated testing, and comprehensive audit trails via Azure Log Analytics
Collaborated with data science teams to enable feature engineering pipelines using Kafka and Snowflake, supporting predictive analytics that improved operational efficiency by 25% and reduced model training time by 40% KEY PROJECTS
Credit Card Fraud Detection System
Developed high-performance ETL pipelines using Python and P and as to process 1+ million transaction records, improving data processing efficiency by 60% and reducing processing time from hours to minutes
Built and optimized Random Forest classification model achieving 98% fraud detection precision, resulting in $2M+ potential fraud prevention annually
Designed optimized My SQL database scheam as supporting sub-second query performance forreal-time transactional analytics and fraud scoring
Containerized application using Docker and deployed on AWS/Azure, enabling scalable cloud-native architecture with 99.9% uptime
User Authentication System
Engineered secure, enterprise-grade authentication system with role-based access control (RBAC) supporting1,000+ concurrent users and ensuring zero security breaches
Built to optimized PySpark ETL pipelines for user activity log processing and normalization, enabling comprehensive audit trails and compliance reporting
Deployed scalable infrastructure on AWS EC2 with Docker containerization and S3 storage, achieving 40% cost reduction compared to traditional hosting
Automated ETL work flows using Apache Airflow, ensuring 99.9% pipeline reliability and reducing manual monitoring effort by 80%
CERTIFICATIONS
AWS Data Analytics Specialty-Amazon Web Services
Google Cloud Associate Cloud Engineer-Google Cloud Platform
Python for Data Science, AI& Development-IBM (Coursera) Credential ID: IU5FQ6E22XQK