Results-driven Data Engineer with *+ years of experience designing and managing scalable data pipelines, distributed data architectures, and cloud-based data solutions. Proficient in Python, SQL, and Databricks, with a strong track record of optimizing data processing systems and schema modeling to drive business insights. Hands-on expertise in AWS, GCP, and Azure ecosystems, including database design, query optimization, and cloud-native data services. Experienced in infrastructure as code (Terraform), automation, and performance tuning to enhance data reliability and scalability. Adept at working in hybrid environments, collaborating cross-functionally to implement high- impact data strategies.
PROFESSIONAL EXPERIENCE
AZURE DATA ENGINEER Feb 2024 – Present
STRYKER, Kalamazoo, MI
• Designed and deployed real-time data pipelines with AWS infrastructure and improved data accessibility for 5 data scientists, accelerating model development by 25%.
• Developed Power BI dashboards with DAX and custom visuals to support business intelligence and reporting.
• Optimized SQL Server queries, resulting in a 25% improvement in query performance.
• Built real-time streaming solutions utilizing Kafka and Spark Streaming for monitoring medical device performance.
• Collaborated with cross-functional teams to ensure compliance with FDA, ISO 13485, and GMP regulations.
• Constructed fully automated CI/CD pipelines with Jenkins, reducing deployment time by 30% and enabling faster iteration cycles for the engineering team, resulting in quicker updates.
AWS DATA ENGINEER March 2023 – Jan 2024
AUTO-OWNERS INSURANCE, Lansing, MI
• Developed real-time data pipelines using AWS Glue, Redshift, and Kinesis for risk scoring and underwriting models.
• Built an AWS data lake for claims processing, reducing data retrieval time by 40% and improving fraud detection accuracy by 30%.
• Configured Spark Streaming to process real-time data from Kafka for fraud detection and claims monitoring.
• Automated infrastructure provisioning using Terraform and set up CI/CD pipelines with Jenkins and GitHub.
• Created datasets from Amazon S3 using AWS Athena and generated visual insights with AWS QuickSight. GCP DATA ENGINEER April 2021- August 2022
MORGAN STANLEY, Bangalore, India
• Designed and optimized financial data pipelines using GCP BigQuery, DataProc, and Airflow for risk analytics and regulatory reporting.
• Built ETL workflows for loan processing, credit risk analysis, and fraud detection, ensuring compliance with Basel III and GDPR.
• Orchestrated financial data transformations and reporting pipelines with PySpark and Hive, processing 500+ GB of data daily while improving data accuracy for regulatory filings by 15%. Implemented machine learning techniques (sci-kit-learn, TensorFlow) for predictive analytics and anomaly detection. AWS DATA ENGINEER June 2019 – March 2021
PHILIPS HEALTHCARE, Bangalore, India
• Developed AWS Data Pipeline to extract, transform, and load (ETL) medical data from S3 into Redshift for healthcare reporting.
• Built real-time dashboards using Power BI and Tableau, providing KPI insights for operational and supply chain data.
• Optimized over 150 SQL queries within healthcare data systems through advanced index management and query tuning techniques; achieved a 40% reduction in query execution time, enhancing overall system performance for end-users. EDUCATION
Lewis University at Romeoville Illinois, USA
Master’s in Data Science Graduation Date: May 2024 VNR Vignana Jyothi Institute of Engineering and Technology Hyderabad, India Bachelor of Technology, Mechanical Engineering GraduaƟon Date: Sep 2020 TECHNICAL SKILLS
Cloud Technologies: AWS (S3, Glue, Redshift, Lambda, Kinesis), Azure (Data Factory, Databricks, Synapse), GCP (BigQuery, Dataflow) Programming Languages: Python, Scala, Java, SQL, PySpark Python Libraries: Pandas, NumPy, PySpark, Polars
Databases: Oracle, MySQL, SQL Server, PostgreSQL, Snowflake, HBase, MongoDB Big Data Tools: Hadoop (HDFS, Hive, HBase), Spark (Core, SQL, Streaming), Kafka, Airflow ETL Tools: Azure Data Factory, SSIS, Talend, Informatica Visualization Tools: Power BI, Tableau, Grafana
Data Formats: Parquet, Avro, JSON, CSV, Protobuf
DevOps Tools: Jenkins, Terraform, Docker, Kubernetes, Git, Maven PROJECTS
Liver Disease Prediction Using Machine Learning Algorithms
• Led a team of 3 in building an application to predict the occurrence of liver disease. Achieved a 94% accuracy with a selected machine learning algorithm and further enhanced diagnostic accuracy through a program. Emphasized ongoing maintenance and performance tuning for sustained accuracy levels.
• Organized a program to enhance diagnostic accuracy, achieving a 25% increase in classification accuracy through implementing Machine Learning models and scripting for efficient troubleshooting. Managed databases to ensure accurate and efficient data handling. Development of a Petrol Flow Authenticity Check Device using Arduino
• Designed and developed an Arduino-based IoT device with advanced sensors to monitor petrol flow in real time, achieving 98% accuracy in detecting anomalies and tampering. Enabled real-time monitoring and alerting for tampering, ensuring fair fuel dispensing and improving operational transparency. CERTIFICATION
• AWS Certified Data Analytics - Specialty
• Microsoft Azure Data Engineer Associate
• Google Cloud Professional Data Engineer
NAVEEN SIDDINENI
Naveennagsiddineni @gmail.com ( 813 ) 934 - 2526 https://www.linkedin.com/in/siddineninaveen/ PROFESSIONAL SUMMARY