Senior Data Engineer with Big Data & Python Expertise

Location:

Cincinnati, OH

Salary:

110k

Posted:

January 13, 2026

Contact this candidate

Resume:

Prudhvi Kuchipudi

Data Engineer

********************@*****.***

+1-513-***-**** Prudhvi LinkedIn

SUMMARY

• Results-driven Data Engineer with 5+ years of experience building scalable data pipelines, ETL/ELT workflows, and cloud-based data solutions across AWS, Azure, and GCP.

• Proficient in Python, SQL, PySpark, and Apache Airflow for large-scale data processing, orchestration, and workflow automation.

• Hands-on expertise in Big Data technologies including Apache Spark, Kafka, and Hadoop, ensuring optimized data transformation and streaming solutions.

• Strong knowledge of data modeling (Star/Snowflake schemas) and data warehousing platforms such as Snowflake, Redshift, and Azure Synapse.

• Skilled in DataOps and CI/CD automation using Git, Docker, Kubernetes, and Terraform, ensuring seamless deployment and monitoring of data pipelines.

• Adept at implementing data quality, governance, and compliance standards (GDPR, HIPAA) to ensure secure and reliable data ecosystems.

• Collaborative team player experienced in working with data scientists, analysts, and stakeholders to deliver analytics-ready datasets and business insights.

SKILLS

Programming & Scripting: Python, SQL, Scala, Java, Bash Big Data Ecosystem: Apache Spark, PySpark, Hadoop, Kafka, Flink, Hive, HDFS, Databricks, Snowflake Data Pipelines & ETL/ELT: Apache Airflow, Prefect, dbt, AWS Glue, Azure Data Factory, Cloud Data Fusion, Kafka Connect Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda), Azure (Synapse, Data Factory, Blob Storage), GCP (BigQuery, Dataflow, Dataproc, Vertex AI)

Databases: MySQL, PostgreSQL, Oracle, MongoDB, Cassandra, DynamoDB, NoSQL, SQL Server Data Modeling & Warehousing: Dimensional Modeling, Star/Snowflake Schema, OLAP/OLTP, Data Marts, Fact/Dimension Tables DataOps & MLOps: Git, GitHub, CI/CD (GitHub Actions, Jenkins), Docker, Kubernetes, Terraform, MLflow, Data Observability, Pipeline Monitoring

Data Governance & Security: Data Quality Management, Data Lineage, Role-Based Access Control (RBAC), Encryption (At-Rest & In- Transit), GDPR & HIPAA Compliance

Analytics & Visualization: Power BI, Tableau, Looker, Matplotlib, Seaborn, Plotly Other Tools & Concepts: JSON, API Integration, RESTful Services, Shell Scripting, Agile/Scrum Methodology, Performance Tuning, Logging

& Monitoring (CloudWatch, Datadog)

Soft Skills: Analytical Thinking, Problem Solving, Cross-Functional Collaboration, Communication, Continuous Learning, Stakeholder Management

EXPERIENCE

Modak Analytics LLP India Data Engineer Oct 2020 – Aug 2024

• Designed and developed scalable ETL/ELT pipelines using Apache Airflow, PySpark, and AWS Glue to automate data ingestion from heterogeneous sources (APIs, RDBMS, flat files) into Snowflake and AWS S3.

• Optimized Spark-based data transformation workflows, reducing data processing time by 30% and improving overall pipeline efficiency and reliability.

• Implemented data modeling techniques including Star and Snowflake schemas to support data warehousing and business intelligence initiatives using Power BI and Tableau.

• Built real-time data streaming solutions using Apache Kafka and AWS Lambda, enabling near real-time analytics and event-driven processing.

• Managed data quality and governance frameworks, applying validation rules, lineage tracking, and role-based access control (RBAC) for GDPR-compliant operations.

• Collaborated with data scientists and business analysts to prepare and deliver clean, high-quality datasets for machine learning and analytics workloads.

• Deployed containerized data pipelines using Docker and automated deployment workflows through GitHub Actions for CI/CD integration.

HCL India Data Engineer Intern June 2019 – Oct 2020

• Assisted in the development of ETL workflows using Python and SQL to migrate on-premise data to AWS Redshift and Azure Blob Storage.

• Created and maintained data ingestion scripts using Pandas and PySpark, improving data refresh frequency and reliability for internal analytics dashboards.

• Supported senior engineers in configuring Apache Airflow DAGs for job scheduling, monitoring, and alerting across multiple environments.

• Participated in data validation and profiling activities, identifying schema mismatches and improving data accuracy by 20%.

• Documented pipeline workflows, data dictionaries, and source-to-target mappings to streamline future enhancements and audits. EDUCATION

Master’s degree in Computer Science University of Cincinnati Bachelors inComputer Science B. V. Raju Institute of Technology, Information Technology CERTIFICATIONS

• Microsoft— Certified Azure Data Fundamentals, Azure Fundamentals

• AWS-- Certified Solutions Architect - Associate

• Coursera Certifications – Python, SQL

• Red Hat Certification – Linux certification

• Organized— College Induction 2k19, Promethean 2K19

• Cloudera-- Certified Technical Professional

• Hacker Rank Certifications – Problem-Solving

• Services-- National Service Scheme (NGO), Blood Donation Camp PROJECTS

Hadoop Administrator Expertise in Pharmaceutical & CML Projects

• Accomplished Hadoop Administrator with extensive experience supporting a pharmaceutical project for AbbVie and delivering results on critical CML initiatives. Proficient in designing, managing, and optimizing databases, implementing robust access management controls, and creating IAM policies to ensure secure and efficient system operations.

• Skilled in leveraging AWS services, including S3, EC2, EFS, IAM, EKS, and utilizing Hive and SQL for advanced data analysis and processing. Demonstrated ability to streamline data workflows and manage complex data ecosystems in high-stakes environments. Parkinson’s Disease Detection using MRI Data Feb 2021 – June 2021

• The proposed system involves the development of a Machine learning custom algorithm on the native neural networks. A typical neural network consists of different types of layers, followed by pooling and finally flattening layers.

• A novel CNN-based approach is used to detect Parkinson’s disease in MRI. Built a learning model that can predict whether the given subject is affected by the disease or not. Features are extracted from certain regions of the brain where there is a deficiency of dopamine. These features contribute to the high value of the model and bring much accuracy.

Contact this candidate