Data Engineer Machine Learning

Location:

Overland Park, KS, 66211

Posted:

October 15, 2025

Contact this candidate

Resume:

INDU CHINTAPALLI

KS, USA (Open to Relocate) +1-334-***-**** *****************@*****.*** LinkedIn GitHub Portfolio

PROFESSIONAL SUMMARY

Data Engineer with 4+ years of experience in big data engineering, cloud-native analytics, and machine learning within healthcare and e-commerce domains. Proven expertise in architecting and deploying secure, high-performance ETL data pipelines, optimizing AWS-based architectures, and operationalizing ML models to enable real-time decision-making. Proficient in Databricks, Apache Spark/PySpark, AWS (S3, RDS, Redshift, Lambda, Glue, DynamoDB), Kafka, and Python, achieving measurable results such as faster ETL workflows,cost savings, and 95%+ model accuracy. Expert in processing structured and unstructured datasets, integrating multi-source data, and developing scalable, analytics-ready pipelines to improve patient outcomes and support datadriven innovation.

SKILLS

•Programming Languages: Python, Java, JavaScript, TypeScript, SQL, HTML5, CSS3

•Data Analysis & Querying: SQL (Advanced Joins, CTEs, Stored Procedures, Data Reconciliation), Python (Pandas, NumPy,

Jupyter Notebooks)

•Data Visualization & Reporting: Tableau, Power BI, Excel (Pivot Tables, VLOOKUP, Macros, Advanced Charting), Informatica. Matplotlib, Seaborn

•Data Warehousing & Storage: Snowflake, AWS Redshift, AWS S3, EC2

•Data Integration & Transformation : Source-to-Target Mapping, Data Dictionary Creation, Data Cleansing, Data Pipeline Management, Data Transformation

•Big Data & Cloud Technologies: Databricks, Spark, AWS (S3, RDS, Redshift, Lambda, DynamoDB, Glue, EMR), Kafka, RabbitMQ

•Data Engineering & ETL& Automation: ETL, Data Pipeline Architecture, Data Modeling, Alteryx, AWS Glue, Lambda

•EDI Transactions & File Formats: EDI formats; XML, Flat Files, Delimited Files

•Data Security & Compliance: HIPAA Compliance, PHI Data Handling, Data Privacy Compliance

•Frameworks & Libraries: Spring Boot, Hibernate, Angular, React, Django, Express.js

•Databases and Methodologies: PostgreSQL, MySQL, Agile (Scrum)

•Paradigms: PyTorch, Tenser Flow, OOP, Deep Learning, Neural Networks, ML models, EDA, MVC, REST APIs.

•CI/CD & DevOps: Jenkins, GitHub Actions, Docker, Kubernetes, Jira, Confluence (Agile/Scrum Frameworks)

•Documentation: SQL Query Documentation, Functional & Technical Requirements, User Manuals, Process Workflows

•Testing Tools & Data Quality: Scenario-Based Testing, JUnit, Mockito, PyTest, Jasmine, Jest UAT, Claims Data Reconciliation, Data Validation

PROFESSIONAL EXPERIENCE

Walmart Inc Dec 2024 - Present

NLP Data Engineering

•Developed a Kubernetes-based ML pipeline for clinical documentation analysis, performing PDF data extraction and processing for real-time insights

•Optimized SQL queries for a clinical data warehouse, boosting BI reporting speed by 30% using process mapping and data processing techniques

•Automated report generation using Python, ETL tools, and Kanban; reduced manual effort by 40% via efficient data processing and PDF handling

•Implemented Master Data Management, ensuring GDPR compliance and data quality for reliable model training; leveraged statistical concepts, Agile, and process mapping for a key clinical solution; included robust data extraction procedures

•Constructed and optimized scalable ETL pipelines using AWS Lambda, S3, RDS, and API Gateway, reducing data ingestion time by 65% and supporting On-demand analytics

•Designed and developed 12+ RESTful and GraphQL APIs integrated with AWS S3, Lambda, and RDS, enabling 30% faster data retrieval for analytics dashboards

•Engineered high-performance Spark jobs on Databricks to process 10M+ daily records, cutting ETL runtimes by 55% and enabling faster business decisions

•Optimized data models on AWS Redshift and MongoDB, improving query execution speed by 50% and reducing storage costs by 20%

•Integrated streaming data frameworks with Kafka for low-latency data delivery, improving data freshness from T+1 day to near real-time

•Implemented CI/CD pipelines with GitHub Actions and Jenkins, automating deployments and achieving a 60% reduction in downtime

•Collaborated with cross-functional teams to ensure HIPAA-compliant data handling for healthcare applications, supporting 100% audit readiness

Tech Mahindra Jan 2022 – Jul 2023

Data Engineer Associate India

•Developed 12+ RESTful and GraphQL APIs integrated with AWS S3, Lambda, and RDS, enabling 30% faster data

retrieval for analytics dashboards.

•Built Live streaming pipelines using Kafka, Spark, and Python, reducing data processing latency by 70% and enabling subsecond insights

•Constructed and released data ingestion frameworks to handle 1TB+ daily data, increasing data accuracy by 35% and cutting manual validation efforts by 40%

•Aligned efforts with data scientists to deploy machine learning models into production pipelines, reducing inference latency by 45% and improving prediction accuracy to 95%

•Containerized applications with Docker and deployed on Kubernetes, improving pipeline scalability and reducing environment setup time by 50%

•Partnered with QA teams to implement automated testing frameworks, reducing production defects by 40% and boosting release quality

•Developed reports and dashboards in Power BI by integrating data from various sources, including on-premises databases and cloud platforms, ensuring comprehensive insights

•Utilized Power Query for efficient data binding and transformation, enabling seamless data preparation for reporting

•Created reports by importing, directly querying, and connecting live data from multiple sources, ensuring real-time insights for stakeholders

•Applied advanced data transformation techniques, such as calculated columns, measures, row manipulations, and handling of date and time columns, to meet complex business requirements

•Provided ongoing support and troubleshooting for multiple Power BI reports in production workspaces, ensuring smooth operations and accurate reporting

Cognizant Jun 2021 – Dec 2022

Data Analytics Engineer India

•Contributed to cross-functional teams in Scrum calls to integrate frontend components with robust backend APIs, testing Over 100+ endpoints using tools like Postman, ensuring efficiency and secure data flow

•Conducted Test Driven Development (TDD), covering 95% of critical e-commerce platform functionalities

•Ensured code quality by unit testing peer code reviews and deployed containerized microservices to Kubernetes clusters for scalable cloud infrastructure

•Integrated Firebase solutions like Firestone and Authentication

•Enhanced user interfaces for a facility management tool, improving user experience and functionality

•Upgraded application components from Angular 7 to Angular 11, boosting performance and usability

•Customized Material UI components to integrate seamlessly with the Angular framework

EDUCATION

University of central Missouri, Lee's Summit, MO, USA

Master of Information Technology

Shri Vishnu Engineering College, India

Bachelor of Engineering, Electronics and Communication Engineering

PROJECTS

Credit Card Fraud Detection Using PySpark

•Developed a credit card fraud detection model using PySpark achieving 95% accuracy in classifying transactions. And Preprocessed

100% of the dataset, including cleaning, encoding, and feature engineering with PySpark

•Improved model performance by 20% through feature standardization using Vector Assembler and Tableau dashboards for Responsive fraud trend visualization highlighting a 3% fraud occurrence rate

Sales Data Analysis using MySql and Snowflake

•Analyzed sales data using MySQL and Snowflake, optimizing database performance by 25%.and Calculated total sales and shipping costs for improving reporting accuracy by 30%.

•Then Identified top 10% of high-spending customers for targeted strategies To Enhanced credit card transaction analysis and reduced query processing time by 20%.

Los Angeles Wildfire Prediction System Using AWS

•Architected a Kubernetes-based multi-tenant system with JWT authentication and RBAC, achieving 90% system uptime and 20% faster response times.

•Integrated ChatGPT based automated responses, improving customer query resolution speed by 35%.

•Engineered a data pipeline and machine learning model to assess wildfire risk using AWS services.

•Collected and stored historical wildfire data from NOAA, NASA FIRMS, and USDA Vegetation Index in AWS S3

•Performed data cleaning and transformation using AWS Glue, ensuring high-quality datasets for analysis.

•Utilized AWS Athena to query and process large datasets efficiently for real-time risk assessment.

Health care: Heart attack possibility

• Analyzed and visualized heart attack prevalence data using Tableau, revealing that 54.45% of patients have heart disease, with a gender-based prevalence of 58% in males and 47% in females,To Enhanced credit card.

•Developed interactive dashboards to present insights on heart disease demographics, highlighting that males constitute 68% of the dataset, contributing to improved decision-making in healthcare analytics.

CERTIFICATIONS

•Microsoft Certified: Power BI Data Analyst Associate (Certification number: T30BD9-836A62)

•Data Warehouse ETL Testing & Data Quality Management A-Z:(Udemy)

Contact this candidate