Madhurya Vemparala
Full Stack Data Engineer
+1-972-***-**** *****************@*****.*** LinkedIn SUMMARY
Results-driven Engineer with 4+ years of experience in data analytics, cloud development, and data lake solutions. Proficient in managing product and pricing data across multiple systems, supporting POS configurations, and optimizing invoice processes. I am also adept at designing and streamlining batch/real-time pipelines using Python & SQL. Strong collaborator with experience in technical support, troubleshooting, and vendor data integration. Skilled in building HIPAA-compliant data pipelines, API-driven integrations, and lakehouse architectures across healthcare and HR domains. EXPERIENCE
Infinite Health Solutions Jan 2025 - Present
• Implemented HIPAA-compliant ETL pipelines to process sensitive healthcare transactions, incorporating EDI parsers, schema validations, and audit logging to ensure regulatory compliance and end-to-end traceability.
• Developed automated workflows for file ingestion, parsing, and transformation using Python, Spark, and Azure Data Factory, enabling scalable processing of large EDI datasets with strict data privacy safeguards.
• Integrated version control systems (Git, Bitbucket) into pipeline workflows, enabling branching strategies, commit tracking, and CI/CD automation for secure, auditable, and collaborative healthcare data engineering.
• Designed audit and monitoring frameworks for healthcare ETL pipelines with error handling, logging, and data lineage tracking, ensuring integrity, reproducibility, and compliance across multiple environments.
• Developed AI-driven Python services to generate predictive insights from de-identified healthcare data, building REST API endpoints for model inference and integrating them into full-stack applications for real-time clinical decision support. Data Engineer Technology – Evonsys July 2020 – May 2022
• Designed and implemented scalable ETL/ELT pipelines in Azure Data Factory and Databricks to process timesheet, time-off, expense, and service request data, orchestrating schema evolution and transformations using Spark, Delta Lake, and Parquet.
• Developed and integrated REST and SOAP APIs to exchange employee and HR transactions between front-end applications and back- end systems, ensuring secure payload delivery, schema validation, and real-time synchronization.
• Automated workflows for HR data ingestion, validation, and payroll reconciliation using Databricks jobs, Azure Functions, and event- driven triggers, embedding data quality checks, anomaly detection, and logging frameworks.
• Built and optimized SQL queries and stored procedures to manage large HR datasets across relational databases, supporting employee expense tracking, leave balances, and service request processing with high reliability.
• Created and published Power BI dashboards with Row-Level Security (RLS), DAX measures, and scheduled refresh pipelines, enabling HR teams to monitor workforce analytics, payroll compliance, and employee engagement trends.
• Contributed to system integration testing and debugging of HR workflows by validating API calls, monitoring payload transformations, and using Pega Tracer, SQL validation scripts, and log monitoring to ensure compliance with enterprise data governance standards.
• Implemented data lakehouse architecture for HR data, integrating structured (payroll, timesheets) and semi-structured (API payloads, JSON logs) datasets into Azure Data Lake Storage, improving scalability and enabling advanced analytics.
• Enhanced data governance and access controls by configuring role-based security, audit logging, and metadata-driven lineage tracking across HR pipelines, ensuring compliance with HIPAA and enterprise security standards. Associate Engineer – Evonsys Jan 2020 – May 2020
• Built and maintained database connections from UI to backend in Pega PRPC, integrating SQL queries, JDBC connectors, and REST APIs for secure and efficient data persistence across retail, HR, and finance workflows.
• Developed and automated Pega workflows to streamline approvals, batch data processing, and exception handling, integrating ETL logic, transaction validations, and audit trails to reduce manual intervention and turnaround time.
• Implemented event-driven agents and job schedulers in Pega to trigger ETL pipelines, email notifications, and escalation alerts, leveraging batch jobs, data transformations, and orchestration across multiple systems in real time.
• Designed Pega reporting and dashboard components to extract operational insights from employee datasets, integrating SQL queries, KPIs, and data quality checks to improve finance, HR, and compliance-driven decision-making.
• Partnered with POS and product data teams to troubleshoot ingestion pipelines, resolving API errors, data mapping mismatches, and schema conflicts to ensure consistent synchronization between upstream databases and downstream systems. SKILLS
Programming & Data Engineering: Python, SQL, PySpark, Scala, Ab Initio, Hive, EDI Parsing Cloud & Big Data: Azure (Data Factory, Databricks, Data Lake Storage, Functions), AWS (Redshift, S3, Glue, EMR, Lambda), Snowflake Data Warehousing & Modeling: Star & Snowflake Schema, Data Lakes, Lakehouse Architecture, ELT/ETL Pipelines, Data Governance & Lineage DevOps & CI/CD: Git, Bitbucket, Jenkins, Docker, Kubernetes, Terraform, CI/CD Automation Visualization & Analytics: Power BI (RLS, DAX), Tableau, SQL Reporting AI/ML & Advanced Analytics: Predictive Modeling, Time Series Forecasting, Anomaly Detection, Model Deployment via REST APIs EDUCATION
The University of Texas at Dallas – MS in Business Analytics & MBA (May 2025) Institute of Aeronautical Engineering – B. Tech in Computer Science (July 2019) ACADEMIC PROJECTS
Time Series Forecasting for Energy Consumption Trends:
• Developed ARIMA & SARIMAX models to predict energy demand, leveraging statistical tests (Dickey-Fuller, KPSS) to ensure accuracy.
• Automated forecasting workflows with Apache Airflow and AWS Lambda, reducing processing time and improving predictive efficiency. Crime Pattern Analysis Using Apache Spark:
• Processed millions of crime records with Apache Spark on Hadoop, optimizing real-time analytics with Kafka and NoSQL (Cassandra).
• Implemented geospatial clustering (K-Means) and anomaly detection, enabling law enforcement to identify high-risk areas with 20% improved accuracy.
• Developed a real-time crime dashboard (Tableau + Power BI), integrating predictive analytics to assist agencies in optimizing patrol routes and reducing response times by 25%.