Data Engineer Engineering

Location:

Worcester, MA

Salary:

60000

Posted:

September 10, 2025

Contact this candidate

Resume:

PROFESSIONAL SUMMARY

MADHURI ALLUMALLA

Data Engineer

+1-203-***-**** ****************@*****.***

Allumalla Madhuri - LinkedIn

Over 3+ years of experience in data engineering, data warehousing, and big data analytics with expertise in designing and implementing scalable data pipelines and ETL processes.

Proficient in cloud platforms including AWS, Azure, and GCP with hands-on experience in data lake architectures and real- time streaming solutions.

Extensive experience with Apache Spark, Hadoop ecosystem, Kafka, and distributed computing frameworks for processing large-scale datasets.

Strong expertise in SQL, Python, Scala, and Java for data manipulation, transformation, and automation of data workflows.

Skilled in data modeling, schema design, and optimization techniques for both relational and NoSQL databases including PostgreSQL, MySQL, MongoDB, and Cassandra.

Experience with containerization technologies like Docker and Kubernetes for deploying data applications in microservices architecture.

Proficient in CI/CD pipelines, version control systems (Git), and infrastructure as code using Terraform and CloudFormation.

Strong background in data governance, data quality frameworks, and implementing security best practices in data engineering solutions.

Experience with business intelligence tools like Tableau, Power BI, and data visualization techniques for stakeholder reporting.

Excellent communication skills, problem-solving abilities, adaptability, and team player passionate about learning innovative technologies.

TECHNICAL SKILLS

Programming Languages: Python, SQL, Scala, Java, R, Shell Scripting, PL/SQL

Big Data Technologies: Apache Spark, Hadoop (HDFS, MapReduce, Yarn), Apache Kafka, Apache Airflow, Apache NiFi, Databricks

Cloud Platforms: AWS (S3, EC2, EMR, Glue, Lambda, Redshift, Kinesis), Azure (Data Factory, Synapse, Databricks), GCP

(BigQuery, Dataflow, Pub/Sub)

Databases: PostgreSQL, MySQL, Oracle, SQL Server, MongoDB, Cassandra, Redis, DynamoDB, Snowflake

Data Warehousing: Amazon Redshift, Snowflake, Azure Synapse Analytics, Google BigQuery, Teradata

ETL/ELT Tools: AWS Glue, Apache Airflow, Talend, Informatica, SSIS, Azure Data Factory, Google Dataflow

Streaming Technologies: Apache Kafka, Amazon Kinesis, Apache Storm, Apache Flink, Azure Event Hubs

Containerization & Orchestration: Docker, Kubernetes, Apache Mesos, Docker Compose

Version Control & CI/CD: Git, GitHub, GitLab, Jenkins, Azure DevOps, AWS CodePipeline

Infrastructure as Code: Terraform, AWS CloudFormation, Azure Resource Manager, Ansible

Operating Systems: Windows 10, Windows 11, Windows Server, Linux (Ubuntu, CentOS, RHEL), macOS

Data Visualization: Tableau, Power BI, Grafana, Apache Superset, Matplotlib, Seaborn

Monitoring & Logging: Splunk, ELK Stack (Elasticsearch, Logstash, Kibana), CloudWatch, Azure Monitor, Prometheus CERTIFICATIONS

AWS Certified Cloud Practitioner.

PROFESSIONAL WORK EXPERIENCE

GE Healthcare Data Engineer Chicago, IL Feb 2024 – Present Responsibilities:

Architected and implemented end-to-end data pipelines using AWS Glue and Apache Airflow to process 500+ GB of daily medical imaging data, improving data processing efficiency by 40%.

Designed and developed real-time streaming solutions using Apache Kafka and AWS Kinesis to handle patient monitoring data from 200+ medical devices across multiple healthcare facilities.

Built scalable ETL processes using Python and Spark on AWS EMR to transform and load healthcare data into Redshift data warehouse, reducing data latency by 35%.

Implemented data quality frameworks and validation rules using Great Expectations and custom Python scripts to ensure 99.9% data accuracy for critical patient care analytics.

Created automated data monitoring dashboards using Tableau and CloudWatch, providing real-time visibility into data pipeline performance and healthcare KPIs.

Optimized database queries and implemented partitioning strategies in PostgreSQL and Redshift, resulting in 50% faster query execution times.

Collaborated with data scientists and healthcare analysts to develop machine learning models for predictive analytics and patient outcome optimization.

Established data governance policies and implemented security measures including encryption, access controls, and HIPAA compliance protocols.

Managed containerized applications using Docker and Kubernetes for scalable deployment of data processing microservices.

Environment: AWS (S3, EC2, EMR, Glue, Redshift, Kinesis), Python, Apache Spark, Apache Kafka, Apache Airflow, PostgreSQL, Tableau, Docker, Kubernetes, Git, Terraform

DXC Technology Associate Data Engineer India May 2021 – Jun 2023 Project: Digital Banking Data Platform for Risk Analytics and Customer Intelligence Responsibilities:

Developed and maintained ETL pipelines using Apache Spark and Python to process 1TB+ of daily financial transaction data, achieving 99.5% data processing accuracy.

Implemented data lake architecture on Azure using Data Lake Storage Gen2 and Azure Data Factory, enabling efficient storage and processing of structured and unstructured banking data.

Created real-time fraud detection data streams using Apache Kafka and Azure Event Hubs, processing 10,000+ transactions per second with sub-second latency.

Built automated data validation and cleansing processes using SQL and Python, reducing manual data quality checks by 60%.

Designed dimensional data models and star schemas in Azure Synapse Analytics for financial reporting and regulatory compliance requirements.

Developed interactive dashboards and reports using Power BI and SQL Server Reporting Services for executive leadership and regulatory reporting.

Optimized Hadoop cluster performance and managed HDFS storage, resulting in 25% reduction in processing time for batch jobs.

Implemented CI/CD pipelines using Azure DevOps and Git for automated deployment of data engineering solutions.

Collaborated with cross-functional teams including risk analysts, compliance officers, and business stakeholders to deliver data-driven insights.

Performed database administration tasks including backup, recovery, and performance tuning for SQL Server and MongoDB instances.

Created comprehensive documentation and conducted knowledge transfer sessions for data pipeline maintenance and troubleshooting.

Environment: Azure (Data Factory, Synapse Analytics, Data Lake Storage), Apache Spark, Apache Kafka, Python, SQL Server, MongoDB, Power BI, Hadoop, HDFS, Azure DevOps, Git, Shell Scripting EDUCATION

Master of Science in Computer Science from Clark University, Worcester, MA, USA.

Bachelor of Technology in Electronics and Communication Engineering from Gayatri Vidya Parishad College of Engineering, Visakhapatnam, India.

Contact this candidate