Senior Data Engineer Big Data & Cloud Platforms

Location:

Houston, TX

Posted:

December 18, 2025

Contact this candidate

Resume:

SRI LAKSHMI PARVATHI ALLU

United States +1-806-***-**** ***************@*****.*** LinkedIn GitHub SRI Profile SUMMARY

Experienced Data Engineer specializing in enterprise-level big data platforms and large-scale pipeline design. Skilled in Python, Spark, PySpark, and SQL with proven success processing and analyzing extensive data sets. Expertise in integrating cloud services like AWS and GCP within agile teams to optimize data workflows and system performance. Driven to deliver innovative, efficient data solutions that meet business objectives.

TECHNICAL SKILLS

• Programming Languages: Python (Pandas, NumPy, PySpark), SQL, Java, Shell Scripting (Bash), R, Scala

• Big Data Technologies: Apache Hadoop (HDFS, MapReduce), Apache Spark (SparkSQL, Spark Streaming), Apache Kafka, Apache Hive, Apache Flink, Apache HBase, Apache Pig

• Databases: MySQL, PostgreSQL, MS SQL Server, Oracle, DB2, MongoDB, Cassandra, Amazon Redshift, Google BigQuery, Snowflake

• ETL Tools: Apache NiFi, Talend, Informatica PowerCenter, AWS Glue

• Cloud Platforms: AWS (S3, EC2, RDS, Lambda, Redshift, Athena, Kinesis), Google Cloud Platform (BigQuery, Pub/Sub, Dataflow), Microsoft Azure (Azure Data Factory)

• Data Processing Frameworks: Apache Spark (PySpark), Apache Flink, Apache Beam (Google Cloud Dataflow)

• Data Pipelines & Workflow Orchestration: Apache Airflow, Luigi, AWS Step Functions, Oozie, Cron Jobs

• Data Integration & Message Queues: Apache Kafka, RabbitMQ, AWS Kinesis, Apache Camel

• Version Control & CI/CD: Git, GitHub, GitLab, Bitbucket, Jenkins, CircleCI, Travis CI

• Data Modeling: Dimensional Modeling (Star schema, Snowflake schema), ER Modeling, Data Lake & Data Warehouse design

• Data Visualization & Reporting: Tableau, Power BI, Looker

• Containerization & Orchestration: Docker, Kubernetes

• Monitoring & Logging: ELK Stack (Elasticsearch, Logstash, Kibana), Prometheus, Grafana, Datadog

• Data Governance & Security: Data Masking, GDPR & HIPAA compliance, Data Encryption

• Operating Systems: Linux (Ubuntu, CentOS, Red Hat), Windows Server

• Job Scheduling & Automation: Apache Airflow, Cron Jobs, Oozie

• Core Competencies: Data Engineering, Agile Experience WORK EXPERIENCE

Baylor Scott & White Health Feb 2024 - Present

Data Engineer Washington

• Engineered scalable data pipelines for healthcare data processing, optimizing data flow from source systems to the data warehouse, which improved data retrieval speed and accuracy.

• Integrated data from various sources such as EHR systems, clinical databases, and external APIs to create a unified view of patient information, enhancing data accessibility for healthcare providers.

• Utilized Apache Spark and PySpark for distributed data processing over large-scale healthcare datasets, resulting in faster data processing times and improved analytical capabilities.

• Leveraged AWS services (S3, Lambda, Athena) for real-time processing and analysis of patient data, enabling timely insights for healthcare decision-making.

• Developed and maintained real-time streaming solutions using Apache Kafka to continuously monitor patient data, improving tracking of health metrics.

• Implemented automated data pipelines with Apache Airflow for continuous ingestion and transformation of healthcare data.

• Applied data governance principles to ensure HIPAA compliance and secure management of sensitive patient data.

• Optimized ETL workflows using AWS Glue and Apache NiFi to automate data extraction, transformation, and loading processes.

• Designed and implemented data models for healthcare analytics, enhancing insights into patient care outcomes.

• Created interactive dashboards with Tableau for real-time analytics, empowering stakeholders with actionable insights.

• Collaborated with the data science team in an agile environment to integrate machine learning models, thereby enhancing the accuracy of patient readmission predictions.

• Executed data encryption and masking techniques to safeguard privacy regulations and reinforce data security.

• Leveraged Google Cloud Platform (BigQuery, Pub/Sub) for large-scale data storage and processing, significantly enhancing data accessibility and efficiency.

• Developed robust data pipelines for ingesting patient data from various medical devices, ensuring seamless data flow and integration.

• Built custom APIs to extract data from external sources such as pharmacies and insurance providers, expanding data availability.

• Enhanced performance by optimizing SQL queries and data processing workflows, which accelerated analytics.

• Constructed data lakes to store unstructured health data, enabling advanced analytics and supporting research teams.

• Configured automated alerts using Prometheus and Grafana to monitor data pipeline performance. Pacific Western Bank May 2021 - Jan 2023

Data Engineer CA

• Ensured data quality and consistency through regular validation checks embedded in the data pipeline.

• Troubleshot pipeline failures and implemented corrective measures to enhance system reliability.

• Developed ETL pipelines for processing financial transaction data and integrating it into the bank’s data warehouse, improving accessibility for analytics teams.

• Utilized Apache Kafka and AWS Kinesis for real-time streaming of transactional data, thereby improving data synchronization and operational efficiency.

• Optimized SQL queries to accelerate extraction and loading from transactional databases, reducing data processing time.

• Automated data extraction from multiple banking systems using AWS Glue and Apache NiFi, which reduced manual processing time and increased accuracy.

• Designed data models for tracking financial transactions and customer behavior, enhancing analysis capabilities for strategic deci- sion-making.

• Built data lakes and managed the integration of structured and unstructured financial data, improving data accessibility for reporting.

• Created interactive dashboards with Power BI to monitor daily operations and key performance indicators.

• Developed automated reporting systems to track critical financial data, thereby reducing manual effort.

• Integrated third-party APIs to enrich internal financial data, enabling more robust reporting frameworks.

• Conducted performance tuning of SQL queries to ensure swift data retrieval and reporting.

• Managed version control of data models and scripts using Git and GitHub for consistent updates.

• Collaborated with business analysts to interpret data needs and deliver actionable insights through custom reports.

• Implemented CI/CD pipelines with Jenkins and GitLab, reducing deployment times and minimizing errors in data engineering solutions.

• Ensured data security by applying encryption and masking techniques, maintaining compliance with financial regulations.

• Performed root-cause analysis on pipeline issues and optimized workflows to minimize downtime and improve reliability.

• Constructed AWS Redshift clusters to enhance data warehousing and analytical processing capabilities.

• Developed an automated system for transaction fraud detection using integrated machine learning models.

• Reinforced data consistency with robust quality checks across the data pipeline.

• Collaborated with IT teams to deploy and scale data processing infrastructure using Docker and Kubernetes.

• Mentored junior team members on data engineering best practices, thereby improving team performance and project delivery. FedEx Express Jan 2019 - May 2021

Data Engineer Memphis

• Built and maintained ETL pipelines for processing logistics and shipping data from multiple systems, resulting in improved data accuracy and operational efficiency.

• Optimized data pipelines for managing large-scale data related to package tracking, delivery performance, and customer feedback.

• Leveraged Apache Flink for real-time stream processing of parcel delivery statuses, enhancing delivery update timeliness and customer satisfaction.

• Integrated external data sources and APIs to enrich logistics data, providing deeper insights into operational performance.

• Employed AWS services such as S3 and Lambda for scalable data storage and processing, improving accessibility and reducing processing times.

• Designed comprehensive data models for shipping analytics, thereby bolstering operational efficiency and decision-making.

• Created dynamic dashboards using Tableau and Looker to track logistics performance.

• Developed automated reporting systems for real-time insights into shipping volumes, delays, and customer satisfaction.

• Ensured data quality through validation checks and automated cleansing processes while working with large datasets.

• Containerized applications using Docker to streamline development and deployment processes.

• Utilized Apache Kafka for real-time monitoring of parcel statuses across multiple shipping channels.

• Enhanced SQL query performance and workflows to ensure prompt data retrieval for reporting.

• Collaborated with operations teams to design solutions that minimized delivery delays.

• Built a data warehouse with AWS Redshift for comprehensive global logistics reporting.

• Automated routine tasks through Cron Jobs for periodic data extractions and transformations.

• Improved data pipeline performance by tuning processes with Apache Spark, resulting in more efficient system resource usage and faster processing.

• Participated actively in cross-functional teams to define data requirements and optimize infrastructure.

• Architected cloud-based data solutions to ensure scalable, efficient, and secure logistics data processing.

• Enhanced data security by implementing encryption and access controls on sensitive shipping data.

• Coordinated with IT and security teams to ensure adherence to GDPR compliance in data handling and processing. Education

Texas Tech University

Masters, Computer & Information Science Lubbock

• Achievements: Developed MPI-based distributed matrix processing system for optimized computation, Built intelligent web-based job search assistant using SPARC and ASP solvers, Designed wireless network analysis tool using Kismet/Airodump-ng with XML output, Created course registration query agent with speech-to-text and AI reasoning, Completed aerial computing simulations with MATLAB for dynamic path planning

• Coursework: Advanced Database Systems, Big Data & Cloud Computing, Intelligent Systems, Data Mining & Machine Learning, Network Security, Distributed & Parallel Processing, Aerial Computing, Advanced Algorithms, Wireless & Mobile Computing, Artificial Intelligence, Software Engineering, Web Technologies, Project Management Projects

Migration to cloud-Based Data Platform

Baylor Scott & White Health

• The existing on-premises data infrastructure was fragmented, leading to inefficiencies in data access and reporting across departments.

• Led the migration of legacy data systems to a unified cloud-based platform using Snowflake and DBT. This involved designing and implementing ETL pipelines, ensuring data integrity, and optimizing performance.

• Achieved a 40% improvement in data processing speeds and reduced data retrieval times by 50%.

• Enhanced decision-making capabilities through faster and more reliable data access, supporting better patient care and operational efficiency.

Cloud Data Lakehouse Implementation for Financial Analytics Pacific Western Bank

• The bank's legacy on-premises data systems were siloed, resulting in delayed reporting, regulatory compliance risks, and rising infrastructure maintenance costs.

• Led the design and development of a cloud-native data lakehouse architecture on AWS integrating S3, Glue, Redshift, and Athena. Built automated ETL pipelines using Python, SQL, and Apache Airflow to ingest, clean, and unify financial transaction, customer, and compliance data.

• Reduced data processing and report generation time by 60%, improved data accuracy for internal audits, and enabled self-service analytics across finance and compliance teams.

• Lowered infrastructure and operational costs by approximately $500K annually through cloud migration, improved regulatory compliance, and enhanced decision-making speed - directly supporting strategic growth initiatives. COVID-19 Vaccine Logistics Data Pipeline

FedEx Express

• Urgent need for accurate, real-time tracking of PPE and vaccine shipments during the COVID-19 crisis, while maintaining HIPAA compliance.

• Developed emergency ETL pipelines using AWS Glue and Python to prioritize and monitor critical healthcare shipments, with strict data privacy measures.

• Enabled a 99% on-time delivery rate for critical shipments during the pandemic response.

• Strengthened FedEx's healthcare logistics reputation, leading to multi-million dollar government contracts for vaccine distribution. Certifications

• Microsoft Certifies Fabric Data Engineer Associate:Issued by Microsoft-April 2025

• AWS Certified Solution Architect-Associate:Issued by: Amazon Web Services (AWS)-April 2025

• AWS APAC-Solution Architecture Job Simulation:Issued By: Forage-March 2025

• Risk Management - Goldman Sachs:Issued By: Forage-March 2025

Contact this candidate