Post Job Free
Sign in

Data Engineer Engineering

Location:
Melissa, TX
Posted:
October 15, 2025

Contact this candidate

Resume:

Saniya Malyala

Data Engineer

469-***-**** *************@*****.*** LinkedIn Melissa, TX

SUMMARY

Results-driven Data Engineer with 3+ years of experience in designing, developing, and optimizing real-time and batch ETL/ELT workflows across AWS and Azure cloud platforms. Proficient in big data technologies such as Kafka, Spark, Airflow, Databricks, and dbt, with expertise in data governance, data lakes, and both SQL/NoSQL databases including PostgreSQL and Oracle. Adept at building secure, scalable data architectures, optimizing performance, and enabling analytics and ML solutions that drive actionable business insights. Collaborative team player with strong analytical skills and a proven track record in delivering high-impact data engineering solutions using Agile methods.

SKILLS

Data Engineering: Apache Spark, Apache Kafka, Databricks, AWS Glue, AWS Kinesis, HDFS, Snowflake, AWS Redshift, BigQuery, Data Lake, Dimensional Modeling, Star & Snowflake Schema, OLAP ETL/ELT: Python (Pandas, PySpark), SQL, dbt, Data Cleansing, Data Standardization, ETL Processes, CDC, Partitioning & Indexing, API Integrations, Workflow Automation, Real-Time Streaming Pipelines, Schema Design & Optimization Cloud: AWS (S3, IAM, KMS, EC2, CloudWatch, EKS, Lake Formation), Azure Data Factory, Azure Synapse, Azure Event Hubs, Azure Data Lake, Azure Blob Storage, Azure Insight Hub, Data Governance Databases: PostgreSQL, Oracle, TimescaleDB, InfluxDB, NoSQL (MongoDB, Cassandra), Complex SQL Queries Orchestration: Apache Airflow, AWS Step Functions, DAG Design, Job Scheduling, Monitoring Security: Data Masking, RBAC, HIPAA Compliance, Governance Policies, Encryption Standards (KMS, TLS) DevOps: Docker, Kubernetes, CI/CD, Git, Linux, Unix, Bash, Containerized ETL, Prometheus, Monitoring & Alerting Analytics & ML: Analytical Skills, Data Storytelling, Trend Analysis, ML Integration, Scikit-learn, TensorFlow, Predictive Risk Scoring

Agile: Scrum, Sprint Planning, Cross-functional Collaboration, Continuous Improvement EXPERIENCE

HCA Healthcare, USA Data Engineer Mar 2025 to Present

• Devised and maintained automated data pipelines using SQL and Snowflake on Azure Synapse Analytics to integrate pharmacy, claims, and member survey data, enabling care teams to assess utilization trends and chronic condition management gaps, processing 5M+ patient records daily.

• Architected and orchestrated end-to-end pipelines integrating EMR, LIS, pharmacy, billing, and operational systems into Azure Data Lake and Event Hubs with Azure Data Factory for ingestion, ensuring standardized medical codes and improving data consistency across 120+ facilities.

• Developed scalable ETL/ELT workflows using Apache Spark, dbt, and Python for cleansing, de-duplication, and enrichment, delivering HIPAA-compliant datasets through HCA Insight Hub and supporting downstream ML models for predictive risk scoring.

• Implemented real-time lab result ingestion with Event Hubs streaming, Spark-based processing, and Azure Blob tiering, improving dashboard refresh rates by 35% and enabling physicians to make decisions within minutes of test completion.

• Optimized Synapse and Snowflake queries via schema design, indexing, and partitioning strategies, reducing report generation time by 28% and enabling faster predictive analytics and data-driven decisions for hospital administrators. Experion Technologies, India Data Engineer Feb 2021-Jul 2023

• Built structured ETL pipelines using SQL and Python (Pandas) to extract, clean, and merge app usage logs with insurance claims data, standardizing timestamps and medical codes for consistent analysis of 8M+ records per month.

• Designed and deployed a hybrid ingestion framework combining mobile app field sales data, IoT sensor streams, and port logistics events via Apache Kafka, REST APIs, and MQTT, storing 15TB+ raw data in an AWS S3 Data Lake for staging and archival.

• Developed transformation workflows with Apache Spark and Airflow to unify product codes, enrich shipment events, and produce analytics-ready datasets in Snowflake and Amazon RDS (PostgreSQL/TimescaleDB), enabling faster insights.

• Implemented real-time aggregation pipelines for container tracking and sales transactions, streaming metrics to Grafana and BI dashboards, improving operational response times across retail and logistics teams.

• Optimized Snowflake warehouse schema and data models, cutting dashboard query latency by 31% and allowing stakeholders to access live metrics without delays.

• Automated anomaly detection and alerting through Prometheus and AWS CloudWatch, reducing incident resolution time by 26% and preventing workflow disruptions in ports and distribution centers.

• Applied AWS IAM, KMS encryption, and Lake Formation policies to protect sensitive sales and logistics records while meeting compliance standards.

• Deployed containerized Kafka, Spark Streaming, and ETL services on Amazon EKS (Kubernetes), ensuring scalable pipeline performance as data sources and integration points expanded. EDUCATION

Master of Science in Computer and Information Science May 2025 Southern Arkansas University, USA

Bachelor of technology in Computer Science and Engineering May 2022 Jawaharlal Nehru Technological University, India



Contact this candidate