Sowmya Kotagiri Data Engineer
***************@*****.*** +1-254-***-**** LinkedIn
Summary
Data Engineer with over 5 years of experience in engineering and maintaining data pipelines, databases, and cloud-based solutions. Skilled in SQL, Python, and big data technologies with hands-on expertise in AWS, Azure, and GCP. Strong background in healthcare and IT projects, ensuring data accuracy, security, and compliance. Proficient in ETL processes, data modeling, and reporting to support business decisions. Skills
Programming & Scripting: Python (Pandas, NumPy,
PySpark), SQL (T-SQL), Bash
Cloud Platforms: Azure (Data Factory, Databricks, Synapse), AWS (S3, Redshift, Glue, EMR, Athena), GCP
(BigQuery, Dataflow, Pub/Sub), Teradata
Hadoop Ecosystem: HDFS, YARN, MapReduce, Hive,
Sqoop, Spark (1.x/3.x), Pig
Big Data & Streaming Tools: Apache Spark, Apache Kafka, Apache Flink, Spark Streaming, PySpark
ETL Tools: Informatica, Talend, ETL/ELT pipeline development, Data Warehousing, Data Modeling,
Apache Airflow, dbt
Databases: PostgreSQL, DB2, SQL Server, MySQL,
Oracle, Cosmos DB, Snowflake, Delta Lake, Redis
Visualization Tools: Tableau, Power BI (DAX, Power Query), IBM Cognos Analytics (Framework Manager,
Report Studio, Transformer)
Healthcare Data Standards & Compliance: HL7,
FHIR, HIPAA compliance, Claims Data, Healthcare
Data Platforms
DevOps Tools: Git/GitHub, CI/CD pipelines, Docker, Kubernetes, Terraform
Soft Skills: Stakeholder Communication,
Agile/Scrum Collaboration
Experience
UnitedHealth Group, IL Jan 2023 – Current
Data Engineer
Designed and automated ETL pipelines with Informatica to integrate claims and patient encounter data (HL7, FHIR, EDI 837/835) from diverse sources, reducing data ingestion time by 35% while ensuring HIPAA compliance.
Built real-time streaming workflows using Apache Kafka to process high-volume eligibility and claim transactions, scaling throughput to 500K+ events per hour for downstream analytics.
Developed optimized data warehouse structures in AWS Redshift, leveraging partitioning and query optimization to accelerate financial and clinical reporting by 30% and cut query runtime from minutes to under 40 seconds.
Delivered interactive Power BI dashboards that provided payers and providers with actionable insights on cost forecasting, claims adjudication, and patient outcomes, increasing stakeholder adoption by 60%. Infosys, India Aug 2021 – Sept 2022
Data Engineer
Built scalable ingestion pipelines on Apache Flink and Kafka Streams, enabling near-real-time data processing of 90K+ daily records with <5s latency for IT service monitoring.
Designed and implemented Delta Lake architecture on Databricks Lakehouse, improving query performance by 35% and ensuring ACID compliance for critical IT operational datasets.
Automated ETL workflows using dbt and Airflow, reducing manual intervention by 40% while enhancing pipeline transparency and lineage tracking.
Deployed Snowflake-based data warehouse integrating multiple ITSM sources, enabling centralized analytics and reducing reporting turnaround time from 2 days to 18 hours.
Partnered with cross-functional teams to build data quality and governance checks within pipelines, aligning with ITIL standards and minimizing data inconsistencies across IT support systems. Hexaware Technologies, India Jul 2020 – Jul 2021
Junior Data Engineer
Assisted in building and maintaining ETL pipelines using Talend and Pentaho Data Integration (PDI) to ingest and process data from multiple operational systems.
Developed and optimized SQL queries, stored procedures, and functions to improve data retrieval speed by 25% for reporting teams.
Performed data cleansing, validation, and transformation tasks to ensure accuracy and consistency across enterprise applications.
Collaborated with senior engineers to design normalized and star schema data models, enabling efficient storage and faster reporting for BI users.
Education
University of New Haven, CT Aug 2023
Master’s in Business Analytics