Data Engineer Machine Learning

Location:

Richardson, TX

Salary:

70000

Posted:

September 11, 2025

Contact this candidate

Resume:

Shravan Ambati

************@*****.*** +1-469-***-**** Richardson, TX LinkedIn

Professional Summary

Data Engineer with 5+ years of experience in building scalable ETL pipelines, working with big data tools like Spark and Airflow, and deploying solutions on Azure and AWS. Skilled in handling healthcare and finance data, delivering end-to- end data solutions.

Technical Skills

Programming Languages: Python, SQL, Scala, PL/SQL, Java, Shell Frameworks & Tools: NumPy, Pandas, Matplotlib, Git, GitHub, Docker, CI/CD, Statistics, Hadoop, PySpark, ggplot2, Snowflake, Azure Cloud, AWS Cloud, Apache Airflow, Apache Kafka, Apache Spark, ETL, SSIS, SSRS, SSAS, Jupyter, Streamlit, MS 365 Suite

Databases: MySQL, Redshift, Snowflake, Big Query, PostgreSQL, MongoDB, SparkSQL, No-SQL, Oracle Data Analysis / ETL: Power BI, Tableau, Data Bricks, MS Excel, Google Analytics, Machine Learning, Data Modeling, Data Mapping, Data Mining, Data Extraction, Transformation, Informatica, Looker, Data Warehousing, Data Lake Cloud Platforms: Microsoft Azure (Data Factory, Databricks, Synapse Analytics, Azure SQL Database, Cosmos DB, Monitor), AWS (S3, Glue, Redshift, RDS, EMR, CloudWatch), GCP (BigQuery, Cloud Storage) Soft Skills: Problem-Solving, Communication, Teamwork & Collaboration, Adaptability, Attention to Detail Work Experience

Data Engineer, Optum Mar 2024 – Present USA

● Designed scalable ETL pipelines to ingest and process Electronic Health Records (EHR), insurance claims, and patient demographics from heterogeneous sources including SQL Server, HL7/FHIR APIs, and lab report files, resulting in a 50% increase in data processing efficiency.

● Built data ingestion workflows using Apache Airflow, Azure Data Factory, and GitLab CI/CD, supporting dynamic schedules (hourly to monthly), reducing manual intervention by 70%, and enabling end-to-end traceability.

● Transformed structured and semi-structured data using Python (Pandas, NumPy) and stored it in Azure Data Lake and Blob Storage, facilitating downstream analytics and ensuring schema consistency across systems.

● Architected a secure Data Warehouse and Data Lake using star and snowflake schemas, enabling historical and real- time analytics while improving query performance by 35% in Azure Synapse Analytics.

● Enabled compliance and operational insights by integrating clean datasets with Power BI dashboards and Azure Synapse, reducing report generation time by 40% and supporting regulatory mandates (HIPAA-compliant).

● Developed proactive monitoring and alerting mechanisms using Azure Monitor and Airflow sensors, leading to a 40% reduction in ETL failures and ensuring high availability for mission-critical healthcare analytics.

● Streamlined data workflow orchestration using SSIS for legacy systems, ensuring secure and reliable migration to modern cloud-based architecture, and minimizing downtime during the transition phase. Data Engineer, TCS Apr 2018 – Feb 2022 India

● Implemented a financial data warehouse using PostgreSQL and Amazon Redshift to integrate and standardize transactional, market, and credit datasets, improving data accessibility by 80% for analytics and compliance teams.

● Orchestrated scalable ETL pipelines with Apache Airflow, PySpark, and Python to ingest and transform large-scale datasets from trading systems, credit bureaus, and third-party APIs, reducing pipeline runtime by 75%.

● Optimized data models in Snowflake and Databricks by applying partitioning, indexing, and Z-Ordering techniques, accelerating daily P&L and risk analysis queries by 60%.

● Architected a cloud-native data lake using AWS S3, Glue, Lambda, and EMR, enabling processing of both structured and unstructured financial data and reducing storage costs by 30%.

● Implemented a data quality validation framework using Great Expectations, automating over 100 validation checks for schema compliance, transaction accuracy, and regulatory reporting aligned with SOX and Basel III requirements.

● Collaborated with financial analysts, data scientists, and compliance teams to define data contracts, standardize data dictionaries, and enable real-time analytics with improved data reliability and lineage.

● Deployed infrastructure and pipeline updates using CI/CD workflows with Docker, Terraform, and Git, resulting in a 40% reduction in deployment errors and improved operational efficiency. Education

Master of Science, Lindsey Wilson University Mar 2022 - Aug 2023 Kentucky, USA Computer Science and Information Technology

Bachelor of Technology, Telangana university Jun 2015 - Jul 2018 Telangana, India Computer Science

Certificates

● Azure DP-203 ● SnowPro Core, ● AWS Data Analytics Specialty

Contact this candidate