Data Engineer Big

Location:

Posted:

October 19, 2025

Resume:

Koteswar Enamadni

New Haven, CT 475-***-**** *********.**@*****.*** LinkedIn GitHub Portfolio Data Engineer Big Data Developer AI & ML Engineer PROFESSIONAL SUMMARY

Results-driven Senior Data Engineer and AI/ML Engineer with 5 years of experience designing, building, and optimizing cloud-native data platforms, big data pipelines, and MLOps solutions across healthcare, aviation, and telecom domains. Specialized in AWS, Azure, Snowflake, and Redshift for architecting scalable data lakes, warehouses, and real-time analytics pipelines processing millions of records weekly. Skilled in PySpark ETL/ELT development, CDC workflows, Airflow orchestration, and Terraform-based infrastructure automation. Experienced in MLOps, model deployment, and monitoring using TensorFlow, SageMaker, and MLflow. Strong focus on data governance, data quality (Great Expectations), HIPAA/HEDIS compliance, and SQL performance optimization for enterprise-grade AI and analytics initiatives.

TECHNICAL SKILLS

PROFESSIONAL EXPERIENCE

Data Engineer Optum, CT 08/2024 – Present

Value-Based Care Analytics Platform: AWS & Snowflake Healthcare Data Platform – Designed and managed an enterprise-scale healthcare analytics platform integrating AWS S3, Glue, Redshift, and Snowflake to unify claims, clinical, and provider data for value-based care initiatives.

● Architected and deployed high-performance ETL/ELT pipelines in Python/SQL on AWS Glue and Snowflake, processing 5M+ healthcare records weekly from 10+ EHR/payer systems.

● Optimized Snowflake data warehouses using clustering keys and materialized views, boosting query performance for critical HEDIS quality and regulatory reports.

● Engineered Change Data Capture (CDC) workflows using Snowflake Tasks/Streams for incremental data refresh, ensuring near real-time analytics capability.

● Secured PHI data with RBAC, encryption, and tokenization, ensuring strict HIPAA and HEDIS compliance.

● Migrated legacy workloads from on-premise SQL Server to Snowflake and AWS Redshift, cutting infrastructure costs.

● Implemented Infrastructure-as-Code using Terraform and AWS CodePipeline, and integrated CloudWatch/SNS alerting, reducing pipeline downtime.

● Partnered with the Data Science team to deliver AI/ML-ready datasets for risk adjustment and predictive models.

Environment: AWS S3, Glue, Redshift, EMR, Lambda, CloudWatch, Snowflake, PySpark, SQL, Terraform, Great Expectations, Power BI, Confluence, Agile/Scrum.

Cloud & MLOps AWS (S3, EMR, Glue, Redshift, SageMaker, CloudFormation), Azure (Data Factory, Databricks, Synapse, ADLS Gen2), Snowflake, GCP, Terraform, Docker, Kubernetes, MLflow, Airflow, GitHub Actions, Azure DevOps. Big Data & AI/ML Apache Spark (PySpark/Scala), Apache Kafka, Delta Lake, TensorFlow, PyTorch, scikit-learn, Pandas, NumPy, XGBoost, Generative AI. Data & Programming Python, Scala, SQL, PL/SQL, PostgreSQL, MySQL, Oracle, MongoDB, Cassandra, Hive, HDFS, REST APIs, Microservices.

BI & Quality Great Expectations, Power BI, Tableau, Looker, Confluence, Git, JIRA. Data Engineer TCS – Hyd, IN 01/2022 – 07/2023

Azure & Databricks Cloud Data Lake for Telecom Analytics.

● Engineered large-scale ETL pipelines using Azure Data Factory to ingest over 3TB of telecom data daily into ADLS Gen2 from diverse sources (APIs, SFTP, on-premise).

● Created scalable PySpark transformation jobs in Azure Databricks, implementing Delta Lake ACID transactions for CDC.

● Modeled analytical datasets in Azure Synapse Analytics, optimizing star schemas to support BI and ad-hoc query performance for churn prediction models.

● Applied Azure Key Vault for encryption key management and implemented RBAC to secure sensitive data, ensuring GDPR compliance.

● Deployed infrastructure with Terraform and managed CI/CD with Azure DevOps to enable repeatable, version-controlled environment provisioning.

● Delivered 40+ Power BI dashboards with live Synapse connections for operational and financial KPI monitoring.

Environment: Azure Data Factory, Azure Databricks, Azure Synapse, Delta Lake, ADLS Gen2, PySpark, SQL, Azure DevOps, Key Vault, Terraform, Great Expectations, Power BI, GDPR. Data Engineer Corizo 06/2020 – 12/2021

AI workflow and Data Pipelines for Analytics & AI Applications.

● Developed and maintained end-to-end data ingestion pipelines using Apache Airflow and Python for both analytics and ML workflows.

● Integrated streaming and batch data into AWS S3 and transformed datasets via PySpark before modeling them in Snowflake to support AI-driven recommendation engines.

● Engineered feature engineering pipelines for AI/ML models, integrating data from multiple relational and NoSQL databases.

● Automated CI/CD using GitHub Actions and Docker for PySpark job deployments, significantly improving deployment reliability.

● Designed partitioning and compression strategies, reducing S3 storage costs by 30%.

● Implemented Great Expectations for schema validation and built REST APIs to expose curated datasets to downstream analytics platforms.

Environment: AWS S3, Snowflake, PySpark, DBT, SQL, Airflow, Docker, GitHub Actions, Power BI, Tableau, Glue, Great Expectations, Confluence, Agile/Scrum. EDUCATION

Master of Science – MS, Data Science University of New Haven – CT, USA Recipient of “Dean’s Scholarship award”

Certifications

● Generative AI Fundamentals Databricks

● AWS Educate Introduction to Generative AI AWS

● Data Science Foundations - Level 2 (V2) IBM

Contact this candidate