Post Job Free
Sign in

Data Engineer Machine Learning

Location:
Rolla, MO, 65401
Salary:
75000
Posted:
September 10, 2025

Contact this candidate

Resume:

JYOTHIRADITYA GARIKIPATI

Data Engineer

Missouri, USA 573-***-**** *************.***@*****.*** LinkedIn SUMMARY

Data Engineer with 3+ years of experience designing and optimizing data pipelines across cloud and on-premise environments. Skilled in building scalable ETL/ELT workflows using Python, SQL, Spark, and Airflow, with proven expertise in AWS, Snowflake, and GCP. Experienced in healthcare and e-commerce domains, enabling secure, HIPAA-compliant, and audit-ready data solutions. Strong background in data modeling, governance, and real-time streaming with Kafka and Kinesis. Adept at collaborating with cross-functional teams to deliver analytics-ready datasets, optimize cloud infrastructure costs, and support machine learning pipelines through feature engineering and automation. SKILLS

• Methodologies: SDLC, Agile, Waterfall

• Languages & Scripting: Python, SQL, Bash, Scala

• Big Data & ETL Tools: Apache Spark (PySpark), Hive, Kafka, Airflow, AWS Glue, NiFi

• Cloud Platforms: AWS (S3, Redshift, EMR, Lambda, Kinesis), Azure, GCP (BigQuery, Cloud Functions)

• Data Warehousing: Snowflake, Redshift, BigQuery, PostgreSQL, MySQL

• Workflow Orchestration: Apache Airflow, Prefect, Luigi

• DevOps & CI/CD: Docker, Jenkins, GitHub Actions, Terraform

• Data Modeling: Star/Snowflake Schema, SCD, Normalization

• Data Governance & Quality: Great Expectations, Data Catalogs, dbt, AWS Glue Data Catalog

• Others: REST APIs, JSON, Parquet, Avro, FHIR, HL7, Jira, Confluence PROFESSIONAL EXPERIENCE

Data Engineer Jan 2025 – Current

Syneos Health US

• Engineered scalable ETL/ELT pipelines using PySpark and AWS Glue to process 15+ TB of EMR and claims data, ensuring HIPAA compliance and secure PHI handling.

• Designed real-time ingestion systems with Kafka, Kinesis, and Lambda, enabling ICU patient vitals monitoring with automated alerting and FHIR-compliant record updates.

• Modeled healthcare datasets in Snowflake using SCD Type 2 and dimensional schemas, empowering clinical decision-making and advanced analytics.

• Automated 100+ workflows in Apache Airflow, implementing SLA monitoring, Slack alerts, and email notifications to improve reliability and reduce manual oversight.

• Established data quality frameworks using Great Expectations, ensuring consistent accuracy and trust in downstream analytics.

• Developed CI/CD pipelines with Jenkins, GitHub Actions, and Docker, reducing deployment time for Spark jobs by 40%.

• Implemented IAM policies, KMS encryption, and AWS Glue Data Catalog for secure, compliant data management.

• Partnered with Data Scientists to build feature pipelines for patient risk prediction models, enabling faster ML experimentation.

• Optimized AWS and Snowflake usage, cutting cloud infrastructure costs by 25%.

• Containerized Spark applications and deployed on AWS EKS (Kubernetes) for elastic, scalable processing. Data Engineer July 2021 – July 2023

Cybage Software India

• Built and maintained data pipelines using Apache Spark, NiFi, and Python to process 8+ TB/month of clickstream, IoT sensor, and e- commerce transaction data.

• Designed star and snowflake schemas in BigQuery and PostgreSQL, improving BI query performance by 50% and accelerating insights for reporting dashboards.

• Orchestrated ETL workflows with Apache Airflow, including REST API ingestion, S3-to-Redshift transfers, and SLA-driven error handling.

• Developed real-time streaming pipelines using Kafka and Spark Structured Streaming for fraud detection, operational alerts, and log analytics.

• Standardized transformation rules across staging, warehouse, and reporting layers using dbt and pytest for automated testing.

• Provisioned and automated infrastructure with Terraform, ensuring reproducible environments and compliance.

• Managed metadata and lineage with Apache Atlas, improving traceability and governance for enterprise data assets.

• Migrated legacy on-prem ETL jobs to GCP BigQuery and Cloud Functions, modernizing infrastructure and reducing latency.

• Delivered real-time dashboards in Looker/Tableau to provide business teams with live KPI tracking.

• Mentored junior engineers on Spark tuning, Airflow DAG design, and Terraform best practices, improving overall team efficiency. EDUCATION

Master In Information Technology Aug 2023 - May 2025 Missouri University of Science and Technology, USA Bachelor of Technology Aug 2019 - April 2023

Velagapudi Ramakrishna Siddhartha Engineering College, AP, India



Contact this candidate