Senior Data Engineer Cloud ETL/ELT, Big Data, MLOps

Location:

New York City, NY

Salary:

70000

Posted:

December 11, 2025

Contact this candidate

Resume:

Sai Rohith Tulasi

Arlington, TX 682-***-**** ************@*******.*** LinkedIn

SUMMARY

Data Engineer with 4+ years of experience building scalable, cloud-native ETL/ELT pipelines and data architectures that improve data reliability, reduce latency, and speed ingestion across Snowflake, Redshift, and Azure Synapse. Skilled in Big Data tools like Apache Spark, Hadoop, Kafka, Hive, and Airflow to handle large-scale workflows efficiently. Led automation of data validation, monitoring, and CI/CD pipelines with Jenkins, Airflow, and Terraform, reducing errors and deployment times while embedding security in the SDLC. Experienced in accelerating ML workflows using feature stores, MLflow, and orchestration, and driving cloud migrations with Kubernetes and Terraform to boost performance and cut costs. Proficient with AWS services (S3, Glue, Lambda, Kinesis, Redshift) for scalable, serverless, event-driven pipelines and in building REST APIs for real-time and batch ingestion using Kafka and MongoDB. Collaborative Agile team player focused on delivering data solutions that drive business growth and improve operational efficiency.

SKILLS

• Big Data & Distributed Computing: Apache Spark, Hadoop, Hive, HDFS, MapReduce, Kafka, Airflow, Databricks, Azure Data Factory, Talend, PySpark

• Cloud Platforms & Data Services: AWS (S3, Glue, Lambda, Kinesis, Redshift), Azure Data Lake, Azure Synapse Analytics, AWS Data Stack

• Programming & Scripting Languages: Python, SQL, Java, Scala, TensorFlow

• Data Warehousing & Storage Solutions: Snowflake, Redshift, Azure Synapse, Databricks, File Formats Experience

• DevOps, Automation & Containerization: Terraform, Jenkins, Docker, Kubernetes, CI/CD, Git, DevSecOps

• Databases & Data Management: PostgreSQL, MySQL, MongoDB, Oracle

• Data Engineering & Machine Learning Operations (MLOps): Data Modeling, ETL/ELT Pipeline Design, Feature Stores, MLflow, Azure ML, Data Quality & Validation Automation, Data Engineering Ecosystems, AI Automation, Statistical Data Analysis

• Tools & Methodologies: Tableau, Jira, Confluence, Agile/Scrum, REST API Development, Investment Banking Experience EXPERIENCE

Clairvoyant Jul 2024 - Present

Data Engineer

• Led migration of legacy batch pipelines from HDFS and MapReduce to near real-time processing using AWS Glue ETL, S3, and Redshift, reducing data latency by 70% and enhancing analytics for 10+ business teams.

• Engineered and maintained scalable cloud-native ETL/ELT pipelines using Databricks, Spark, Python, SQL, and Azure Data Factory, processing over 5TB daily with 99.9% reliability and 35% faster ingestion.

• Enhanced data quality by implementing automated validation and cleansing with PySpark, Python, and MongoDB, reducing down- stream errors by 50% and mitigating risks of manual mistakes.

• Architected and optimized data models and warehouses on Snowflake, Azure Data Lake, Redshift, and Oracle, improving query performance by 40%, ensuring data integrity, and reducing costs by 30%.

• Automated infrastructure provisioning with Terraform and CI/CD pipelines using Git and Jenkins, cutting deployment times by 40% and reducing pipeline failures by 20%.

• Optimized complex SQL queries and Spark jobs using T-SQL, Scala, and TensorFlow, boosting system reliability and lowering processing times by 15%.

• Constructed scalable machine learning feature pipelines leveraging Databricks, Azure ML, and MLflow, improving feature availability and model retraining efficiency by 30%.

• Collaborated with data scientists and stakeholders to deploy reusable feature stores in Databricks and orchestrate Airflow workflows, accelerating model training by 30%.

Avenir Technologies Mar 2019 - Jul 2022

Associate Data Engineer

• Developed and optimized SQL and Python scripts for data extraction, transformation, and feature engineering using Spark, improving query efficiency by 35% and accelerating machine learning model deployment cycles by 20%.

• Constructed scalable data lakes and pipelines with Talend, Azure Data Lake, AWS, Azure, and Databricks, boosting processing throughput by 40% while ensuring secure multi-structured storage.

• Designed robust data models and dimensional schemas with Snowflake, Redshift, and Databricks, enabling faster query performance and reducing report generation times by 30% for analytics and BI teams.

• Implemented automated data quality monitoring and validation frameworks using Jenkins CI/CD and Airflow, incorporating security best practices to reduce data errors by 25%.

• Led on-premises to hybrid cloud migration efforts using Kubernetes and Terraform; established Jenkins dashboards for pipeline monitoring and optimized Spark jobs, cutting runtime by 35% and costs by 15%.

• Engineered REST APIs in Java and Python for real-time and batch data ingestion, facilitating system interoperability and near real-time analytics with Kafka, MongoDB, and event-driven architecture.

• Drove MLOps initiatives by integrating automated model testing, monitoring, and retraining workflows with MLflow, Azure Machine Learning, Hadoop, Apache Spark, and Hive, improving model accuracy and operational uptime.

• Collaborated with product teams, ML engineers, and analysts in Agile SCRUM to translate requirements into effective solutions, enhancing customer segmentation accuracy by 20% and achieving 95% sprint delivery success.

• Mentored junior engineers on Python, SQL, DevOps tools (Git, JIRA, Confluence), data engineering fundamentals, and cloud architecture, elevating code quality and accelerating team productivity. EDUCATION

Kakatiya University

Bachelor of Science, Computer Science

University of Texas at Arlington, Arlington, TX

Master of Science, Computer Science

Contact this candidate