Data Engineer Machine Learning

Location:

Texas City, TX

Posted:

October 15, 2025

Contact this candidate

Resume:

Tejeswar Raju Vempalli

Data Engineer

Houston, TX – 77077 *****************@*****.*** 216-***-**** linkedin.com/in/tejeswar-vempalli/ SUMMARY

Data Engineer with 5+ years of experience designing and implementing scalable data pipelines and architectures in the healthcare and technology sectors. Experienced in ETL development, data warehousing, and deploying cloud-based solutions using Python, SQL, Apache Spark, and AWS. Works closely with data scientists, analysts, and business teams to deliver high-quality, analysis-ready data. Skilled in automating data workflows, maintaining data integrity, and optimizing system performance in Agile development settings. Strong focus on solving complex data issues and applying data governance and security best practices.

SKILLS

Languages: Python, C++, R programming, SQL

Clouds: AWS (Glue, EC2, S3, Lambda, DynamoDB, Redshift, Kinesis), Azure (Synapse Analytics, Data Factory, Azure MySQL, Azure Data Lake, EventHub, Databricks), GCP Databases & BigData: SQL Server, MySQL, PostgreSQL, Oracle, MongoDB, Pyspark, MapReduce, Kafka, SSIS, SSMS, Talend, Airflow, DBT, Informatica, Apache Flink, Splunk Machine Learning: Supervised & Unsupervised Learning, Neural Networks, NLP, Time-series analysis DevOps: CI/CD, GitHub, Terraform, Ansible, Jenkins, Docker, Kubernetes Visualizations: Tableau, SAS, Google, PowerBI, MS Excel Environments: SDLC, Agile, Scrum, Waterfall, Windows, Mac OS, Linux EXPERIENCE

Data Engineer

CVS Health, USA Oct 2023 – Present

Designed, developed, and maintained ETL pipelines to ingest and transform 500M+ healthcare records from diverse sources into AWS Redshift using AWS Glue and S3, improving data accessibility across departments.

Partnered with analysts, data scientists, and product teams to gather data requirements and delivered scalable data architectures that supported real-time analytics and reporting.

Optimized SQL queries and Spark jobs, reducing processing time by 40% and lowering cloud compute costs by 20%.

Implemented automated data validation and quality checks, increasing data accuracy by 99.7% and minimizing downstream errors in reporting tools.

Used AWS CloudWatch and structured logging to monitor pipeline health and proactively resolve failures, maintaining 99.9% data pipeline uptime.

Automated 70% of manual data processing tasks using Python scripts and Airflow DAGs, freeing up analyst time and increasing pipeline deployment speed.

Contributed to Agile development cycles, participating in sprint planning, daily stand-ups, and peer code reviews, ensuring continuous delivery of data engineering features.

Created and maintained detailed documentation for data models, ETL workflows, and operational procedures, improving team onboarding time by 30%.

Tech Mahindra, India Nov 2018 – Aug 2021

Supported development and maintenance of ETL workflows to extract data from 10+ client systems into enterprise data lakes and warehouses, enabling unified analytics across business units.

Wrote advanced SQL queries and Python scripts to clean, transform, and aggregate data, improving report accuracy and reducing data processing errors by 40%.

Assisted in migrating legacy on-prem pipelines to Azure Data Factory and Databricks, reducing infrastructure costs by 20% and improving scalability.

Resolved data inconsistencies by collaborating with data analysts and business users, increasing data quality scores by 30%.

Created ETL job monitoring dashboards and automated alerts using Azure Monitor and Python, reducing incident response times by 35%.

Participated in data modeling efforts and contributed to the design of star and snowflake schemas, optimizing data for self-service BI tools.

Developed unit tests and validation scripts to ensure end-to-end data integrity, catching critical data issues before production deployment.

Authored documentation including data pipeline process flows, architecture diagrams, and deployment guides, improving team onboarding efficiency by 25%.

Worked with DevOps teams to automate pipeline deployment using Git and CI/CD tools (e.g., Azure DevOps), reducing manual release errors.

Implemented incremental data loading strategies in ADF and Spark, reducing pipeline runtime by 30% and improving resource utilization.

Contributed to defining data governance policies and implemented compliance checks to meet internal and external data security standards (e.g., HIPAA, GDPR).

EDUCATION

Masters in Computer Science Rivier University at Nashua, NH Aug 2023 CERTIFICATIONS

• Achieved Databricks Certified Data Engineer Associate.

• Achieved AWS Certified Solutions Architect Associate.

Contact this candidate