Vishal Reddy Data Engineer
TN, USA +1-901-***-**** ******.******@*****.*** LinkedIn
SUMMARY
Data Engineer with 4 years of experience delivering high-impact data solutions across healthcare and financial domains. Proven ability to optimize data workflows, reduce processing times, and improve data quality for business reporting and decision-making. Strong track record of working in fast-paced, Agile environments and collaborating with cross-functional teams to support large-scale data initiatives. Committed to delivering accurate, timely, and secure data to drive strategic outcomes.
PROFESSIONAL EXPERIENCE
Data Engineer Oct 2024 – Current
Elevance Health TN, USA
Developed PySpark and AWS Glue pipelines to process over 3 TB of member and claims data weekly, reducing data availability lag from 12 hours to 3 hours for downstream analytics teams.
Automated data workflows using Apache Airflow, achieving a 98% SLA adherence rate and reducing manual interventions during critical overnight processing windows by 40%.
Improved Redshift performance by 30% through restructuring data models, implementing clustering keys, and refining the partitioning strategy for high-volume tables.
Integrated 75+ dbt models with embedded data tests and documentation, ensuring accuracy and lineage tracking across financial reporting datasets used by the actuarial and operations teams.
Implemented CloudWatch monitoring and alerting across 20+ pipeline jobs, reducing troubleshooting time by 50% and enabling faster root-cause analysis during pipeline failures.
Contributed to HIPAA-compliant architecture by applying IAM-based role controls, KMS encryption at rest, and S3-level access policies for protected data assets.
Data Engineer Jan 2020 - Jan 2023
Hexaware Technologies India
Built PySpark batch ETL pipelines to process over 10M records/day from PostgreSQL, MongoDB, and MySQL, supporting unified sales and finance dashboards across three client projects.
Designed parameterized pipelines in Azure Data Factory for daily incremental loads, reducing end-to-end pipeline latency from 8 hours to under 2 hours for time-sensitive financial metrics.
Enabled cross-platform analytics by transforming semi-structured data and storing curated datasets in Amazon Redshift and Snowflake, improving dashboard response time by 35%.
Resolved 15+ recurring data quality issues through integration of Great Expectations for schema validation and data completeness checks across ingestion points.
Reduced ETL job failures by 45% through better error handling, automated retries, and visibility improvements using CloudWatch Logs and Python-based notifiers.
Delivered consistent contributions in Agile sprints via Jira, completing an average of 25+ story points per sprint, supporting 4 product owners across multiple data domains.
EDUCATION
Master's in Computer Science Dec 2024
The University of Memphis, Memphis, TN, USA
Bachelor's of technology in computer science and engineering May 2022
Gandhi Institute of Technology & Management, India
TECHNICAL SKILLS
Programming & Scripting: Python, SQL (Advanced), Bash, PySpark
Big Data Technologies: Hadoop, PySpark
ETL & Data Engineering Tools: Apache Airflow, AWS Glue, Apache Spark, dbt; experience building and maintaining
ETL/ELT pipelines
Cloud Platforms: AWS (S3, Glue, Redshift, Lambda, CloudWatch), Azure (Data Factory, Synapse), GCP (BigQuery –
basic)
Databases & Warehousing: Amazon Redshift, Snowflake, PostgreSQL, MySQL, MongoDB, data modeling (Star/Snowflake), partitioning, indexing
Data Formats: Parquet, JSON, Avro, CSV
DevOps & Automation: Git, Jenkins, GitHub Actions, Docker (intro level), basic Terraform
Data Quality & Monitoring: Great Expectations, dbt tests, CloudWatch, ELK Stack (basic)
Security & Compliance: IAM, data encryption (in transit/at rest), familiarity with HIPAA and GDPR
Visualization Tools: Power BI, Tableau, Amazon QuickSight (basic), MS Excel
Project Management & Methodologies: Agile, Jira
Machine Learning Algorithms: Linear Regression, Logistic Regression, Decision Trees, Random Forests, SVM, Naïve
Bayes, KNN, K-Means, Classification, Supervised & Unsupervised Learning
ML Libraries & Packages: NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, TensorFlow