Data Engineer II

Location:

Little Elm, TX

Salary:

85000

Posted:

October 15, 2025

Contact this candidate

Resume:

Harshavardhan Reddy A

Email: ***************@*****.***

PH: +1-201-***-****

Data Engineer

Professional Summary

Results-driven Data Engineer with 3+ years of experience designing, building, and optimizing large-scale data platforms across AWS, Google Cloud Platform, and Microsoft Azure environments. Proven track record in developing robust ETL pipelines, automating real-time data workflows, and enabling actionable insights for enterprise applications in the healthcare and financial sectors. Proficient in Python, SQL, PySpark, and orchestration tools like Airflow and Terraform, with a strong emphasis on performance tuning, cost optimization, and data integrity. Adept at aligning data architecture with compliance standards such as HIPAA and financial regulations, ensuring secure, scalable solutions that transform raw data into strategic business value

Technical Skills:

Languages Python, SQL, PySpark, Spark SQL, HTML, JavaScript, CSS Big Data & Data

Processing

Apache Spark, Databricks, Apache Hive, Apache Kafka, Apache Airflow, AWS Glue, EMR, Dataproc, Dataflow

Data Warehousing Snowflake, SQL Server Integration Services (SSIS), Redshift, BigQuery, Azure Synapse, ADF

Cloud AWS (Glue, EMR, Redshift, S3, Lambda, EC2, RDS, DynamoDB), GCP

(BigQuery, Dataflow, Dataproc), Azure (Data Factory, Synapse, Azure SQL) Database MySQL, SQL Server, Amazon RDS, Redshift, DynamoDB OS Ubuntu Linux, UNIX, Windows Server (2008–2016), macOS Reporting Tools Power BI, Tableau

DevOps & CI/CD Git, GitHub Actions, Jenkins, Terraform, Docker, Kubernetes Professional Experience:

Data Engineer ll JULY 2024 to PRESENT

Client: Vanguard

Responsibilities:

• Designed and deployed distributed ETL pipelines using AWS Glue, PySpark, and EMR, processing high-volume financial data including transactions, account activity, and portfolio details.

• Optimized Spark jobs on EMR, reducing historical trade data processing time by 35%, accelerating insights for risk and investment teams.

• Built real-time and batch ingestion workflows into Amazon Redshift, powering portfolio analytics, financial reports, and executive dashboards.

• Developed custom data validation frameworks with Python and Great Expectations, enforcing critical compliance and business rules across datasets.

• Automated alerting and anomaly detection using AWS Lambda, enhancing fraud monitoring capabilities with real- time triggers.

• Managed structured and semi-structured data in Amazon S3, applying encryption, partitioning, and lifecycle policies to optimize cost and performance.

• Implemented CI/CD pipelines using Jenkins and Terraform, automating infrastructure deployment and job releases in a secure, audit-friendly environment.

• Applied fine-grained IAM roles, KMS encryption, and audit logging to ensure secure access to sensitive financial data, aligning with regulatory frameworks.

• Mentored junior engineers on AWS architecture best practices, PySpark tuning, and financial data modeling techniques

Environment: Python, SQL, PySpark, Spark SQL, AWS (S3, Glue, DynamoDB, EMR, Lambda, EC2, Redshift, KMS), Great Expectations, Git, Jenkins, Terraform, CI/CD.

Data Engineer I DEC 2022 to JUNE 2024

Client: Centene

Corporation

Responsibilities:

• Designed scalable ETL workflows using GCP Dataproc, BigQuery, and PySpark, processing high-volume claims, eligibility, and provider data.

• Migrated batch workloads from on-prem to Dataproc, reducing processing time by 50% and lowering compute costs using preemptible VMs.

• Developed event-driven pipelines with Cloud Functions, enabling near real-time ingestion of pharmacy transactions and member updates

• Built analytical datasets in BigQuery for quality metrics, patient outreach, and care coordination dashboards, significantly improving decision-making for clinical teams.

• Monitored pipeline performance and latency using Stackdriver, reducing unplanned downtime and SLA violations by over 40%.

• Engineered ingestion and transformation pipelines in Azure Data Factory, integrating third-party partner data and EHR extracts.

• Modeled structured datasets in Azure Synapse and Azure SQL to support centralized reporting and analytics across departments.

• Implemented encryption, versioning, and lifecycle policies in GCS for secure, HIPAA-compliant storage.

• Used Terraform to automate infrastructure provisioning across GCP and Azure, standardizing deployment pipelines and reducing manual configuration errors.

• Partnered with compliance, QA, and data governance teams to ensure workflows met healthcare regulatory requirements.

• Collaborated closely with data quality, compliance, and analytics teams to align cloud data workflows with healthcare regulatory and operational needs.

Environment: Python, SQL, PySpark, GCP (Dataproc, BigQuery, Cloud Functions, Cloud Storage, Stackdriver), Azure (Data Factory, Synapse, Blob Storage, Azure SQL, Azure Monitor), Terraform, Kafka, GitHub Actions, CI/CD Software Developer

Client: Onpassive, Hyderabad, TG, India JAN 2022 to AUG 2022 Responsibilities:

• Designed and developed data warehouse schemas and ETL processes for transportation data, supporting business analytics and reporting.

• Created and optimized SQL queries and stored procedures to improve database performance and maintain data integrity.

• Developed web-based dashboards and reports using JavaScript and Tableau to visualize key business metrics.

• Participated in the development and deployment of backend services using C++ and SQL Server.

• Implemented data validation and quality checks within the ETL processes to ensure data accuracy and reliability.

• Collaborated with cross-functional teams to gather requirements and translate them into effective data solutions.

• Developed REST APIs using Node.js to expose data from the database to other applications.

• Contributed to the design and implementation of user interfaces using HTML, CSS, and JavaScript. Environment: HTML, CSS, C, C++, SSIS, SSRS, SQL Server, MySQL, Oracle, Tableau, MongoDB, CI/CD, Git, Jenkins, Docker, JavaScript (UX/UI).

Education Details:

Bachelor's Degree

Information Technology – Malla Reddy College of Engineer and Technology [Hyderabad, India] Master's Degree

Master’s in computer science – Lindsey Wilson College, Kentucky. Projects:

Automated Data ETL Pipeline for Healthcare Data Integration

• Developed an automated ETL pipeline using Python and Apache Airflow to integrate and streamline disparate healthcare data sources for improved analytics.

• Designed and implemented an automated ETL pipeline using Apache Airflow, Spark, and Snowflake to integrate and process healthcare data, improving data reliability and reducing processing time by X%. Online Payment Fraud Detection using Machine Learning in Python

• Built and deployed a real-time fraud detection model using ML algorithms (Random Forest, XG Boost) in Python, improving fraud detection accuracy while reducing false positives.

• Designed a scalable ETL pipeline integrating Kafka, Spark Streaming, and Snowflake/Redshift for real-time transaction monitoring and fraud alerting.

Contact this candidate