Data Engineer A

Location:

Bloomington, IN

Salary:

90000

Posted:

September 10, 2025

Contact this candidate

Resume:

Harsh Sunil Patel

Bloomington, IN 1-908-***-**** *************@*****.*** www.linkedin.com/in/hsp1

SUMMARY

Data Engineer with 3+ years of experience designing and optimizing large-scale data pipelines, ETL workflows, and cloud-native analytics solutions across healthcare and retail domains. Skilled in delivering regulatory-compliant, high-throughput data platforms using AWS, Azure Databricks, and Snowflake. Adept at collaboration, problem-solving, and cross-functional communication, with a proven track record of reducing data processing times by up to 45% and enabling real-time business insights for global stakeholders.

SKILLS

Languages & IDEs:

Python, SQL, PySpark, Scikit-learn

Libraries & ML:

NumPy, Pandas, SciPy, TensorFlow, ML Algorithms, Statistical Methods, Advanced Analytics, Data Mining

Visualization:

Tableau, Power BI, Plotly

Cloud & Big Data:

AWS (Glue, Redshift, S3, Athena, Lambda, Kinesis), Azure Databricks, Snowflake, Kafka, Hive

Databases:

MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, DynamoDB, Google BigQuery

ETL & BI Tools:

Apache Airflow, Informatica, Talend, SSIS

DevOps & VCS:

Docker, Kubernetes, Jenkins, Terraform, Git, GitHub

Data Governance:

Data Quality and Governance, Data Management, Compliance (GDPR, HIPAA)

Operating Systems:

Windows, Linux, Mac

WORK EXPERIENCE

Data Engineer Pfizer, IN, USA Aug 2024 – Present

Automated the ingestion of clinical trial datasets from S3 to Redshift using AWS Glue, enabling seamless integration of over 12 data sources, and reducing manual processing time by 45% across analytics teams.

Engineered scalable data transformation pipelines in PySpark within Azure Databricks to cleanse over 2 TB of patient-reported outcomes and lab datasets, ensuring consistency with CDISC and regulatory data standards.

Created 16 subject-level analytics datasets using Snowflake, leveraging dynamic partitioning and metadata tagging to facilitate real-time clinical monitoring, leading to improved protocol deviation detection across oncology trials.

Built and maintained 20+ production-grade Apache Airflow DAGs for scheduling extraction from PostgreSQL, MongoDB, and Oracle systems, ensuring timely data availability for machine learning model training and validation.

Developed interactive monitoring dashboards in Power BI and Plotly to visualize patient enrollment, visit trends, and dropout rates, supporting clinical operations teams across 5 global study locations.

Containerized micro-batch ETL services using Docker and orchestrated deployments with Jenkins, reducing release failure rates by 28% and improving deployment frequency across Pfizer’s cloud-native data platform.

Data Engineer Cognizant, India Jun 2021 – Jul 2023

Designed and executed parameterized ETL workflows in Informatica, integrating over 25 million retail transaction records daily from diverse POS systems into MySQL for downstream financial analytics with minimal latency.

Implemented schema validation, transformation logic, and error logging in Talend, ensuring 99.8% data accuracy across multi-source ingestion pipelines supporting cross-functional merchandising, sales & customer insights teams.

Modeled OLAP cubes in SQL Server Analysis Services (SSAS), incorporating calculated measures and aggregation strategies to reduce report generation time from 15 minutes to under 3 minutes for quarterly executive dashboards.

Optimized high-volume analytical queries in Hive through bucketing, partition pruning, and vectorized execution, achieving a 27% performance improvement for ad-hoc seasonal demand forecasting processes across retail operations.

Developed and deployed streaming ingestion framework using Kafka, configuring partition schemes and schema registry enforcement to handle over 120,000 replenishment alerts per hour from distributed warehouse hubs.

Built interactive sales and revenue dashboards in Tableau, leveraging row-level security and calculated fields to deliver store-specific KPIs without exposing confidential corporate or operational metrics to unauthorized users.

Engineered automated data load pipelines with SSIS, incorporating parallel processing and checkpoint features to improve nightly batch reliability by 18% while reducing recovery time in event of failures.

EDUCATION

Master of Science in Data Science Indiana University, Bloomington, IN, USA Aug 2023 - May 2025

Bachelor of Engineering in Mechanical Engineering University of Mumbai, India Aug 2018 - Jun 2022

Contact this candidate