Data Engineer Quality

Location:

Arlington, TX, 76013

Posted:

October 15, 2025

Contact this candidate

Resume:

PALLAVI Data Engineer

********@*****.*** +1-817-***-**** LinkedIn:www.linkedin.com/in/pallavi-y203874 SUMMARY

Data Engineer with 3+ years of experience in AWS, Azure, Snowflake, and Databricks. Skilled at building ETL/ELT pipelines, streaming solutions, and data warehouses to reduce costs, improve data quality, and enable self-service analytics. Delivered measurable impact such as 20% cost savings, 25% faster reporting, and 98% accuracy in validation. TECHNICAL SKILLS

Programming: Python (Pandas, PySpark), SQL, Shell

Big Data: Spark, Hadoop, Hive, Kafka, Airflow

Cloud: AWS (S3, Redshift, Glue, Lambda, DMS), Azure (Data Factory, Databricks, Synapse)

Databases/Warehousing: Snowflake, Redshift, PostgreSQL, MySQL, Oracle

Data Quality & Governance: Great Expectations, IAM, KMS

Visualization: Power BI, Tableau

DevOps/CI-CD: Git, Jenkins, GitHub Actions, CodePipeline EXPERIENCE

Purevisitx Austin, TX

AWS Data Engineer Jan 2025 – Present

Designed and deployed end-to-end ETL workflows with PySpark, Airflow, and AWS Glue, cutting batch times by 40%.

Migrated on-prem datasets into AWS S3 and Redshift using DMS, saving 20% infra costs annually.

Partnered with analysts to expose self-service datasets on Redshift Spectrum and Athena, improving compliance reporting speed by 25%.

Implemented partitioning, compression, and sort keys in Redshift queries ran in minutes instead of hours.

Automated data validation & schema checks via Python, Lambda, Glue Crawlers, achieving 98% accuracy.

Built CI/CD workflows using Jenkins, GitHub Actions, and AWS CodePipeline for zero-downtime releases.

Configured role-based access controls and encryption with IAM & KMS for governance and compliance.

Integrated CloudWatch, CloudTrail, SNS alerts for proactive job monitoring and troubleshooting.

Created workflow documentation and lineage diagrams in Confluence for cross-team usage.

Delivered ad-hoc SQL queries on Redshift & Athena for regulatory and compliance teams. Virtusa Hyderabad, India

Big Data Engineer Jul 2022 – Jun 2023

Built real-time streaming pipelines with Kafka, Spark Streaming, and Azure Event Hubs (<2s latency).

Integrated Azure Data Lake Storage, APIs, and RDBMS into Synapse & Snowflake for centralized analytics.

Designed reusable PySpark/Scala scripts on Databricks reduced manual effort by 60%.

Tuned Spark jobs with dynamic partitioning & adaptive query execution, cutting compute cost by 15%.

Implemented Great Expectations in ADF pipelines boosted accuracy to 99%.

Automated pipeline orchestration using Airflow and ADF DAGs with retries and recovery.

Delivered Power BI & Tableau datasets from Synapse/Snowflake for BI teams.

Built monitoring dashboards with Grafana, Prometheus, Azure Monitor.

Migrated legacy Informatica ETL to PySpark/Databricks reduced licensing costs.

Documented architecture, lineage, and coding standards in Confluence/DevOps Wiki. DBS Bank Mumbai, India

Associate Data Engineer Jan 2021 – Jun 2022

Developed ETL jobs in Python, PySpark, SQL using AWS Glue & Azure Data Factory.

Reduced manual SQL work by 50% with Hive/Spark SQL queries.

Supported warehouse modeling in AWS Redshift & Azure Synapse for risk reporting.

Performed data profiling & cleansing in Python & PySpark for improved reliability.

Automated error handling & logs with Python & AWS Lambda.

Configured Sqoop & ADF copy jobs to move data into HDFS/ADLS.

Optimized MapReduce & Spark jobs via partitioning and caching.

Created basic shell scripts & CloudWatch alerts to monitor pipelines.

Assisted in migration from on-prem Hadoop to AWS/Azure data lakes.

Participated in Agile sprints & code reviews, learning industry best practices. EDUCATION

M.S. Computer & Information Science – University of Texas at Arlington

B.E. Computer Science – Sphoorthy Engineering College

Contact this candidate