PALLAVI Data Engineer
********@*****.*** +1-817-***-**** LinkedIn:www.linkedin.com/in/pallavi-y203874 SUMMARY
Data Engineer with 3+ years of experience in AWS, Azure, Snowflake, and Databricks. Skilled at building ETL/ELT pipelines, streaming solutions, and data warehouses to reduce costs, improve data quality, and enable self-service analytics. Delivered measurable impact such as 20% cost savings, 25% faster reporting, and 98% accuracy in validation. TECHNICAL SKILLS
Programming: Python (Pandas, PySpark), SQL, Shell
Big Data: Spark, Hadoop, Hive, Kafka, Airflow
Cloud: AWS (S3, Redshift, Glue, Lambda, DMS), Azure (Data Factory, Databricks, Synapse)
Databases/Warehousing: Snowflake, Redshift, PostgreSQL, MySQL, Oracle
Data Quality & Governance: Great Expectations, IAM, KMS
Visualization: Power BI, Tableau
DevOps/CI-CD: Git, Jenkins, GitHub Actions, CodePipeline EXPERIENCE
Purevisitx Austin, TX
AWS Data Engineer Jan 2025 – Present
Designed and deployed end-to-end ETL workflows with PySpark, Airflow, and AWS Glue, cutting batch times by 40%.
Migrated on-prem datasets into AWS S3 and Redshift using DMS, saving 20% infra costs annually.
Partnered with analysts to expose self-service datasets on Redshift Spectrum and Athena, improving compliance reporting speed by 25%.
Implemented partitioning, compression, and sort keys in Redshift queries ran in minutes instead of hours.
Automated data validation & schema checks via Python, Lambda, Glue Crawlers, achieving 98% accuracy.
Built CI/CD workflows using Jenkins, GitHub Actions, and AWS CodePipeline for zero-downtime releases.
Configured role-based access controls and encryption with IAM & KMS for governance and compliance.
Integrated CloudWatch, CloudTrail, SNS alerts for proactive job monitoring and troubleshooting.
Created workflow documentation and lineage diagrams in Confluence for cross-team usage.
Delivered ad-hoc SQL queries on Redshift & Athena for regulatory and compliance teams. Virtusa Hyderabad, India
Big Data Engineer Jul 2022 – Jun 2023
Built real-time streaming pipelines with Kafka, Spark Streaming, and Azure Event Hubs (<2s latency).
Integrated Azure Data Lake Storage, APIs, and RDBMS into Synapse & Snowflake for centralized analytics.
Designed reusable PySpark/Scala scripts on Databricks reduced manual effort by 60%.
Tuned Spark jobs with dynamic partitioning & adaptive query execution, cutting compute cost by 15%.
Implemented Great Expectations in ADF pipelines boosted accuracy to 99%.
Automated pipeline orchestration using Airflow and ADF DAGs with retries and recovery.
Delivered Power BI & Tableau datasets from Synapse/Snowflake for BI teams.
Built monitoring dashboards with Grafana, Prometheus, Azure Monitor.
Migrated legacy Informatica ETL to PySpark/Databricks reduced licensing costs.
Documented architecture, lineage, and coding standards in Confluence/DevOps Wiki. DBS Bank Mumbai, India
Associate Data Engineer Jan 2021 – Jun 2022
Developed ETL jobs in Python, PySpark, SQL using AWS Glue & Azure Data Factory.
Reduced manual SQL work by 50% with Hive/Spark SQL queries.
Supported warehouse modeling in AWS Redshift & Azure Synapse for risk reporting.
Performed data profiling & cleansing in Python & PySpark for improved reliability.
Automated error handling & logs with Python & AWS Lambda.
Configured Sqoop & ADF copy jobs to move data into HDFS/ADLS.
Optimized MapReduce & Spark jobs via partitioning and caching.
Created basic shell scripts & CloudWatch alerts to monitor pipelines.
Assisted in migration from on-prem Hadoop to AWS/Azure data lakes.
Participated in Agile sprints & code reviews, learning industry best practices. EDUCATION
M.S. Computer & Information Science – University of Texas at Arlington
B.E. Computer Science – Sphoorthy Engineering College