Data Engineer Quality

Location:

St. Petersburg, FL

Posted:

September 10, 2025

Contact this candidate

Resume:

Megha Dharwad Chandrashekar

774-***-**** *******.**.***@*****.*** LinkedIn

Professional Summary

Data Engineer with 5+ years of experience designing and optimizing large scale ETL pipelines across AWS, Azure, and GCP ecosystems. Proven ability to process multi terabyte datasets using PySpark, Spark SQL, and Hive while ensuring performance, scalability, and data quality. Specialized in orchestrating workflows using Apache Airflow and Azure Data Factory, managing schema evolution via Avro, and enforcing enterprise grade security with IAM, Key Vault, and RBAC. Demonstrated success in improving Spark job performance by up to 40% and leading high impact data migrations and integration projects. Adept at collaborating with cross functional teams in Agile environments to deliver SLA compliant data solutions that enable downstream analytics and business reporting. Professional Experience

Nyu Langone Health – Data Engineer

Jun 2024 – present California, US

Project: Clinical Trial Data Integration

• Built 10+ production grade ETL pipelines on AWS EMR to process and integrate clinical trial data from 5+ source systems.

• Developed PySpark jobs transforming 1+ TB/day of structured and semi-structured data ingested from S3.

• Tuned Spark configurations, reducing job runtime by 30% and improving resource utilization by 25%.

• Managed GCP Dataproc clusters to meet SLA requirements for nightly transformations across 3+ regions.

• Implemented Hive schema evolution using Avro, ensuring compatibility with downstream BI tools.

• Created 20+ SQL based validation rules, improving data quality accuracy by 98%.

• Collaborated with 5+ cross functional team members to align ETL logic with clinical/reporting needs.

• Orchestrated daily ETL workflows using Airflow and enabled 50% faster failure detection via Azure Monitor.

Utthunga Technologies Pvt Ltd – Data Engineer

Feb 2021 – Jan 2024

Project: Advanced Data Migration Technologies

• Designed and implemented 12+ scalable PySpark ETL pipelines on Azure Databricks, migrating over 5 TB of data from MySQL to ADLS in Parquet format.

• Optimized Spark jobs by fine tuning shuffle partitions, caching, and joins, achieving up to 40% performance improvement and reducing migration timelines by 30%.

• Built ADF pipelines with Mapping Data Flows and custom activities to automate incremental and full data loads into Hive, supporting 50+ production tables.

• Automated end to end workflows with triggers, dependency chains, and retry logic in ADF, achieving a 99.5% job success rate.

• Developed and executed Python/SQL based validation scripts to ensure 100% data integrity, with automated reconciliation and error logging.

• Managed schema evolution in Hive using Avro, ensuring seamless integration with Power BI and improving reporting reliability by 25%.

• Applied enterprise grade security with RBAC, Azure Key Vault, and Managed Identities, eliminating plaintext secrets and reducing access related incidents by 80%.

• Authored and maintained 50+ pages of ETL design documentation in Confluence and used Git for version control to improve collaboration.

• Partnered with DBAs, QA teams, and business analysts (6+ stakeholders) to finalize data mappings and meet SLA driven transformation deadlines.

• Led Spark performance optimization workshops, reducing overall cluster costs by 20% across 3 projects.

• Supported Agile practices by setting up Jira projects, enabling sprint tracking, and improving team productivity by 25%.

Cognizant Technology Solutions – Software Engineer Sep 2019 – Dec 2020

Project: Gannett and Co. Inc.

• Contributed to PySpark ETL pipelines on AWS EMR, processing 1 TB+/week into S3 as Parquet.

• Built Apache NiFi workflows automating MySQL Hive/S3 ingestion, streamlining 2 TB migrations.

• Developed SQL validation scripts ensuring 99.5% post migration data consistency.

• Tuned Spark jobs, reducing execution time by 25%.

• Actively collaborated via Git (5+ reviews/sprint) and contributed to workflow documentation/best practices.

Education

University of Massachusetts – Dartmouth, MA

M.S. in Data Science (GPA: 4/4)

Technical Skills

• Big Data & Processing: Apache Spark, PySpark, Spark SQL, Hive, Sqoop, Apache Kafka, Apache NiFi, Flink, Delta Lake

• Cloud Platforms: AWS (EMR, S3, Databricks, EC2, CloudWatch, Glue, Redshift), GCP (Dataproc, GCS, BigQuery, Compute Engine), Azure (Data Factory, Azure Databricks, Synapse Analytics, Blob Storage, Virtual Machines)

• Programming & Scripting: Python, SQL, Shell Scripting, Scala (basic), C# (.Net, WPF), Bash, Java

(basic)

• Data Formats: Parquet, Avro, ORC, JSON, CSV, XML

• Databases & Warehousing: MySQL, Oracle, PostgreSQL, Snowflake, MongoDB, Cassandra, HBase, Redshift, BigQuery

• Orchestration & Workflow: Apache Airflow, Azure Data Factory, Luigi, Oozie

• DevOps & CI/CD: Git, GitHub Actions, Jenkins, Docker, Kubernetes (basic), Terraform (basic)

• Visualization & BI Tools: Power BI, Tableau

• Tools & Others: Confluence, JIRA, Linux, VS Code, Postman

• Security & Governance: IAM, RBAC, Azure Key Vault, Secrets Manager, Data Encryption

Contact this candidate