Devops Engineer

Location:

Scottsdale, AZ

Posted:

June 05, 2025

Contact this candidate

Resume:

Udaya Mehta

DATA ENGINEER

+1-860-****-**** **********@*****.*** ln www.linkedin.com/in/udaya-mehta-03725341 ABOUT ME

Skilled and results-oriented Data Engineer and Data Architect with 18+ years of IT experience, including 10+ years in Big Data and 5+ years in Data Engineering, specializing in building and optimizing scalable data pipelines and architectures for both batch and real-time data processing.

TECHNICAL SKILLS

Programming languages Python, SQL, Shell Scripting, Pyspark Big Data Frameworks Hadoop, HDFS, Spark, Hive, MapReduce, Sqoop, Oozie Cloud Platform Google Cloud Platform (GCP)

Google Cloud Platform (GCP) services Databricks, Google Cloud Storage (GCS), BigQuery, Dataproc Relational Databases MySQL, DB2, Oracle, MS SQL Server NoSQL Databases HBase

Control Systems & Documentations Git, GitHub, Jira, Confluence ETL Tools Snowflake

Software Methodology Agile, Waterfall

BI tools Tableau, Power BI

Streaming Kafka, Spark streaming

Orchestration Tidal, Airflow

Data Platform Databricks, GCP, Snowflake

PROFESSIONAL SUMMARY

Led and managed a team of data engineers, ensuring timely and high-quality delivery of data pipelines and platform enhancements in a fast-paced cloud environment.

Adept at leveraging Databricks, Kafka, PySpark, and Airflow in GCP and AWS cloud environments to deliver high-performance data solutions.

Played a key role in migrating legacy data infrastructure from on-prem Hadoop and Oracle systems to GCP and AWS, including schema conversion, incremental loading, and historical data validation.

Designed and developed scalable, cloud-native, end-to-end data pipelines using Databricks, PySpark, and Kafka, handling both real-time streaming and batch ETL workloads.

Architected and implemented Big Data solutions on Databricks, leveraging Unity Catalog, Delta Live Tables, and Service Principals for robust governance, automation, and access control.

Adopted Delta Lake Medallion architecture (Bronze/Silver/Gold layers) for improved pipeline modularity, schema enforcement, and ACID-compliant transactions across streaming and batch workflows.

Applied advanced Spark optimization techniques such as Catalyst Optimizer, Predicate Pushdown, Dynamic Partition Pruning, Caching, and SMB/Map Joins to significantly improve query performance and resource utilization.

Developed config-driven PySpark ETL frameworks with modular logic for SCD Type 1 and 2 handling, enabling reuse and scalability for large-scale data transformation.

Utilized Google Cloud Storage (GCS), BigQuery, and Dataproc to build scalable analytics solutions in GCP, integrating structured and semi-structured data sources including CSV, Parquet, ORC, Avro, and JSON.

Designed dimensional data models including fact/dimension tables, star and snowflake schemas, to support enterprise BI and reporting across Snowflake, BigQuery, and Tableau.

Extensive experience with Snowflake and BigQuery for cloud-native data warehousing and analytics — implemented efficient ELT pipelines, partitioning, clustering, and materialized views to support high- performance reporting.

Developed and optimized SQL queries in both Snowflake and BigQuery, applying cost-based optimization, query partitioning, and resource tuning to process large volumes of structured and semi-structured data.

Created actionable analytics dashboards in Tableau, integrating with real-time and batch pipelines for data- driven decision-making across operational and customer domains.

Automated and orchestrated pipelines using Tidal Workload Automation, building event-driven workflows with SLA monitoring and dependency resolution for batch and streaming jobs.

Conducted performance tuning and query optimization in Databricks SQL and PySpark, executing complex distributed queries to support reporting and machine learning readiness.

Promoted cloud-agnostic and multi-cloud strategies by building pipelines that leverage both GCP and AWS components for maximum flexibility and performance.

Championed best practices in data wrangling, enrichment, and quality checks, delivering curated datasets for consumption by data scientists, analysts, and business teams.

Championed best practices for data wrangling, enrichment, and curation, preparing high-quality datasets for advanced analytics and machine learning initiatives. EDUCATION

• Bachelor of Engineering in Electronics & Telecommunications Engineering, India PROFESSIONAL EXPERIENCE

Project: Customer Data Engineering- Client: Petsmart Jan -2023 – Present Data Engineer Phoenix, AZ

Led the development of a Customer 360 platform, delivering a unified shopper profile to drive personalized recommendations, targeted promotions, and enhanced loyalty program experiences. Played a key role in optimizing the Loyalty Points System to improve customer retention via tailored incentives.

Spearheaded the design and implementation of end-to-end batch and real-time data pipelines using Databricks, PySpark, GCP and AWS, ensuring high-throughput and low-latency processing of pet commerce and behavioral data.

Adept at leveraging Databricks, Kafka, PySpark, and Airflow in GCP and AWS cloud environments to deliver high-performance data solutions.

Built scalable Spark-based solutions in Databricks, creating and optimizing DataFrames and Datasets with PySpark for in-memory transformations and aggregations on large datasets.

Developed custom Python UDFs and SQL expressions for advanced transformation logic in both Spark SQL and Databricks SQL environments.

Integrated Kafka (if applicable; include only if used) and GCS-based streaming inputs into data pipelines for real-time ingestion and analytics, ensuring timely insights for operational dashboards.

Implemented Delta Lake for ACID transactions and schema enforcement across curated layers, supporting scalable and governed data lake operations.

Orchestrated jobs using GCP Workflows and Cloud Composer (Airflow) to schedule, monitor, and manage dependencies across complex ETL workflows.

Leveraged a mix of file formats (Parquet, ORC, JSON, Delta) for ingestion and storage optimization, ensuring compatibility and performance for downstream analytics.

Tuned Spark job execution by applying caching, broadcast joins, Map Joins, Sort-Merge Bucket (SMB) Joins, and resource configuration (executors, driver memory), resulting in a 40% improvement in processing efficiency.

Created scalable data marts in Snowflake using clustering, partitioning, and materialized views, reducing query times for high-volume dashboard queries.

Managed secure data exchange and storage using GCS, while enabling federated querying and BI reporting through BigQuery.

Collaborated with product and analytics teams to design business-critical data models (Fact/Dimension), implementing SCD Type 1/2, star, and snowflake schemas for reporting and ML features.

Contributed to data validation and lineage tracking through Unity Catalog and Databricks notebooks, enhancing observability and auditability.

Participated in code reviews, CI/CD deployments, and data governance practices, fostering a high-quality, production-grade data ecosystem.

Project: Enterprise Data Platform Modernization Client: Petsmart Jan -2021 – Dec 2022 Role: Data Engineer Phoenix, AZ

Led a strategic initiative to migrate legacy on-premise data infrastructure to Google Cloud Platform (GCP), modernizing the data ecosystem using cloud-native services and big data frameworks. The project unified batch and real-time data pipelines to support operational analytics and enterprise reporting.

Migrated complex ETL workflows from on-prem Hadoop and Informatica to GCP using Databricks, PySpark, and Kafka, enabling scalable and cost-efficient cloud processing.

Re-engineered batch pipelines using PySpark within Databricks, optimizing data processing with parameterization, dynamic partitions, and modular job design.

Designed and implemented real-time streaming pipelines with Kafka and Spark Structured Streaming, handling high-throughput transactional and sensor data.

Built a Delta Lake-based architecture (Bronze/Silver/Gold layers) supporting ACID transactions, schema evolution, and time-travel for both batch and streaming workloads.

Implemented ingestion from on-prem sources using Kafka Connect and GCP Pub/Sub, transforming and storing data in BigQuery and Google Cloud Storage (GCS).

Tuned Spark performance using techniques such as caching, predicate pushdown, broadcast joins, SMB joins, and checkpointing to ensure optimized data flow and minimal latency.

Orchestrated and monitored complex data workflows using Tidal Workload Automation, integrating real-time and batch jobs with event-based triggers, SLA monitoring, and alerting for operational efficiency.

Developed robust modules for data enrichment, quality checks, and business rule application, applied consistently across ingestion layers.

Integrated Databricks Unity Catalog for data lineage, access control, and auditability, ensuring compliance with organizational governance standards.

Worked with DevOps to containerize Spark jobs and implement CI/CD pipelines for smooth deployment and rollback in the GCP environment.

Conducted knowledge transfer sessions for business and engineering teams to ensure smooth adoption of the modernized platform.

Project: DevOps production support Datalake - Client: Petsmart Jan -2021 – Dec 2022 DevOps Production support Manager Phoenix, AZ

Led the DevOps Production Support team at PetSmart, ensuring the seamless operation of data pipelines and system infrastructure. Managed job executions, cluster performance, and data pipeline health in Databricks and GCP environments.

Key Responsibilities:

Monitoring & Alerts: Monitored job executions in Databricks/Tidal, reviewed GCP metrics, and responded to system alerts to ensure smooth operations.

Job Execution & Issue Resolution: Ensured successful execution of scheduled jobs, debugged failures, and re- ran jobs post-failure to maintain data consistency.

Cluster & Data Pipeline Management: Managed Databricks clusters, optimized resource usage, and troubleshoot performance issues. Monitored data pipelines from ingestion to transformation and resolved errors.

Incident & User Support: Logged incidents in ServiceNow, coordinated issue resolution, and provided user support for data availability, accuracy, and troubleshooting.

Code Deployment & Version Control: Deployed new code, validated successful execution, and rolled back changes as needed. Maintained version control and documented deployment steps.

Performance & Optimization: Optimized SQL queries and transformations, identifying and addressing performance bottlenecks in Databricks and GCP.

Backup & Recovery: Ensured regular data backups and performed recovery tests to prevent data loss.

Compliance & Auditing: Reviewed access logs for compliance, maintained audit trails, and ensured adherence to data governance policies.

Proactive Maintenance: Identified cost optimization opportunities, scheduled routine maintenance, and ensured system reliability.

Holiday-Specific Monitoring & Reporting: Enhanced monitoring for peak seasons, scaled resources in advance, and generated detailed performance reports during holidays. Project: Digital Offer Ecosystem – Client: American Express Jun 2014 – Dec 2020 Big Data Quality Manager Phoenix, AZ

The Digital Offer Ecosystem is a comprehensive suite of applications that enables the delivery of digital offers to cardholders globally via big data platforms. It supports the entire lifecycle of digital offers, including setup, eligibility configuration, personalized recommendations, and offer redemption. Additionally, the ecosystem features tools that promote small business merchants, aligning with initiatives like Amex's "Shop Small Saturday," which categorizes and promotes small businesses across the United States. Key Responsibilities:

Led QA teams, ensuring timely deliverables and team growth, while overseeing successful migration of legacy systems to modern platforms.

Managed vendor relationships, ensuring timely software updates and system maintenance.

Optimized team performance through effective resource allocation and capacity planning.

Developed and executed manual, automated, and performance testing strategies, and maintained CI/CD pipelines using Jenkins.

Collaborated across teams, promoting the adoption of testing tools and fostering communication in Agile projects.

Championed data quality improvements through validation, cleansing, and automated data management using Python and Spark.

Led the American Express recommendation engine project, applying business rules to recommend offers and merchants.

Provided SME-level support during testing and deployment phases, managing test activities across functional, manual, and regression tests.

Designed and executed automated test scripts using Python and Spark, and validated large datasets with complex SQL queries.

Troubleshot data issues, conducted root cause analysis, and resolved defects, ensuring data accuracy and integrity.

Built and maintained CI/CD pipelines and ensured test coverage for new features and backend systems.

Tools & Environment: Python, Spark, Hive, HBase, Java, SQL, Jenkins, Jira, MapReduce, SOLR, Elasticsearch, Flume, JMeter, SoapUI.

Project: Provider and Member domain product in healthcare- Client: Aetna Jan-2007 – June-2014 Technical Test Lead Hartford (CT), Phoenix(AZ)

Key Responsibilities:

Specialized in automating testing for Medicaid products, ensuring alignment with business requirements.

Developed test plans, cases, and data, delivering comprehensive status reports.

Managed defect triage and traceability using HP ALM and Quality Center.

Performed functional, regression, and data validation testing with SQL and Python.

Implemented mapping variables and session parameters for data extraction.

Authored UNIX and SQL validation scripts to automate data verification.

Designed test strategies, schedules, and metrics to reduce defects.

Built a Test Automation Framework to streamline manual testing processes.

Executed load testing with JMeter, identifying performance issues.

Mentored junior staff on quality management best practices.

Facilitated communication across UAT, business, and IT teams.

Integrated Agile practices, improving software development processes and cross-functional collaboration.

Contact this candidate