Data Engineer Big

Location:

Belmont, CA, 94002

Posted:

January 21, 2025

Contact this candidate

Resume:

YING-WEN (OLIVIA)

CHEN

PROFILE

Experienced Data Engineer with 15+ years of expertise in software development, big data, and distributed systems. Strong fundamentals in systems design, data structures, algorithms, and object-oriented programming. Proven track record in building and optimizing data processing and orchestration pipelines using Apache Spark, Kafka, and other big data technologies. Skilled in programming with Python, Scala, Java, and familiar with modern cloud infrastructure and automation tools. Recently earned the Google Professional Data Engineer certification and looking forward to applying new technologies and scientific computing in a multidisciplinary environment. SABBATICAL 2022 - PRESENT

• Focus on personal growth and family commitments while dedicating time to learning and experimenting with new technologies, including Google Cloud services (BigQuery, BigTable, DataFlow) and exploring machine learning frameworks.

• Completed Google Professional Data Engineer Certification in 2024 to enhance skills in big data engineering, cloud technologies, and orchestration. EXPERIENCE

Staff Data Engineer, Visa Inc., Palo Alto, CA April 2019 – February 2022

• Cluster Performance Optimization: Analyzed complex data sets to identify inefficiencies in Spark and Hive workloads, optimizing data storage and query performance. Implemented partitioning, bucketing, and columnar storage in Hive to reduce disk space usage by 30% across clusters. Fine-tuned Spark configurations for better memory management and resource utilization, resulting in improved overall system performance and faster processing times for large-scale data operations.

• Cross-Functional Collaboration: Worked closely with data analysts and scientists over one hundred coworkers across the USA, UK, and Singapore, delivering detailed reports and summarizing findings to support business team decisions and drive performance improvements.

• Spark Framework Expertise: Acted as the Spark Subject Matter Expert (SME), planning and executing multiple upgrades of the Apache Spark framework, enhancing processing speed and performance within the data platform. Additionally, created a monitoring tool to help the application team track memory usage, improving resource management and preventing performance bottlenecks.

• High Availability: Contributed to internal initiatives focused on ensuring high availability of the data platform by creating a robust validation mechanism that CONTACTS

Phone: 650-***-****

Email:

**********@*****.***

Location: San Mateo,

Certificate

• Professional Data

Engineer from Google

in 2024

• Supervised Machine

Learning: Regression

and Classification

• Advanced Learning

algorithms

SKILLS

• Apache Spark

• Apache Beam

• Kafka

• Hive

• AirFlow

• BigQuery

• BigTable

• Data Flow

• PUB/SUB

• Redis

• Java

• Scala

• Python

• Data Warehouse

• CI/CD

• Git

minimized downtime and reduced maintenance windows by several hours, guaranteeing seamless operations and continuous data accessibility.

• Performance Testing Spark on Kubernetes and Hadoop Ecosystem, including tuning Kubernetes configurations (e.g., resource allocation, autoscaling, and pod management) to optimize Spark performance and scalability for large datasets.

Staff Data Engineer, Visa Inc., Foster City, CA April 2016 – March 2019

• Legacy System Redesign: Led the end-to-end redesign of a legacy ETL system, migrating it to Apache Spark, Scala, and Hive within the Hadoop ecosystem. The project involved building the solution from scratch, including data modeling, parsing raw data, creating a data lake, automating report generation, and orchestrating workflows with Airflow to streamline ETL processes and improve scalability.

• Proof-of-Concept Development: Proof-of-Concept Development: Designed and executed proof-of-concept (POC) projects to evaluate proposed technical solutions, leveraging technologies such as Presto, Parquet, ORC, Apache Hbase, Apache Hive, Apache Flink, and Hive to ensure performance, scalability, and customer requirements were met for data processing, storage, and real-time analytics.

• Collaborative Solution Development: Worked closely with software development, testing teams, and business stakeholders to design and deliver data solutions that met customer requirements for functionality, scalability, and performance.

• Mentorship & Leadership: Provided mentorship to less-experienced team members, guiding them through technical challenges and fostering a collaborative learning environment.

• Deployment & Maintenance Collaboration: Partnered with data platform and infrastructure teams to ensure smooth deployment and ongoing maintenance of software applications, driving seamless operational continuity. Senior Software Engineer, Populous Group, Foster City — 2015 - 2016

• Offered experience with Scala and Python in Spark

• Data modeling for Data warehouse in Hive

Senior Big data Developers, Uni-President Enterprise Corp 05/2011 — 07/2012

• Regulate the data access flow to obey Personal Information ACT in Taiwan

• Developed and enhanced search programs with Apache Solr, HDFS and MapReduce

• Created BI system for e-commerce website to map out sales strategy

• Directed planning and designed of recommend system on Java/J2EE EDUCATION

California Lutheran University Master’s degree, Computer Science 2013-2014

Contact this candidate