Senior Data Engineer with 10+ Years of Scalable Data Platforms

Location:

Dallas, TX

Salary:

130000

Posted:

May 05, 2026

Contact this candidate

Resume:

George Wu

Senior Data Engineer

***********@*****.*** 929-***-**** Dallas, TX 75219

S U M M A R Y

Data Engineer with 10 years of experience designing and building scalable data platforms across financial services and enterprise software domains. Specialized in developing distributed data pipelines, real-time processing systems, and cloud-based analytics infrastructure that support data-driven decision making. Proven track record of optimizing large-scale data ecosystems and collaborating with cross-functional teams to deliver reliable, high-impact data solutions. S K I L L S

Programming Languages: Python, Java, Scala, SQL, Bash Big Data Technologies: Apache Spark, Hadoop, Kafka, Hive, Airflow Cloud & Infrastructure: AWS (S3, EMR, EC2, Lambda), Docker, Kubernetes, Terraform Databases & Storage: PostgreSQL, MongoDB, Redshift, MySQL, NoSQL Data Processing & ETL: Data Pipelines, Data Modeling, ETL/ELT, Workflow Orchestration, Data Integration Monitoring & Visualization: Tableau, Power BI, CloudWatch, Prometheus, Grafana E X P E R I E N C E

Capital One McLean, VA

Senior Data Engineer Oct 2022 – Present

– Designed and maintained large-scale data pipelines supporting credit risk and customer analytics platforms used by millions of banking customers, leveraging AWS-based architecture to ensure high availability and scalability.

– Built distributed data processing workflows using Apache Spark and Kafka to handle real-time transaction data streams, improving data latency and enabling faster fraud detection and decision-making processes.

– Developed and optimized relational and NoSQL database models in PostgreSQL and MongoDB to support complex financial data use cases, ensuring efficient query performance and accurate reporting.

– Collaborated closely with data scientists and business stakeholders to deliver curated datasets for machine learning models used in credit scoring and marketing personalization initiatives.

– Implemented automated data quality checks and governance frameworks to ensure compliance with financial data security standards and improve trust in downstream analytics systems.

– Redesigned legacy ETL workflows into modern Airflow-orchestrated pipelines, significantly reducing manual intervention and improving pipeline reliability and maintainability.

– Led performance tuning efforts across distributed systems, optimizing Spark jobs and data storage strategies to reduce processing costs and improve overall system efficiency. First American Santa Ana, CA

Data Engineer Oct 2019 – Oct 2022

– Developed data integration pipelines for real estate and title insurance platforms, enabling accurate property data aggregation used by internal analysts and external customers.

– Built batch and streaming data pipelines using Spark and Kafka to process large volumes of property transaction data, improving data freshness for downstream analytics systems.

– Designed scalable data models and schema structures in PostgreSQL to support reporting systems and business intelligence dashboards used by underwriting and operations teams.

– Collaborated with cross-functional teams to gather data requirements and deliver optimized datasets for analytics and reporting, ensuring alignment with business objectives.

– Implemented workflow orchestration using Apache Airflow to automate ETL processes, reducing manual effort and improving operational efficiency across data engineering workflows.

– Enhanced data governance practices by implementing validation rules and access controls, ensuring compliance with regulatory requirements in the financial and real estate domains.

– Integrated third-party data sources into internal systems, enabling richer insights and expanding the capabilities of analytics platforms used across the organization.

Page 1 of 2

Quorum Software Houston, TX

Data Engineer II Jan 2018 – Oct 2019

– Developed data processing pipelines for energy trading and asset management platforms, enabling clients to analyze production and operational data more effectively.

– Built ETL workflows using Python and SQL to transform raw energy data into structured datasets used for reporting and forecasting applications.

– Worked with Hadoop and Spark to process large-scale datasets related to oil and gas operations, improving data accessibility and performance for analytics teams.

– Designed database schemas and optimized queries in relational databases to support high-volume transactional and analytical workloads.

– Collaborated with product and engineering teams to enhance data platform capabilities and deliver reliable data solutions for enterprise clients.

– Implemented monitoring and logging solutions to track pipeline performance and quickly identify and resolve data processing issues.

Pulse Secure San Jose, CA

Junior Data Engineer Jun 2017 – Aug 2017

– Supported the development of data processing tools for cybersecurity products that monitored network activity and user behavior across enterprise environments.

– Assisted in building data ingestion pipelines to collect and process log data from distributed systems, enabling security analytics and threat detection.

– Wrote SQL queries and scripts to clean and transform raw data for use in internal dashboards and reporting tools.

– Collaborated with senior engineers to improve data pipeline reliability and troubleshoot issues in data processing workflows.

– Participated in the design of data storage solutions for high-volume log data, ensuring efficient retrieval and analysis capabilities.

– Gained hands-on experience with data engineering tools and best practices in a fast-paced, security-focused technology environment.

E D U C A T I O N

University of California, Berkeley 2014 - 2017

Bachelor of Arts (B.A.), Statistics

Page 2 of 2

Contact this candidate