Post Job Free
Sign in

Senior Data Engineer with Databricks and SQL Proficiency

Location:
Reston, VA
Salary:
$65/hr
Posted:
November 19, 2025

Contact this candidate

Resume:

*

SHUXIN (SHERRY) DONG

Address: ***** ********** **., *******, ** 20171

Phone: 414-***-**** Email: *********@*****.***

VISA STATUS

Authorized to work in the U.S. (Citizen)

PROFESSIONAL SUMMARY

Senior Data Engineer with 7+ years of experience building scalable data pipelines, optimizing performance, and integrating complex data sources across healthcare, auditing, and gaming industries. Proficient in Databricks, PySpark, Snowflake, and SQL Server, with a strong focus on automation, data modeling, and cloud-based architecture. Strong collaborator in Agile environments with expertise in data governance, SDLC, and integrating multi-source data for analytics and reporting.

CORE SKILLS

Databases & Data Warehouses: SQL Server (2008 and later), Snowflake, MySQL, Apache Cassandra

Big Data & Cloud Platforms: Databricks, Spark, Hadoop, Azure (Data Factory, Azure SQL, Storage), AWS

(S3, Redshift, RDS)

ETL & Data Integration: SSIS, Matillion, Airflow

Data Modeling & Design: ERwin, Lucidchart, MS Visio, Miro

Programming & Scripting: SQL (T-SQL, Dynamic SQL, NoSQL, Snowflake Query, CQL), Python, PySpark, JSON, Java

BI & Reporting: Power BI, Tableau, SSRS

Source Control & Collaboration Tools: Azure DevOps, VS Code, Jira, Team Foundation Server (TFS), Git, GitLab, Confluence, Kanban Board

SDLC Methodologies: Agile, Scrum, Waterfall

CERTIFICATE

Databricks Data Engineer Associate

PROFESSIONAL EXPERIENCE

Exact Sciences Corporation, Madison, WI (December 2021 - Present) Sr. Data Engineer (February 2024 – Present)

Responsibility

• Automated PwC Auditing workflows, reducing 14+ hours of manual work to 45 minutes systematical work, by developing PySpark scripts and Databricks Jobs to dynamically create cascading Box folders and generate CSV files by following the changing business requests and naming conventions.

• Designed and optimized scalable ETL pipelines using Databricks, PySpark, and PySQL, managing 20+ data pipelines for enterprise data processing. Developed automated data extracts and batch notification pipelines, integrating with AWS S3 and EPIC SFTP for efficient data delivery.

• Collaborated with the Principal Lead to build centralized Patient and Provider entities by integrating attributes from EPIC Clarity, Nitro/Veeva, Salesforce, Pglims, and other sources. Developed standardized data pipelines that ingest raw data every three hours, enabling consistent, master-aligned datasets across the platform. These unified entities support the application Customer Data Platform (CDP), empowering teams to join previously siloed systems and build scalable, personalized data products.

• Migrated existing SSIS packages to Databricks pipelines, modernizing ETL workflows with PySpark and notebook- based transformations for improved scalability and performance. Data Engineer II (December 2021 – January 2024)

Responsibility

• Led data migration and architecture design for EPIC Clarity Data and PwC Auditing, optimizing data models and transformation workflows.

• Built and maintained ETL pipelines using Databricks, Matillion, Python, SQL, and AWS (S3, SQS) to support large- scale data ingestion.

2

• Extracted and processed SAP data from S/4HANA 2020, creating SAP Views, Infosets, and QLIK REPLICATE tasks to integrate into Snowflake.

• Improved Snowflake performance, reducing query runtimes from 25 hours to 30 minutes through indexing, caching, and optimized query strategies.

• Managed Agile workflows, version control, and documentation using Jira, GitLab, Confluence, and Alation, ensuring efficient development cycles.

• Maintained and enhanced existing SSIS packages by applying bug fixes, performance tuning and updating business logic to meet evolving data requirements.

Landmark Health LLC., Ellicott City, MD (September 2019 – November 2021)

(Acquired by UnitedHealth Group – Optum in March 2021) Data Developer

Responsibility

• Independently managed inbound CMS file ingestion and outbound file delivery, handling multiple formats (Excel, CSV, Flat Files, XML) and integrating data into the Enterprise Data Warehouse (EDW) based on business needs.

• Led on Data Profiling, Data Cleansing, Data Transforming and Cohort Algorithm implementation, delivering optimized ETL design, development and unit testing aligned with business requirements.

• Designed, developed and deployed SSIS packages to extract, transform and load data from heterogeneous sources. Implemented advanced SSIS components including ForEach Loop, Script Task, Conditional Split and Lookup to meet complex business logic and data flow requirements.

• Provided production support and client integration, handling Help Desk tickets by prioritizing tasks based on severity assessment and impact analysis. Utilized SQL, database structures, and business knowledge for troubleshooting and resolving critical issues.

• Developed Ad Hoc queries, Stored Procedures or User Defined Functions for data validation, transformation, and pipeline development. Built Python Pandas scripts in Jupyter Notebook for dimensional transformations and source data analysis.

• Monitored and analyzed the query performance by using Execution Plans, Logs and Profilers. Optimized SQL performance by rewriting queries, creating indexes/partitions, modifying locking strategies and implemented efficient SQL solutions including Views, CTEs, Temp Tables, Table Variables, and Aggregation-First strategies, leading to substantial query performance improvements.

• Collaborated with the HScale Team to build an Operational Data Store (ODS), cleaning high-volume inbound data from multiple sources using Hadoop/Spark and mapping it with internal tables via MySQL and Java.

• Utilized ERwin for ODS modeling and architecture design, leading key module implementations, Slowly Changing Dimension (SCD) strategies, and critical functionalities establishment.

• Utilized Azure DevOps for source control, version management, and CI/CD coordination across data engineering projects.

Helm 360, Wayne, PA (May 2019 – September 2019)

Contractor - Data Developer/Data Analyst

Responsibility

• Contributed to Data Warehouse design using Dimensional Modeling, gathering stakeholder requirements from legacy Data Marts and collaborating on solution architecture.

• Designed ER diagrams with Star/Snowflake schemas in Visio and developed data migration strategies for loading fact and dimension tables.

• Developed, tested, and optimized SSIS packages, leveraging tasks and containers for data cleaning, transformation, and ingestion from heterogeneous databases.

• Managed Azure Cloud Data using Azure SQL, Data Warehouse, Data Factory, PowerShell, and Azure Storage, tailoring solutions to client requirements.

• Built and optimized Power BI solutions, integrating data from SQL Server, Oracle, and Access. Created custom DAX measures, reports, and dashboards to meet business needs. YOOZOO GAMES CO., LTD., Shanghai, China (June 2017 - August 2017) User Research Intern

3

Responsibility

• Extracted, compiled, and tracked data for 30,000+ game products and 1.1 billion user devices from Hadoop using SQL. Collaborated with the Data Mining team on Data Profiling and Cleansing, ensuring data quality and integrity.

• Led the migration of legacy game data from local servers to Snowflake Cloud, flattening nested datasets at varying granularities for optimized querying.

• Performed market analysis and generated predictive reports using Tableau dashboards, providing short-term and long-term insights on competing products.

• Assisted in Hadoop and Hive administration, solving OOM issues, monitoring cluster health, and optimizing daily maintenance, reducing processing time by 12.3%.

Ardisam Inc., Shanghai, China (January 2012 – July 2013) Data Processor

Responsibilities

• Participated in the requirement gathering and solution design with other team members.

• Coordinated resolutions with the Sales Development Team and project managers. Assisted to develop Stored Procedures, User Defined Functions, Views, Triggers, Cursors, CTE, Temporary Table and Table Variables based on the requirements from the Sales Development Team.

• Involved in security related issues by creating user logins, roles, permissions and granting access to database according to their role.

EDUCATION

UNIVERSITY OF WISCONSIN - MILWAUKEE, Bachelor of Science – Computer Science (2018) UNIVERSITY OF WISCONSIN - MILWAUKEE, Master of Arts – Economics (2014) SHANGHAI INTERNATIONAL STUDIES UNIVERSITY, Bachelor of Arts – Finance (2013)



Contact this candidate