Data Engineer

Philadelphia, PA
October 15, 2020

Juanyan Zhu

University of Pennsylvania School of Social Policy & Practice Philadelphia, PA


University of Pennsylvania School of Social Policy & Practice Philadelphia, PA

· Master of Science, Data Analysis + Social Policy, GPA 3.5 May 2019

· STEM Major: Data Modeling/Warehousing and Database Administration 11.0802

· Related coursework: Quantitative Policy Analysis, Programming Language, Applied Statistic Models, Modern Regression (Statistical Learning), Geo-Spatial Analysis and Modeling, Research and Evaluation Design

Fudan University School of Social Science Shanghai, China

· Bachelor of Science, Statistics, GPA 3.4 June 2017

· Related coursework: Social Statistics, Quantitative Social Research Methodology, Micro Econ Analysis Data Engineer Experience

District Performance Office, School District of Philadelphia Philadelphia, PA Associate, Data Engineering February 2019 – Now

· Data Cleaning & Regulation (Python, R)

o Develop package for standardizing and validating data point naming, format and values

· Data warehousing, Data System Design & Execution (Qlik; SQL; Google Analytics) o Put up auto process to streamline data from source to product like reports or dashboards

· Build BI Dashboard products (Visualization)

· Internal program development for special needs (Python) o Develop package for image manipulation, website traffic monitoring, etc.

· Fulfill external data requests (Qlik; SQL; R)

o Help with research data model design to select better data points in database o Make database pull and queries, then clean and transform data for users NetEase Beauty Community, NetEase (Hangzhou) Network Co., Ltd. Hangzhou, China Database Engineer (Intern) May 2018 – August 2018

· Build and maintain beauty product catalogue database through web scraping and database merging.

· Work with NLP team for keyword analysis.

· Use site visiting data to build user adoption and adaption model, predict user growth and retention, helping decisions on service volume and improving user data labeling.

Data Reporting Experience

Fonterra Co-operative Group, Commerce Department Shanghai, China Business Intelligence Intern February 2017 – July 2017

· Monthly market data analysis report, including trend analysis, performance comparison, segmentation analysis, and provide possible explanation.

· Competitive analysis, in-depth data exploration from competitor’s seasonal reports, study of their strategies and executive policies in relation to their performance data. Development & Reform Commission, Regional Government Shanghai, China Data Reporting Intern May 2016 – September 2016

· Assist the regional government data reporting group with the annual report

· Draft daily government-public communication post; Data visualization for open data initiatives



o SQL (similar varieties) - Advanced

This is the major language I use for interacting with the data warehouse in my current position. I have 2 years of experience in relational databases and I'm confident writing advanced queries.

I use this to build a backend data models for our BI products, also to build longitudinal metadata tables for data monitoring. This involves data model design and query optimization.

I'm valued on my team for writing queries that are efficient (and accurate) since we deal with complex student data that are large (million-row tables) and intricate in table-key relations. o R - Advanced

For local data(outside our BI platform), this is the major language I use for rectangular data tables. I've been using R since in school. I'm confident in statistical analysis and data frame manipulation.

I also have project experience with statistical learning algorithm in R. This is the environment that I learned machine learning methodology.

o Access - Have Experience

· BI Product

o Qlik Sense - Advanced

This is my current working BI platform. Aside from the back-end data structure, I build front-end BI dashboard visualization for sharing with stakeholders. o Tableau - Have Experience

· Data Warehousing

o Work together with our Information System to design, manage, and update the system. I think this is a fundamental part of being able to understand the data flow and better design other datasets.

· Data Engineering/Program Design

o Python

This is the main language I use for non-rectangular data and other functional programming. I have 4 years of programming experience in Python. I'm confident in the logic design but there's still a lot to learn. Project examples listed following.

Data structure package - This is a data cleaning and standardization program design to internal user preference, converting input files into a standardized internal data object, automizing formatting, renaming and structuring.

Student matching - This is a package for assigning a correct internal ID to individuals based on personal information input, handling slight mismatch (typos). This a little Natural Language processing, and a lot of tuning and algorithm optimization.

Privacy Suppression - Design algorithm for minimal data suppression to protect sensitive data.

Image library management - Image generation, modification, and management package for BI product icons.

o R

Fulfill internal functional needs and translate Python programs for R users

· Git

o I'm comfortable working with version control and cooperating with the team.

· Reporting

o My current job responsibility includes reporting products: school district Open Data for the public, data requests from the educational researchers, School Progress Reports for schools, Dashboards for internal offices and stakeholders, and data report system to the Pennsylvania Department of Education.

