Resume

Engineer Data Analyst

Location:

Arlington, VA

Posted:

January 20, 2021

Contact this candidate

Resume:

YINGYING PAN

**** * **** **, *********, VA, *****

adjkzd@r.postjobfree.com 571-***-****

https://github.com/PAN-0921

EXPERIENCE

ASCENDING LLC Fairfax, VA

Data Engineer Aug. 2020 - Present

ASCENDING provides consultation service in Software Development, DevSecOps and Data Engineer fields.

Used Python and SQL to perform test-driven development tasks with the appropriate data storage and access solutions

Used Apache Spark and Cloudera Platform to capture and aggregate operational metrics, built interactive visualizations with Amazon QuickSight for marketing decision making.

Assisted in configuration, development and testing of Scala files and other scripts with Spark Shell. Troubleshooted issues in the execution of Apache Spark jobs in local machine and Cloudera cluster

Built, test and monitored ETL Process with AWS Data Pipeline, AWS EMR and AWS CloudWatch, prepared documentation management for the whole process.

Extracted and analyzed Users' data from Amazon Aurora MySQL Database using standard SQL then generated report.

Built local ETL environment with Dockerfile and Docker Compose.

Created Kafka topics with ingested events, then built Kafka procedure and consumer with Scala to process real-time streaming data.

Developed, test and deployed serverless API with API Gateway and Lambda function. Implemented Serverless Application Model (SAM) to built local environment and deploy to AWS.

Used impala to create a KUDU table. Extracted information by dataframe transformation and wrote to the KUDU table, then used impala query to verify that data was inserted correctly by the streaming job.

Used Hive join queries to join multiple tables of a source system and load them to Elastic search tables.

George Washington University Washington, DC

Research Assistant Jun. 2020 – Aug. 2020

Analyzed the behavior of monthly users data with time series model and linear regression model using Python; Achieved accuracy more than 90%.

Manipulated, processed, and extracted value from large MySQL and Redis NoSQL datasets. Bank of Ningbo Ningbo, Zhejiang Province, China

Data Analyst May 2018 - May 2019

Implemented Jupyter and Pandas to build a data prediction model on savings rate with more than 95% accuracy, contributed to designing incentives.

Used ensemble of decision trees to predict customers growth for services such as savings, personal loaning and mortgage; Achieved model accuracy of ~86%.

Assisted data scientists in risk assessment model validation with data management, analysis and interpretation.

Assisted in the execution and monitoring of risk management practices, operational issues and key risk indicators. Developed reporting dashboards on the outcomes of monitoring activity in relation to identified risk.

EDUCATIONAL BACKGROUND

The George Washington University Washington, D.C., United States Master of Science in Statistics (Expected in June 2021) Sep. 2019-Present Related Courses: Data Warehousing, Intermed Probability/Stochatic & Applied Multivariate Analysis University of Skövde Högskolevägen, Sweden

Courses: Management Accounting and Control & Financial Markets and Institution. Jan. 2017- Jun. 2017 Ningbo University Ningbo, PRC

Bachelor of Economics in Financial Engineering Sep. 2015- Jun.2019 Related Courses: Financial Risk Management, Statistical Methods for Forecasting & Theoretical Foundation of the Quantitative Investment.

SKILLS

Python, SQL, Scala, Apache Spark, Apache Hive, Apache Impala, AWS Lambda, SAM, Docker, Git, AWS, Sqoop, Spark SQL, Spark Structure Streaming, HDFS, Kafka, KUDU, YARN

Contact this candidate