FANGYUAN (SHAY) CHEN
** ******* ******* ***. *** • Jersey City, NJ 07310 • 646-***-**** • ************@*****.*** https://www.linkedin.com/in/fangyuanchen/
SKILLS
Programming Languages: C++, Python, R, SQL, NoSQL, VBA, JavaScript, HTML, CSS RDBMS: SQL Server, MySQL, Oracle AWS Cloud Services: S3,EMR, Redshift, Athena, Lambda, etc Big Data Tools: Apache Spark, Apache Casandra, Snowflake, RapidMiner, Tableau, Google Analytics WORK EXPERIENCE
The Bridge Corporation Data Engineer New York, NY Python, Spark, SQL, AWS 05/2019 – Present
• Develop and implement data extraction procedures from multiple sources, integrate and structure large scale dataset
• Design and develop automated data ingestion pipelines and batch process to increase efficiency and cut cost
• Perform analytics on the cleaned data, plugged in different machine learning models and understood the behavior based on the metrics
• Pipeline into AWS Glue then used PySpark to perform complex transformations standardizing the data and stages it into S3 buckets and developed EMR jobs to analyze the data in S3
• Create AWS Athena database and tables using AWS Glue crawler for BI consumption and create queries in Athena to meet client’s data requests
• Collaborate with SDEs and business stakeholders to understand data needs and scope out technical requirements for data products and deploy high quality code in mission-critical production environments ChineseInvestors.com Data Analyst Intern New York, NY Python, SQL 10/2018 – 12/2018
• Created and automated reports on sales performance by Python and SQL and report the KPI in Tableau
• Collaborated across multiple business teams including marketing, sales, and logistics to optimize marketing policy formulation and improve users’ performance metrics accuracy and clarity using Tableau, SQL
• Created Python API to automate historical cryptocurrency data scraping, preprocessed and implemented Machine Learning techniques to predict cryptocurrency price, and summarized and visualized findings Litec Systems Corporation Quantitative Analyst Intern New York, NY Python, SQL 02/2018 – 09/2018
• Developed risk calculation framework to generate risk measure configurations based on market data, trade types and risk measure types. Developed the daily jobs to run risk calculation for all positions
• Designed database tables and developed python API to manage trades, positions and portfolios
• Developed the back testing system to prototype trading strategies. Compared the performance with market indices PROJECTS EXPERIENCE
Data Mining Santander Bank Product Analysis Python, Machine Learning 02/2018 – 05/2018
• Preprocess over 10 million records to define a subset of data for analysis by ridding the dataset of missing, duplicate and invalid values; applied tree-based feature selection to filter out redundant attributes to avoid overfitting
• Analyzed by training and testing Machine Learning models Decision Tree, Naïve Bayes and Deep Learning models over 5 million customer records to predict new users’ purchasing; investigated models by computing Area Under ROC curve to decide the best-performing model
• Performed Association Rule Mining to identify customers’ purchasing patterns EDUCATION
Bernard M. Bernard M. Baruch College – City University of New York, Zicklin School of Business New York, NY Master of Science in Information Systems 12/2018
Hubei University of Economics Wuhan, China
Bachelor of Management in Accounting 06/2015
- Certification: ACCA (the Association of Chartered Certified Accountants)