SHENGZHONG YIN
*** * ***** **, *** York, NY *****
*********@*****.***
PROFILE
A well-educated data science graduate with: strong academic background (CS, Math, and Data Science); familiarity/experience of big data analysis tools (Pig, Hive, Spark, Hadoop, Matlab, R); fluency of in computer operation and programming (Python, Java, C++, SQL, HTML, PHP); real industrial experience. EDUCATION
Columbia University New York, NY
Master of Science Aug.2015 - Dec.2016
Master in Data Science.
University of Virginia Charlottesville, VA
Bachelor of Arts Aug.2011 - May.2015
Major in Mathematics (with Probability and Statistics concentration).
Major in Computer Science.
WORK EXPERIENCE
CITIC Securities Company Limited Shanghai, China
Analyst Assistant May.2014 - Aug.2014
Passed through learning curve quickly, got involved in data collection and wrote a report weekly.
Supported business working flow.
PROJECT EXPERIENCE
Kaggle Competition New York, NY
Predictive modeling competition Apr.2016
Participated in a Kaggle in class competition.
Built a model based on spoken dialogues to classify texts by speakers.
Got 93.6% accuracy (top accuracy approximately 95%). Data Science Capstone New York, NY
Natural language processing project Sep.2016 – Dec.2016
Cooperated with Unilever.
Dealt with Amazon commodity product review dataset (~1G) and generated a summary for each product.
Dealt with Unilever product survey answer dataset (~200M) and generated a summary for each survey question.
Techniques used include topic modeling (LDA, NMF), keyword selection (word embedding), clustering (k-means) and etc.
Big Data Analytics Project New York, NY
PokemonGo data analysis with big data tools Sep.2016 – Dec.2016
Used pokemon historical occurrence data (~1G) to predict future occurrence.
Linked pokemon occurrence to local 311 service data (~2G) to analyze unusual events.
Techniques used include multiple machine learning algorithms, PySpark and graph database tool SystemG. TECHNICAL SKILLS
Computer Skills
Programming languages: Python, Java, C++
Data processing tools: Matlab, R
Database tools: SQL
Big data manipulating tool: Pig, Hive, Spark, Hadoop
Web-design tools: HTML, PHP