Sign in

Data Project

Arlington County, Virginia, United States
December 03, 2018

Contact this candidate


Biyi Chen 412-***-**** ****s Eads St. Arlington VA 22202 (GitHub): SUMMARY

• Around 3+ years of diversified information technology experience in the field of Data Analysis, Data Warehousing, Data Integration, ETL and Application/Production Support.

• Expertise lies with data management and data warehouse projects end to end by providing business intelligence, data quality, data integration and reporting solution.

• Proficient with SQL, R, Python, and business professional visualization skills using Excel and Tableau

• Specialize in Statistical Modelling, Linear Regression, Logistics Regression, Multivariate Regression, Time Series, A/B Testing, Machine Learning, Decision Tree, Clustering, Classification, NLP, Data Visualization/Reporting.

• Hands-on Programming experiences in: R (TensorFlow, Keras, Caret, Dplyr, Tidyr, Ggplot2, Shiny), SQL/ NoSQL, Databases (MySQL, SQL Server), SAS, Python (Jupyter, Matplotlib, NumPy, Pandas, Scikit-Learn, Scipy, Scrapy), Tableau, and proficient in Excel, VBA, MS PowerPoint, and MS Access. TECHNICAL SUMMARY

Software: Visual Studio, SAS, Tableau, SQL Server, Azure, AWS, CRM, SCRUM; Languages: Python(Beautifulsoup, NumPy, IPython, Pandas, Scipy, Scrapy, Scikit-Learn), SQL/NoSQL, Databases (MySQL, SQL, Server), HTML5, Angular, PHP, JavaScript, VBA, Excel(PivotTables, Power BI, PivotCharts, HLOOKUP, VLOOKUP, Solver), R

(ggplot2, dplyr, glm, CRAN, Tidyr, Ggplot2, Shiny), GIT, Gulp, REST API; Analytical: Machine Learning, Bagging and Boosting Models, Logit Model,Neural Network,SVM, Survival Analysis, Cluster Analysis, Multivariate Analysis, Statistical Modeling/Analysis; WORKING EXPERIENCE

University of Management and Technology Arlington, VA Business Intelligence Analyst February 2018-Present

• Developed CS/IT course and Moodle LMS system administration, and document vulnerabilities for The University of Management and Technology (UMT).

• Designed and Built the ETL process with SSIS Package to migrate 2000 Models into the new database. Evaluated the data with advanced Excel skills and SQL queries to perform analysis for reporting projects and the IT/CS courses. Increased performance of SQL queries by 10%.

• Analyze and compare 5000 student datasets including demography, penetration, usage frequency, engagement, segmentation analysis, course lifecycle and adoption patterns, custom interactive dashboard reports to the top management.

• Assisted managers to establish the Conversion Rate Model of each course.

• Collaborated with IT team to improve the courses’ attractiveness and accessibilities by estimating the effectiveness of different region, or online and multi-platform advertising, better targeting at different region and platform audience.


Data analyst internship August 2017-December 2017

• Elevated the discussion around the legal cannabis industry globally by providing unbiased vetted information and educating stakeholders to make the informed decision.

• Developed ETL process to the data warehouse, which provides a flexible data usage by querying large dataset for 10 years over 2,000,000 pharmacy data. Implemented ETL mappings using SSIS package.

• Compiled web crawler to transform 6000 unstructured format online data (HTML) into structured data. Worked with large data sets, prepared data for analysis including data quality check, integration, clean, and manipulations, interacted with CRM data model to compare the usage of cannabis and analyzed the votes of legitimate the use of cannabis.

• Utilized statistical analysis to identify the key factors that influence cannabis product, building predictive models to predict future subscription rate and segmented the customer, market influence and provided data-driven decision-making recommendation to clients.

• Analyzed over 2000 operators with more than 10 years of data and generated KPI daily reports using R shiny for top management to review.

• Improved code processing efficiency by 30% by conducting data manipulation and creating VBA Macro. GEORGE WASHINGTON UNIVERSITY

Washington, DC

Research Assistant June 2017-February 2018

• Communicated with the professor and collaborated with multi-departments to identify data needs;

• Scripted SQL queries for dynamic data updating. Kept the database up-to-date and prioritized needs to meet client’s specifications.

GitHub REST API Project October 2018

• Used REST API to return the list of public repositories accessible to my user account. YELP Dataset Analysis Project September 2017-December 2017

• Explore high-rated Yelp restaurants in 10000 unstructured data. Subtracted the business.json and checkin.json files from the Yelp dataset. Applied MySQL to analyze attributes. Converted the business.json dataset into a csv file and read the csv into MySQL database to analyze the attribute column using SQL queries.

• Performed some exploratory analysis and answered a question pertaining to what sort of restaurant should an investor or new restaurateur start.

Cleaning Request Interactive Dashboard September 2017-December 2017

• Build data models and develop strategy to assess the performance of 3000 datas.

• Used the R-Shiny to create the interactive dashboard for analyzing the cleaning request for public property in Washington DC for the school master level class individual final project. Raw dataset from Kaggle open data resources.

• R-Shiny interactive data dashboard individual project: Regression Model Python Project September 2016-December 2016

• Used Python customized the function and imported linear model, logistics regression model, and sklearn metrics to plot the benchmark wins and calculate the accuracy and evaluate the metrics. Airline Python Programming with Database Applications Project September 2016-December 2016

• Used Python datatime, pandas, warnings packages to manipulate, perform analytical functions on unstructured data. Used seaborn and matplotlib to visualize the structure data. Imported myDB package to create the database. Stored data in database and further analyzed with SQL queries. Text Mining Python Project September 2016-December 2016

• Used Python customized the function and imported collection and countered to counter the words. Specified the number of occurrences of each word frequency in the text files. EDUCATION BACKGROUND


Master of Science in Information Systems Technology (GPA:3.8/4.0) August 2016-May 2018 Certificate of Business Analytics (GPA:3.8/4.0) Certificate of Google Analytics ID:13413272 DUQUESNE UNIVERSITY Pittsburgh, PA

Bachelor of Accounting and Minor in Legal Studies August 2012-May 2016 Student Representative of CASTP

Contact this candidate