Jieli Chen
ac7x9h@r.postjobfree.com; (***) - *** - 0561; https://github.com/JieliChen268
Address: Los Angeles, Torrance, CA, 90502, USA: https://www.linkedin.com/in/jielichen/
SUMMARY:
Computer Science graduate student with a strong math and programming background pursuing data engineer careers.
EDUCATION
California State University, Dominguez Hills Carson, Los Angeles, CA
M.S. in Computer Science, GPA: 3.9/4.0 Graduation Date: Dec. 2018
Awards & Honors: CSU Foundation_Edison Scholarship
SKILLS
Programming Languages: Python, Java, Scala, SQL, JavaScript
Database: MySQL, Postgres, MongoDB, HDFS, Cassandra, Elasticsearch
Big Data: Spark DataFrame/Dataset API, Spark SQL, Spark Structured Streaming, Airflow,
Hadoop, MapReduce, Kafka, Hive
Cloud Computing: Azure (Azure Function App, Event Hub, Databricks),
AWS (EC2, EMR, Kinesis, redshift,S3)
Statistics: Hypothesis Testing, Linear Regression, Naïve Bayes, Logistic Regression
Machine Learning Model: Decision Trees, K Nearest Neighbors, Random Forest,
Gradient Boosting
Machine Learning Tools: TensorFlow, Sci-kit Learn, NumPy, SciPy, Pandas
Data Visualization: Jupyter Notebook, matplotlib, seaborn, Tableau
Operating System: Linux (Ubuntu, RedHat, CentOS), Windows, MacOS
Tools: Docker, Vagrant, Git
WORK EXPERIENCES
Symantec, Business Intelligence & Telemetry Culver City, CA May. 2018 - Aug.2018
Software Engineer Intern
Project 1: Data Pipeline Visualization and Management System Design and Development (Python)
Used Apache-Airflow framework to design data pipeline workflow management system
Built and configured Airflow clusters to manage the ETL jobs and their dependencies
Developed DAG tasks to schedule and trigger ETL jobs in many different remote servers
Visualized ETL jobs running and reported failure in time
Reduced the failure rate of ETL jobs by around 30 % to get in time data for reporting and data analysis
Project 2: BIT Data Pipeline Migrating to Azure cloud (C#, Python, Scala)
Used .Net Core to implement Azure function App to trigger data ingest into Azure Event Hub message bus
Used Spark SQL, Spark DataFrame/Dataset API and Structured Streaming to implement streaming Spark applications /ETL jobs
Deployed Sparking applications in Azure Databricks clusters
Used Spark UI and Catalyst to optimize Spark job performance
YanSet Los Angeles, CA Oct. 2017- Jan.2018 Software Engineer Intern – Data Analytics
Developed and built end-to-end data processing pipelines using engineering methods, including data collection, filtering, analysis, aggregation, loading and reporting, etc.
Used analytical and statistical methods (logistic regression and Gradient Boosting classification models) to find the main top 10 features of User Retention, and provide reports to product feature team to add features to improve user retention rate
Measured views, times, clicks and other user engagement indicators, and provide solutions to improve user engagement by doubling user clicking times
PROJECTS
Real Time Twitter Streaming Analysis System (Scala) Apr. 2018 - May.2018
●Got a stream of tweets by listening to twitter app
●Used Spark streaming to extract words, filter words start with hashtag from streaming tweets
●Map the filtered words to key value pair and count them over a 30 - minute window
●Putted the structured into Cassandra database
●Deployed real time twitter streaming analysis application on EMR
Movie Recommender System (Hadoop - MapReduce) Jul. 2017 - Aug. 2017
●Used Netflix data as input to recommend movies according to the movies that users watched and rated.
●Used Collaborative Filtering algorithm for this recommender system because the number of users weighs more than number of products
●Built a user rating matrix to represent which movies were related
●Created a co-occurrence matrix to represent the relationship between different movies
●Multiplied user rating matrix and co-occurrence matrix to get a merged recommender list for users.
●Implemented MapReduce jobs in Intellij IDEA to do matrix multiplication