CHENZHI TIAN
469-***-**** **** Frankford Rd, Dallas, TX 75287 ***********@*****.***
SUMMARY
Master Degree of Computer Science graduated in May 2017, seeking full-time Software Engineer positions
Professional experiences related to Oracle DB, MySQL DB, Greenplum DB, Java
In-depth knowledge of software development and system maintenance
Reliable team worker & quick learner, able to work well in a fast paced environment to meet deadlines
EDUCATION
University of Texas at Dallas, Richardson, TX
Master of Science in Computer Science, Sept 2015 - May 2017, GPA: 3.6 / 4.0
Relevant coursework: Design and Analysis of Computer Algorithms, Database Design, Statistical Methods for Data Science, Machine Learning, Cloud Computing, Big Data Management and Analytics, Information Retrieval
New York Institute of Technology, New York, NY
Bachelor of Science in Computer Science, Sept 2010 - May 2014, GPA: 3.3 / 4.0
SKILLS
Most familiar with Java and Python
Solid in data structure, traditional algorithms and machine learning algorithms
Hands-on experience with SQL, R, Scala, Matlab, Linux, Aix, Shell Script, KVM, AWS (EC2, S3, EMR)
Working experience with Oracle DB, MySQL, Greenplum DB
In-depth knowledge in big data frameworks and tools like Hadoop, Spark, MapReduce, Pig, Hive, Hbase, MongoDB
PROFESSIONAL EXPERIENCE
Oracle DBA & Greenplum DBA
Emag software Technology Co., Ltd, Nanjing, China, Jun 2014 – Jun 2015
Managed the RACs (Real Application Cluster) and Data Guard of Oracle Database on Aix.
Managed Green plum Database with more than 50 nodes.
Proficiency in deployment, backup/ recovery, data migration, database migration and SQL & Java tuning.
ACADEMIC PROJECTS
Search Engine for Olympics
A search engine website of Olympics (Java, Python, Tomcat), 2017 Spring
A group project of building a search engine based on 100,000 web pages crawled from related websites. I was responsible for incremental indexing the crawled web pages and building the relevance model.
Single-pass in-memory indexing algorithm was used to index the web pages and multiple data structures (Hashtable, binary search tree) were tested to improve the response time of the search engine.
Two relevance models were implemented: vector space relevance model based on tf-idf as well as relevance models based on PageRanking and HITS.
I also helped my teammates to apply the clustering on the web pages so that the search engine will performs better.
Subcategories Inference of Chinese Restaurant Using Yelp Dataset
Subcategories Inference based on text mining of user review (EMR, S3, scala), 2016 Fall
Used Apache Spark on Amazon EMR to work on the Yelp Dataset (stored on Amazon S3) to find Chinese restaurants which deserve their own subcategory (i.e., Szechuan or Hunan versus just "Chinese restaurants").
Several machine learning algorithms were tried to train a model using training data of 257748 review texts. Pipeline was used to set up the workflow. With One-vs-Rest classifier (Logistic Regression), a test accuracy of 65.64% was obtained on dataset consisting of 85811 review texts.
Handwritten Digits Recognition
An artificial neural network worked on MNIST dataset (Python with numpy), 2016 Spring
60000 images were used to train the model. With improvement of the feature extraction and parameter tuning, a test accuracy of 96.06% was obtained on dataset consisting of 10000 images.
PUBLICATION
Chenzhi Tian, Statement of the Applications of Database System in Information Management Chinese e-commerce,2014(01)
CERTIFICATE
Red Hat Certified Engineer
Oracle Certified Professional