Machine Learning Social Media

Location:

Toronto, ON, Canada

Posted:

November 16, 2023

Contact this candidate

Resume:

Dr. Guichong Li

SECURITY CLEARANCE

SECRET

EXPERIENCE SUMMARY

The main research focusing on Bayes learning; develop multiple layer Bayes learning/Bayesian statistics; high level Bayesian variable learning;

Latest research results for applications in anomaly traffic detection in CAN workflows; high level feature extraction using Bayesian variables; object detection and action classification;

Effectively designed and implemented machine learning algorithms for churn prediction and Add a Line analysis and machine learning pipelines for telecom business analysis.

Developed advanced machine learning algorithms for text, POS outlet items, category hierarchy classification; multilabel and multitask classification algorithms/text classification/NLP;

Developed advanced machine learning regression/classification algorithms for food component analysis using chemometrics and spectroscopy;

Recent postdoctoral research on uniformly and unbiased sampling/crawling online social networks using advanced Markov Chain Monte Carlo techniques; developed an innovative sampling algorithm, a new coupling technique, implemented by Ruby and Rails and Twitter API, DataMapper; Unix/Linux, Amazon EC2; social media analysis using Python, NLTK, SkLearn.

Previous postdoctoral research in DRDC, CORA, Canada, for Complex Dynamic Network Analysis; Simulation of Autonomous Underwater Vehicles; using MatLab and VB, etc.

Two year research contract in Health Canada for nuclear explosion and pollution monitoring, and environmental anomaly detection; using J# and Weka (Java) software package, Eclipse.

The main research interest focusses on Machine Learning and Data Mining algorithms and technology; in particular, one-class learning using kernel methods for anomaly detection and its application on big data using MapReduce/Hadoop with Pig/Hive/HBase, and advanced Markov Chain Monte Carlo techniques for fast and unbiased sampling/crawling online social networks such as Twitter and Facebook.

Having both Mathematics and Computer Science education backgrounds; 10 year professional experience for software development, artificial intelligence algorithm design, and leadership for transaction and database applications using SQL Server, ORACLE, C/C++, Java/J#, VB, JDBC, .Net., PowerBuilder, TCP/IP, OpenGL;

Research work on information security using anomaly detection techniques on Web server such as WebSphere/Weblogic with Spring, Swing, EJB, AJAX, SOAP; HTML5, NoSQL;

5 years big data flatform: spark/pypark/java/scala, Hadoop, Hive, NoSQL, Ab Initio;

EDUCATION

PhD of Computer Science, University of Ottawa, 2010

Master of Computer science, University of Regina, 2004

Bachelor of Mathematics, Southwest China Normal University, 1985

Guichong Li

Dates

Project

Aug, 2022-

AI developer/Python programmer, IRCC, Altis

Project I: IBM SPSS python conversion. Implemented end-to-end python pipeline with spark on AWS cloud for the original SPSS stream models. The main tasks include data collection, spss stream python conversion for spss type, filter, select, filler, merge nodes, aggregate nodes, flag nodes supernodes, cache, statistical outputs, with pyspark with spark on AWS cloud platform; unit tests and spark storage plan for optimization. Further, including techniques to transfer to Google Cloud Platform (GCP) services with BigQuery, DataFlow, Pub/Sub, BigTable, Data Fusion, DataProc, Cloud Composer, Cloud SQL, Compute Engine, Cloud Functions, and App Engine; BigQueryML, AutoML, Vertex AI

Technology:

Pandas, Pyspark(filter, count, columns, drop, withcolumrename), spark, spark, sql(to_date, col, lit, when, trim, substring, etc); spss (stream, filter node, filler node, type node, marge node, aggregate node, etc); GCP: BigQueryML, AutoML, Vertex AI.

Project II: fraud detection for financial applications. Implement and develop python tools and platform to automatically extract handwriting and signature, signature verification, handwritten and printed text classification, from docs and check images, using deep meaning models and image preprocessing methods, thresholding, connected component analysis; object detection and segmentation; remove noise from background; template matching, text extraction using keras ocr, tesseract, easyocr.

Jan, 2022-July, 2022

Data Scientist/Machine learning engineer, CN rail / SpruceInfotech

Developed and implemented Python scripts for data parsing, data imputation, and data encoding using sklearn, pandas. Developed Python scripts to train and build models and to run tests to evaluate system performance of AI solutions, using sklearn, pandas; developed and implemented python scripts to AI solutions for forecast modeling and regression modeling and classification modeling; developed and implemented python scripts for log time series analysis;

Developed MLflow for model tracking, training, logging, registration, inference, hyperopt/parameter sweep. Univariate/multivariate forecasting and regression

Analyzed and validated business requirements and review of solutions with relevant stakeholders; the technical report for MLFlow project development and production solution for azure cloud AI solution, and anomaly detection and sentinel; the research report for improvement of forecast models for rail transportation with Azure Devops and Databricks

Technology:

Pyspark, pandas, sklearn, MLflow, Azure databricks, Devops

Sept 2019- Dec, 2021

Data scientist, Solana networks

Project: Data Management, Labelling and Automation for Machine Learning; Activity Recognition and Hierarchical Labelling; OpenPose and HumanActionClassification platform

Context: image/video datasets downloaded from publicly-available sources or created by using annotation tools, and parsing and importing the data items into the MongoDB database (both image and video content). A tool supports the creation of training data collections in the database, subject to different criteria such as ImageNet, COCO, StanfordDrone. Finally, implement training set creation; and active learning and experimentation; reporting the results of a human pose estimation and activity recognition prototype solution by building on existing open source packages. validation with TIDE and PyBrisque; Deep learning and ML research on the 3-body problem in Comology

Technology and tools: Python, Pytorch, Tensorflow/CPUs and GPUs, ImageNet, COCO, StanfordDrone, VoTT, LabelMe annotation, boundingbox, segmentation, active learning, OpenPose, Yolov3, Resnet, HumanActionClassification, TIDE, PyBrisqueTechnology and tools: Python, Pytorch, Tensorflow, ImageNet, COCO, StanfordDrone, VoTT, LabelMe annotation, boundingbox, segmentation, active learning, OpenPose, HumanActionClassification, TIDE, PyBrisque

Project: Anomaly Detection for In-Vehicle Networks

Context: anomaly traffics (attacks) might be found in automotive bus system (CAN) which transmits signals between electronic control units (ECUs) as well as wireless interfaces such as GSM and Bluetooth. Attackers try to access the automotive network in order to inject messages, manipulate data or access confidential information. The task is to develop advanced ML algorithms to detect four attacks such as DoS, Fuzzy, Replay, Impersonation attacks in real-time with a low false positive rate and a high recall rate; implementation and testing python scripts/API.

Tools: python, Pandas, scikit-learn, statistical models, Spark/Hadoop, scala/pyspark, Hive, MS sentinel

Project: ML design for threat & intent detection system

Context: intrusion detection systems (IDSs) are used for network security by reporting network attacks and triggering alerts. The issue is that IDSs have been observed to trigger thousands of alerts per day, and thus high false positive rate. This makes it extremely difficult for the analyst to correctly identify the true positives. ML supervised learning methods have been successfully applied for network security by reducing false positives raised by IDSs. Further, the task also aims to intent attacks. For example, periodic network traffics might imply potential adversarial behaviors from botnets, which later launch network attacks. Therefore, Intent detection can be achieved by developing start of the art periodicity detection techniques. Research on anomaly detection on Cloud platform such as AWS with AWS Sagemaker

Technology: Kubernetes, Elasticsearch, Python, Pandas, scikit-learn, Naïve Bayes, PCA, randomforest, decision tree, clustering; cosine distance, segment distance, FFT, fuzzy naming, multihop ssh, Spark/Hadoop, scala/pyspark, Hive, python design patterns: global object, prebound, sentinel, creational, structural, behavioral patterns; AWS Sagemaker, AWS Glue, AWS Cloudwatch, Random Cut Forest; word and excel.

Project: ML techniques for improvement of ChatBot

Context: investigate cut edge ML techniques for sentimental analysis and intent classification for improvement of ChatBot. Techniques details: Entity Recognition, Stemming, lemmatization, vectorization for feature normalization and extraction. NLP package/tools: Spacy, NLTK, scikit-learn, Gensim. NLP pretrained models: Bert, OpenAI-GPT, ELMO, ALBERT.

Jan, 2021- Mar, 2021

AI developer, CRA Canada

Project: AWS architecture for QnABot development and deployment

Context: create and deploy QnABot with AWS Lex and AWS Alexa and ElasticSearch and Kendra and CloudWatch. Using CloudFormation previsions and manage stacks and resources specified in template code. Building responseBots with intents and slottype and fulfilment Lambda functions. Using AWS CDK and CloudFormation for infrastructure as code; CodePipeline for continuous delivery and CodeBuild to create AWS projects. Tasks also include model training and deployment in AWS; Deploy anomaly detection models to AWS for ClouldWatch and SageMaker service; AWS EKS service. Azure VM; ML modeling implementation and testing python scripts/API; GCP, etc

Tools: AWS, Boto3, Python, Node.js, java/javascript, angularJS, PyAthena, Boto3, S3, SQL-pandas, AWS data lake, CloudWatch, SageMaker, AWS Glue, Snowflake, Random Cut Forest (RCF), RecordIO Protobuf format, SageMaker.deploy; word and excel, csv_serializer, json_deserializer, eksctl, Fkubectl, Apache Flink; sentinel, Azure VM, workplace, Machine learning VM, GPU Cluster

June 2019- Nov 2019

Project2: Improvement of forecasting models by machine learning algorithm design with R; conduct research and analysis for pricing prediction using forecast models; visualization using shiny platform design for business analysis, (3 months),

Client: BigR.io LLC and JM Smucker

Subject: Improvement of forecasting models by machine learning algorithm design with R

Context: the data scientist team in JM Smucker has made much effort to build forecasting models to predict price and unit sales and build a shiny analysis flatform to help sale management. The challenge is how to improve forecast models while training sample contains much missing values and noise. As a result, forecasting models are subject to overestimation for prediction of price and unit sale amount, and thus suffer poor performance for deployment. Model implementation and testing python scripts/API.

Tools: python, pandas, scikit-learn, R, Shiny, and R ML package; java, AngularJS, Kubernetes, Elasticsearch.

Tasks and technical details:

Built forecasting models using Linear regression PCR and nonlinear regression models such as randomforest; combine with PCA for machine learning pipeline

Developed a new learning architecture based on Bayesian learning; combine Bayesian learning and PCA;

Developed a new algorithm for improvement of forecasting models by learning Bayesian variables using Bayesian methods;

Research on pricing prediction using forecasting models

Developed methods and experiments to create training sample for complex learning tasks including pricing predictions

Oct 2018- March 2019

Principal Machine Learning scientist/Engineer, BigR.io LLC

Project1: Predictive maintenance machine learning system in Porsche of Volkswagen Group, Stuttgart, German (6 months)

Client: BigR.io and Porsche of Volkswagen Group

Subject: Predictive maintenance using machine learning techniques.

Context: For maintenance in automobile industry, each vehicle generally goes to workshop for maintenance every six months. The issue is that some parts might be failed before next six month term for maintenance. These exceptional failures might lead to huge loss for customers as well as enterprises. In particular, the air spring in Porsche sport car is vulnerable to bad driving condition such as cold weather. People in automobile industries just wonder if machine learning techniques can be efficient and effective methods to predict maintenance issue such that mechanics can repair those flaw parts which might be damaged before the next maintenance. The same technology can be used to serve military equipment such as worship maintenance. Model implementation and testing python scripts/API.

Tools: python, pandas, scikit-learn

Tasks and technical details:

Develop advanced AI technology for predictive maintenance with Porsche of Volkswagen Group;

Machine learning pipeline: automate data imputation and data encoding; stability selection/recursive feature elimination; creating training data for predictive maintenance using XML parser

Recent research: mixture of Bayes with gaussian and kernel density

Develop mixture of Bayes algorithm, Python, scikit-learn, Bayes Theorem

Mar 2018-Aug 2018

Senior data scientist, FreedomMobile, Toronto

Projects: Rebuild five SAS models for Churn Prediction; design and train Add a Line prediction models, (6 months)

Client: Procom and FreedomMobile

Subject: Rebuild five SAS models for Churn Prediction; design and train new models for Add a Line prediction; in addition, conduct prototype design for forecasting models

Context: the data science team in FreedomMobile has built several SAS models for churn prediction for telecom business analysis using SAS tools; the challenge is that existing models suffer poor performance with low gini scores around and below 0.6 for churn prediction. It is required to rebuild existing SAS models using python and pyspark and zeppelin on spark with Hadoop cloud platform from generated training sample transformed from SAS database. Additional tasks are to build new models for Add a Line prediction and sale forecasting analysis.

Tools: python, pyspark, sciki-learn, MLib, pandas, zeppelin, Jupyter; other tools including tensorflow, keras, theano scikit-flow for deep learning; Django for web analysis.

Tasks and technical details:

The main research focusing on Bayes learning algorithms; published the recent work on Data Science and Big Data Analytics(DSBDA) 2018;

Design and implement Customer Churn Analysis using machine learning technology such as LogisticRegression, RidgeClassifier, RBF SVM, Neural Network, TensorDNN, PCA, Machine learning pipeline for data imputation and categorical encoding and algorithmic learning; various ML open source scikit-learn, Tensorflow, Keras, Theano, Django, R, SAS, etc;

Developed effective methods for data imputation and unseen missed values for categorical encoding;

Design and implement Add a Line for Telecom business analysis using Python, Pandas, SAS, Zeppelin Notebook, Spark/pyspark/Scala, Kafka, MLLib, Hadoop, Django, MatLab/Simulink, Micro services;

Prototype design for forecasting model analysis for item logistics

Apr 2015- Dec 2017

Principal Machine learning data scientist, NPD GROUP, Port Washington, NY

Projects: Walmart Harmony: POS data category hierarchical classification; second proj: Automated Classification

Client: NPD Group

Subject: Walmart Harmony: POS data category hierarchical classification

Context: NPD Group runs market research business which provides market analysis for retailers and product manufacturers by classification of retailer POS data over US market and Europe and Asia and workwide markets, into 45 business categories. The company has introduced ML techniques for automated classification. Due to hierarchical categories, how to do categorical hierarchical classification is still a technical challenge. Previous methods suffer poor performance and lead to high cost for demand of manual classification. After initial effort for investment on research and development, NPD Group decided to build a data science team and develop their own ML application for retailer POS category hierarchical classification system. Especially, in Walmart Harmony, NPD Group first built a ML system for category hierarchical classification with Walmart retailer POS data for Walmart company. Besides Walmart Harmony project, the second project is Automated classification, which is the ML application for retailer POS data classification for NPD Group

Tools: python, pandas, scikit-learn; and other open source such as tensorflow, keras, scikit-flow; NLTK

Tasks and technical details:

Work on machine learning algorithm research on Naive Bayes and randomization propagation;

Build two machine learning classification systems for large-scale POS data automatic classification;

Develop machine learning algorithms for hierarchical category classification, multi-label/multi-task classification;

Deep learning algorithms for text classification on retailer pos data such as Convolution Neural Network (CNN) and Word Embedding;

Published a novel statistical coupling technique: conditional independence coupling of Markov Chains, for sampling graphical networks, instead of traditional MCMC techniques.

Technology:

Lasso multitasks; logistic regression; Multinomial/Bernoulli Naive Bayes for text classification/NLP; CNN, word Embedding ; NLTK and Pandas for NLP; Naive Bayes incremental learning; PCA, clustering techniques for feature extraction; Scikit learn package, cx_Oracle and database application using Python, Pandas Hadoop, Spark with Pyspark/Scala, Hive, Hadoop; Caffe, Tensorflow, Keras, Django, R, Tableau, Ab Initio;

k-nearest neighbor, decision tree, Bayesian classification, support vector machine, neural networks, genetic algorithms, self-organizing feature maps, etc.

Dec 2014 - Mar 2015

Visit researcher, Sprott school of business

Project: big data for supply chain analysis and management, (4 months)

Client: sprott school of business

Subject: big data for supply chain analysis and management

Objective: big data platform for supply chain network analysis

Scope: analyze products and supply chain information on web for Canada product and manufacturers; build big data platform for supply chain management which help research projects in university

Tools: Java/javascript, python, web crawling library

Project periods: starting Dec, 2014 to March, 2015

Exact dates: Dec 1, 2015, to March 10, 2015

Role, visit researcher

Tasks and technical details:

Big data analytics and supply chain management; web crawling for large supply chain networks; graphic and network analysis; Statistical Markov Chain Monte Carlo; parallel MCMC for web crawling; develop advanced techniques for community detection/security;

Semantic web using RDF, Schema, microdata, ANY23; big data analytic using AWS EC2, S3, SQS;

Technology:

Web crawling methods; statistical Markov Chain Monte Carlo; Python; Weka, R, java/javascript, C++, ruby; AWS EC2, S3, SQS, ANY23, RDF, n-quads;

Informatics/ BDM/IDL/EIC/IDQ, Cassandra, Hadoop, Neo4j.

Sept 2014 –Nov 2014

Senior Machine Learning Specialist, TellSpec, Toronto

Project: food ingredient detection using machine learning techniques (3 months)

Client: TellSpec

Subject: food ingredient detection using machine learning techniques

Context: TellSpec is a Biotechnology company, which has developed a hand device to scan food surface and obtain spectral data about food ingredients. The data science team in Tellspec applied ML techniques to build models to identify 25 food ingredients such as eggs, sugar, protein, etc. This new technique may help people eating health and safe by eating right food and avoiding harmful and allergic ingredients. ML challenge is how to efficiently and effectively build ML models for multiple targets with an acceptable accuracy, which meets the strict requirement while the obtained performance is below the expectation for practical application. This is because training data sampled from practical environments might be subject to variance and noise.

Objectives: design and build robust ML algorithms for food ingredient detection for 25 targets

Scope: prototype design and implement ML algorithms given training sample collected from different brand bread sample in real environment.

Tools: python, pandas, scikit-learn,

Project period: Sept 12, 2014 to Nov 30, 2014

Role: senior machine learning specialist

Task and technical details:

Food component analysis: macronutrients (calories, carbo, proteins) and sugars( fructose, maltose, sucrose) using machine learning for chemometrics;

Developed advanced regression/classification machine learning algorithm for precisely predicting/detecting food ingredients;

Linear regression/generalized linear regression; PCA/PLS; Stacked Regression/Stacked PCA/PLS; Lasso L1/L2; feature extraction, using weka and Python, scikit-learning;

Technology:

Generalized Linear regression, PCA/PLS2/PLS1, Stacked Regression, Lasso L1/L2; Semi-supervised/supervised learning;

Weka, Python, scikit-learning, MatLab: statistics and machine learning toolboxes, neural networks, PCA, algorithms: k-nearest neighbor, decision tree, Bayesian classification, support vector machine, genetic algorithms, self-organizing feature maps

Spark/pyspark/Java/scala; Hive, Hadoop, NoSQL, MongoDB

Jul 2013– Aug 2014

Freelance data analytic consulting with GIRIH, Ottawa

Project: sampling graphical networks by conditional independence coupling of markov chains (13 months)

Work on Markov Chain Monte Carlo for scalable parallel computation for big data analysis; and MapReduce/Hadoop;

Work on anomaly detection techniques using kernel decomposition and Maximum Likelihood Estimation using the statistical package, R., Weka, RapidMiner;

Sentiment analysis using Weka and MatLab

Technology:

Pajek, networkx, Neo4j for social network analysis;

Kernel learning; Maximum Likelihood Estimation; One-class Support Vector Machine;

Markov Chain Monte Carlo, MapReduce/Hadoop;

Java/javascript, Weka, R, Java, C++, SQL/Oracle/Hive, MatLab: events and listener, control system, Fuzzy logic, image processing, neural networks, statistics and machine learning toolboxes, etc

Jun 2012 – Jun 2013

Postdoctoral Researcher, Computer Science of University of Ottawa

Project: sampling social networks using new markov chain monte carlo techniques (12 months)

Client: Girih company

Contact: Nathalie Japkowicz <********@********.***>

Context: local social media company Girih launched a research project to develop a new sampling technique for sampling social networks such as facebook and twitter; the resulting sample will be used for social network analysis. The previous sampling algorithms such as random walk, Metropolis-Hasting MCMC produce bias sample with much redundant and duplicate.

Objectives: research and develop new network sampling techniques for uniformly and unbiased sampling social networks

Scope: research and develop new network sampling algorithms using MCMC methods especially, coupling techniques, for uniformly and unbiased sampling social networks such as facebook and other types social networks.

Role: postdoc researcher

Project period: (12 months) from June, 2012 to June 2013

Task and technical details:

Developed a new algorithm for uniformly and unbiased sampling online social networks instead of traditional methods in applied statistics; the method overcomes the drawbacks such as a slow mixing time and biased results in traditional methods.

Supported by NSERC Engage Grant and SME4SME Grant as a sole researcher; developed an innovative and unique algorithm which applies advanced Markov Chain Monte Carlo methods such as coupling techniques for sampling large graph networks;

Extended traditional coupling algorithms such as perfect sampling; conducted experiments by sampling online social networks such as Twitter and small social networks; results show that the algorithm is extremely efficient to produces unbiased samples.

The algorithm implementation using Ruby, Twitter API, DataMapper, SQLite, MySQL, PostGres; running environment: Unix/Linux, Amazon EC2; designed a web application using Rails with MVC pattern for demonstration;

Performed social network analysis such as degree distribution, Centrality, Clustering coefficient, community detection. Initial research results have been published in IEEE ICDM International Workshop on Data Mining in Network, 2012.

Obtained a US patent for the initial research result as the original inventor.

Applied to Community detection and social media analysis using Python for NLP such as NLTK, SKLearn; RapidMiner, C++, R; tasks for text classification, sentiment analysis, term extraction.

Technology:

Advanced Markov Chain Monte Carlo (MCMC) methods and Coupling technique in applied statistics; Random Walk and Metropolis-Hastings algorithms; various convergence diagnosis methods such as Geweke Diagnostic; uniformly and unbiased sampling; online social networks;

Pajek, netowrkx, Neo4j, Ruby, Ruby and Rails, MVC pattern, Twitter API, DataMapper, SQLite, MySQL, PostGres;

Unix/Linux, Amazon Elastic Compute Cloud (Amazon EC2), EMR;

Python, NLTK, SKLearn, RapidMiner, C++, R; text classification, sentiment analysis, feature selection, term extraction.

Dec 2010 – Dec 2011

Postdoctoral Researcher, DRDC

Project: simulation of autonomous underwater vehicle, (12 months)

Client: DRDC

Contact: Nguyen, Bao <***.******@****-****.**.**>

Subject: simulation of autonomous underwater vehicle

Context: improve and upgrade tools for simulation of autonomous underwater vehicle. The project is to add new functionality into the previous developed tool written by VB; the result of the research project helps AUV optimal operation for mine detection under seabed.

Objectives: add new functionality such as different operation patterns of autonomous underwater vehicle and investigate and analyze the performance and accuracy improvement with new operation patterns

Scope: improve and upgrade VB tools for simulation of autonomous underwater vehicle with new operation patterns

Role: visit postdoc researcher

Tools: VB and MatLab

Project period: (12 months) starting from Dec 2010 to ending at Dec 2011

Level of effort: independent researcher under supervisor Bao Nguyen. In the meantime, my another research focuses on ML algorithms for anomaly detection using SVM algorithms; ML for web security and complex network analysis; published research paper on Canadian AI 2012.

Task and technical details:

Engaged in the development of the software tool for simulation of Autonomous Underwater Vehicles using MatLab and VB; Involved in research on Complex Dynamic Network Analysis; simulation using MatLab;

Utilized various artificial intelligence algorithms such as Genetic Algorithm (GA) in MatLab for simulating and computing the shorted path with the lowest cost in complex networks.

Implemented the Box-Muller algorithm to simulate mine distributions on the seabed as normal distributions; implemented algorithms to dynamically demonstrate the manipulation of Autonomous Underwater Vehicle (AUV).

Performed complex dynamic network analysis using various tools such as SNAP and Pajek; knowledge of the small world effect, degree distribution, degree correlation, centrality, clustering coefficient, community detection.

Developed a new fraud/anomaly detection algorithm using Java/javascript (J#/Eclipse), weka for machine learning, kernel methods, ensemble learning, one class learning; improved the traditional One-Class SVM algorithm implemented in LibSVM in Weka (a Java package for data mining and machine learning algorithms). This research work was published in Canadian Artificial Intelligence (AI) in 2012.

Designed one-class Naive Bayes algorithm for anomaly detection in big data using MapReduce/Hadoop, with Pig/Hive/HBase;

Also used anomaly detection techniques for information security by analyzing web log files and data transfer on WebSphere/WebLogic; EJB, AJAX, SOAP, XML, HTML5, NoSQL.

Technology:

Anomaly detection techniques, one-class learning algorithm, kernel methods, support vector machine algorithm;

Social networks; social network analysis; small world effect; degree distribution, community detection;

Big data; MapReduce/Hadoop, Pig/Hive/HBase, MatLab, VB, social network tools and package: SNAP and Pajek, Java(J#/Eclipse), SVM, Weka; EJB, WebSphere/Weblogic, AJAX, SOAP, XML, HTML5, NoSQL.

Sept 1987-Feb 2001

Instructor, Computer science of Zhengzhou University

Employed in Department of Computer Science, Zhengzhou/HuangHe University;

Teaching courses including RDBMS; C/C++;

Jul 1993-Feb 2001

Software Engineer, INSTITUTE OF COMPUTER RESEARCH AND APPLICATION, ZHENG ZHOU, HENAN, CHINA

Project: 1) Future Transaction Remote platform; 2) Accounting System (84 monts)

Context: develop commercial software for financial industrial application

Role: leader, software engineer

Tools: Powerbuilder, VB, C++, Oracle database, MySQl

Project period: starting from July 1993 to ending at Feb 2001

Tasks and technical details:

A part-time position; I was responsible of software development for practical applications;

As a project leader, I was involved in developing Remote Exchange System for future trade, mainly using TCP/IP, C/C++, SQL server, Unix/Linux, 1999 – 2001;

As a sole developer, I developed Accounting System for future trade, mainly using PowerBuilder and SQL server, 1994 - 1998; I designed and implemented and tested

Contact this candidate