Post Job Free

Resume

Sign in

United States Data Science

Location:
New York, NY
Posted:
November 09, 2023

Contact this candidate

Resume:

Jun Kong

**-** ****** ******, ** ***** • ad0zqf@r.postjobfree.com • 929-***-****

EDUCATION

Columbia University New York, United States

Master of Arts in Statistics, Data Science Track 09/2021 - 02/2023 University of Missouri – Columbia Missouri, United States Bachelor of Science in Mathematics and Statistics, Dean’s List, GPA 3.8/4.0 09/2015 - 05/2019 WORK EXPERIENCE

Columbia University New York, United States

Associate II in Statistics 09/2023-Present

Applied clustering technique to student solutions (Python) to detect potential plagiarism of graduate students. Performed data consolidation and cross-check for over 15000 solution pairs, identified 3 plagiarism cases and maintained academic integrity.

Utilized Tableau to analyze class performance metrics, conducting comprehensive evaluation of students' performance. Identified knowledge gaps and weak areas. Reported to professor for targeted instructional improvements, resulted in 10% increase in the overall class average scores.

Working as a teaching assistant; leading discussion sections; conducted 3 weekly virtual office hours via Zoom, addressing over 100 student queries from GR 5204 Statistical Inference and GR 5221 Times Series throughout the semester, leading to improved student satisfaction rates. Ranial Systems Inc. New York, United States

Data Scientist 11/2022 – 10/2023

Integrated IoT sensor data from power plants with external weather forecasts from 20+ websites; designed ETL workflows using R, Python and SQL to perform data cleaning from messy data and feature engineering to identify key variable and increase data integrity.

Visualized over 2 million records to reveal seasonal trends and uncover temporal patterns in energy generation. Build interactive dashboards

(PowerBI) that display the impact of weather variables on energy output. Reduced data analysis time by 30% and enabled quick identification of energy inefficiencies, resulting in a 15% reduction in operational costs.

Optimized solar generation forecasts for six inverters using LSTM (Keras) and XGBoost models. Implemented cross-validation techniques with XGBoost to optimize hyperparameters, achieving standout MSE of 1.35. Enhanced predictive accuracy by 25%.

Simulated Federated Learning system using Flower; implemented LSTM (Tensorflow) with FedAvg and FedSGD methods, enhancing resource monitoring and allocation, enabling more efficient use of energy resources, and reducing downtime during peak demand periods.

Conducted extensive R&D to implement the cutting-edge Solar Smoothing Ramping algorithm in Python, outlined in recent publications. Streamlined data management processes. Led to a 20% increase in energy storage system utilization, resulting in substantial cost savings.

Partnered with a five-member multinational team to develop a real-time AI-based Intelligent Energy Management System (EMS) using Double DQN (DDQN). Debugged and maintained on AWS EC2, AI decision-maker outperformed traditional random strategies, delivering a notable 17% increase in grid stability and reliability. Met critical business needs by reducing energy costs by 15%, improving operational efficiency, and enhancing integration of renewable and traditional energy sources in hybrid systems. BAKH Architecture New York, United States

Data Analyst Internship 07/2022 – 08/2022

Automated and optimized daily contracts, budgets, and schedules using VBA in Excel. Reduced contract processing time by 35%, saving an average of 7 hours per week. Implemented Trello to track project milestones, resulting in a 20% reduction in project delays.

Analyzed market trends in Tableau and identified a 10% increase in demand for certain property types. Provided data-driven insights that increased client confidence, leading to 12% higher conversion rate in closing deals.

Used R to analyze facade perception surveys and extract customer behaviors. Collaborated with product team to integrate findings into 3D virtual tours and floor plans, contributing to more effective marketing strategies and 15% increase in user satisfaction in Chinese market.

Revamped the company's dashboard in Tableau, resulting in 15% increase in user engagement and data accessibility. Regularly provided timely and relevant business intelligence reports that reduced decision-making time. Contributed to securing three long-term partnerships with property developers, resulting in projected revenue increase of $1.5 million over next year. Wuhan Planning & Design Institute Wuhan, China

Data Analyst Internship 11/2019 – 09/2021

Retrieved web traffic from Ctrip, using Beautiful Soup in Python. Utilized Pandas to handle missing data, apply regular expressions for data extraction, and engineer new features for analysis. Yielded over 200,000 cleaned records. Increased data accuracy by 8%.

Contributed to the "Wuhan City Tourism Index Analysis" research: Amassed over 100 pertinent documents from both domestic and international origins, performed statistical analysis A/B test to help research explorations, and autonomously completed 30% of comprehensive report.

Collaborated closely with business product team to extract and analyze customer behavior and transaction data from MySQL databases, resulting in the generation of 5 monthly Excel reports. Designed interactive Tableau dashboards with drilldown functionality and integrated land structures using ArcGIS and CAD for the 'Hankou Historical and Cultural Area Implementation Project Repository’. PROJECT EXPERIENCE

Machine Learning – All Lending Club Loan Eligibility Prediction 05/2023 – 07/2023

Engineered and analyzed diverse datasets to identify key factors influencing loan approval, conducting thorough data preprocessing to ensure model accuracy. Implemented feature selection techniques to optimize the dataset, resulting in improved model performance and interpretability.

Designed and implemented a random forest model (Scikit-learn), achieving an impressive 87% accuracy and 0.66 recall score, outperforming other models such as Logistic Regression, and Neural Network models. Optimized the model by tuning hyperparameters and applying ensemble methods, resulting in robust and reliable loan approval process systems.

Presented findings to tech stakeholders using tools like Matplotlib and Seaborn to interpret the results and influence data-driven decisions. Machine Learning – Credit Card Fraud Transaction Detection 08/2023 – 10/2023

Conducted comprehensive data analysis on a dataset of over 2 million credit card transactions to identify patterns and anomalies. Applied data cleaning, feature engineering, and normalization techniques to prepare data for model training.

Developed a Generative Adversarial Network (GAN) model to address the class imbalance by generating synthetic but realistic credit card transaction data, dynamically adapts to new transaction patterns, reducing false positives and improving fraud detection rates by 25%.

Led team of 4 including data engineers, security experts, and business analysts to focus on iterative improvements and model retraining strategies. SKILLS

Technical Skills: R (ggplot2, tidyverse, caret), Python (NumPy, Pandas, Matplotlib, Sklearn, PyTorch, Tensorflow, PySpark), mySQL, Microsoft Access, and SAS

Visualization Skills: PowerBI, Tableau, Photoshop, ArcGIS, and CAD



Contact this candidate