Post Job Free
Sign in

Data Engineer Scientist

Location:
Jersey City, NJ
Posted:
June 25, 2025

Contact this candidate

Resume:

CINDY GAO

Jersey City, NJ 929-***-**** *****.*******.***@*****.*** linkedin.com/in/hangao-cindy https://github.com/CindyGao8 EDUCATION

Bachelor of Data Science and Mathematics, New York University, New York, NY May 2025 GPA: 3.6 / 4.00 Visa Status: Green Card Candidate Certification: AWS Certified Data Engineer Associate SKILLS

● Programming: Python, PyTorch, TensorFlow, Panda, SQL, Java, C++, NoSQL(MongoDB), MATLAB, Hive, Azure, Scala

● Machine Learning: Supervised & Unsupervised Learning, RNN, Gradient Descent, Regularization (L1/L2), GANs, Statistical Modeling, Data Warehousing, Cloud Computing, Deep Learning, Regression, KNN, BERT, Word Embeddings

● Technical Tools: ETL, AWS, Snowflake, Atlassian(Jira), Azure, Agile, HPC, Kubernetes, Database Management PROFESSIONAL EXPERIENCE

CBRE Investment Management, Data Scientist Intern, Dallas & New York June 2025 - August 2025

● Automating Excel in the Finance team and applying an LLM (LangChain) model to forecast the valuation of the property

● Conducted and tested a deep research model internal benchmark for investment management based on self-designed prompts

● Developed an AI-powered newsletter pipeline that extracts news from external websites and leverages LLMs to generate weekly summaries, distributed to the team weekly

● Designing AI tools and fine-tuning LLM to extract key information from lease documents for lease abstraction tasks Vestwell, Data Engineer Intern, New York May 2024 - August 2024

● Data ETL: Built an ETL pipeline using Python(PySpark) to aggregate 5 M+ client statements data, integrating with Snowflake and AWS Batch, reducing entire processing times by 30%, and enhancing anomaly detection capabilities

● Data Transformation: Identified data redundancy and implemented data quality assurance (unit and integration tests) with Python to transform data and add calculations, reducing 1 hr+ work time for the investment team's daily, integrated with CI/CD tools: Jenkins and GitHub, and Docker, to automate deployment and version control

● ML: Developed Random Forest Classification with 30K data points to identify high-risk transactions and achieved false positive reduction by 20%, with 0.96 R & 4.4% MAPE, resulting in 500 K+ cost savings and reduced review time

● Performed feature engineering, including missing value imputation, scaling, and PCA, and boosted model accuracy by 10%

● Data Integration: Assisted Analysts in extracting data with SQL; Integrated Snowflake datasets into Sigma for analysis

● Presented insights on fraud risk to senior management with data visualizations in Sigma using non-technical words USWOO Realty, Data Scientist Intern, New York February 2022 - April 2024

● Developed FNN with four hidden layers using TensorFlow to predict rental prices for the summer season, processing 1 M+ data points to enhance sales performance in NYC and JC

● Collaborated with brokers to understand demands and built optimized SQL databases with 22 features (rent, location, etc.)

● Trained the model with over 800K+ train-set and found parameters by using Hyperband(keras), achieving 0.85 R

● Applied AWS Lambda to model, integrating with AWS S3, reducing processing time by 30%, allowing model automation

● Optimized deal sourcing process and boosted annual client sales by $ 1 M+ by using the Tableau visualization tool Minsheng Securities, Quantitative Analysis Intern, Shanghai, China May 2023 - December 2023

● Deployed LSTM with PyTorch to forecast financial stock trends using the past 20 years of data by API, reaching 0.97 R

● Enhanced accuracy by 5.1% through sentiment analysis using NLTK; data extracted from Bloomberg News by API

● Communicated complex technical concepts effectively to non-technical stakeholders, improving client satisfaction by 20% ACADEMIC PROJECT

NYU Project, Stock Price Prediction PyTorch, LSTM, ARIMAX+ANN, Transformer September 2024 - February 2025

● Compared ARIMA+ANN(innovation from paper), LSTM, and Transformer with PyTorch to predict the stock price of IBM with macroeconomic indicators (SP&500, etc.), combined with black-scholes to improve the model

● Collected data from Alpha Vantage API, filled in missing values with Linear Interpolation, and applied differencing for non-stationarity (tested by ADF and KPSS test)

● Applied MinMaxScaler and sliding window with a 7-day window for short-term prediction, achieving the best model RMSE of 6.69 CBRE Datathon, Forecast Warehouse Completion Rates in Atlanta PyTorch, ResNet, CNN April 2024

● Developed an image classification model in ResNet architecture, leveraging Data Augmentation techniques

● Trained the model on a GPU with Adam Optimizer and Cross Entropy Loss, achieving efficient forward/backward propagation and a significant reduction in training time.

● Visualized performance metrics using Matplotlib and fine-tuned with Hyperband for enhanced accuracy of 12%



Contact this candidate