Post Job Free
Sign in

Data Scientist & Machine Learning Engineer

Location:
Brooklyn, NY
Posted:
July 09, 2020

Contact this candidate

Resume:

RAMIRO MATA Data Scientist / Machine Learning Engineer

NEW YORK, NY 11237 347-***-**** ***********@******.*****.*** linkedin.com/in/ramiro-mata I help organizations improve their products by leveraging data & explainable AI. 3 years in Data Science & ML and have acquired Data Engineering responsibilities over time. 2 years in academic research CHAINALYSIS

Data Scientist & Machine Learning Engineer New York, NY September 2017 - Present

• Designed smart filter and built data pipeline as Team Lead that will accelerate company’s data-flywheel for years to come. Project was in Top 3 company goals for Q1 & Q2 2020. Responsible for data science and engineering deployment. Airflow pipeline extracts data across 6 SQL dbs, computes a ‘trust’ metric, and delivers refined data to our main product

• Optimized XGBoost model to predict [classified] with 92% accuracy. Now used internally.

• Created Multinomial Regression model with scikit-learn with 84% accuracy to identify unknown blockchain services.

• Automated KYC & AML analytics report-making workflow. Facilitated new product/revenue (report price $30k each).

• Built end-to-end interactive visualization tool to analyze sensor data.

• Developed Data Ingestion & Data Access infrastructure for Research Platform to deliver clean graph-data from raw data with Python interface to load and visualize data across different features.

• Found anonymization vulnerabilities via data analysis of P2P network data. Resulted in new feature that is key to customers.

• Built web scrapers using Beautiful Soup and Selenium to integrate alternative data.

• Built metric that allows us to prioritize which data is most valuable to our customers.

• Designed, built and maintained Neo4J graph database optimized for entity-to-entity financial transaction queries. ALIPES CAPITAL

Deep Learning Quant Intern Copenhagen, Denmark August 2017 - August 2018

• Implemented Bayesian DL algorithms from literature and adapted them to financial trading (such as RNNs for NLP)

• Built event-driven strategy time-series Deep Learning models with Pytorch and increased benchmark accuracy by 3%

• Optimized models’ hyperparameters using Bayesian Optimization & Grid Search

• Created benchmark experiments to evaluate models' performance across different scenarios (eg across Signal-to-Noise ratios, Model Complexity) to answer questions such as: Can the model see through noise? Is the model resistant to overfitting?

• Regularly analyzed benchmark results to understand model behavior and compare new models with those in production COLUMBIA UNIVERSITY

Visiting Biogeosciences Researcher New York, NY January 2014 - August 2014

• Evaluated integrity of biogeochemical data of PLIOMAX, a $5,000,000 joint-project by Columbia and Harvard University which aims to reduce uncertainty in global sea-level forecasts. NASA ASTROBIOLOGY INSTITUTE

Undergraduate Research Intern State College, PA June 2009 - August 2009

• Used carbon cycle numerical model and biogeochemical constraints inferred from sedimentary data to understand climate evolution during the oxygenation of the atmosphere.

• Data Analysis & Machine Learning

• Bayesian Deep Learning

• Workflow Automation

• Data Wrangling & Airflow ETL Pipelines

• SQL and Graph-DBs

• Python Stack

SKILLS

WORK

TECHNICAL UNIVERSITY OF DENMARK (DTU): M.S. Applied Mathematics & Computer Science - ML Track TECHNICAL UNIVERSITY OF HAMBURG (TUHH): M.S. Environmental Engineering BROWN UNIVERSITY: B.S. Biology, Economics

Nordic Probabilistic AI School: Summer Professional Training Python, C, Git, SQL, Jira, Docker, Linux command-line, Pytorch, Pandas, Sklearn, Scipy, Numpy, NetworkX, Airflow, Tensorflow SemiSupervisedLearning BayesianDeepLearning GraphNeuralNetwork EDUCATION

TECH STACK

PROJECTS & PREPRINTS



Contact this candidate