RAMIRO MATA Data Scientist / Machine Learning Engineer
NEW YORK, NY 11237 347-***-**** ***********@******.*****.*** linkedin.com/in/ramiro-mata I help organizations improve their products by leveraging data & explainable AI. 3 years in Data Science & ML and have acquired Data Engineering responsibilities over time. 2 years in academic research CHAINALYSIS
Data Scientist & Machine Learning Engineer New York, NY September 2017 - Present
• Designed smart filter and built data pipeline as Team Lead that will accelerate company’s data-flywheel for years to come. Project was in Top 3 company goals for Q1 & Q2 2020. Responsible for data science and engineering deployment. Airflow pipeline extracts data across 6 SQL dbs, computes a ‘trust’ metric, and delivers refined data to our main product
• Optimized XGBoost model to predict [classified] with 92% accuracy. Now used internally.
• Created Multinomial Regression model with scikit-learn with 84% accuracy to identify unknown blockchain services.
• Automated KYC & AML analytics report-making workflow. Facilitated new product/revenue (report price $30k each).
• Built end-to-end interactive visualization tool to analyze sensor data.
• Developed Data Ingestion & Data Access infrastructure for Research Platform to deliver clean graph-data from raw data with Python interface to load and visualize data across different features.
• Found anonymization vulnerabilities via data analysis of P2P network data. Resulted in new feature that is key to customers.
• Built web scrapers using Beautiful Soup and Selenium to integrate alternative data.
• Built metric that allows us to prioritize which data is most valuable to our customers.
• Designed, built and maintained Neo4J graph database optimized for entity-to-entity financial transaction queries. ALIPES CAPITAL
Deep Learning Quant Intern Copenhagen, Denmark August 2017 - August 2018
• Implemented Bayesian DL algorithms from literature and adapted them to financial trading (such as RNNs for NLP)
• Built event-driven strategy time-series Deep Learning models with Pytorch and increased benchmark accuracy by 3%
• Optimized models’ hyperparameters using Bayesian Optimization & Grid Search
• Created benchmark experiments to evaluate models' performance across different scenarios (eg across Signal-to-Noise ratios, Model Complexity) to answer questions such as: Can the model see through noise? Is the model resistant to overfitting?
• Regularly analyzed benchmark results to understand model behavior and compare new models with those in production COLUMBIA UNIVERSITY
Visiting Biogeosciences Researcher New York, NY January 2014 - August 2014
• Evaluated integrity of biogeochemical data of PLIOMAX, a $5,000,000 joint-project by Columbia and Harvard University which aims to reduce uncertainty in global sea-level forecasts. NASA ASTROBIOLOGY INSTITUTE
Undergraduate Research Intern State College, PA June 2009 - August 2009
• Used carbon cycle numerical model and biogeochemical constraints inferred from sedimentary data to understand climate evolution during the oxygenation of the atmosphere.
• Data Analysis & Machine Learning
• Bayesian Deep Learning
• Workflow Automation
• Data Wrangling & Airflow ETL Pipelines
• SQL and Graph-DBs
• Python Stack
SKILLS
WORK
TECHNICAL UNIVERSITY OF DENMARK (DTU): M.S. Applied Mathematics & Computer Science - ML Track TECHNICAL UNIVERSITY OF HAMBURG (TUHH): M.S. Environmental Engineering BROWN UNIVERSITY: B.S. Biology, Economics
Nordic Probabilistic AI School: Summer Professional Training Python, C, Git, SQL, Jira, Docker, Linux command-line, Pytorch, Pandas, Sklearn, Scipy, Numpy, NetworkX, Airflow, Tensorflow SemiSupervisedLearning BayesianDeepLearning GraphNeuralNetwork EDUCATION
TECH STACK
PROJECTS & PREPRINTS