Post Job Free

Resume

Sign in

Data Scientist Event Manager

Location:
Westborough, MA
Posted:
December 12, 2022

Contact this candidate

Resume:

Alexander O.

Central MA

Mobile: 775-***-**** ·

adt0hi@r.postjobfree.com

Summary background:

Development of smart contracts for Ethereum blockchain utilizing Truffle, Solidity. Parsing Ethereum chain with Go to collect recent ERC20, ERC721 contracts addresses for analysis.

Building deep learning models for image generation, processing, classification, semantic search engines, natural language processing (NLP) systems for text summarizing, document cross-referencing, document enrichment, text simplification. Working with NLP oriented libraries like Gensim, NLTK for language pre-processing, Application of word embedding and document embedding using NLP models like Word2Vec, Doc2Vec, BERT, RoBERTa sentiment analysis, assessment of document similarity and with Keras deep learning library. Developing an information retrieval system based on NLP/Machine Learning technology to enhance search beyond primitive key word search. Utilizing Jupyter notebooks for model development.

Apache Spark with Scala and R-studio, Keras/API with Tensorflow, PyTorch and Python, statistical ML, development of analytics within Big Data, Machine Learning domains. Massively parallel data analysis, NVIDIA GPU 12,500+ CUDA core AI-appliance, predictive model development and validation.

Data Engineering, Data Lakes, Big data analytics, EC2 cluster computing, Spark Streaming, Kinesis, AWS S3/HDFS/Spark Data Lake architecture/implementation. Set up AWS VPC clusters, internet gateways, security groups, utilized AWS Glue (Crawlers, metadata catalog) for ETL processes. Amazon Redshift, Redshift Spectrum.

Development of analytical software in defense, financial, climate domains. Experience in equities, fixed income, time series analysis, market tick data analysis, real time analytics. Terabyte scale data analysis. Worked with Digital Signal Processing, Image Processing technologies.

Experience:

Big Deeper Advisors, Inc.

March 2016-present

Chief Data Scientist/Architect/CTO, Greater Boston, MA

Fine-grained text analysis, extractive summarizing, multiple document automated cross-referencing. Have built a generic semantic search engine based on large language models. Fine-tuned pretrained language models to improve semantic search for specific industry verticals. Worked with BERT, RoBERTa language models. Developed cutting edge techniques for classification of imbalanced datasets.

A contractor senior data scientist at the Center of Excellence (COE) in Windsor, CT for a wealth/pension fund manager. Utilizing Hadoop tools Hue, Hive, Impala, Cloudera Data Science Workbench (CDSW) to solve business problems. Have developed a PySpark application to enable daily large scale data migration from Oracle, DB2 data warehouses into HDFS. Researched modern unsupervised learning ANN algorithms like XL-net, BERT, document to vector for classification and search functionality (NLP/NLU).

Prototyped semantic search engine to process customers e-mails for further classification.

Provided business analysis for data lake design/requirement from analytics/data science point of view, based on AWS S3, AWS Glue. Collaborated with team members (customized of ETL scripts, enrichment of meta data Glue catalog, erc...), to load customer's data warehouse, Hadoop/Spark based AWS EMR analytics (financial data, weather data, cyber data clean up, predictive analytics). AWS Redshift warehouse, Redshift Spectrum, Kinesis data streams, Firehose, SQL.

Applying Deep Learning/AI technology to solve business problems in image classification, facial image recognition, facial spoofing counter-measures, generation of synthetic visual data using GANs for model training. Keras, PyTorch with python.

Have worked with financial, cyber security, document retrieval (NLP) domain data.

Facial image recognition, anti-spoofing deep learning networks to recognize attempts to display facial images on tablets, phones, as well as physical masks. Python/Keras

Brain lesion image classification, binary classification of two types of lesions, requiring different approaches for management. Python/Keras, Resnet50, Resnet101.

Working with financial data, price data, compliance data, cyber data, Apache Spark/Hadoop HDFS, Scala, Python, R-studio(R), DPLYR R API with Apache Spark,Amazon Web Services (AWS), S3, EC2 Cluster computing. Statistical Machine Learning (ML), time series analysis, Fourier transform analysis,, analysis of cyber security big data. Nvidia GPU cluster, deep learning (DL), convolutional neural networks (CNN), residual networks (ResNet), PyTorch, OpenCV, Keras API for Tensorflow, SVM, multiple linear regression, Predictive Model development and validation, Word2Vec (NLP), Doc2Vec (NLP) models using Gensim Python module, LSTM, GRU.

Built a semantic search based database for the NIH database of biomedical abstracts. Capabilities include semnatic search based on question meaning extraction, meaning similarity metrics.

Using NIH 27.6 million document corpus for training the LDA (NLP), Doc2Vec models. Document similarity ranking based on Doc2Vec Euclidean distance measures. Utilized deep learning models based on Keras for text classification, clustering, sentiment analysis. Tuned hyperparameters for Doc2Vec, Word2Vec embedding models.

Development of smart contracts (ERC20 digital tokens/coins aka. Cryptocurrency, ICO sales contracts, ERC721 NFT) for the Ethereum distributed ledger with Solidity. Extensive experience writing unit tests using Truffle/web3.js framework for the Ethereum blockchain, running two full Geth client nodes in-house. Developing Go applications to interact with Ethereum Geth client. Working with node.js. Have advised multiple companies for their ICOs. Thorough understanding of public/private key systems. Multiple smart contracts deployed live to Mainnet.

Provided healthcare related consulting services & worked on a super fast OMOP Cohort Builder for a Japanese Oncology Firm based in Cambridge, MA. R, Spark, Scala.

Providing custom development of analytics for financial, defense, security domains. Working with embedded GPUs, Maintaining C/C++ APIs for DSP/Image processing libraries.

Stealth mode AI startup.

August 2020-October 2020

CTO

Worked on and lead others to develop NLP systems for business intelligence. Fine-grained text analysis, extractive summarizing, multiple document automated cross-referencing. Have built a generic semantic search engine based on large language models. Fine-tuned pretrained language models to improve semantic search for specific industry verticals. Worked with BERT, RoBERTa language models. Developed cutting edge techniques for classification of imbalanced datasets. Researched speech transcription, i.e. spectrogram based data augmentation, language model based error correction.

RAYTHEON CYBER PRODUCTS, LLC

December 2014-March 2016

Principal Data Scientist/Quantitative Analyst (Cyber Security), Billerica, MA

Apache Spark, Scala, Python, limited Rstudio(R), Amazon Web Services (AWS), EC2 Cluster computing. Statistical ML, time series analysis, Fourier transform analysis, distributed force directed graph visualization, analysis of cyber security big data. Nvidia GPU cluster, deep learning, neural networks, Digits(R), Theano, SVM, multiple linear regression.

SINECTO., INC.

CTO/Co-founder August/2005 – December/2014

Westborough, MA

Development of software products and associated consulting services in the financial and defense industries, price feeds, asset valuation, etc... C/C++ quantitative data analysis. POSIX Threads, distributed processing, IPC, Digital Signal/Image processing, asynchronous programming model.

Large Agro-business/End-Client(2 months)

Data Scientist/Analyst

Big data climate analysis. Used Python with HADOOP under Ubuntu OS on Amazon EC2, Amazon S3, EMR (Map/Reduce data modeling) to parse, clean up, analyze NOAA weather data. AWS command line interface CLI. Analyzed behavior of the agricultural maize growth model CERES under volatile temperature conditions. Developed a report about different approaches for predicting corn growth stages.

CLOUDMARKED.COM

Software Architect/Data Scientist

An online visual knowledge base repository service integrating web pages snapshots, videos, images, mp3, document files. Integrated with social network services for content filtering and curation.

Semantic search based on user natural language questions, semantic search for similar documents, semantic search based document cross-referencing, summarizing.

The portal is an SaaS application which is intended to help users to find similar, relevant documents, store them, and refine the content based on other users’ feedback. Allows end-users to add ad hoc links to web documents to connect related content. Javascirpt has been used extensively over two years to implement DOM manipulation.

Developed NLP analytics in Python to assess similarity of English medical text documents, web pages, etc...

Used Apache Spark to analyze link data to gain an insight into how different pages are related.

Defined specifications, managed off-shore software developers for the project.

Gruber Gray, LLC (A Private Hedge Fund)/Client

Data Scientist/Developer (2 year project)

Conducted price data research on multi-terabyte tick database to detect insider trading based on aberrant price movements to protect the company from taking large, market neutral but, risky positions in equity groups. Utilized C/C++ and APL for prototyping data analysis algorithms. Researched and implemented in software Big Data financial analysis algorithms.

Developed code to process and analyze tick by tick data for U.S. equities. Provided analytics for real time assessment of risk.

Profile Group, LLC./Client

Subcontractor to Fixed Income Hedge Fund Convexity Capital Management (Harvard Management Corporation off-spring run by ex HMC president)

Responsible for data and algorithm validity (model validation) for CMS Spread Options, CMS Caps/Floors, CMS Swaps, BMA Caps/Floors, BMA Swaps. Implementation of new valuation models. Defined test methodologies, interfaces, data structure mappings, etc… Dealt with business issues like cash flows, business calendars, currencies, etc… Collaborated with Convexity Personnel to establish testing protocols, optimized code for speed.

BusinessEdgeSolutions, Inc./Client

Subcontractor to State Street Global Advisors (SSgA Internal HEDGE FUND),

Contributed solutions during weekly, Monday status meetings.

Provided consulting services with respect to Mandatory (with/out options) and Voluntary Corporate Actions. Analyzed what procedures need to take place for reconciling fund manager portfolio view with custodian’s portfolio. Examined Bloomberg’s corporate action files examples to affect legacy code modifications.

Created programs to automatically populate Front Arena database with CFD (contract for difference) instruments for all equities traded on British, Irish exchanges using SSgA pub-sub interfaces to enable fund managers in US and UK to trade, track and price CFDs.

Created programs to automatically populate and keep updated SunGard database with all international calendars.

Created programs to define and load issuers, issues, foreign exchanges, end of day pricing, corporate actions, issue classifications for ICB, GICS, Lehman(Bonds), SSgA classification hierarchies in XML format for loading them into SunGard FrontArena product.

Modified SSgA pre-existing pub-sub Java code to utilize RKS issue key instead of Reuters Instrument Code (RIC) for the SSgA internal hedge fund.

Developed Sybase stored procedures to support pubsub code.

Cascada, Inc. /Client

Subcontractor to Boston Stock Exchange (BSE) and Boston Option Exchange (BOX)

Participated in analysis meetings with Boston Exchange personnel, provided solutions for various trade-through surveillance problems.

Analyzed Cascada data and code to determine validity of BBO (Best Bid/Offer) and NBBO (All exchanges BBO combined) computations. Identified erroneous NBBO computations and collaborated with BOX and Cascada personnel to correct the model.

Provided code and time series data analysis and advice on processing trade-through transactions for the purpose of compliance monitoring of real-time OPRA and other feeds. Helped debug time-related discrepancies between BOX code and Cascada code.

Participated in meetings with BSE/BOX personnel, provided analysis of BOX/BSE requirements. Environment: Linux/C/Python/Streaming Software.

Various Internal and External Concurrent Projects

· Provided software consulting services for a small hedge fund proprietary trading system. The automated system trades by establishing hedge positions within different industrial sectors. Developed real-time event manager, the trading modules are able to register certain events (socket descriptors’ ready to be read or written, timers going off, exceptions occurring, etc…) with the event manager to enable various callbacks to be invoked when events occur. Defined standards for other developers for creating robust, fast, dead-lock free code. Developed various C++ classes for resource locking by various system threads. Participated in code reviews. Debugged different parts of the system. Optimized multiple C++ modules to improve execution speed. Optimized critical sections of code to achieve real time performance. Optimized communication interfaces to effectively interleave feeds from different sources. Utilized shared memory APIs to enhance performance. Provided analysis on how best to break up some analytical code into different parts to obtain optimal execution time. Utilized POSIX thread libraries under RedHat Linux and Unix (SunOS) to implement various codes. Developed relevant parallel processing algorithms.

Developed a container for a client, to augment C++ STL library, based on trie data structures. This container implements an associative container based on a STL string object as a key, allowing insertion and retrieval/search of data within strlen(key) time complexity. The maximum depth of the trie is the maximum key size. The structure is self-sorting, i.e. new insertions occurs exactly in the right place. The project was done under RedHat Linux using g++ with some shell scripting for building the software.

Developed a highly efficient program to parse a customer web log, containing duplicate customer IDs and visited web pages in order to analyze statistical occurrences of three (3) page sequences which a customer visited. Utilized STL map and STL vector containers to implement this solution.

· Developed a product called Global. Global is a parallel processing middle-ware software libraries enabling multiple heterogeneous UNIX workstations to be used as a parallel computer. Defined specifications for this product, developed it, debugged it and maintained it until the IP rights were sold. Global also enabled multi-threaded applications running on remote nodes to seamlessly invoke threads on the user workstation to service application input/output requirements. Enabled user workstation resources to be accessed simultaneously by plurality of remote machines under protection of mutual exclusion locks (mutex). Ported this software to IBM Unix (AIX), Sun Microsystems Unix (SunOS), Red Hat Linux. Utilized posix thread libraries for unix client stations.

· Contracted by IBM Microelectronics to develop five different optimized analytical/mathematical software libraries for PowerPC chip. Packages included elementary mathematical functions, linear algebra, image processing, etc… Resulting code performance for elementary functions exceeded IBM and Motorola's software. IBM microelectronic purchased some 250 licenses for the complete software. Utilized Windows NT and Linux for this project.

· Developed a parallel heuristic network optimization tool on a farm of processors. This application was developed in C++ for optimizing various communication networks, like data networks, gas pipelines, electric distribution grids, etc… The goal of the software was to find a good solution whereas finding the perfect solution was theoretically and practically impossible. Developed this project in C++ utilizing a third party middle-ware (EXPRESS) as the programming base. The program operates by testing what happens with overall cost of the system if one or two links in the grid are removed, different scenarios are explored in parallel, the more optimal is chosen. If the cost is reduced the solution is kept and the program proceeds to experiment further. Either a windows front-end host computer is utilized or a unix workstation with X windows server for user display services.

· Wrote a parallel-ised option valuation application for computing values of American stock options. Applied a set of 8 processors to a problem of computing an expected value of an American option. Utilized a Cox, Rubenstein binomial tree approach, whose branches were distributed to different processors and then recombined in the end to obtain the final value. The program runs on an embedded system connected to either Windows or Unix workstation (AIX, SunOS, Linux).

· Developed an analytical vector software library for Texas Instruments processors, developed new algorithms which were specifically tailored to the Texas instruments processors by utilizing Chebychev expansions for elementary functions. This package contains over 300 different analytical functions to operate on time series data. This package has been extensively utilized by various military agencies and contractors like Naval Underwater Warfare Center, Bettis Labs, IBM, ATT, Rafael, as well a lot of foreign companies in Japan and Europe. Developed a 300 page comprehensive user manual for this software. Either Windows or a Unix client station was utilized as a front end to an embedded system running this software.

· Developed object oriented (OO) C++ interface to the analytics processing software above to enable customers migrating to C++ from C to still obtain great performance which is offered by this software.

· Developed Image Processing software package for Texas Instrument's processors. Wrote close to 500 different functions to analyze photographic image for quality control systems, for missile defense systems, and other security applications. Wrote critical sections of software in a native assembler to obtain maximum performance. Developed a 600 page user manual for this software package. Some of the things the software does are 2 dimensional fast Fourier transforms, edge detection, image sharpening, filtering and other types of analysis. The software is very flexible, it is designed to be re-entrant, interruptible, relocatable and very fast as well as allowing an application programmer to operate on image subsets.

· Created an object oriented interface (OO) to the image processing software in C++. Developed various C++ classes to create images and to perform various operations from the above software library, therefore allowing our customers to migrate to C++ still utilizing our robust and fast software. The class libraries running Unix or Windows enable seamless use of embedded accelerator hardware. Wrote scripts to set up the system and to pre-load embedded code.

· Developed a code generation Linux tool, based on gcc compiler, to automatically convert conventional Windows applications into client/server Linux/Unix services. The tool creates over 3 million lines of code to describe a communication interface between a Windows workstation and a Linux/AIX/Solaris application server. Required understanding of gcc produced C language parse tree.

· Developed underlying communications client/server protocols based on TCP/IP or UDP to connect Windows User workstations with UNIX application servers. Utilized UNIX signals to simulate communication failures to test communication interface recovery procedures. Created a client-server interface for remote invocation of threads on user workstation to enable remotely run service applications to access local resources. Implemented locking of local operating system resources using mutexes. Implemented an RPC-like protocol to enable remoting of OS to user front end callbacks. Utilized MS Visual C/C++ studio to develop the code, debug the code, and maintain the code. Used simultaneously MS debugger and gdb debugger under Linux/UNIX to debug communication interfaces to correct any problems with mis-aligned message data.

· Provided numerous consulting services in unix/windows networking and inter-process-communications (IPC). Provided consulting and code development related to dead-lock free resource sharing by multiple threads or processes. These services were provided for client using different flavors of UNIX, AIX, SUN OS, Linux.

· Designed a proprietary secure protocol to authenticate and logon from a user workstation to an application server. This protocol eliminates man-in-the-middle security flaw, and removes a need for third party CA (certification authority). Managed a contractor to implement said protocol.

· Managed several projects to develop optimized analytical linear algebra packages for PowerPC, Pentium, and Texas Instrument's processors. Developed scripts and make files to build these projects.

· Optimized code for several linear algebra analytical packages, C_LinPack, C_EisPack, C_Blas123. Adjusted code to compile more efficiently for particular processor architectures.

· Developed C++ Object Oriented (OO) Interfaces to the analytical packages. This code permits a C++ application programmer to operate on matrices naturally with regular mathematical operators. It also controls storage formats for different kinds of matrices. Ported the code to IBM Unix (AIX), Sun Microsystems Unix (SunOS), FreeBSD Unix.

Schonfeld Securities LLC./Wavefront Capital LLC.

Project Lead/Senior Quantitative Software Developer, New York, NY August 2004 – May 2005

Implemented over 550 new analytical algorithms for statistical equity trading application in C/C++/STL under Solaris OS and Fedora Linux.

Optimized a statistical application to handle large market data sets and to run within reasonable amount of time.

Implemented a “Genetic” money pool allocation to different trading strategies, to improve ROI. Under-performing strategies would have their capital reduced, while successful ones would be given more capital.

Defined specifications for a new hardware system to implement a new real-time analytical compute engine, specified interactions with other firm applications and investigated storage requirements for the new system.

Implemented a communication algorithm in C/C++ under Sun Solaris to fail over from one real-time market feed to another and to switch back if the primary market data feed recovers, the algorithm also notifies key personnel when failure or recovery occurs.

Adapted/Ported C++ analytical applications’ code from 32 bit to 64 bit addressing mode, removed all the 32 bit code dependencies.

Employed statistical machine learning for handling large historical tick datasets for discovery of trading strategies (model development.)

Implemented a collection of buy/sell signals for statistical black box trading system.

Ported C/C++ code from Solaris to Linux.

Investigated and defined parallel processing hardware specifications for a new analytical back testing system (SMP Quad Opteron/Fedora Linux based system with 1.6 TERABYTES tick data). Developed C/C++ analytical software to provide real time analysis of market data to traders and firm and third party black boxes.

Education:

Columbia University Bachelor of Arts (2+ years grad. level), Mathematics New York, NY

Took about two years worth of graduate level (M.Phil and PhD) courses in mathematics and physics in excess required for BA, substantial course work in computer sciences IBM Mainframe assembly programming, Data structures, etc



Contact this candidate