Data Design

Location:

Old Lyme, CT

Posted:

April 02, 2014

Contact this candidate

Resume:

Steven Handerson

** ********* **.

Old Lyme, CT *6371

860-***-****

****************@*****.***/gmail.com

Summary

Masters level computer professional with many years academic and commercial experience.

Interest and challenge driven, prefer working with real word data (NLP, Text Understanding, Bioinformatics /

Genetics, etc.), work that involves learning new things, or improving existing systems.

Can maintain and debug code, reengineer for improved speed and reliability, but can also invent new, elegant

solutions to posed problems.

Enjoys collaboration and teamwork, but also likes an empowering environment where there are few artificial

barriers to progress.

Experience Categories

Next Generation Sequencing: Basecalling, High Speed Signal Processing

Genomics:

Basic PCR design, programmatic Sequencing Assay design K mer

applications (assay design, read depth estimation) some SNP calling

MPI, openMP, multi threading, vectorization

Scaling Up:

NVidia CUDA, Intel PHI

Hadoop and Hadoop like

Classification / Modeling: Naïve Bayes and Log Linear Models, Decision Trees, some SVM

Data Mining/Text Mining: Taxonomy Mapping, Session Processing, some Fact Extraction

Language Modeling: N grams, some HMM

Natural Language Processing: Unification and Definite Clause Grammars, Finite State including some

large scale

Research/Development Interests

● “Big Data” – addressing issues and challenges of scale the more data the better, for useful statistics

● Speed / efficiency

● Fast processing using either standard hardware (map/reduce) or specialized hardware (Nvidia, PHI/Mic)

● Algorithms – research, coding, and applying

● Machine Learning – research, coding, applying

Computer Programming Languages

C (15+ years), C++ (6+), Java (7+), Perl (5+), Lisp (5+, invented “Symbol Macros”), Matlab, R,

Prolog

EDUCATION

Carnegie Mellon University, Pittsburgh, PA

M.S., Computational Linguistics, Department of Philosophy

B.S., Mathematics

PROFESSIONAL EXPERIENCE

Pattern Genomics, Branford, CT July 2013 present

Consultant

● Extending PCR assay design software for sequencing assays, where the PCR product(s) identify

subtypes, using a hash based method of my own invention to efficiently find useful differences. Working

within a C++ framework that was initially developed by the founder using the SeqAn libraries, and an

integer programming approach to choosing the final primer pairs.

454 Life Sciences (a Roche company), Branford, CT July 2011 – June 2013

Bioinformatics Scientist

● Invented and productized “In Silico Normalization”, which uses k mer calculations to sample reads so as

to reduce the depth of high depth regions. Uses a lock free hashtable of my own design, and supports

k mer sizes up to 32 bases (64 bits), which was found to be useful for larger datasets.

● Sped up gsReadProcessor, the main base calling application, by a factor of 2, by speeding up a central

algorithm (CAFIE) by a factor of 4. Basic meta technique was full system profiling with oprofile, finding

3 different sources unnecessary runtime not previously identified

● Worked on the “image processing” stage of their new (since cancelled) ISFET based sequencer,

something which stretched/stressed the bounds of current simgle computer bandwidth. Used OpenMP,

and techniques inspired by CUDA on traditional Intel hardware. Also used Intel’s profiler to improve

multi threading.

● Helped convert Matlab code for ISFET “image processing” to C, allowing analyses which were

prohibitively expensive before (i.e. not even attempted, at least 1 day real time). Newest version ran in

1.5 hours using 16 hyperthreaded Xeon cores

● Helped develop the gsReadProcessor basecaller through two iterations, including one which introduced

non cyclic flow patterns, which allowed much improved read lengths. Used valgrind, electric fence, and

other “power tools” to find and eliminate lurking unknown, and known, bugs.

● Developed the gsReadCluster tool, which provides various clustering options for read processing.

In Silico Normalization, though (see above), was the best outgrowth of this work.

● Maintained and improved genetic mapping and assembly code, and included SNP detection.

Lockheed Martin, Bethesda, MD April 2010 – Oct 2010

Senior Information Systems Specialist

● Developed original C code for the SRA (Short Read Archive) genetic database at the National Institutes

of Health. Developed approximate string matching code to identify important regions of submitted DNA

sequences – recognized the need, researched, coded, debugged, and packaged.

Xplusone, Norwalk, CT Dec 2008 – Mar 2009

Applications Developer (Contract)

● Assisted in developing system to analyze log files in Java, to support customer information and model

building for intelligent ad serving. Custom code (sorted data, hash joins) worked better (faster, simpler)

than an evaluation Hadoop implementation. Basic Java 5/6, Junit, some Spring.

Verizon/Idearc SuperPages, Waltham, MA Oct 1999 – Sep 2008

Web Programmer IV

● Developed a web service (XML in/out) server that suggests advertising terms by integrating various

information sources, including spidering customer web pages real time, and other sources of information

in various term databases. Included multithreaded spider (hits the single advertiser site with multiple

threads).

● Prototyped data quality systems that involve back end processing in Perl, with results stored in a

database and manipulated through JSP, JDB in Tomcat.

● Data mined user logs for information used in pay per click advertising – specifically, matching user

inputs to advertising categories.

● Developed techniques to suggest changes to the taxonomy, programmatically incorporating data from

web and print yellow pages extracts.

● Assisted in various mapping efforts between, for example, Yellow Pages print categories and taxonomy

categories, using different ad hoc techniques (such as using listings in common via two classifications).

● Prototyped a finite state system to allow categories to define explicit rules as to when they apply to user

inputs –similar to what Yahoo now refers to as their “Query planning” system (developed

independently).

● Adapted yellow and white pages web servers to three other countries.

● Designed and implemented the procedures that allow these servers to operate with near 100% uptime,

and allow prepublication verification.

● Massively simplified the daily ad rotation algorithm (uses modulo of the current time in various ways).

● Developed Naïve Bayes / mutual information technique to improve yellow pages data quality, enabling

provider (Acxiom) to find a bug in their processing.

● Developed mutual information technique to identify advertisers with too many unrelated advertising

categories.

● Assisted with spider / presentation system to filter objectionable content.

● Developed technique to "map" external heading categories into a restricted set using common listings

Vorbroker Consulting, Cincinnati, OH May 1997 – Sep 1999

Consultant (Compaq, GE Aircraft Engines)

● Maintained, as part of a team, thousands of engineering workstations

● Problem solved a wide variety of user problems (second tier help desk analyst)

Software Engineering Institute, Pittsburgh, PA Oct 1995 – May 1997

Visiting Scientist, Risk Program

● Rewrote (incorrect) noun phrase ATN parser as a series of finite state programs.

● Reengineered term clustering module as a sort merge operation, allowing arbitrary data sizes.

PATENT

6,640,228 Method for detecting incorrectly categorized data

[Basically build a naive Bayes model, look for outliers.]

Contact this candidate