Steven Handerson
Old Lyme, CT *6371
****************@*****.***/gmail.com
Summary
Masters level computer professional with many years academic and commercial experience.
Interest and challenge driven, prefer working with real word data (NLP, Text Understanding, Bioinformatics /
Genetics, etc.), work that involves learning new things, or improving existing systems.
Can maintain and debug code, reengineer for improved speed and reliability, but can also invent new, elegant
solutions to posed problems.
Enjoys collaboration and teamwork, but also likes an empowering environment where there are few artificial
barriers to progress.
Experience Categories
Next Generation Sequencing: Basecalling, High Speed Signal Processing
Genomics:
Basic PCR design, programmatic Sequencing Assay design K mer
applications (assay design, read depth estimation) some SNP calling
MPI, openMP, multi threading, vectorization
Scaling Up:
NVidia CUDA, Intel PHI
Hadoop and Hadoop like
Classification / Modeling: Naïve Bayes and Log Linear Models, Decision Trees, some SVM
Data Mining/Text Mining: Taxonomy Mapping, Session Processing, some Fact Extraction
Language Modeling: N grams, some HMM
Natural Language Processing: Unification and Definite Clause Grammars, Finite State including some
large scale
Research/Development Interests
● “Big Data” – addressing issues and challenges of scale the more data the better, for useful statistics
● Speed / efficiency
● Fast processing using either standard hardware (map/reduce) or specialized hardware (Nvidia, PHI/Mic)
● Algorithms – research, coding, and applying
● Machine Learning – research, coding, applying
Computer Programming Languages
C (15+ years), C++ (6+), Java (7+), Perl (5+), Lisp (5+, invented “Symbol Macros”), Matlab, R,
Prolog
EDUCATION
Carnegie Mellon University, Pittsburgh, PA
M.S., Computational Linguistics, Department of Philosophy
B.S., Mathematics
PROFESSIONAL EXPERIENCE
Pattern Genomics, Branford, CT July 2013 present
Consultant
● Extending PCR assay design software for sequencing assays, where the PCR product(s) identify
subtypes, using a hash based method of my own invention to efficiently find useful differences. Working
within a C++ framework that was initially developed by the founder using the SeqAn libraries, and an
integer programming approach to choosing the final primer pairs.
454 Life Sciences (a Roche company), Branford, CT July 2011 – June 2013
Bioinformatics Scientist
● Invented and productized “In Silico Normalization”, which uses k mer calculations to sample reads so as
to reduce the depth of high depth regions. Uses a lock free hashtable of my own design, and supports
k mer sizes up to 32 bases (64 bits), which was found to be useful for larger datasets.
● Sped up gsReadProcessor, the main base calling application, by a factor of 2, by speeding up a central
algorithm (CAFIE) by a factor of 4. Basic meta technique was full system profiling with oprofile, finding
3 different sources unnecessary runtime not previously identified
● Worked on the “image processing” stage of their new (since cancelled) ISFET based sequencer,
something which stretched/stressed the bounds of current simgle computer bandwidth. Used OpenMP,
and techniques inspired by CUDA on traditional Intel hardware. Also used Intel’s profiler to improve
multi threading.
● Helped convert Matlab code for ISFET “image processing” to C, allowing analyses which were
prohibitively expensive before (i.e. not even attempted, at least 1 day real time). Newest version ran in
1.5 hours using 16 hyperthreaded Xeon cores
● Helped develop the gsReadProcessor basecaller through two iterations, including one which introduced
non cyclic flow patterns, which allowed much improved read lengths. Used valgrind, electric fence, and
other “power tools” to find and eliminate lurking unknown, and known, bugs.
● Developed the gsReadCluster tool, which provides various clustering options for read processing.
In Silico Normalization, though (see above), was the best outgrowth of this work.
● Maintained and improved genetic mapping and assembly code, and included SNP detection.
Lockheed Martin, Bethesda, MD April 2010 – Oct 2010
Senior Information Systems Specialist
● Developed original C code for the SRA (Short Read Archive) genetic database at the National Institutes
of Health. Developed approximate string matching code to identify important regions of submitted DNA
sequences – recognized the need, researched, coded, debugged, and packaged.
Xplusone, Norwalk, CT Dec 2008 – Mar 2009
Applications Developer (Contract)
● Assisted in developing system to analyze log files in Java, to support customer information and model
building for intelligent ad serving. Custom code (sorted data, hash joins) worked better (faster, simpler)
than an evaluation Hadoop implementation. Basic Java 5/6, Junit, some Spring.
Verizon/Idearc SuperPages, Waltham, MA Oct 1999 – Sep 2008
Web Programmer IV
● Developed a web service (XML in/out) server that suggests advertising terms by integrating various
information sources, including spidering customer web pages real time, and other sources of information
in various term databases. Included multithreaded spider (hits the single advertiser site with multiple
threads).
● Prototyped data quality systems that involve back end processing in Perl, with results stored in a
database and manipulated through JSP, JDB in Tomcat.
● Data mined user logs for information used in pay per click advertising – specifically, matching user
inputs to advertising categories.
● Developed techniques to suggest changes to the taxonomy, programmatically incorporating data from
web and print yellow pages extracts.
● Assisted in various mapping efforts between, for example, Yellow Pages print categories and taxonomy
categories, using different ad hoc techniques (such as using listings in common via two classifications).
● Prototyped a finite state system to allow categories to define explicit rules as to when they apply to user
inputs –similar to what Yahoo now refers to as their “Query planning” system (developed
independently).
● Adapted yellow and white pages web servers to three other countries.
● Designed and implemented the procedures that allow these servers to operate with near 100% uptime,
and allow prepublication verification.
● Massively simplified the daily ad rotation algorithm (uses modulo of the current time in various ways).
● Developed Naïve Bayes / mutual information technique to improve yellow pages data quality, enabling
provider (Acxiom) to find a bug in their processing.
● Developed mutual information technique to identify advertisers with too many unrelated advertising
categories.
● Assisted with spider / presentation system to filter objectionable content.
● Developed technique to "map" external heading categories into a restricted set using common listings
Vorbroker Consulting, Cincinnati, OH May 1997 – Sep 1999
Consultant (Compaq, GE Aircraft Engines)
● Maintained, as part of a team, thousands of engineering workstations
● Problem solved a wide variety of user problems (second tier help desk analyst)
Software Engineering Institute, Pittsburgh, PA Oct 1995 – May 1997
Visiting Scientist, Risk Program
● Rewrote (incorrect) noun phrase ATN parser as a series of finite state programs.
● Reengineered term clustering module as a sort merge operation, allowing arbitrary data sizes.
PATENT
6,640,228 Method for detecting incorrectly categorized data
[Basically build a naive Bayes model, look for outliers.]