Sign in

Engineer Data

Los Angeles, CA
October 10, 2018

Contact this candidate


Jacob Leonard



Early career data engineer with extensive experience in architecting and developing cloud-based data pipelines in corporate and academic settings, spanning the fields of oncology, genomics, cryptocurrency, RNA-seq analysis and precision nutrition. Leveraging storage, processing, development, testing, automation and security frameworks across cloud providers, pipeline projects have included sample processing, re-annotations, ETL, customer/company analytics, machine learning, report generation and database creation. My experience focuses on exploring the use of automated modular processes with custom execution strategies designed to achieve the needs and goals of the project while efficiently using resources. Distributed and cloud processing technologies present amazing opportunities for the future of data in numerous fields, and I am always looking for new ways to discover and influence that potential. EXPERIENCE

Suggestic Inc. Mexico City, MX

Data Engineer February 2018 – Present

• Responsible for the validation and re-annotation of internal company documents on elasticsearch using API sources to ingest restaurant, menu, menu item, and recipe information delivered to end users

• Analyzed code-base, infrastructure and execution patterns to give company consultations on current technology, as well as give recommendations for new pipelines to increase data quality, reliability, and scalability

• Developed Django module to execute elastic search API validation pipelines, executed on kubernetes pods hosted on google cloud, using google storage for data staging and elasticsearch for data warehousing

• Developing infrastructure and security best practices for future biometric data integrations, focusing on cloud technologies and distributed computing architectures for execution and permissions access Winter Genomics - Johns Hopkins Mexico City, MX

Bioinformatics Pipeline Consultant December 2017 – May 2018

• Worked in conjunction with Winter Genomics and John Hopkins University teams to consult and re-write RNA- Seq bash scripting pipeline, utilizing newer technologies such as Nextflow and Docker

• Pipeline was built to be deployable and extensible in multiple cluster environments while retaining consistent dependencies for versioning and reproducibility

• Lead author on code-base and guided project development and direction up until cluster testing, with regular meetings with team stakeholders to update requirements for desired outcomes/outputs Bitso SAPI de CV. Mexico City, MX

Data Scientist April 2017 – February 2018

• Sole member of the Analytics department, responsible for 100% of infrastructure/development/analytics

• Built distributed analytics computation engines in Apache Spark for on demand data delivery

• Developed and maintained cloud analytics container system for ETL ingestion using Docker-Luigi-Spark-AWS

• Consulted with internal departments and key partners about user base, new products, features, and services to better leverage data sources to guide business intelligence decisions and improve customer engagement

• Developed automized reports and dashboards for internal departments, reading from AWS S3 into R Products EDUCATION

University of Southern California Los Angeles, CA

Dana and David Dornsife College of Letters, Arts and Sciences December 2015 B.A. in Applied and Computational Mathematics

Questbridge Scholar: In the top 10% of all low-income high achieving students at top schools in the US INVOLVEMENT & PROFESSIONAL ASSOCIATIONS

Delta Omicron Zeta Leadership Fraternity

• Alumni member of USC’s premier coeducational leadership fraternity since Spring of 2012, consisting of social, professional and leadership based enrichment programs designed to connect student leaders from across campus Quest Scholars Network - USC Alumni

• Alumni member of the Questbridge College Network which helps connect low income high achieving students to top universities with 100% need based financial-aid through subsidized applications and university connections TECHNOLOGIES

Docker, Github, BaseSpace, Python, Java, Groovy, Shell Scripting, BioConductor, Basespace/Illumina, Amazon AWS EC2

& S3, Google Compute & Storage, GATK, Samtools, VCFtools, Picard Tools, BedTools, SQL, ElasticSearch, Spark, Kubernetes, Luigi, Nextflow, SGE, R Programming, Python Pandas, Travis CLI, Redis, R Shiny, Celery, Django, and more!

Contact this candidate