Ravi Bhargava
Los Gatos, CA **030
408-***-****(Cell)
*******@*****.***
BACKGROUND
I have experience in the architecture, design and implementation of Java/Python products at startups and mid-sized companies. I have expertise in the following technical areas: building ETL data pipelines, backend servers, AWS, NoSQL and relational databases, Spark, Kafka, REST.
Big Data: ETL, Hue, Hive, Spark, Oozie, CDH.
Cloud: AWS - Glue, S3, Lambda, EventBridge, Athena, boto3.
Open Source: Elasticsearch, Storm, Kafka, Hibernate, Flask.
NoSQL: Cassandra.
Container: Docker.
Languages: Java, Python.
Database: SQL Server, Oracle, MySQL.
Technologies: REST, encryption/decryption, NLP.
J2EE/J2SE: JSON, XML, XML Schema,, JDBC, JMS.
Testing/Build: JUnit, Ant, Git, Maven, YourKit, Mockito.
OS: Windows, Linux, MacOS.
CI/CD: Jenkins, Terraform.
Methodology: Agile, UML
Personal: Excellent written and communication skills.
EXPERIENCE
Principal Data Engineer, Noah Medical, San Carlos, CA Oct 2022 - present
Noah Medical is a medical robotics company. I am responsible for developing the data products for the company.
·Proposed the data vision for the company.
·Architected and implemented a framework for ingesting log files from Noah products and transforming them into a searchable table using Glue which forms the data foundation for all current and future products.
·Architected and implemented system to store metadata of each run, define queries for each question, and save results in a relational database. Tableau was used for visualization.
·Participated in identifying key questions that different teams wanted answered from data.
·Led effort to standardize logging across all products in the company.
Consultant, Federal Reserve Bank, San Francisco, CA Apr 2021 - July 2022
Cyber Weather Map (CWM) is a big data ecosystem with ingestion from various internal data sources. Data in various formats gets ingested, processed and enhanced with correlated analytics. Tableau dashboards built on top of processed data can be accessed by end users.
·Responsible for migrating CWM applications from on-premise to AWS using Glue, Lambda, SNS.
·Responsible for maintaining and extending applications in the CWM family of products.
·Proposed methodology for unit testing Glue components.
·Implemented process to work locally in AWS Glue and S3 using Docker, localstack and docker image for Glue.
·Added new pipelines in on-premise CWM using Oozie.
·Built web application (REST) using Flask to monitor usage on Twitter for user accounts.
·Ported complex queries in SQL Server to Spark SQL.
Consultant, Software Threads, Los Gatos, CA Jan 2020 - Mar 2021
·Designed and implemented a NLP/ML application to query real estate listings from a REST web service using a voice interface in Python and Elasticsearch. Classification is done using Spark ML Pipelines.
·Developed an unsupervised process to generate questions and answers from PDF documents/contents.
·Proposed architecture for a domain specific virtual intelligent assistant.
·Designed and implemented a REST API as a generic module to save state of an application in CosmosDB for a PAAS application framework.
·Implemented functionality to make Azure Event Grid a managed module by calling Terraform programmatically.
·Migrate existing data pipelines in Vertica to Spark SQL.
· Architect and build ETL pipelines using on Glue/Spark on AWS based big data platform.
·Load data from Vertica and other internal/external sources to data lake for consumption by applications.
·Analyze dashboard queries in Tableau to find dependencies on pipelines
·Work closely with business users on data analytics, reporting and data visualization needs
Consultant, Experian, San Jose, CA Aug 2019 - Jan 2020
The Oxygen Platform on AWS is a framework to build entity resolution applications. Entity resolution is the task of disambiguating records that correspond to real world entities across and within datasets.
·Proposed an architecture to make the Oxygen framework distributed, modular, and extendible by converting each stage into a separate Spark process with a Kafka backbone. The architecture improves productivity for framework and extensions development as well as maintainability and testing.
·Designed and implemented an Oxygen module to find differences in large Json files using Merkle trees and Spark.
·Designed and implemented an Oxygen module to generate relationships between entities embedded in Json files using Spark.
·Proposed a migration strategy to convert COBOL legacy code to Java using third party products
Consultant, Walmart-Labs, Sunnyvale, CA March 2019 - July 2019
The Content Quality Platform helps users understand which attributes are important for certain categories, identify where content opportunities exist, and track whether the changes they've made are making an impact on their bottom line.
·Designed and implemented a backend system to support type ahead using Elasticsearch and caching.
·Designed and implemented a Spark job to backup Cassandra tables which store the scores associated with items as they come into the pipeline.
·Extended existing system to identify key drivers for more than one product type using Elasticsearch.
·Designed and implemented system to fetch items from Elasticsearch and compute aggregate scoring from history stored in Cassandra.
Backend Engineer, Peernova, San Jose, CA July 2018 - Feb 2019
Cuneiform is an immutable ledger based on block-chain technology with built-in audit, scale, and other capabilities necessary to help organizations store, secure, and validate data. The product is implemented as a pipeline with ETL functions.(Java 8)
·Designed an Archival System for Cuneiform. It copies data that satisfies a purge criteria, encrypts it and writes to secondary storage using Spark and Cassandra.
·Designed and implemented a Replication System for Cuneiform. Cuneiform is fault tolerant and highly available and runs in two data centers, in a master slave configuration where one data center is the primary at a time using Spark and Kafka.
·Designed and implemented a Lineage System for Cuneiform. Lineage is aimed at capturing event data from different source systems in real time and creating event lineages using Spark and Cassandra.
Software Engineer I (Principal Engineer), American Express, Palo Alto, CA Mar 2017 - July 2018
·Responsible for the architecture, design and implementation for converting a batch process to a cloud application. This included mapping DB2 tables to Couchbase. Used N1ql queries to build an asynchronous backend that periodically sends requests to a queuing system to get new keys associated with a card number.
·Responsible for architecture, design and implementation of a system for encrypting/decrypting data using keys from a Key Management System or an in-memory cache.
·Proposed a framework to create workflows comprising of asynchronous business processes.
Principal Software Engineer, Thomson-Reuters, Sunnyvale, CA Sept 2015 – Nov 2016
·Responsible for the architecture, design and implementation for the migration of user data from an old schema in Cassandra to the new schema in production for Neon, a micro-services based social networking application for researchers. (Java 8)
·Implemented an Email Notification topology using Storm. The topology checks for events since the last login and sends an email indicating number of events since last login.
·Designed and implemented an adapter between Spark and Neflix Dynomite to store/retrieve data including DataFrames.
Principal Software Engineer, Dataguise, Fremont, CA May 2013 – Sept 2015
·Responsible for the architecture, design and implementation of the Discovery Engine. The Discovery Engine is the core of all Dataguise 5.0 products and identifies sensitive data from structured and unstructured data. Patterns to be searched can be specified in XML. It uses a layered approach in which tokens at a lower level can be combined to form higher level tokens. The architecture is modular allowing customization for other domains and integrating with user defined ontologies. (Java)
·Developed a command line utility to mask/encrypt large files by invoking Discovery Engine as a Hadoop job.
·Proposed changes to the architecture of the Dataguise 4.6 Discovery Engine to decouple masking and encryption functions from the Discovery Engine.
Senior Software Engineer, Ancestry.com, San Francisco, CA Apr 2012 - Jan 2013
·Responsible for the architecture, design and implementation of a monitoring application for the DNA Backend server. It allows only authorized personnel to login and administer the application using JMX. This also included instrumenting the code to write state to the database.
·Managed operations of the DNA Pipeline. Performed monitoring and fixing of problems on production instance. This is a critical function for the DNA program of the company.
Consultant, San Jose, CA June 2010 – Feb 2012
Apollo Group:
·Proposed architecture and design of a horizontally scalable messaging system using ActiveMQ and Riak, and Amazons EC2. (Java)
eBay:
·Responsible for the architecture, design and implementation of a web service for accessing competitor pricing, supply and demand data daily. It included using an in-house framework for adding the data to a fault tolerant in-memory database. The product is currently live in production. (Java)
·Proposed changes to the architecture of the Classification Engine, a machine learning application to use dependency injection. This made the system unit testable and easier to extend. The recommendations were incorporated into the product. (Java)
Previously employed at the following companies: Zuora, Commerce One, General Magic, Spincode, Sun Microsystems, Teknowledge. I started working in 1985.
EDUCATION
M. S. Computer and Information Science, University of Massachusetts, Amherst, MA. .
M. S. Chemical Engineering, State University of New York at Buffalo, Amherst, NY.
B.Tech Chemical Engineering, Indian Institute of Technology, Kanpur, India.
NATIONALITY
US Citizen.