Engineer Data

Location:

Fremont, CA

Posted:

April 06, 2020

Contact this candidate

Resume:

Raxit Solanki

***********@*****.*** +1-714-***-**** San Francisco

github.com/raxit65535 linkedin.com/in/raxit65535 EXPERIENCE

DATA ENGINEER MINTED LLC, SAN FRANCISCO FEB 2020 – APRIL 2020

● Developed new feature requests for internal reporting too (web app) using React JS and GraphQL.

● Designed CI/CD pipeline for the same app to expedite feature development, using Jenkins CICD, AWS Cloudformation, CodePipeline/AWS Lambda.

● Crafted end to end Streaming data pipeline with help of product designers using Java, Python, Apache Kafka and Apache Beam.

● I.e. Designed ML model (TensorFlow, K-nearest neighbour) and deployed it along with Beam pipeline, to generate similarity index of similar product on our ecommerce site. Beam Stream pipeline, which will capture onpageLoad events from each page and aggregate the latency faced by the customer on every page load.

● Small fraction of responsibility includes writing analytics queries on production Snowflake database for reporting purpose.

● Sadly, I got lay off due to the Covid-19 business economy hit. DATA ENGINEER GLOBE LIFE INC., LOS ANGELES SEP 2019 – JAN 2020

● Crafted data pipelines for daily data warehousing workload using AWS Glue (Spark Jobs, Scala) and AWS Batch (compute job queue). Designed CI/CD for efficient deployment/testing of etl jobs, using AWS Cloudformation, CodePipeline and AWS Lambda.

● Developed “User trip” data pipeline, which will sessionize activities of users based on event logs (~GBs per day) collected by the web team (s3 bucket), to help the marketing team design efficient advertising (GA360, SA60, InHouse ad optimization framework) strategies. Also Engineered infrastructure as code for data pipelines using shell (awscli) / Python Scripts (AWS ECR-docker, Glue, Batch, SQS, S3, IAM policies).

● Helped in creating / modifying Data models for new business requirements and Gathered marketing insights / reports using AWS Athena queries on top of Glue catalog (s3 buckets - parquet data format). DATA ENGINEER FELLOW INSIGHT DATA SCIENCE, SAN FRANCISCO JUN 2019 – AUG 2019

● Designed a high performance data pipeline on an unbounded data stream using AWS infrastructure in a cost efficient manner.

● Simulated data stream using ~1TB Taxi Rides data, and designed a greedy matching algorithm for Rider & Drivers in real time using Spark Streaming. Also Used Kafka as an ingestion layer for this data pipeline to ensure exactly once processing semantics.

● Deployed data pipeline on Kubernetes cluster to solve the dynamic crashing issue while running spark job.

● Developed a custom dashboard to keep track of matching statistics using Java, Spring Boot, web-socket and ChartJS.

● Project details can be found at: github.com/raxit65535/umatch SOFTWARE ENGINEER UJOBI INC, FULLERTON JAN 2017 – JUNE 2017

● Engineered a web portal, to create a network of talented students, and allow them to gain expertise by working on real-world projects assigned by individual or organizations. (Java, Spring, MongoDB, AWS-EC2)

● Implemented Rocket Chat API to introduce high quality chat in web portal and engage more user traffic.

● Designed CI/CD pipeline on AWS to automate the production code deployment process. Also, solved technical production issues of MongoDB thread management and JVM out of memory. SOFTWARE ENGINEER eCLINICALWORKS INDIA PVT LTD, INDIA JULY 2014 – NOV 2016

● Confined and build new feature requests for support ticket management software to leverage ease of use by technical support staff ( Java/JSP).

● Resolved performance, thread management and memory leak issues from legacy codebase with the help of tools like JConsole, Flame Graphs and Heap dump analysis on Eclipse.

● Developed data pipeline using Java Map Reduce to aggregate / process health insurance claim data (~ 5TB) and created Hive based HDFS partitions for the data warehouse.

● Used Hive queries to analyze the data and exported the results into MySQL using python scripts to create custom UI dashboards.

● Helped design partitioning and bucketing scheme for hive data warehouse to get faster OLAP query response.

● Built utility scripts in python to auto analyze static information from Hadoop error logs and show the necessary information. EDUCATION

M.S IN COMPUTER SCIENCE, CALIFORNIA STATE UNIVERSITY, FULLERTON, CA MAY-2019 B.TECH IN COMPUTER SCIENCE, GANPAT UNIVERSITY, INDIA MAY 2014 SKILLS

Languages Java, Python, Scala, JavaScript

Web Technologies JavaEE, ReactJS, Express, NodeJS, HTML/CSS, Hibernate, Struts, Spring, Maven, Gradle Database Postgres, SQL, Snowflake, Hive, Athena-S3, GraphQL, Prisma Big Data Hadoop-MapReduce, Apache Spark/Spark streaming, Kafka, Apache Beam, Apache Flink AWS Lambda, Glue, EMR, CloudFormation, CodePipeline, IAM, S3, Python-Boto3, aws-cli, SNS, SQS, EC2, VPC/Subnet, Elastic Load balancer, SageMaker ML Scikit-Learn, Tensorflow, Pandas

PROJECTS

● Git Monitor

Data pipeline to (Git-Archive - 3 TB) to find the connected components in a graph of github users using ApacheFlinkGelly.

● App Data Analytics

Automated Data pipeline to analyze App store data (100GB~day) using AWS Glue, Spark (python), Athena, S3 - parquet.

● Book Review

Multiple Map-Reduce based pipelines to answer geographic & business questions on huge dataset of books, and final dashboards for each answer designed in Tableau.

● Sentiment Analysis

Data pipeline to tokenize the Amazon product Review dataset and calculate polarity (negative, positive) of each word and finally aggregate the final sentiment value in Tableau dashboard. All other projects can be found on my github account. Github: https://github.com/raxit65535

StackOverflow: https://stackoverflow.com/users/7045032/raxit-solanki ACHIEVEMENTS

● Student Scholarship for “High Impact Practices” at California State University Fullerton Pollak Library

● Best paper presentation award for “Prediction of Crop cultivation” at IEEE CCWC 2019, Las Vegas

Contact this candidate