Engineer

Location:

Gurgaon, Haryana, India

Posted:

June 05, 2022

Contact this candidate

Resume:

CONTACT Bengaluru

India

E-mail: **.************@*****.***

Phone: +91-916*******

INTERESTS Data Engineering, ETL, Cloud Computing, Big Data, Information Retrieval and Extraction, Machine Learning, Web Mining, Social Network Analysis, Text Analytics

(NLP).

WORK

EXPERIENCE

Apple Inc August 2020 — Present

Intuit Inc September 2016 — August 2020

Teaching Assistant August 2015 — December 2015

Xerox Research Center India May 2015 — July 2015

Teaching Assistant Jan 2015 — May 2015

Cloud Data Platform Engineer

Leading the design and implementation of Central Data Quality and Validation Framework.

Senior Data Engineer

Built an Agent Risk Profile Dashboard using Springboot, DynamoDB, React and MSAAS on Kubernetees Cluster.

Built Risk Control and Analytics Platform using Spark,Kafka, DynamoDB and Drools(Rules Engine) for both Real-time streaming data and batch processing. Requirements gathering, design, and implementation of ETL pipelines for MINT(Personalised Financial Platform) and workforce/ recruitment analytics. Migration of multiple ETL pipelines to AWS from private servers. Real-time Analytics using AWS Kinesis and Lambda on streaming data. Generic Data Ingestion Engine: Continuously Developing an extensible framework in python which enables ingesting data from multiple sources to the data warehouse(Vertica/S3).

Data Parity Checker For AWS migration: Develop tools and scripts to ensure data movement to cloud and parallel pipelines in the cloud are reliable and stable. AWS EMR MapReduce Jobs to implement data filters to query application level encrypted fields for data security reasons.

Hands-Free Operations: Developed a framework for running and monitoring multiple ETL pipelines and alerting when required by continuously monitoring logs from multiple sources.

Data Insights: Worked in a team to produce valuable data insights for TurboTax. Communicating with various business groups for requirement gathering and establishing end-to-end ETL pipelines solutions.

Cloud Computing, IIIT Hyderabad

Research Intern

Exploratory Text Analytics : Intelligent Navigated Browser. Used word embedding techniques such as Word2Vec for improved Topic Modeling for Information retrieval applications.

Information Retrieval And Extraction, IIIT Hyderabad Prateek Mehta

Prateek Mehta 1

Google Summer Of Code'15 -

Drupal

May 2015 — August 2015

Datapub (Startup) March 2014 — August 2014

CA Technologies October 2012 — August 2013

Student Intern

Developed a module for Drupal 8 to embed URLs using WYSIWYG editors. Research Intern

Web Analytics : Built a web application for bloggers' community, which helps them understand what to write, when to write,how popular their blogs (virality) are and how they are known to the outside world. Piwik was used for tracking & analysis. Research Intern

Big Data Analysis on Cloud : Built a semi-structured log anlytics tool using Big Data technologies like Hadoop, Solr Cloud, ZooKeeper and Mahout. EDUCATION B.Tech + MS(By Research) in

Computer Science and

Technology

2011 — 2016

International Institute Of Information Technology, Hyderabad Worked at Search and Information Extraction Lab as a Research Assistant under Prof. Vasudeva Varma. CGPA : 7.5

ACADEMIC

PROJECTS

Improved Topic Models For Contemporary Documents

Community-based document aggregation scheme for improved topic models for social media documents. Utilized various network topography and user attribute based features to establish user communities within a social network whose documents can be pooled together to come up with efficient Topic Models that can be shared amongst them.

Social LDA (Latent Dirichlet allocation)

Implemented a pipeline and extended 'Scalable Topic Modeling in Social Networks' on Twitter data.

Temporal Analysis Of Wikipedia Entity's Attribute

Outlier detection using temporal analysis of attributes of Wikipedia entities. Twitter-based chatbot

Rank Twitter conversations to build a Q&A system using indexed tweets. Used Lucene for indexing and searching.

Twitter News Extraction and Classification

Multi-Label Classification of News related tweets using Labeled LDA and SVM. Cricket Report Analysis

Mapping sentences of cricket report to the ball no in ball-by-ball commentary using language model and statistical heuristics.

Distributed Searchable Key-Value Store

Built a tool to provide efficient storage and fast retrieval mechanism (even on secondary keys), and maintains Strong Consistency, Availability and Fault Tolerance. Used concepts of HyperSpace Hashing and Value-Dependent Chaining. Prateek Mehta 2

Cloud Orchestration Layer

Implemented a cloud orchestration layer using Libvirt, Ceph etc. ACHIEVEMENTS CG(Business Unit) Engineering Excellence Award for Two consecutive quarters at Intuit.

Certificate of Achievement for Wildlife Animal Detection and databases in Tech exhibition in an Inter College university fest.

Certificate of Merit for English from CBSE in class X. Elected Member of the Student Senate and Cultural Council and Entrepreneurship Cell.

Worked as Manager (Finance) at Junior Achievement company in High School. SKILLS Operating System Linux based OS (Fedora, Ubuntu), Microsoft Windows. Programming Languages Python (Scipy, Numpy, Scikit-learn, Gensim), R, C, C++

(OpenGL), PHP, Java.

Database Management: Vertica, RDBMS (primarily MySQL) and NoSQL based databases such as Hyperdex, Solr 4, MongoDB.

Big Data Technologies: Spark,Hadoop, Mahout, Kafka,Zoopkeeper Other Tools: Amazon Web Services (AWS), Ceph, Gephi, TweeterAPI, NLTK, Piwik, Splunk, NiFi.

Prateek Mehta 3

Contact this candidate