CONTACT Bengaluru
India
E-mail: **.************@*****.***
Phone: +91-916*******
INTERESTS Data Engineering, ETL, Cloud Computing, Big Data, Information Retrieval and Extraction, Machine Learning, Web Mining, Social Network Analysis, Text Analytics
(NLP).
WORK
EXPERIENCE
Apple Inc August 2020 — Present
Intuit Inc September 2016 — August 2020
Teaching Assistant August 2015 — December 2015
Xerox Research Center India May 2015 — July 2015
Teaching Assistant Jan 2015 — May 2015
Cloud Data Platform Engineer
Leading the design and implementation of Central Data Quality and Validation Framework.
Senior Data Engineer
Built an Agent Risk Profile Dashboard using Springboot, DynamoDB, React and MSAAS on Kubernetees Cluster.
Built Risk Control and Analytics Platform using Spark,Kafka, DynamoDB and Drools(Rules Engine) for both Real-time streaming data and batch processing. Requirements gathering, design, and implementation of ETL pipelines for MINT(Personalised Financial Platform) and workforce/ recruitment analytics. Migration of multiple ETL pipelines to AWS from private servers. Real-time Analytics using AWS Kinesis and Lambda on streaming data. Generic Data Ingestion Engine: Continuously Developing an extensible framework in python which enables ingesting data from multiple sources to the data warehouse(Vertica/S3).
Data Parity Checker For AWS migration: Develop tools and scripts to ensure data movement to cloud and parallel pipelines in the cloud are reliable and stable. AWS EMR MapReduce Jobs to implement data filters to query application level encrypted fields for data security reasons.
Hands-Free Operations: Developed a framework for running and monitoring multiple ETL pipelines and alerting when required by continuously monitoring logs from multiple sources.
Data Insights: Worked in a team to produce valuable data insights for TurboTax. Communicating with various business groups for requirement gathering and establishing end-to-end ETL pipelines solutions.
Cloud Computing, IIIT Hyderabad
Research Intern
Exploratory Text Analytics : Intelligent Navigated Browser. Used word embedding techniques such as Word2Vec for improved Topic Modeling for Information retrieval applications.
Information Retrieval And Extraction, IIIT Hyderabad Prateek Mehta
Prateek Mehta 1
Google Summer Of Code'15 -
Drupal
May 2015 — August 2015
Datapub (Startup) March 2014 — August 2014
CA Technologies October 2012 — August 2013
Student Intern
Developed a module for Drupal 8 to embed URLs using WYSIWYG editors. Research Intern
Web Analytics : Built a web application for bloggers' community, which helps them understand what to write, when to write,how popular their blogs (virality) are and how they are known to the outside world. Piwik was used for tracking & analysis. Research Intern
Big Data Analysis on Cloud : Built a semi-structured log anlytics tool using Big Data technologies like Hadoop, Solr Cloud, ZooKeeper and Mahout. EDUCATION B.Tech + MS(By Research) in
Computer Science and
Technology
2011 — 2016
International Institute Of Information Technology, Hyderabad Worked at Search and Information Extraction Lab as a Research Assistant under Prof. Vasudeva Varma. CGPA : 7.5
ACADEMIC
PROJECTS
Improved Topic Models For Contemporary Documents
Community-based document aggregation scheme for improved topic models for social media documents. Utilized various network topography and user attribute based features to establish user communities within a social network whose documents can be pooled together to come up with efficient Topic Models that can be shared amongst them.
Social LDA (Latent Dirichlet allocation)
Implemented a pipeline and extended 'Scalable Topic Modeling in Social Networks' on Twitter data.
Temporal Analysis Of Wikipedia Entity's Attribute
Outlier detection using temporal analysis of attributes of Wikipedia entities. Twitter-based chatbot
Rank Twitter conversations to build a Q&A system using indexed tweets. Used Lucene for indexing and searching.
Twitter News Extraction and Classification
Multi-Label Classification of News related tweets using Labeled LDA and SVM. Cricket Report Analysis
Mapping sentences of cricket report to the ball no in ball-by-ball commentary using language model and statistical heuristics.
Distributed Searchable Key-Value Store
Built a tool to provide efficient storage and fast retrieval mechanism (even on secondary keys), and maintains Strong Consistency, Availability and Fault Tolerance. Used concepts of HyperSpace Hashing and Value-Dependent Chaining. Prateek Mehta 2
Cloud Orchestration Layer
Implemented a cloud orchestration layer using Libvirt, Ceph etc. ACHIEVEMENTS CG(Business Unit) Engineering Excellence Award for Two consecutive quarters at Intuit.
Certificate of Achievement for Wildlife Animal Detection and databases in Tech exhibition in an Inter College university fest.
Certificate of Merit for English from CBSE in class X. Elected Member of the Student Senate and Cultural Council and Entrepreneurship Cell.
Worked as Manager (Finance) at Junior Achievement company in High School. SKILLS Operating System Linux based OS (Fedora, Ubuntu), Microsoft Windows. Programming Languages Python (Scipy, Numpy, Scikit-learn, Gensim), R, C, C++
(OpenGL), PHP, Java.
Database Management: Vertica, RDBMS (primarily MySQL) and NoSQL based databases such as Hyperdex, Solr 4, MongoDB.
Big Data Technologies: Spark,Hadoop, Mahout, Kafka,Zoopkeeper Other Tools: Amazon Web Services (AWS), Ceph, Gephi, TweeterAPI, NLTK, Piwik, Splunk, NiFi.
Prateek Mehta 3