Sign in

Data Engineer

Austin, Texas, United States
February 25, 2019

Contact this candidate


Rutuja Shah

Phone: 571-***-**** Email: SUMMARY

A very good team player with the ability to work independently and ready to accept challenges and learn new technologies.

Excellent in writing, communication, presentation and interpersonal skills.

Experienced in fields of Computer Science includes Machine Learning, Artificial Intelligence, Big Data Hadoop Development, Python and Java/J2EE Technologies.

Implemented Machine Learning, Computer Vision, Deep Learning and Neural Networks, Convolutional Neural Networks(CNN) algorithms using TensorFlow and designed Prediction Model using Data Mining Techniques with help of Python, NLTK, Twitter Rest API, Twitter Stream API, Gmail API, Facebook API and Libraries like NumPy, Tweepy, SciPy, Matplotlib, Pandas, Scikit-learn.

Developed and implemented Big Data solutions on Hadoop using Pig, Hive, Spark, Flume, Sqoop, Kafka, Scala, HBase, Cassandra.

Developed core modules in large cross-platform applications using JAVA, J2EE, Hibernate, Spring MVC, JSP, Servlets, JDBC, Amazon Web Services-EC2, S3 Buckets, JavaScript, XML, Angular JS4, CSS3, Bootstrap and HTML5, Oracle, MS SQL Server, MySQL RDBMS databases.

Detailed understanding of Software Development Life Cycle (SDLC) and methodologies including Waterfall and Agile. EDUCATION

Master of Science, Computer Science

University at Albany (State University of New York) Major in Computer Science (GPA: 3.7/4)

AUG’17 - DEC’18

Bachelor of Engineering, Computer Engineering

Nirma University, India (CGPA: 7.5/10)

AUG’13 - MAY’17


Languages: C, C++, Core Java, J2EE-Spring, Python, Scala (Basic).

Technologies: Hadoop 2.x(Cloudera CDH), Sqoop, Flume, Kafka, Spark, PySpark, AngularJS - 2, TypeScript, HTML5, CSS3, JavaScript, JSTLJSP, Bootstrap, AJAX, XML, jQuery, JSON, Maven.

Frameworks: Hibernate, Spring

Data Modelling: Spark, Hadoop, Supervised Learning (Regression, Decision Trees, SVM), Un-Super Learning (K-Mean Clustering, PCA), Deep Learning (Convolutional Neural Networks, Deep Learning, TensorFlow), Map-Reduce, Scikit-Learn

Database: AWS-RDS, AWS-Redshitft, NoSQL- Mongo DB (Basic), Cassandra, HBase, Hive, Pig, MySQL, Oracle, MS SQL Server, sqlite3, RDBMS, SQLite.

Cloud Technologies: Amazon Web Services - EC2, S3 Buckets.

SDLC: Agile Software Development- Scrum, Waterfall Model.

Version Control: GitHub, SVN, GitLab.

Operating Systems: Linux, Windows

Tools: Eclipse, NetBeans, STS, PyCharm, Anaconda, PostgreSQL, MySQL Workbench, WebStorm, Postman, Visual Studio.




Machine Learning Engineer Center For Technology In Government (UAlbany) NOV’17 - DEC’18 Project: STREET & BUILDING COMPUTER VISION – Project to identify and track components of properties visible from the street. Environment: Python, Hadoop 2.x, Apache Spark, Hive, Sqoop, Amazon Web Services EC2 and S3 bucket, Boto3, Redshift, GraphX, SQL, TensorFlow

Extracted data from AWS-EC2 and S3 using boto3 library in python.

Utilized pandas and numpy python libraries for data manipulation and data processing and stored process data into .dat and .pkl file format for validation and further analysis of data

Collected data from AWS Redshift and perform dataset merging and update operations with the help of pandas data frames and stored processed data into RDBMS schemas.

Composed Spark jobs to perform Data Quality checks with help of pyspark and Created a table in Hive to hold the generated data in database in-order to perform further analysis on the data.

Manipulated Spark RDD with help of Spark SQL library and used Graph X for data visualization.

Researched and evaluated different classification techniques for object recognition.; the Single shot detection (SSD) for bounding the framework and Neural Network model Mobile Net coco

Prepared datasets for model evaluation, labeled image data, transformed data formats and trained model on GPU server.

Evaluate accuracy of trained models with test datasets while also capturing frame per second speed of the model Project: CHATBOT – Project to evaluate how Intelligent Chatbot’s can be used to access Open Data Sets. First phase was to develop an Intelligent Chatbot.

Environment: Python, sqlite3, TensorFlow, Neural Networks

Performed data Extraction Reddit data from using Reddit API in Python3

Transformed the large raw data in RDBMS database using sqlite3.

Loading (ETL) to obtain data for training and testing dataset for the prediction model.

Trained data using TensorFlow translation seq2seq library and designed Bidirectional Recurrent Neural Networks (BRNN) as a prediction model.

Frontend Engineer Oizom Instruments Pvt. Ltd. JAN’ 17 – MAY’ 17 Project: Aims at giving a real-time air quality data by monitoring the environment. Environment: AngularJS4, SCSS, HTML, Bootstrap, TypeScript, Express.js

Collected the data using the sensors via GSM Networks and stored in the IBM Cloudant database.

Build IOT services to connect the application with the database using real time APIs.

Designed user friendly interface using AngularJS4, SCSS, HTML, Bootstrap, TypeScript and API’s are developed using Express.js. ACADEMIC PROJECTS

STAMPING (Role: Software Developer) JAN’18 – MAY’18 Project- Spring MVC based web application, a media library to share photos with friends, allows user to upload posts, add comments and likes the posts with a better user experience.

Environment: J2EE(Spring), MySQL, Hibernate, Apache Tomcat, Html5, CSS3, Bootstrap, JAX-RS, Hibernate, Apache Tomcat, Amazon Web Services-EC2, S3 bucket.

Designed user interface (UI) using HTML5, CSS3, Bootstrap, JSP, JSTL, Angular 4.

Implemented Singleton design pattern and configured and integrated business logic with Spring MVC and spring web layer to manage actions as beans and set their dependencies in a spring context file.

Designed secured login with the help of Facebook API and used for the integration of friends for the application.

Used Maven for building application and deployed application on Apache Tomcat during development and stored data in RDMS using MYSQL.

Used AWS-RDS and Amazon- S3 buckets to store data on AWS and is hosted using AWS-EC2.

NLP WITH HADOOP (Role: Data Scientist) JAN’18 – MAY’18 Project- A Big Data project to extract Gmail data and build a prediction model using Hadoop technologies. Environment: Java, Hadoop 2.x, Apache Spark, Yarn, Pig, Hive, Sqoop, Scala, Akka, Cloudera CDH 5.12.

Extracted Gmail email data into plain text file utilizing Gmail API in Java.

Utilized Object Oriented Programming to create objects that can handle and easily manipulate the extracted data in Java.

Used Sqoop Script to Store extracted data into HDFS from RDBMS using Apache Sqoop and did further analysis through Hadoop. Implemented MapReduce Job utilizing Hadoop and Yarn to extract relevant chemical data from the generated text file and uploaded it on HDFS in a Key Value pair.

Composed Spark jobs to perform Data Quality checks and Created a table in Pig to hold the generated data in database in-order to perform further analysis on the data.

Mapped each inquiry to order conversion data on a plot to get a better understanding of clusters in the data.


Project- The project solves the problem of hidden or obscured objects on the street by detecting the objects by 3D modeling using Machine Learning algorithms.


A kernel builds at each patch of the 3D deformable vehicle model and associates them with constraints in 3D space.

Using Single Shot multi-box Detector (SSD) and multi scale YOLO and use of multiple kernels to represent several parts of each objects to form 3D deformable models for fast moving vehicles.

Object detection using YOLO method to detect the person or the vehicle. This 3D models help in tracking vehicles and improves detection performance.

The application is built using GPU with CUDA/CUDNN

Contact this candidate