Data Engineer

Location:

Chandler, AZ

Posted:

November 18, 2020

Contact this candidate

Resume:

Yashaswini Vishwanath

Atlanta, GA +1-470-***-**** **********@*****.***

https://www.linkedin.com/in/hvyashbhat/

PROFESSIONAL SUMMARY

Machine Learning Professional with almost 6 years of experience in Quantitative Research, Acquisition of correct Datasets, Data Scrubbing to mine the target data, Data Engineering to extract features utilizing Statistical Techniques, Exploratory Data Analysis with an inquisitive mind, build diverse Machine Learning Algorithms for developing Predictive Models and design Stunning Visualizations to help the growth of Business Profitability.

Outstanding preeminence in Data extraction, Data cleaning, Data Loading, Statistical Data Analysis, Exploratory Data Analysis, Data Wrangling, Predictive Modeling using R, Python and data visualization using Power BI.

Profound knowledge in Machine Learning Algorithms like Linear, Non-linear and Logistic Regression, SVR, Reinforcement Learning, Natural Language Processing, Fuzzy Logic, Random forests, Ensemble Methods, Decision tree, Gradient-Boosting, K-NN, SVM, Naïve Bayes, Clustering (K-means), Deep Learning.

Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Neural Networks, Random Forest, Ensemble Models, SVM, KNN and K-means clustering.

Solid experience in Deep Learning techniques with Convolutional Neural Networks (CNN), Recursive Neural Networks (RNN), LSTM, Hierarchical RNN and different architectures for feature extractors such as EfficientNet, ResNet, Alexnet, VGG and Darknet.

Experience dealing with object detection algorithms such as – YOLO V4 for road signs, traffic signs and traffic signs localization and detection written in tensorflow and trained as transfer learning using LISA dataset.

Excellent proficiency in model validation and optimization with Model selection, Parameter tuning and K-fold cross validation.

Avid knowledge Docker containers – used in projects for ease of replication across various cloud platforms.

Multi GPU training on Nvidia machines for deep learning model training and CUDA.

Strong skills in Statistics Methodologies such as Hypothesis Testing, Principle Component Analysis (PCA).

Proficient with Python 3.x including Numpy, Scikit-learn, Pandas, Matplotlib and Seaborn.

Proficient at data visualization tools such as Tableau, Python Matplotlib, GGPlot and Seaborn.

Explicitly fashioned in writing SQL queries for various RDBMS such as SQL Server, MySQL, Microsoft SQL, Postgre SQL, Teradata and Oracle, NoSQL databases such as MongoDB, MongoDB GridFSHBase and Cassandra to handle unstructured data.

Solid understanding of RDBMS database concepts including Normalization and master in creating database objects such as tables, views, stored procedures, triggers, row-level audit tables, cursors, indexes and user-defined datatypes

Smart in investigating and scrutinizing large databases like Microsoft Azure, MongoDB, Oracle, SQL Server, DB2.

Rapidly evaluated Deep Learning frameworks, tools, techniques and approaches for Implementing and building data ingestion pipelines for Neural Networks that included CNNs and RNNs like LSTMs using TensorFlow and Keras.

Fashioned on different libraries related to Data science and Machine learning like Scikit-learn, OpenCV, NumPy, SciPy, Matplotlib, Tesseract, pandas, Seaborn, Bokeh, TensorFlow, Theano and Keras.

Deep Learning frameworks comfortable with are TensorFlow, Theano, Keras, PyTorch and Chainer.

Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, HBase, Hive.

Increased customer visibility by developing real-time insights to sales people and sales managers using Tableau, QlikView, Power BI, Matplotlib, ggplot2, Bokeh resulting in boosting the revenue by 10%.

Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining & reporting solutions that scales across massive volume of structured and unstructured data.

Super-eminent understanding of AWS (Amazon Web Services), S3, EC2, Amazon RDS, process and concepts. Developing Logical Data Architecture with adherence to Enterprise Architecture.

Experience with Data Analytics, Data Reporting, Ad-hoc Reporting, Graphs, Scales and OLAP reporting.

Progressive involvement in Software Development Life Cycle (SDLC), GIT, Agile methodology and SCRUM process. Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.

Robust participation for functioning in fast-paced multi-tasking environment both independently and in the collaborative team. motivated exuberant learner.

SKILLS

Databases

SQL Server, MS Access, AWS RDS, MySQL 5.5/5.6, Microsoft SQL, Postgre SQL.

NoSQL Databases

MongoDB 3.x, Hadoop HBase 0.98.

Programming Languages

Python 2.x/3.x, R 3.x, SQL, Swift, Scala, C/C++, MATLAB, Java, JavaScript.

Cloud Technologies

AWS, Microsoft Azure, GCP, Docker.

Querying Languages

SQL, NO SQL, PostgreSQL, MySQL, Microsoft SQL.

Markup languages

XML, HTML5, DHTML, XSLT, X Path, X Query and UML.

Other web skills

CSS, JSON, Python ORM, Flask and Django.

Deployment Tools

Anaconda Enterprise v5, R-Studio, Azure Machine Learning Studio, AWS Lambda.

Deep Learning Frameworks

Tensorflow, PyTorch, Chainer, Keras.

Testing and Version control Tools

JIRA, HP Quality Center ALM, Testrail, GIT.

Scalable Data Tools

Hadoop, Hive 1.x/2.x, Apache Spark 2.x, Map Reduce.

Operating Systems

Linux, Unix, Ubuntu, Debian, Centos, Windows, macOS.

Reporting & Visualization

Tableau, Matplotlib, Seaborn, ggplot, Microsoft VoTT, Microsoft Power BI, Qlik View.

EXPERIENCE

Computer Vision Engineer - Vouched, USA May 2020 - August 2020

●Research and development of OCR technology on scene text detection and recognition for document verification. Multi GPU training was performed on Nvidia GPU enclosing the training process in a docker and further utilizing continuity with tmux terminals.

●Research on network construction, NLP and model scaling up - used EfficientNet as a feature extractor for text recognition. CNN were used for localizing the textual patterns - lines, words, characters and LSTM to learn the recognition part of the network.

●Built synthetic dataset customized to ID documents embedded with real world noise. End-to-end word-wise accuracy increase of 6% was achieved with an 80:20 blend of real:synthetic training.

●The vast dataset involved exploratory analysis, image processing and data augmentation to make it trainable for which python libraries such as Numpy, Scikit-learn and OpenCV were utilised. To visualize the data, variations of libraries such as seaborn, matplotlib and ggplot were used.

●The framework used for model building and training is Chainer which comes with underlying CuPy that makes GPU training easier. Multi GPU training was done using help of chainer trainer ParallelUpdater that does data parallel computing.

●MongoDB used to store the heavy model as document storage. Explored GridFS for larger model sizes.

Graduate Research Assistant - Georgia State University, USA Jan 2019 - July 2020

●Graphology: Graphology Tool - Built a computer vision-oriented software which performs handwriting analysis on handwritten text image, revealing personality traits. The pattern detection algorithms are written in python and majorly uses OpenCV for more intrinsic textual image pattern analysis.

●Evaluated at 90% accuracy using user evaluation as the ground truth.

●Packages such as Numpy, Pandas were used for initial exploration, image processing via OpenCV, data visualization in matplotlib, seaborn and ggplot. PyTesseract was used for initial OCR.

●We have hosted our graphology tool at: Graphology Site and further developed an Android app. The tech stack is bootstrap 4, Python Flask for REST services and rendering HTML, ReactJS for front end GUI. The site is hosted on Azure cloud as a continuous deployment/integration mode enabled. The web application is containerized for ease of deployment across cloud platforms via Docker.

●Vehicular communication: Road and traffic signs detection - Worked on the scene perception module to detect US road signs and automobiles’ brake-lights. Implemented with Yolo v4, darkflow version trained on LISA dataset. Microsoft VoTT was used for annotations.

●The image/video analysis for scene frames were done on OpenCV as a preprocessing step. Segmentation was explored with Unet. Achieved 96% accuracy on validation set (US roads - collected by driving in down-town and suburbs).

●Model storage on Amazon S3 buckets and Lambda to trigger relevant OCR core services.

Computer Vision Engineer - Admatic Solutions, India Aug 2016 - Dec 2018

●Scene Text Detection and Recognition to build a whiteboard character recognition system trained on LSTM.

●Image processing and video processing of large data sets for making it trainable for image classification tasks.

●Automatic License Plate Recognition (ALPR) using Yolo v2 transfer learning written in tensorflow.

●ALPR tiny-Yolo implementation on Android and iOS mobile devices regulating model size using Tensorflow-Lite.

Software Engineer - Larsen and Toubro Infotech, India June 2015 - July 2016

●Developed RESTful Web Services for data warehousing application to depict real time events on the flight operations dashboard using java, spring boot, spring mvc.

●Successfully implemented EhCache to update flight details dynamically for every real time event mitigating the budget of querying DynamoDB each time.

●Author Unit Test cases and Integration Tests with JUnit and MOCKITO while engaging in Postman tasks for web service development and writing test suites for web service methods.

COURSE PROJECTS

Smart Whiteboard - Advanced Machine Learning

●Character recognition system built with Hierarchical RNN for solving simple mathematical equations. Crohme and Hamex dataset was used for training on AWS EC2 instances.

●Dataset of equations collected from users via Android app was used as a validation set with accuracy ~ 95%.

●The Android app was a simple screen stroke capturing and saving it to uint8 image jpeg format.

Full-stack Web Application - Databases and Web

●University Application Portal - Built three web applications for university application portals using REST Web services, React, HTML5, JavaScript, Python Flask and PostgreSQL.

●Clinic appointments tracking - Tech: OpenAPI, MongoDB, REST Web services - Python Flask, Web application for registering and tracking new/existing patients, managing appointments for a clinic.

CERTIFICATIONS

●Machine Learning Coursera and Deep Learning Coursera specialization by Andrew Ng

●Data Science - edWisor.com

●Deep Learning Udacity by Sebastian Thrun

EDUCATION

M.S. in Computer Science - Georgia State University Jan 2019 - August 2020

Coursework: Databases and Web, Advanced Operating Systems, Parallel Algorithms, Computer Vision, Machine Learning, Deep Learning.

B.S. in Computer Science - Don Bosco Institute of Technology Sept 2011 - June 2015

Coursework: Data Structures and Algorithms, Databases and Management Systems, Web Programming.

Contact this candidate