Data Python

Location:

San Leandro, CA

Posted:

April 05, 2021

Contact this candidate

Resume:

Venu K Tangirala

********@*****.*** linkedin.com/in/venuktan/ 510-***-****

Summary

** ***** *********** ********** **** 9 years in Data Science and Big Data.

Extensive experience in data analytics and machine learning of various data sets.

Extensive experience in feature selection and data modeling.

Extensive experience in training Deep Learning models.

Extensive experience in Machine Learning with python and R.

Extensive experience in Convolutional Neural Networks(CNN).

Extensive experience in Data Visualization with python and R.

Expertise in Deep Learning with Tensor Flow, Keras and Theano.

Extensive experience in using big data tools like Spark, Hadoop, Cassandra, Elastic Search.

In depth understanding of HDFS and Map-Reduce framework.

Worked with Apache Spark in AWS, local, Databricks deployments.

Extensive experience with Cassandara and Elastic Search data modelling.

Experience in deploying scalable Hadoop clusters over cloud like amazon AWS, S3, EMR, ec2.

Setting up time series analysis for Cassandra.

Ability to play with various scalable data sets on a variety of platforms.

Experience in designing and developing applications spanning full life cycle of software development (SDLC) from writing functional specification, designing, documentation, unit testing and support.

SKILLS

Programming

Python, R, Java, Scala, C++, C

Datastores

MySQL, Hive, Cassandra, Dynamo DB, Elastic Search

ML Big Data Tools

Tensor Flow, Pytorch, Apache Spark, Deep Learning, Turi/Graph Lab, Theano, Keras, Hadoop, MapReduce, Cassandra, Kafka, Hive, Zookeeper, EMR, Caffe, Natural language processing (NLP), Ray, Modin, Kubeflow

Misc

S3, VMWare, Intellij, Maven, SBT, Machine learning, Recommendation Engine, GraphX, AWS, Unix, Git, ec2, Distributed computing, Object orient programming (OOPS)

EXPERIENCE

Adaptive Insights-Workday, CA Staff Data Scientist

Python, Deep Learning, Tensor flow, Keras, Docker, Kubeflow, Kubernetes, Airflow, Java April 2018-Present

Adaptive Insights has a Financial planning and Sales planning product

Apply machine learning to time series financial and sales data

Built ETL pipelines with airflow and apache spark.

Build LSTM time series models based on historical data and make predictions for the future

Write docker files to productionize the code

Compose k8s yamls for qa and production deployments

Build Temporal Convolution Network (TCN) for time series prediction.

Time series based anomaly detection for financial data

Time series data visualization with streamlit python

Cyngn, CA Perception Lead, AI Scientist

Python, Tensor Flow, Deep Learning, Spark, ROS Feb 2017-March2018

Cyngn is in the autonomous driving space

Built model for recognizing objects in images, worked with VGG-16 based models like Yolo, Single Shot Detector and SqueezeDet, MultiNet

Worked with Image segmentation models like SegNet, Mask RCNN, MultiNet

Lane detection with classical computer vision technologies

Deployed these model for edge processing on Nvidia TX2

Leeo, CA Data Scientist

Python, Machine Learning, Spark, Hadoop, R, Tensor Flow, AWS, Tensor Flow, OpenTSDB July2014-Feb2017

Leeo is in the IoT space building devices for home automation

We built an Iot device that listens to the audio when a Smoke, Co, Water Alarm.

Used a Random forest classifier as an audio predictor.

Used FFT and audio spectrogram as features to the Machine learning.

Built an Image similarity and classification with Deep learning in Tensor flow and Keras.

Used Convolution Neural Networks (CNN) for model building on dual Titan X GPUs.

Used pre-built VGG-16 CNN model in Keras for image classification.

Built a temperature and humidity calibration system.

Used Elastic search with spark for searching and indexing logs.

Used Linear regression model for data calibration.

User OpenTSDB for time series data logging and visualization.

nFlate, CA Co-Founder, Data Science Architect

Tensor Flow, Keras, Spark, Python, Machine Learning, Turi/Graph Lab, AWS, DynamoDB Dec2013-Mar2018

Built an image retrieval service based on similar images for searching products

Resnet transfer learning from Imagenet Model to custom dataset.

Used Nearest Neighbor’s (ANN) to find most similar items to an item.

I was responsible in building algorithms like Frequently bought together, Similar items, People that bought this also bought this based on collaborative filtering.

Built these recommendations on Apache Spark with Alternating Least Square fit.

Extracted the dominant colors in an image with k-means clustering.

Ran the dominant color extractor code on spark for scalability.

Used Elastic Search for searching products by color.

Made sure these were running on a daily batch basis in the AWS cloud with EMR.

Statistical simulation of data with R.

Cloudwick, CA Data Engineer

Spark, Hadoop, Mahout, Python, Java, Amazon’s Elastic MapReduce, R Apr2013-July2014

Used mahout for k means clustering of scattered photon data points.

Implemented namenode High Availability (HA).

Map Reduce code to clean and transform loaded data in HDFS.

Import data from open data sources from S3 and private clusters.

Build an automated data pipeline to import and pre-process.

Map Reduce code to clean and transform loaded data in HDFS.

Architect Cassandara tables based on primary key and cluster key

Used Hive data warehouse tool to analyze the unified historic data in HDFS to identify issues and behavioral patterns.

The Hive tables created as per requirement were internal or external tables defined with proper static and dynamic partitions, intended for efficiency.

Used the RegEx, JSON and Avro SerDe’s for serialization and de-serialization packaged with Hive to parse the contents of streamed log data.

Northwestern Polytechnic University, CA Doctoral Student

A Recommendation System based on Ratings and Textual Reviews July2010-April2015

Built a Collaborative Filtering Based Recommendation System with reviews.

Used textual features to feed to the CF algorithm for better prediction.

Used Latent Semantic Indexing to obtain higher dimensional features of the textual reviews.

Used python NLTK for text cleaning.

Used ALS for Matrix Factorization.

Ran this algorithm on a 10 node cluster to test for scalability.

Lawrence Berkeley National Lab, Berkeley, CA Research Assistant

Parallel Computing( C++, Python, Boost Python, Linux, Inter Process Communication, MPI) Sep2010-Apr2012

Analyzing massive data from the Linear Coherent Light Source(LCLS) at the SLAC National Accelerator Laboratory which is the world’s longest linear accelerator in real time by parallelizing processes.

Simulation and analysis of data from beam lines and understanding various applications.

IBM, Boulder, CO

Shared ID Boarding Tool- SIBT(Java, UNIX, DB2, Servlets, JSP, RAD, CMVC) Aug2009-Sep2010

Geometrics Inc., San Jose, CA

Fourier Transform (Java, JDBC, SQL, AWT) Dec2007-Dec2008

Database Management (ACT 6.0!, Microsoft BCM, SQL2005)

EDUCATION

Doctorate in Computer Engineering (2015)

Northwestern Polytechnic University, CA

M.S. in Computer Science (2008)

Northwestern Polytechnic University, CA

B.Tech. Electronics & Communications (2007)

Jawaharlal Nehru Technological University, Hyderabad

Cloudera Certified Hadoop Developer (2013)

Cloudera

Cloudera Certified Hadoop Administrator (2013)

Cloudera

Datastax Certified Cassandra Developer (2013)

Datastax

Elastic Search Developer (2016)

Elastic

PATENTS

Tangirala, Venu. 2018. Calibrating an environmental monitoring device. U.S. Patent Application 10026304, filed January 2015.

Tangirala, Venu. 2019. Prediction model training using detected anomalies. U.S. Patent Application 16/601,309, filed September 2019. Patent Pending

BLOG POSTS AND PUBLICATIONS

Venu Tangirala; Vectorized intersection over union (iou) in numpy and tensorflow; 03/02/2018; https://venuktan.wordpress.com/2018/03/02/vectorized-intersection-over-union-iou-in-numpy-and-tensor-flow/

Venu Tangirala; Setting up mahout and running recommender job; 12/28/2012; https://venuktan.wordpress.com/2012/12/28/setting-up-mahout-and-running-recommender-job/

Venu Tangirala; Running jobs on emr with data on s3; 12/27/2012; https://venuktan.wordpress.com/2012/12/27/running-jobs-on-emr-with-data-on-s3/

Venu Tangirala; Wordcount mapreduce from command line; 11/19/2012; https://venuktan.wordpress.com/2012/11/19/wordcount-mapreduce-from-command-line/

Venu Tangirala; Wordcount map reduce on Hadoop eclipse plugin; 11/19/2012; https://venuktan.wordpress.com/2012/11/19/wordcount-mapreduce-hadoop-eclipse-plugin/

Contact this candidate