Hadoop Developer Data Scientist

Location:

Philadelphia, PA

Posted:

June 03, 2022

Contact this candidate

Resume:

GIRIJA YERRAMSETTY

801-***-****

*********@*****.***

https://www.linkedin.com/in/girija-vasagiri-a1040168/ Data scientist

PROFESSIONAL SUMMARY

As a Data scientist, seeking an opportunity with a passion for using statistics and machine learning to solve complex problems and deliver valuable insights from data. Enthusiastic about explaining the value of data- driven analytics to non-technical users and focused on spotting and solving business problems using analytics and machine learning. Efficient, versatile and a quick learner with the new technologies

Having overall 8 years of experience in IT Industry, out of which 5 years 7 month of experience in Data Science and 2 years 5 months in Big Data Hadoop Technology.

Currently working as technical lead at Wipro, Bangalore and worked for projects related to Bigdata consumer cloud Analytics.

Experience in the field of Data Analytics and Machine learning Algorithms with good knowledge and hands-on experience in R, Python and SAS EMINER and DATAIKU platform.

Responsible for implementing data mining and statistical machine learning solutions to various business problems.

Analyze large volumes of data to generate insights and actionable recommendations to drive business growth.

Proficient in Machine Learning Algorithms such as Decision Trees, Random Forest, Gradient Boosting, Support Vector Machines, K Mean Clustering, Naïve Bayes, and Artificial Neural Networks.

Good knowledge and understanding in Statistical Modeling, Time Series Analysis, Optimization, Data mining, Machine learning techniques and algorithms, Cluster Analysis.

Worked on projects dealing with Text analytics, Predictive and inferential by using different algorithms to solve and get insights from data.

Technically accomplished of Big Data Eco Systems experience in ingestion, storage, querying, processing and analysis of Big Data.

In-depth knowledge and hands- on experience in dealing with Apache Hadoop components like HDFS, Map Reduce, HiveQL, Pig, Hive, Sqoop, Oozie, Flume.

Worked on importing and exporting data from different databases like Oracle, MySQL into HDFS and Hive using Sqoop.

Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.

Experience in writing HiveQL queries to store processed data into Hive tables for analysis.

Excellent understanding and knowledge of NOSQL databases like HBase, MongoDB.

Knowledge on latest languages like Kafka, Spark, Scala.

Knowledge of Software Development Life Cycle (SDLC).

Efficient in Core Java, and Object-Oriented Programming Concepts.

Building complete ETL and Machine and deep learning models in DSS in Dataiku.

Experience in interaction with clients to collect systems requirements, specifications, timelines and actively involved in stakeholder management.

Experience in preparing descriptive analytics using visualization tool using Tableau.

Experienced in providing KT sessions to the team.

TECHNICAL SKILLS

Programming Languages

Java, C, R, PYTHON, PYSPARK, SCALA

Frameworks

DATAIKU, RASA, LUIS, KERAS, Azure, TENSORFLOW

Web Service

Web Service (RESTful and SOAP)

Web Technologies

JavaScript, HTML5, JSON, XML, MFlow

Databases

Oracle, SQL, NoSQL, My SQL, Spark SQL

Methodologies

UML

IDE (Integrated Development Environment)

PyCharm, Jupiter Notebook, DSS in DATAIKU

Visualization Tools

TABLEAU

Operating Systems

Microsoft Windows, Linux and Unix

Bigdata Ecosystems

Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Spark, Pig, Sqoop, Flume, Impala, Oozie, Kafka, Strom

Machine Leaning

Text Mining (NLP, NER, Sentiment), Decision trees, SVM, SVD, Naïve Bayes, KNN, K-MEANS, Random Forest

Deep Learning

ANN, RNN, LSTM time series, Deep FM

Statiscal Learning

Linear Regression, Logistic Regression, Timeseries

Domain Experience

Pharma, Government (Finance, Agriculture), Telecom, Trading, Retail, IT hardware and Chatbot

EDUCATIONAL QUALIFICATION

Completed Certificate Program in Big Data Analytics & Optimization at International School of Engineering, Hyderabad. The only program in India is certified for the quality, pedagogy and assessment by LTI of Carnegie Mellon University.

Master of Technology (MTech.) in Computer Science from JNTU, Anantapur.

Bachelor of Technology (B.Tech.) in Computer Science from JNTU, Hyderabad

WORK EXPERIENCE

Currently working as technical lead at Wipro, Bangalore from 22nd JJune 2020 to till date.

Worked as Application Development Sr. Analyst at Accenture, Hyderabad from 29th June 2018 to 10thJanuary 2020.

Worked as Data Scientist at OTSI, Hyderabad from 29th May 2017 to 27th June 2018.

Worked as Software Engineer at Adicent Information technology Pvt ltd, Hyderabad from 19th August 2013 to 25h May 2017.

PROJECTS AND RESPONSIBILITIES

TECH LEAD /DATA SCIENTIST HPI, USA 08/2021 – PRESENT

HPI Marketing Data Lake

The main goal is to create & deploy a fully integrated, sharable data lake work focuses on Search, Media, Partners & Consumer Activity .Gather data automatically from partner sources & HP systems related to business needs requested by the analytics teams. To understand of how HP products price changes in one time impact future periods by looking at what has happened before and how that impacts demand using Dataiku workspace for data analytics (DSS). Migration of the ETLs from COE_ETL to Marketing Data Lake.

Responsibilities:

Ingestion from data sources like orca DB, GCW, dataroma etc.

Understanding the data and data preprocessing steps.

Handling the huge of amount data and level up the partitioning.

Writing PySpark, Spark sql and Python coding, applying visual recipes such as Prepare, join, split, filter, sync and sharing the ETL dataset to S3buket to implement the forecasting model.

Prepare Time series data for Forecasting model. Understand time series forecasting methods such as stationarity, horizon, trend, and seasonality

Evaluate time series forecasting model performance and accuracy

Applying Feature Engineering to give better accuracy values with optimization parameters.

Deploying the dss datset to MMM_market data lake server.

Creation of supporting documentation using templates to document code, installation plans, test plans and cases. Visualizations of all products for comparing the trend and providing insights from the data.

Environment and Technologies: Dataiku, Pyspark, Python, Spark SQL, General formulas

Tech Lead /Data Scientist Huawei, Russia/China 06/2020 -07/2021

Huawei Ads & Video Recommendation

The main goal of Ads & Video Recommendation system is to predict and improve the performance of CTR, the project proposes as integrated approach by combining feature engineering with techniques from machine learning models like LR, FFM, FTRL, GEM are used. Apps (Hi board, Browser etc.) of Huawei phone show ads and Video to the user and to decide which ads & Videos to show.

Responsibilities:

Understanding the data and data preprocessing steps

Writing Hive Queries and PySpark, Python coding.

Analytics using Machine learning algorithms like LR, FFM, FTRL, Random Forest and optimizing techniques.

Training and testing the model to find AUC, Group auc AND COPC values in exploratory.

Analysis on Group auc to find top slots_id and task_id which is giving the balanced COPC.

Applying Feature Engineering to give better accuracy values with optimization parameters.

Deploying the code into Hadoop server.

Creation of supporting documentation using templates to document code, installation plans, test plans and cases.

Visualizations of each ad for comparing the trend.

Involved in providing maintenance and support for the application after its rollout.

Environment and Technologies: Hadoop, Spark, Python, Pyspark, Scala, Spark SQL

SR. ANALYST/ SR. APPLICATION DESIGNER MYWIZARD-N-DIM

11/2018 - 11/2019

Mywizard Analytics

Assist in defining requirements and designing applications to meet business process and application requirements. Starting from data analytics is the best way to understand current processes health, quality, and efficiency level to understand painful areas, spot improvement opportunities and prioritize further evaluations. If data is not available, a quick collection must be set up to have a detailed snapshot of contract processes. Ticket historical record of the last 6 months or more. Including, classification against project dimensions, resolution time and effort, problem, and resolution description. Detailed effort spending classification against support type, rework type, line of business, team, location, activity type, etc.

Responsibilities:

Coordinating with clients to cover the functionality, technical issues, and scope of work.

Setup environments for development box.

Understanding the data and data preprocessing steps.

Writing Database Queries, Procedures, and web scrapping to extract the data.

Text Analytics using Machine learning tool (K Means).

Descriptive Analytics using Tableau to bring the insights from the data.

Implementing Automation Blueprint.

Predictive Modeling to analyze the current and proposed service management process by simulating multiple as is/what if scenarios to forecast various decision-making parameters.

Providing recommendation to the client.

Environment and Technologies: Python, Tableau

SR. ANALYST/DEVELOPER MY TECH HELP, ACCENTURE IND 06/2018-10/2018

My Tech Help-Building chatbots and assistants with the open-source conversational AI framework. The main aim of this project is that which allow the developers to expand the chatbots and voice assistants beyond answering simple questions. Using the state-of-the-art machine learning, our bots can hold contextual conversations with the users.

Responsibilities:

Involved in creating technical design documents out of requirement documents.

Understanding the RASA stack.

Installing RASA on the machine.

Starting building chatbots.

Understanding the Luis.

Creating utterances and intents and training and testing the utterances

Implemented the chatbot communication with the user using Python.

Applied machine and deep learning algorithms such as ANN, SVM, LSTM RNN

Environment and Technologies: RASA, LUIS, PYTHON

DATA SCIENTIST CM DASHBOARD, TN 06/2017-06/2018

Datalytics -Agriculture

Our Understanding is that the Government want to keep a maintenance track of assets. If these assets are not correct interval of time, then there might be a possibility that they are no longer in use. In such a case there should be kind of alert a regular interval of time to get the assets checked. Regular checking will help them to save a lot of money and assets life will increase. Understanding the agriculture today, the main avenues for increasing crop productivity are optimizing farming practices through precision agriculture and accelerating crop improvements through effective fertilizer recommendations based on the health of the soil.

Responsibilities:

Involved in creating technical design documents out of requirement documents.

Importing the data dump from given by government and has deployed to oracle database at our end.

Identify the key variables, perform the required data cleaning, data manipulations and data preparation.

Understanding the data and preparing the necessary inputs to the data.

Document collection and Applied SVD for the matrix to reduce dimensionality.

Calculated cosine similarity to find out similarity between farmers and apply it to farmers similarity matrix. Applied machine learning algorithm using KNN.

Evaluating the advanced machine learning techniques such as correlation analysis & Fertilizer recommendation.

Environment and Technologies: R, Python, SAS EMINER

DATA SCIENTIST LIFELINE HEALTH SERVICES, USA 04/2016-05/2017

Text Classification Problem

Lifeline Health Services is a top ranked Health care provider in USA with stellar credentials and provides high quality-care with focus on end-to-end Health care services. The Health Care Services range from basic medical diagnostics to critical emergency services. The provider follows a ticketing system for all the telephonic calls received across all the departments. Calls to the provider can be for New Appointment, Cancellation, Lab Queries, Medical Refills, Insurance Related, and General Doctor Advise etc. The Tickets have the details of Summary of the call and description of the calls written by various staff members with no standard text guidelines. The challenge is, based on the Text in the Summary and Description of the call, the ticket is to be classified to Appropriate Category.

Responsibilities:

Involved in creating technical design documents out of requirement documents.

Identify the key variables, perform the required data cleaning, data manipulations and data preparation.

Understanding the data and preparing the necessary inputs to the data.

Documents collection, preprocessing, Tokenization.

Removing Stop words, stemming word, indexing.

Applied confusion matrix to find the accuracy threshold value.

Train and evaluate Models using SVM, DECISION TREES, NAÏVE BAYES.

Environment and Technologies: R, Python

HADOOP DEVELOPER CRIME DEPARTMENT, USA 08/2013-11/2014

Crime incident Analysis

This project talks about crime incidents that happened in the city of San Francisco in the last 3 years. This application is to employ Hadoop ecosystem technologies for analyzing semi- structured Data to find out Relative frequencies of different types of crime incidents. Given time and location, must predict the category of crime that occurred Crime occurrence frequency as a function of day of the week, Crime occurrence frequency as a function of hour of the day.

Responsibilities:

Involved in creating technical design documents out of requirement documents.

Identify the key variables, perform the required data cleaning, data manipulations and data preparation.

Understanding the data and preparing the necessary inputs to the data.

Stored the relational data in an RDBMS. Used Sqoop to import it into Hadoop. Relative frequencies of different types of crime incidents.

Writing Map Reduce jobs in java for data cleaning and preprocessing and loading data from Linux local file system to HDFS.

Defining job flows, managing, and reviewing log files, loading, and transforming of large sets of structured and semi structured data.

Created components like Hive UDFs for missing functionality in Hive for analytics. Developed Hive queries for the analysis of data.

Hive queries to set performance tuning parameters to increase overall performance. Migrating the data to MySQL from Hive using Sqoop.

Environment and Technologies: Hadoop, HDFS, HIVE, SQOOP, MYSQL

Participants/Achievements

Completed certification on IBM WATSON Application developer.

Attended two- day Workshop on Network Security conducted by IIT, Delhi.

Published paper on “Inpainting with Extreme Firmness” in international conference conducted in the year 2014 by WARSE.

Holding life membership in ISTE (Indian Society for Technical Education).

Currently working on the Journal on "Constructing a System for Sentiment Analysis for Emotional Analysis of Music in Social media using Machine Learning Techniques".

Completed Dataiku Core Designer certification and Machine learning certification.

Received multiple appreciations from the client end during the project deliverables.

Contact this candidate