GIRIJA YERRAMSETTY
*********@*****.***
https://www.linkedin.com/in/girija-vasagiri-a1040168/ Data scientist
PROFESSIONAL SUMMARY
As a Data scientist, seeking an opportunity with a passion for using statistics and machine learning to solve complex problems and deliver valuable insights from data. Enthusiastic about explaining the value of data- driven analytics to non-technical users and focused on spotting and solving business problems using analytics and machine learning. Efficient, versatile and a quick learner with the new technologies
Having overall 8 years of experience in IT Industry, out of which 5 years 7 month of experience in Data Science and 2 years 5 months in Big Data Hadoop Technology.
Currently working as technical lead at Wipro, Bangalore and worked for projects related to Bigdata consumer cloud Analytics.
Experience in the field of Data Analytics and Machine learning Algorithms with good knowledge and hands-on experience in R, Python and SAS EMINER and DATAIKU platform.
Responsible for implementing data mining and statistical machine learning solutions to various business problems.
Analyze large volumes of data to generate insights and actionable recommendations to drive business growth.
Proficient in Machine Learning Algorithms such as Decision Trees, Random Forest, Gradient Boosting, Support Vector Machines, K Mean Clustering, Naïve Bayes, and Artificial Neural Networks.
Good knowledge and understanding in Statistical Modeling, Time Series Analysis, Optimization, Data mining, Machine learning techniques and algorithms, Cluster Analysis.
Worked on projects dealing with Text analytics, Predictive and inferential by using different algorithms to solve and get insights from data.
Technically accomplished of Big Data Eco Systems experience in ingestion, storage, querying, processing and analysis of Big Data.
In-depth knowledge and hands- on experience in dealing with Apache Hadoop components like HDFS, Map Reduce, HiveQL, Pig, Hive, Sqoop, Oozie, Flume.
Worked on importing and exporting data from different databases like Oracle, MySQL into HDFS and Hive using Sqoop.
Experience in building Pig scripts to extract, transform and load data onto HDFS for processing.
Experience in writing HiveQL queries to store processed data into Hive tables for analysis.
Excellent understanding and knowledge of NOSQL databases like HBase, MongoDB.
Knowledge on latest languages like Kafka, Spark, Scala.
Knowledge of Software Development Life Cycle (SDLC).
Efficient in Core Java, and Object-Oriented Programming Concepts.
Building complete ETL and Machine and deep learning models in DSS in Dataiku.
Experience in interaction with clients to collect systems requirements, specifications, timelines and actively involved in stakeholder management.
Experience in preparing descriptive analytics using visualization tool using Tableau.
Experienced in providing KT sessions to the team.
TECHNICAL SKILLS
Programming Languages
Java, C, R, PYTHON, PYSPARK, SCALA
Frameworks
DATAIKU, RASA, LUIS, KERAS, Azure, TENSORFLOW
Web Service
Web Service (RESTful and SOAP)
Web Technologies
JavaScript, HTML5, JSON, XML, MFlow
Databases
Oracle, SQL, NoSQL, My SQL, Spark SQL
Methodologies
UML
IDE (Integrated Development Environment)
PyCharm, Jupiter Notebook, DSS in DATAIKU
Visualization Tools
TABLEAU
Operating Systems
Microsoft Windows, Linux and Unix
Bigdata Ecosystems
Hadoop, Map Reduce, HDFS, Zookeeper, Hive, Spark, Pig, Sqoop, Flume, Impala, Oozie, Kafka, Strom
Machine Leaning
Text Mining (NLP, NER, Sentiment), Decision trees, SVM, SVD, Naïve Bayes, KNN, K-MEANS, Random Forest
Deep Learning
ANN, RNN, LSTM time series, Deep FM
Statiscal Learning
Linear Regression, Logistic Regression, Timeseries
Domain Experience
Pharma, Government (Finance, Agriculture), Telecom, Trading, Retail, IT hardware and Chatbot
EDUCATIONAL QUALIFICATION
Completed Certificate Program in Big Data Analytics & Optimization at International School of Engineering, Hyderabad. The only program in India is certified for the quality, pedagogy and assessment by LTI of Carnegie Mellon University.
Master of Technology (MTech.) in Computer Science from JNTU, Anantapur.
Bachelor of Technology (B.Tech.) in Computer Science from JNTU, Hyderabad
WORK EXPERIENCE
Currently working as technical lead at Wipro, Bangalore from 22nd JJune 2020 to till date.
Worked as Application Development Sr. Analyst at Accenture, Hyderabad from 29th June 2018 to 10thJanuary 2020.
Worked as Data Scientist at OTSI, Hyderabad from 29th May 2017 to 27th June 2018.
Worked as Software Engineer at Adicent Information technology Pvt ltd, Hyderabad from 19th August 2013 to 25h May 2017.
PROJECTS AND RESPONSIBILITIES
TECH LEAD /DATA SCIENTIST HPI, USA 08/2021 – PRESENT
HPI Marketing Data Lake
The main goal is to create & deploy a fully integrated, sharable data lake work focuses on Search, Media, Partners & Consumer Activity .Gather data automatically from partner sources & HP systems related to business needs requested by the analytics teams. To understand of how HP products price changes in one time impact future periods by looking at what has happened before and how that impacts demand using Dataiku workspace for data analytics (DSS). Migration of the ETLs from COE_ETL to Marketing Data Lake.
Responsibilities:
Ingestion from data sources like orca DB, GCW, dataroma etc.
Understanding the data and data preprocessing steps.
Handling the huge of amount data and level up the partitioning.
Writing PySpark, Spark sql and Python coding, applying visual recipes such as Prepare, join, split, filter, sync and sharing the ETL dataset to S3buket to implement the forecasting model.
Prepare Time series data for Forecasting model. Understand time series forecasting methods such as stationarity, horizon, trend, and seasonality
Evaluate time series forecasting model performance and accuracy
Applying Feature Engineering to give better accuracy values with optimization parameters.
Deploying the dss datset to MMM_market data lake server.
Creation of supporting documentation using templates to document code, installation plans, test plans and cases. Visualizations of all products for comparing the trend and providing insights from the data.
Environment and Technologies: Dataiku, Pyspark, Python, Spark SQL, General formulas
Tech Lead /Data Scientist Huawei, Russia/China 06/2020 -07/2021
Huawei Ads & Video Recommendation
The main goal of Ads & Video Recommendation system is to predict and improve the performance of CTR, the project proposes as integrated approach by combining feature engineering with techniques from machine learning models like LR, FFM, FTRL, GEM are used. Apps (Hi board, Browser etc.) of Huawei phone show ads and Video to the user and to decide which ads & Videos to show.
Responsibilities:
Understanding the data and data preprocessing steps
Writing Hive Queries and PySpark, Python coding.
Analytics using Machine learning algorithms like LR, FFM, FTRL, Random Forest and optimizing techniques.
Training and testing the model to find AUC, Group auc AND COPC values in exploratory.
Analysis on Group auc to find top slots_id and task_id which is giving the balanced COPC.
Applying Feature Engineering to give better accuracy values with optimization parameters.
Deploying the code into Hadoop server.
Creation of supporting documentation using templates to document code, installation plans, test plans and cases.
Visualizations of each ad for comparing the trend.
Involved in providing maintenance and support for the application after its rollout.
Environment and Technologies: Hadoop, Spark, Python, Pyspark, Scala, Spark SQL
SR. ANALYST/ SR. APPLICATION DESIGNER MYWIZARD-N-DIM
11/2018 - 11/2019
Mywizard Analytics
Assist in defining requirements and designing applications to meet business process and application requirements. Starting from data analytics is the best way to understand current processes health, quality, and efficiency level to understand painful areas, spot improvement opportunities and prioritize further evaluations. If data is not available, a quick collection must be set up to have a detailed snapshot of contract processes. Ticket historical record of the last 6 months or more. Including, classification against project dimensions, resolution time and effort, problem, and resolution description. Detailed effort spending classification against support type, rework type, line of business, team, location, activity type, etc.
Responsibilities:
Coordinating with clients to cover the functionality, technical issues, and scope of work.
Setup environments for development box.
Understanding the data and data preprocessing steps.
Writing Database Queries, Procedures, and web scrapping to extract the data.
Text Analytics using Machine learning tool (K Means).
Descriptive Analytics using Tableau to bring the insights from the data.
Implementing Automation Blueprint.
Predictive Modeling to analyze the current and proposed service management process by simulating multiple as is/what if scenarios to forecast various decision-making parameters.
Providing recommendation to the client.
Environment and Technologies: Python, Tableau
SR. ANALYST/DEVELOPER MY TECH HELP, ACCENTURE IND 06/2018-10/2018
My Tech Help-Building chatbots and assistants with the open-source conversational AI framework. The main aim of this project is that which allow the developers to expand the chatbots and voice assistants beyond answering simple questions. Using the state-of-the-art machine learning, our bots can hold contextual conversations with the users.
Responsibilities:
Involved in creating technical design documents out of requirement documents.
Understanding the RASA stack.
Installing RASA on the machine.
Starting building chatbots.
Understanding the Luis.
Creating utterances and intents and training and testing the utterances
Implemented the chatbot communication with the user using Python.
Applied machine and deep learning algorithms such as ANN, SVM, LSTM RNN
Environment and Technologies: RASA, LUIS, PYTHON
DATA SCIENTIST CM DASHBOARD, TN 06/2017-06/2018
Datalytics -Agriculture
Our Understanding is that the Government want to keep a maintenance track of assets. If these assets are not correct interval of time, then there might be a possibility that they are no longer in use. In such a case there should be kind of alert a regular interval of time to get the assets checked. Regular checking will help them to save a lot of money and assets life will increase. Understanding the agriculture today, the main avenues for increasing crop productivity are optimizing farming practices through precision agriculture and accelerating crop improvements through effective fertilizer recommendations based on the health of the soil.
Responsibilities:
Involved in creating technical design documents out of requirement documents.
Importing the data dump from given by government and has deployed to oracle database at our end.
Identify the key variables, perform the required data cleaning, data manipulations and data preparation.
Understanding the data and preparing the necessary inputs to the data.
Document collection and Applied SVD for the matrix to reduce dimensionality.
Calculated cosine similarity to find out similarity between farmers and apply it to farmers similarity matrix. Applied machine learning algorithm using KNN.
Evaluating the advanced machine learning techniques such as correlation analysis & Fertilizer recommendation.
Environment and Technologies: R, Python, SAS EMINER
DATA SCIENTIST LIFELINE HEALTH SERVICES, USA 04/2016-05/2017
Text Classification Problem
Lifeline Health Services is a top ranked Health care provider in USA with stellar credentials and provides high quality-care with focus on end-to-end Health care services. The Health Care Services range from basic medical diagnostics to critical emergency services. The provider follows a ticketing system for all the telephonic calls received across all the departments. Calls to the provider can be for New Appointment, Cancellation, Lab Queries, Medical Refills, Insurance Related, and General Doctor Advise etc. The Tickets have the details of Summary of the call and description of the calls written by various staff members with no standard text guidelines. The challenge is, based on the Text in the Summary and Description of the call, the ticket is to be classified to Appropriate Category.
Responsibilities:
Involved in creating technical design documents out of requirement documents.
Identify the key variables, perform the required data cleaning, data manipulations and data preparation.
Understanding the data and preparing the necessary inputs to the data.
Documents collection, preprocessing, Tokenization.
Removing Stop words, stemming word, indexing.
Applied confusion matrix to find the accuracy threshold value.
Train and evaluate Models using SVM, DECISION TREES, NAÏVE BAYES.
Environment and Technologies: R, Python
HADOOP DEVELOPER CRIME DEPARTMENT, USA 08/2013-11/2014
Crime incident Analysis
This project talks about crime incidents that happened in the city of San Francisco in the last 3 years. This application is to employ Hadoop ecosystem technologies for analyzing semi- structured Data to find out Relative frequencies of different types of crime incidents. Given time and location, must predict the category of crime that occurred Crime occurrence frequency as a function of day of the week, Crime occurrence frequency as a function of hour of the day.
Responsibilities:
Involved in creating technical design documents out of requirement documents.
Identify the key variables, perform the required data cleaning, data manipulations and data preparation.
Understanding the data and preparing the necessary inputs to the data.
Stored the relational data in an RDBMS. Used Sqoop to import it into Hadoop. Relative frequencies of different types of crime incidents.
Writing Map Reduce jobs in java for data cleaning and preprocessing and loading data from Linux local file system to HDFS.
Defining job flows, managing, and reviewing log files, loading, and transforming of large sets of structured and semi structured data.
Created components like Hive UDFs for missing functionality in Hive for analytics. Developed Hive queries for the analysis of data.
Hive queries to set performance tuning parameters to increase overall performance. Migrating the data to MySQL from Hive using Sqoop.
Environment and Technologies: Hadoop, HDFS, HIVE, SQOOP, MYSQL
Participants/Achievements
Completed certification on IBM WATSON Application developer.
Attended two- day Workshop on Network Security conducted by IIT, Delhi.
Published paper on “Inpainting with Extreme Firmness” in international conference conducted in the year 2014 by WARSE.
Holding life membership in ISTE (Indian Society for Technical Education).
Currently working on the Journal on "Constructing a System for Sentiment Analysis for Emotional Analysis of Music in Social media using Machine Learning Techniques".
Completed Dataiku Core Designer certification and Machine learning certification.
Received multiple appreciations from the client end during the project deliverables.