Data Analyst

Location:

Knoxville, TN

Posted:

July 23, 2017

Contact this candidate

Resume:

Yogitha Yadhav Chinni

https://www.linkedin.com/in/yogithayadhavchinni-537b129a

**** ********* ***,********* ***********@**.***.*** TN, 37932 +1-925-***-****

CAREER OBJECTIVE

As a passionate data science thinker, I believe that an organization can optimize its resources by precisely tapping the immense potential of its data. With my technical and analytical abilities, I find the challenges rewarding in analyzing huge chunks of data that benefit the organization. SUMMARY

Experience as data analyst in retrieving, analyzing and reporting data to the management.

Analyzed as well as compared the study of data and historical data by using ETL.

Have worked on twitter sentiment analysis projects with MySQL and latest version of python packages like tweepy, numpy, scipy.

Hands on experience in text mining using R programming.

Designed an android application with embedded email client and SQLite Database.

Experience in retrieving JSON data from twitter and used Hive to analyze the number of tweets.

Used SVM classifier in analyzing Text Data.

Have good working knowledge of HTML5 and CSS3.

Efficiently helped in software development life cycle processes including analysis, design WORK EXPERIENCE

Jewelry Television (jtv), Knoxville, US May 2017

Data Scientist Intern (Enterprise Data Warehouse team)

Developed dashboards using R programming and machine learning algorithms and then determined appropriate design of reports.

Integrated R into Micro Strategy to expose metrics determined by more sophisticated and detailed models than natively available in the tool.

Performed quality assurance on all deliverables and also presented deliverables to management and executives.

RKS Engineers and Consultants, India Jul 2014-Jul 2015 Data Analyst

Performed Comparative analysis between LED and Florescent bulbs by collecting data related to energy consumption and cost to manufacture.

R software is employed in analyzing the datasets and conclusion is that LED bulbs are more Energy saving compared to fluorescent bulbs

APEPDCL -Visakhapatnam, India April 2013-June2013

Industrial Trainee

Analyzed the Sub-Station power distribution process and protection methods applied to different equipment.

Calculated the load distribution on economic dispatch for the entire city considering sector wise priorities and gained experience on control methods employed for sub-station equipment to achieve effective operation. TECHNICAL SKILLS

Programming Languages : Hadoop, Python, SQL, R, PSPICE, MATLAB, C, C++, Microstrategy Web Technologies : HTML5, CSS3, JavaScript

Database : MySQL, SQLite

Operating systems : Linux, MacOsx, MS Windows

IDE : Android studio, Visual Studio, Eclipse,Jupyter Notebook. Ms Office : MSWord, MS Excel

EDUCATION

Florida Institute of Technology, Melbourne, FL Dec 2016 Master of Science in Computer Engineering GPA 3.6/4.0 Coursework: Computer Networking, Applied Discrete Mathematics, High Performance Computing, Data Mining (Audit), Machine Learning (Audit).

JNTUK University, India May 2014

Bachelor of Technology in Electrical Engineering GPA 3.2/4.0 Coursework: Computer Networking, Object oriented programming,C++, VLSI Design, Microcontrollers, Management Science, Database Design, Probability and Statistics. ACADEMIC PROJECTS

Logistic Regression analysis of Health care data Jan 2017

Selected the features of the dataset as column names, the main 4 columns were numeric columns, date columns, text labels like gender info and text label with multiple info like applicant city, applicant state.

Divided the dataset into training set and test set using random number generator.

As the csv file has unique values then we use categorical variable for prediction. Model.fit command is used for pattern identification and the training data is given as input and the fit function predicts the output model on test data.

Evaluated the model accuracy by using ROC curve. Text classification using SVM Classifier Dec 2016

Loaded the datasets [“comp.graphics”, “rec.auto”] from 20newsgroup dataset and after matching the returned dataset i.e. scikit-learn “bunch” is accessed as python object attribute.

Converted the text content in to numerical feature vector by assigning integer id. Count- Vectorizer and sklearn.feature are employed in text preprocessing, tokenizing and filtering.

Predicted the outcome by SVM classifier using learner sklearn.linear_model and SGDclassifier in the pipeline.

JSON Data Analysis Using Hive Nov 2016

The word #Hive in Json format was extracted from twitter.

Implemented Json SerDe interface for the hive in order to process the Json data.

Created a schema for a single user and it is implemented for every user and the hashtags are in string format and user data like location, account verification, followers, name etc. which are generally of different data types are consolidated into struct.

After creating a schema, row format serde “com.cloudera.hive.serde.JSONSerde” was implemented as JSONSerde class can identify details about the user.

Used ‘Select count from tweet’ command in order to find the number of tweets. Storing Twitter Data in MySQL Sep 2016

Twitter data about IPhone 7 reviews is streamed via tweepy, which is a python module.

JSON load command is used in loading the appropriate string data into database.

Imported MySQL dB, JSON module to interact with local database and streamed tweet data. And Stream listener object is employed in tracking real time tweets. Employee Database Application Jul 2016

Created app for the manager to quickly view his employee’s project details and to interact with them via email client.

Designed a rich layout using Layout editor and used SQLite database to store and fetch data. Text Mining Using R Aug 2016

Analyzed the frequency of the word #Big data for a period of two days based on the tweets and generated consumer key and consumer secret by creating a twitter application.

Used function called OAuth Factory for authorizing the application and given arguments like the search term, number of tweets to return and the downloaded file.

Used head function # Big data as a list and in order to pull the requested number of tweets, length function is employed and then this list of big data is converted to comma separated value file.

CERTIFICATIONS

Certified in Big Data Foundations by IBM CN-BD0101EN.

Certified in Hadoop by IBM CN-BD0111EN.

Contact this candidate