Data Python

Location:

Kernersville, NC

Posted:

December 10, 2020

Contact this candidate

Resume:

Prathyusha D *****.**********@*****.***

PH: 336-***-****

SUMMARY:

Highly efficient Data Scientist with 5 years of Professional experience in Data Analysis, Machine Learning, with large data sets of Structured and Unstructured data, Data Validation, Predictive modeling, Data Visualization.

Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K - fold cross-validation and data visualization.

Used Python to scrape, clean, and analyze large datasets and analyze raw data by cleaning and structuring and Perform Exploratory Analysis on Data.

Experience in using various packages in python-like models, NLP, pandas, NumPy, Seaborn, SciPy, Matplotlib, sci-kit-learn, TensorFlow, Keras, spicy.

Used Random Forest algorithm to help identify loyal customers and predict the likelihood of customers buying a recommended product.

Experience in Analysis, generating data visualizations using Python and creating dashboards using tools like Tableau.

Experience in using Statistical procedures and Machine Learning algorithms such as ANOVA, Clustering and Regression and Time Series Analysis to analyze data for further Model Building.

Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupyter Notebook 4.X, R 3.0(ggplot2) and Excel.

Hands on experience in implementing Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, Principle Component Analysis, Time Series Analysis, Predictive analysis, Gradient Boosted Trees, PCA, AUC, Accuracy, Precision, Recall, Classification.

To reduce Overfitting and underfitting used Setup arrays to store train and test accuracies using scikit-learn train_test_split.

Strong skills in statistical methodologies such as A/B test, Experiment design, Hypothesis test, ANOVA, Crosstabs, T tests and Correlation Techniques.

Done performance analysis on the patient’s datasets using Logistics Regression, K Nearest Neighbors, Classification Tree for testing and training accuracy.

Extensive knowledge of Dimensionality reduction (PCA, LDA), Hyper-parameter tuning, Model regularization, Grid search techniques to optimize the cost function and model performance.

Proficient with all phases of Software Development Life Cycle which includes analysis, design, development, deployment, testing and maintenance of Enterprise application.

Employed various Tableau functionalities like Tableau Extracts, Cross Database joins, Parameters, Filters, Contexts, Data Source Filters, Actions, Functions, Trends, Hierarchies, Sets, Groups, Calculations, Data Blending and Maps etc.

Experience in data visualization, Reporting, Analysis, Pie Charts, Bar Charts, Cross Map, Scatter Plots, Geographic Map, Page Trails, Density Chart and other making use of actions, other local and global filters according to the end user requirement.

Highly skilled in using visualization tools like Tableau for creating dashboards.

Excellent knowledge and expertise in PL/ SQL Queries and Oracle database.

Used SQL Queries at the custom SQL level to pull the data in tableau desktop.

Experience in various reporting objects like Hierarchies, Sets, Groups, and Calculated fields.

Experience in building web pages and consuming web services-(HTML, HTML5, CSS).

Experience in handling network security tools like Wire shark, Ax crypt and Encrypt, File verifier++, Nmap, Nessus-used for vulnerability assessment of web application.

Good understanding and experience on different Operation Systems –Windows 2000/7/10/NT/XP, Linux and UNIX.

Experience in analyzing business requirements and preparing business requirement documents, preparing Test cases and Integration Test case scenarios.

Experience in post implementations support and training of end users.

Adaptability and an ability to learn quickly and apply new knowledge.

Academic Profile:

Master’s in Information Sciences as Data Scientist in Merit with 3.92 GPA.

Bachelor of Technology in Computer Science and Engineering in First Class with Distinction.

Technical Skills :

Programming Languages: C, C++, Core Java, CSS, HTML, Python.

Databases: Oracle, PL/SQL, SQL Server, Hadoop, Spark.

BigData Technologies: Spark, Hadoop, Hive, HDFS, mapReduce.

Application Services: Apache Tomcat.

IED’s and Tools: MS-office, SQL Plus, TOAD, Edit Plus, Git hub, SQL Developer, Hadoop, Scala, Hive, Redshift.

Data Visualization: Tableau, Matplotlib, Seaborn, ggplot2, Excel, Tableau, scikit-image.

Packages: ggplot2, NLP, Reshape2, pandas, NumPy, Seaborn, SciPy, Matplot lib, Scikit-learn, Beautiful Soup.

Operating System: Windows 7/10/2000/NT/XP, MS-DOS, Linux, UNIX.

ML Skills: Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, K-Means Clustering, KNN and Ensemble Method, Natural Language Processing (NLP), Linear Regression, Logistic Regression.

Work and Research Experience:

Project Title: Machine learning: Predicting Car-MPG using the linear Regression Model.

Southern Arkansas Univ.

Jan-2020 to May-2020

Duties and Responsibilities:

Prepared the environment and loaded the data using Pandas.

Investigating the data by viewing the few rows for data analysis.

Applied different cleaning techniques to eliminate bad data points.

Analyze Format data using Machine Learning algorithm by Python Scikit-Learn

Created Relationships, actions, hierarchies, calculated fields, sorting, groupings, live connections in tableau.

Create the Features matrix and exclude few columns and created the label vectors.

To reshape the data, dropped the unused columns and standardized existing columns and created new ones.

Created customized reports using charts like Pie, Bar, Line, tree maps and funnel chart etc., to better visualize the data, and making reports user-friendly in analysis purposes.

Experience in python, Jupyter, Scientific computing stack (NumPy, pandas and matplotlib).

Plotted the relationship between each of the features and the label mpg on a scatter plot using Matplot library.

Normalized the features using the StandardScaler class of the sklearn. preprocessing package.

Splitting the data into training and testing data using model_selection class of sklearn.

Trained a regression model on the training subset using the SGDRegressor class of the sklearn. linear models’ package. Set the number of iterations of the learner to be 500 iterations.

Trained a model using one feature at a time. And then trained a model using all features together.

For each of the models trained above, applying the model to the test subset and then compute the r2_score, the mean_squared_error, and the mean_absolute_error scores for the predictions of each model trained above.

The penalty value of the SGDClassifier is set to l1 instead of l2 and the r2_score, mean_squared_error and the mean_absolute_error is computed.

Trained a model using all features for 500 iterations with ‘l2’ regularization and an initial learning rate (eta0) set to 10.0 and compute the evaluation metrics as in r2_score, the mean_squared_error and the mean_absolute_error.

Environment: Anaconda, Jupyter Notebook, Notepad++, Pandas, Scikit-learn, Tableau.

Project Title: Predicting No-Shows at Clinics using Logistic Regression.

Southern Arkansas Univ

Jan-2020 to May 2020

Duties and Responsibilities:

Prepared the environment and loaded the data using Pandas.

Implemented Data Exploration to analyze patterns and to select features using Python SciPy.

Previewing the data by viewing the first few rows for data analysis using dataset.head function.

Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib.

Printing the label distribution using the collection. Counter function.

Created the Features matrix and created the label vector.

Converted the categorical features to multiple binary features with the help of pandas.get_dummies method.

Splitting the data into a training set (80% of the data) and a test set (20%) of the data using the train_test_split function of the model_selection class of the sklearn package.

Trained a logistic regression model using the training set and applied the trained model to the test set.

Displayed the confusion matrix of the prediction and the actual labels of the test set.

Printing the accuracy, precision, recall, F1_score, PR AUC, and ROC_AUC for the trained model and Plotting the Precision-Recall Curve and the ROC Curve for the trained model based on the test set.

Re-training the model on all features using 10-fold cross validation on the full data set (the original set before splitting into train and test).

Printing the average accuracy, average precision, average recall, and average F1_score from the 10 folds to predict the No-shows at clinics.

Environment: Anaconda, Jupyter Notebook, Notepad++, Pandas, Scikit-learn, sklearn.metrics, Logistic Regression, confusion matrix, Tableau.

Project Title: Data Mining: Shortest Path from source to all vertices in given input or graph using Dijkstra Algorithm in Python.

AUG-2018 to Dec-2018

Project Description:

It is an implementation of shortest paths algorithm. Discovering the previously unknown relations among the given data. If a destination node is given, algorithm will stop once it reaches the destination. Otherwise it will continue its path until paths from source to the destination node is found. For a graph that represents a street network street segment could be the length of the segment multiplied by the speed limit. Keeping track of the vertices in the shortest path whose minimum distance is calculated from the source and finalized starting from the empty set. Using the adjacent matrix, the shortest path is calculated. In real time the shortest path is calculated in IP Routing and is used in telephone networks. Google map uses Dijkstra to find the shortest path between nodes in the road graph.

Environment: Anaconda, Spyder, Python, Notepad++.

Herbalife, NC

Aug 2015 – Jun-2017

Role: Jr. Data Scientist

Responsibilities:

Designed algorithms to identify and extract incident alerts from a daily pool of incidents.

Reduced redundancy among incoming incidents by proposing rules to recognize patterns.

Performed exploratory data analysis like calculation of descriptive statistics, detection of outliers, assumptions testing, factor analysis, etc., in Python.

Utilized Spark, Python, R, a broad variety of machine learning methods including classifications, regressions, dimensionality reduction based on domain knowledge and customer business objectives.

Innovated and leveraged machine learning, data mining and statistical techniques to create new, scalable solutions for business problems.

Worked with Machine learning algorithms like Regressions (linear, logistic etc...), Clustering and classification, SVMs and Decision trees.

Build analytic models using a variety of techniques such as logistic regression, and pattern recognition technologies.

Performed Exploratory Data Analysis using Python. Also involved in generating various graphs and charts for analyzing the data using Python Libraries.

Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning using Python scripts.

Work on different data formats such as JSON, XML and applied machine learning algorithms in Python.

Performed Data Profiling to assess data quality using SQL through complex internal database

Extracted data from the database using Excel/Access, SQL procedures and created Python and R data frames for statistical analysis, validation, visualization, and documentation.

Developed and designed SQL procedures and Linux shell scripts for data export/import and for converting data.

Designed data profiles for processing, including running SQL, and using Python for Data Acquisition and Data Integrity which consists of Datasets Comparing and Dataset schema checks.

Worked on extensive Business Intelligence, Analytics tasks focusing on consumer and customer space.

Performed Exploratory Data Analysis and Data Visualizations using R, and Tableau.

Developed interactive dashboards, created various Adhoc reports for users in Tableau by connecting various data sources.

Developed and automated Google Analytics dashboards for web performance statistics

Responsible for integration and setup of Web Properties, Google Tag Manager, Google Analytics, Apple App Analytics and other tools.

Analyzed data from Primary and secondary sources using statistical techniques to provide daily reports.

Used Amazon Elastic Beanstalk with Amazon EC2 to deploy project into AWS.

Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.

Enhancing data collection procedures to include information that is relevant for building analytic systems.

Coordinate with data scientists and senior technical staff to identify client's needs and document assumptions.

Involved in the execution of multiple business plans and projects Ensures business needs are being met.

Estimation and Requirement Analysis of project timelines.

Environment: Jupyter Notebook, Notepad++, Pandas, Scikit-learn, sklearn.metrics, Python, Tableau, SQL developer, Hadoop.

AMAZON

Aug-2014 to Jun-2015

Role: Associate Analyst

Responsibilities:

Contributed software engineering expertise in the development of products through the software lifecycle, from requirements definition through successful deployment.

Built Cluster Analysis models using Python SciPy to classify customers into different target groups.

Performed Risk Analysis of the requirements to identify the project critical success factors and prioritize functional requirement. Analyze the trends in the data and detecting outliers and perform exploratory analysis.

Managed the entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling, testing and validation and data visualization.

This project was focused on customer segmentation based on machine learning and statistical modeling effort including building predictive models and generate data products to support customer segmentation.

Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights.

Provided different test case scenarios and created transaction reconciliations for Sales data using advanced excel features like Vlookup, Hlookup and Pivot Tables.

Organize the project, analyze the data and develop the detailed/canned reports, Donut chart, dashboard and scorecards by Tableau

Done Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

Written complex SQL queries using joins, sub queries and inline views to retrieve data from the database.

Analyze and Prepare data, identify the patterns on dataset by applying historical models.

Created SQL Scripts and views for reports.

Involved in the Data Analysis and reducing data discrepancy for the source and target schemas.

Maintained Test log and Wrote Test Evaluation, Summary Reports.

Collaborating with Senior Data Scientists for understanding of data.

Worked extensively with Advance Analysis using LOD expressions, Scatter plots, Box plots, Background images, Maps, Trend Lines and Log Axes, groups, hierarchies and sets to create detail level summary report and Dashboard using KPI's.

Used SEO best practices to elevate organization's Web presence.

Environment: Oracle, Notepad++, Pandas, Scikit-learn, seaborn, Matplotlib, Tableau, SQL Developer, Excel (Pivot Tables, VLOOKUP, Hlookup), Python.

GENPACT

JUN-2013 to JUL-2014

Role: Process Associate

Responsibilities:

Performed website Planning, design and management.

Worked with client gathering requirements of the project.

Worked closely with the creative team to develop and produce creative for all global ecommerce sites, email campaigns and online marketing for Dockers.

Worked on websites and related issues.

Checking for errors and de-bugging websites and making the website responsive for clients.

Worked on Web hosting, Servers, SSL, SEO.

Resolving internet issues like (503 service unavailable, 500 internal server errors, 404 page not available, server error in the application, deals with web site configurations).

Worked on coding related technical resolutions.

Worked on Production tickets raised by Client and quickly resolved them within identified categories.

Make recommendations for Website improvements based on findings from site statistics.

Coordinate/follow-up for necessary approvals for the appropriate resolutions.

Integrated and tested the different modules of the application.

Written Technical specifications Documents for each use case.

Helping intra domain clients in resolving database related issues through remote connections like SQL server, Virtual Private server and control panel.

Created SQL queries and Stored Procedures for CRUD (Create, Read, Update and Delete) operations on database.

Environment: Java JDK, HTML, CSS & JavaScript, SQL Developer, SSL, SEO, Apache server, Windows, Toad, Excel.

Academic Project:

Project Title: Congestion control via modified Depth-Breadth Algorithm.

Project Description:

Presented a method for controlling the Congestion by modifying DB routing algorithm.

Implementation of Ant Net is an agent based on routing algorithm that is influenced from individual behavior of the ants in ant colony optimization.

Introduced a modified MDB routing load- aware algorithm that improves DB routing in order to support load balancing.

Environment: Java, TCP/IP, DNS, Ethernet, Eclipse, VLAN, IPv4, IPv6.

Extra-Curricular Activities :

Presented a paper on “Wireless Transmission of Electricity” at “ESPARTO” conducted by Hyderabad Institute of Technology and Management.

Presented a paper on “Blue Eye Technology” at “PROFUSION” conducted by Prof. Rama Reddy Engineering College.

Worked as a Graduate Assistant (GA) for Master’s in Information Sciences. Researched and worked on Rover using Basic Programming for Undergraduate Student studies.

Worked as a Lab assistant for Undergrad studies for CPP programming.

Performed research on live datasets like AAPL stock price prediction, DC housing Properties price prediction based on seasonal trends.

Worked on many capstone projects like Boston house price prediction, Netflix (NFLX) stock price prediction and performed exploratory analysis.

Contact this candidate