Data Analyst

Location:

Andhra Pradesh, India

Posted:

March 11, 2020

Contact this candidate

Resume:

MINGING

*********@*****.***

Phone: 732-***-****

SUMMARY

* ***** ** ********** ** Machine Learning, Data mining, Data Architecture, Data Modeling, Data Mining, Data Analysis, NLP with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping, Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.

Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K- fold cross validation and data visualization.

Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Excellent understanding of Hadoop cluster architecture that include Map Reduce (MRv1), YARN (MRv2), HDFS, Pig, Hive, Impala, HBase, Spark, Sqoop, Flume, Oozie and Zookeeper.

Experience in Deep Learning frameworks like Tensorflow, Theano, CNTK, and Keras.

Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.

Experience on Cloud Databases and Data warehouses ( SQL Azure and Confidential Redshift/RDS).

Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.

Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.

Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.

Excellent working experience and knowledge in Hadoop eco-system like HDFS, MapReduce, Hive, Pig, MongoDB, Cassandra, HBase.

Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.

Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.

Expertise in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.

Experienced in data mining & loading and analyzing unstructured data -XML, JSON, flat file formats into Hadoop.

Experienced in using various packages in R and python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.

Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.

Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.

Excellent hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.

Experienced in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.

Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and experience with working in Agile/SCRUM software environments.

Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.

Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

EDUCATION

BS Biomedical Engineering, Shanghai University.

SKILLS

Languages

Java 8, Python, R

Packages

ggplot2, caret, dplyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, Seaborn, sciPy, matplot lib, sci-kit-learn, Beautiful Soup, Rpy2.

Web Technologies

HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

Machine Learning

Decision Tree, SVM, KNN, K-Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Data Modelling Tools

Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies

Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Databases

SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools

Informatica Power Centre, SSIS.

Version Control Tools

SVM, GitHub

Project Execution Methodologies

Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools

Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Project Management

PMP, Lean Manufacturing, Six Sigma, Agile Methodology, Scrum Master

Operating System

Windows, Linux, Unix, Macintosh HD, Red Hat

Analysis Tools

Python (Pandas, Numpy, scikit-learn, matplotlib), R, SAS, Tableau, Advanced MS Excel, A/B testing

Machine Learning

Decision Tree, SVM, KNN, K-Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Databases

Oracle, MySQL, Microsoft SQL Server, MS Access

Big Data Technologies

Hadoop, HDFS, MapReduce, Sqoop, Hive, Spark

Cloud

Azure, Amazon Web Services (AWS)

Application & System

Linux

Web Development

HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

EXPERIENCE

Jafra Cosmetics International Dec 2018 – Present

Data Scientist/Data Analyst

JAFRA manufactures high-end cosmetics, skin care products, and fragrances. It markets them internationally through some 570,000 independent beauty consultants spanning nearly 20 countries. Products include skin cleansers and lotions, mineral makeup, vitamin tablets, home spa sets, and nail polishes, as well as a variety of products for men, teens, and babies. The company, established in 1956 and once owned by Gillette, has been part of German direct sales company Vorwerk & Co. since 2004.

Predictive Analytics for Customer Satisfaction

Predictive analytics project is aimed at revolutionizing Jafra's customer-marketer relationship, boosting sales while simultaneously increasing shopper satisfaction. Hyperpersonalized marketing is also being experimented to help serve customers the right message at the right time on the right channel. Using predictive models, Jafra also wants to create an accurate inventory forecasts and manage resources to match customer behaviors and needs.

Text Recognition and Sentiment Analysis

The goal of this project is to find the ‘happy customer experience’ with Jafra. NLP is used to teach the systems to understand the emotions of the text. Customer feedbacks, answers to queries, their likes, and dislikes, their choice and preferences in the coming festival seasons, holidaying trends, better product ideas, their expectations with regard to the product and services amounts to a huge unstructured data. Customer’s emotional responses, analysis, and findings are marked as a positive, negative or neutral outcome. NLP helps the unstructured text data into a standardization form. This enables the search results to be swifter and with utmost precision.

Customer Purchase Propensity Modelling

Built machine learning based regression models using scikit-learn python frameworks to estimate the customer propensity to purchase based on attributes such as customer verticals they operate in, revenue, historic purchases, frequency and regency behaviours. These predictions helped estimate propensities with higher accuracy improving the overall productivity of sales teams by accurately targeting the prospective clients.

Cross Sell and Upsell Opportunity Analysis

Implemented market basket algorithms from transactional data, which helped identify coupons used/purchased together frequently. Discovering frequent coupon sets helped unearth cross sell and up selling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams.

Responsibilities

Designs and perform analyses to highlight, address and resolve operational concerns using statistical predictive indicators and visualization reports.

Built ways for cross-sell of different products based on the existing demand in a particular market with SQL joins to create a report that helped view all the customers as a single entity and generated graphs and visualizations with Excel.

Design an A/B experiment for testing the business performance of the new recommendation system.

Create Machine Learning models with Python and scikit-learn which assisted the trading team in their trading strategies

Optimize parameters using grid search, Cross-validation and developed a deep learning algorithm using Keras and Feed Forward Networks

Use Python Libraries including NumPy, Pandas, Scipy, Sklearn, Matplotlib, Keras and Tensor flows

Build Data visualization like Heat maps and Time series plots by using Python libraries Such as Matplotlib and Seaborn.

Maintain and enhance existing Algorithmic framework to cope with the ever-changing market dynamics and business requirements and back tested several in-house algorithms by data-driven and statistics driven approach using Time series.

Support MapReduce Programs running on the cluster.

Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Configure Hadoop cluster with Namenode and slaves and formatted HDFS.

Use Oozie workflow engine to run multiple Hive and Pig jobs.

Participate in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.

Perform Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.

Develop multiple MapReduce jobs in java for data cleaning and pre-processing.

Analyze the partitioned and bucketed data and compute various metrics for reporting.

Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.

Work on loading the data from MySQL to HBase where necessary using Sqoop.

Develop Hive queries for Analysis across different banners.

Extract data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.

Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.

Provides system subject matter expertise and utilizes Excel, Teradata SQL, Tableau, Alteryx or other programs for database management and reporting.

Worked on Migrating of premise database structure to Confidential Redshift data warehouse

Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.

Develop Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.

Analyze the data by performing Hive queries and running Pig scripts to study customer behavior.

Create HBase tables to store various data formats of data coming from different portfolios.

Work on improving performance of existing Pig and Hive Queries.

Analyze cross platform data from energy and restaurant companies and worked on creation and automation of daily reports with SQL and dashboards with Tableau including charts, calculated fields and statistical functions.

Data mining from SAP, analyzed order completion status to make sure each order be delivered within 48 hours.

Analyze daily and monthly order trend and using time series model to predict future forecast for all SKUs.

Independently develop python scripts to clean data faster and generate better reports and charts.

Extract useful columns and rows from large scales of datasets and cleaned data using Python Pandas library.

Combine different datasets, grouped products by station and analyzed the overall allocation.

Develop a user friendly GUI to help internal non-technical users to operate the datasets.

Build customer journey analytic maps and utilize NLP to enhance the customer experience and reduce customer friction points.

Develop line balance automation python scripts, and improve the output from 150 orders/h to 260 orders/h.

Create custom SQL queries for data analysis and data validation: such as checking duplicates, null values and etc.