Post Job Free

Resume

Sign in

Data Analyst

Location:
Andhra Pradesh, India
Posted:
March 11, 2020

Contact this candidate

Resume:

MINGING

adb89x@r.postjobfree.com

Phone: 732-***-****

SUMMARY

* ***** ** ********** ** Machine Learning, Data mining, Data Architecture, Data Modeling, Data Mining, Data Analysis, NLP with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization, Web Crawling, Web Scraping, Adept in statistical programming languages like R and Python including Big Data technologies like Hadoop, Hive, HDFS, MapReduce and NoSQL Based Databases.

Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, neural networks, SVM, clustering), dimensionality reduction using Principal Component Analysis and Factor Analysis, testing and validation using ROC plot, K- fold cross validation and data visualization.

Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Excellent understanding of Hadoop cluster architecture that include Map Reduce (MRv1), YARN (MRv2), HDFS, Pig, Hive, Impala, HBase, Spark, Sqoop, Flume, Oozie and Zookeeper.

Experience in Deep Learning frameworks like Tensorflow, Theano, CNTK, and Keras.

Adept and deep understanding of Statistical modeling, Multivariate Analysis, model testing, problem analysis, model comparison and validation.

Experience on Cloud Databases and Data warehouses ( SQL Azure and Confidential Redshift/RDS).

Excellent knowledge of Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.

Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.

Experienced in Data Modeling techniques employing Data warehousing concepts like star/snowflake schema and Extended Star.

Excellent working experience and knowledge in Hadoop eco-system like HDFS, MapReduce, Hive, Pig, MongoDB, Cassandra, HBase.

Expert in creating PL/SQL Schema objects like Packages, Procedures, Functions, Subprograms, Triggers, Views, Materialized Views, Indexes, Constraints, Sequences, Exception Handling, Dynamic SQL/Cursors, Native Compilation, Collection Types, Record Type, Object Type using SQL Developer.

Excellent knowledge and experience in OLTP/OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data modeling using Erwin tool.

Expertise in performing data parsing, data manipulation and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.

Experienced in data mining & loading and analyzing unstructured data -XML, JSON, flat file formats into Hadoop.

Experienced in using various packages in R and python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.

Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau.

Analyze Data and Performed Data Preparation by applying historical model on the data set in AZUREML.

Excellent hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.

Experienced in Teradata RDBMS using Fast load, Fast Export, Multi load, T pump, and Teradata SQL Assistance and BTEQ Teradata utilities.

Expertise in Excel Macros, Pivot Tables, vlookups and other advanced functions and experience with working in Agile/SCRUM software environments.

Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees, Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis.

Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

EDUCATION

BS Biomedical Engineering, Shanghai University.

SKILLS

Languages

Java 8, Python, R

Packages

ggplot2, caret, dplyr, Rweka, gmodels, RCurl, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, Seaborn, sciPy, matplot lib, sci-kit-learn, Beautiful Soup, Rpy2.

Web Technologies

HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

Machine Learning

Decision Tree, SVM, KNN, K-Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Data Modelling Tools

Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer

Big Data Technologies

Hadoop, Hive, HDFS, MapReduce, Pig, Kafka

Databases

SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, My SQL, MS Access, HDFS, HBase, Teradata, Netezza, MongoDB, Cassandra.

Reporting Tools

MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

ETL Tools

Informatica Power Centre, SSIS.

Version Control Tools

SVM, GitHub

Project Execution Methodologies

Ralph Kimball and Bill Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD).

BI Tools

Tableau, Tableau Server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Project Management

PMP, Lean Manufacturing, Six Sigma, Agile Methodology, Scrum Master

Operating System

Windows, Linux, Unix, Macintosh HD, Red Hat

Analysis Tools

Python (Pandas, Numpy, scikit-learn, matplotlib), R, SAS, Tableau, Advanced MS Excel, A/B testing

Machine Learning

Decision Tree, SVM, KNN, K-Means, EM, Apriori, PageRank, AdaBoost, Deep-Learning

Databases

Oracle, MySQL, Microsoft SQL Server, MS Access

Big Data Technologies

Hadoop, HDFS, MapReduce, Sqoop, Hive, Spark

Cloud

Azure, Amazon Web Services (AWS)

Application & System

Linux

Web Development

HTML, CSS, Javascript, JQuery, Bootstrap, AngularJS

EXPERIENCE

Jafra Cosmetics International Dec 2018 – Present

Data Scientist/Data Analyst

JAFRA manufactures high-end cosmetics, skin care products, and fragrances. It markets them internationally through some 570,000 independent beauty consultants spanning nearly 20 countries. Products include skin cleansers and lotions, mineral makeup, vitamin tablets, home spa sets, and nail polishes, as well as a variety of products for men, teens, and babies. The company, established in 1956 and once owned by Gillette, has been part of German direct sales company Vorwerk & Co. since 2004.

Predictive Analytics for Customer Satisfaction

Predictive analytics project is aimed at revolutionizing Jafra's customer-marketer relationship, boosting sales while simultaneously increasing shopper satisfaction. Hyperpersonalized marketing is also being experimented to help serve customers the right message at the right time on the right channel. Using predictive models, Jafra also wants to create an accurate inventory forecasts and manage resources to match customer behaviors and needs.

Text Recognition and Sentiment Analysis

The goal of this project is to find the ‘happy customer experience’ with Jafra. NLP is used to teach the systems to understand the emotions of the text. Customer feedbacks, answers to queries, their likes, and dislikes, their choice and preferences in the coming festival seasons, holidaying trends, better product ideas, their expectations with regard to the product and services amounts to a huge unstructured data. Customer’s emotional responses, analysis, and findings are marked as a positive, negative or neutral outcome. NLP helps the unstructured text data into a standardization form. This enables the search results to be swifter and with utmost precision.

Customer Purchase Propensity Modelling

Built machine learning based regression models using scikit-learn python frameworks to estimate the customer propensity to purchase based on attributes such as customer verticals they operate in, revenue, historic purchases, frequency and regency behaviours. These predictions helped estimate propensities with higher accuracy improving the overall productivity of sales teams by accurately targeting the prospective clients.

Cross Sell and Upsell Opportunity Analysis

Implemented market basket algorithms from transactional data, which helped identify coupons used/purchased together frequently. Discovering frequent coupon sets helped unearth cross sell and up selling opportunities and led to better pricing, bundling and promotion strategies for sales and marketing teams.

Responsibilities

Designs and perform analyses to highlight, address and resolve operational concerns using statistical predictive indicators and visualization reports.

Built ways for cross-sell of different products based on the existing demand in a particular market with SQL joins to create a report that helped view all the customers as a single entity and generated graphs and visualizations with Excel.

Design an A/B experiment for testing the business performance of the new recommendation system.

Create Machine Learning models with Python and scikit-learn which assisted the trading team in their trading strategies

Optimize parameters using grid search, Cross-validation and developed a deep learning algorithm using Keras and Feed Forward Networks

Use Python Libraries including NumPy, Pandas, Scipy, Sklearn, Matplotlib, Keras and Tensor flows

Build Data visualization like Heat maps and Time series plots by using Python libraries Such as Matplotlib and Seaborn.

Maintain and enhance existing Algorithmic framework to cope with the ever-changing market dynamics and business requirements and back tested several in-house algorithms by data-driven and statistics driven approach using Time series.

Support MapReduce Programs running on the cluster.

Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Configure Hadoop cluster with Namenode and slaves and formatted HDFS.

Use Oozie workflow engine to run multiple Hive and Pig jobs.

Participate in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.

Perform Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.

Develop multiple MapReduce jobs in java for data cleaning and pre-processing.

Analyze the partitioned and bucketed data and compute various metrics for reporting.

Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.

Work on loading the data from MySQL to HBase where necessary using Sqoop.

Develop Hive queries for Analysis across different banners.

Extract data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.

Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.

Provides system subject matter expertise and utilizes Excel, Teradata SQL, Tableau, Alteryx or other programs for database management and reporting.

Worked on Migrating of premise database structure to Confidential Redshift data warehouse

Utilize NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.

Develop Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.

Analyze the data by performing Hive queries and running Pig scripts to study customer behavior.

Create HBase tables to store various data formats of data coming from different portfolios.

Work on improving performance of existing Pig and Hive Queries.

Analyze cross platform data from energy and restaurant companies and worked on creation and automation of daily reports with SQL and dashboards with Tableau including charts, calculated fields and statistical functions.

Data mining from SAP, analyzed order completion status to make sure each order be delivered within 48 hours.

Analyze daily and monthly order trend and using time series model to predict future forecast for all SKUs.

Independently develop python scripts to clean data faster and generate better reports and charts.

Extract useful columns and rows from large scales of datasets and cleaned data using Python Pandas library.

Combine different datasets, grouped products by station and analyzed the overall allocation.

Develop a user friendly GUI to help internal non-technical users to operate the datasets.

Build customer journey analytic maps and utilize NLP to enhance the customer experience and reduce customer friction points.

Develop line balance automation python scripts, and improve the output from 150 orders/h to 260 orders/h.

Create custom SQL queries for data analysis and data validation: such as checking duplicates, null values and etc.

Design an A/B experiment for testing the business performance of the new recommendation system.

Support MapReduce Programs running on the cluster.

Evaluate business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.

Configure Hadoop cluster with Namenode and slaves and formatted HDFS.

Use Oozie workflow engine to run multiple Hive and Pig jobs.

Participate in Data Acquisition with Data Engineer team to extract historical and real-time data by using Hadoop MapReduce and HDFS.

Perform Data Enrichment jobs to deal missing value, to normalize data, and to select features by using HiveQL.

Develop multiple MapReduce jobs in java for data cleaning and pre-processing.

Analyze the partitioned and bucketed data and compute various metrics for reporting.

Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.

Worked on loading the data from MySQL to HBase where necessary using Sqoop.

Developed Hive queries for Analysis across different banners.

Extracted data from Twitter using Java and Twitter API. Parsed JSON formatted twitter data and uploaded to database.

Launched Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring launched instances with respect to specific applications.

Developed Hive queries for analysis, and exported the result set from Hive to MySQL using Sqoop after processing the data.

Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.

Create HBase tables to store various data formats of data coming from different portfolios.

Work on improving performance of existing Pig and Hive Queries.

Implemented deep learning algorithms such as Artificial Neural network (ANN) and Recurrent Neural Network (RNN), tuned hyper-parameter and improved models with Python packages TensorFlow.

Swap low run SKUs with high run SKUs based on the trend, and reduced QC reject rate from 1% to 0.1%.

Generate monthly/quarterly KPI dashboard using Tableau: Heat maps, Box, Scatter Plots, Pie Charts, Bar Charts and etc.

Acquired dataset from SAP and MS SQL server using customized query and performed EDA with visual methods to summarize the main characteristics.

Performed data preprocessing, like numerical and categorical attributes transformation, feature engineering, data scaling.

Applied Python pandas, numpy, sklearn library and supervised learning models to predict customer satisfaction score and gleaned the insights between the customer satisfaction and features to attain more actionable operational improvement.

Performed data preprocessing, like numerical and categorical attributes transformation, feature engineering, data scaling. And build up pipelines to standardize the data preprocessing steps.

Used k-fold cross validation and estimate performance between logistic regression, random forest and SVM, and applied grid search to find the optimal hyper parameters.

Reduced the order processing time and overall delivery time to 50% and improved customer satisfaction from 60% to 80% by implementing this model.

Analyzed 200,000+ rows sales order data using Tableau connecting with MS SQL Server, and acquired order detail allocation by join connection between different tables.

Created action filters, parameters and calculated sets for preparing dashboards and worksheets in Tableau.

Generated monthly/quarterly KPI using Tableau dashboard: Geographical maps, Heat maps, Scatter Plots, Bar Charts.

Primary activities include designing technology roadmap for Tableau; product installation and implementation development of insightful dashboards; serving as SME point of contact for Tableau and delivery of solutions adhering to BI industry best practices.

Proficient in data integration between different sources to the SQL environment and reporting the same on Tableau environment.

Responsible for the design, development and production support of interactive data visualizations used across the project.

Administer user, user groups, and scheduled instances for reports in Tableau and documented upgrade plan.

Involved in creating Created Tableau dashboards using stack bars, bar graphs, scattered plots, geographical maps, Gantt charts etc. using show me functionality. Dashboards and stories as needed using Tableau Desktop and Tableau Server.

Create interactive data visualizations in Tableau, using relational and aggregate data sources.

Excellent knowledge in RDBMS concepts and constructs along with Database Objects creation such as Tables, User Defined Data Types, Indexes, Stored Procedures, Views, User Defined Functions, Cursors and Triggers etc.

Cisco Systems – San Jose, CA April 2013 – Dec 2017

Data Analyst/Data Scientist

Responsibilities

Create and enhance Technical Specification Document TSD and Customer Requirement Document CRD through constant interaction with Manager and Tech Lead.

Created data model with required fact and dimensions.

Used Pandas, NumPy, Scikit-learn in Python for developing various machine learning models such Random forest and step-wise regression.

Worked on NLTK library in python for doing sentiment analysis on customer product reviews and other third party websites using web scrapping.

Worked on the MindMeld platorm which was being used to create the Cisco Web Assistant which attempts to strike a balance between more advanced platforms like TensorFlow and conversational AI platforms accessible to non-technical developers like Amazon’s Lex and Google’s Dialogflow.

Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.

Created and Maintained Teradata Databases, Users, Tables, Views, Macros, Triggers and Stored Procedures.

Imported and Exported Data from Access Database and building SQL Queries for data manipulation Reporting in Access and VLOOKUP in Excel.

Created and updated Crystal Reports to client specifications using SQL.

Developed a database in MySQL to manipulate data with monthly updates and created reports in Crystal Reports.

Extensively used Visio to create Use Cases Diagrams, Activity Diagrams.

Effectively used data blending feature in tableau to connect different databases like Oracle, MS SQL Server.

Designed business intelligence dashboard using Tableau Desktop and publishing the same on Tableau server, allowing executive management to view current and past performance sales trends at various geographic locations.

Conducted analysis on various tools available at the client to recommend the best possible option for different project, for example Informatica Data Explorer, Informatica Data Quality, Power centre, Micro strategy, etc

Prepared BI Interactive Dashboards using calculations, parameters in Tableau.

Expertise in connecting to Oracle and SQL databases and troubleshooting.

Trend Lines, Statistics, and Log Axes. Groups, hierarchies, sets to create detail level summary report and Dashboard using KPI's.

Provide data-driven models and analyze data to drive the business and make key business decisions.

Successfully setup global Build-to-Stock (BTS) warehousing and logistic process to meet demand growth.

Built a performance management system to integrate with the supplier scorecards.

Successfully transferred 19 router and 14 wireless access point products from new product to mass production.

Performed daily analysis of on time delivery (OTD), investigating root cause and applying countermeasures.

Prepared manufactory and reliability test plans to make products 100% meet Cisco requirement guidelines.

Managed several vendors and continuously driving the end-to-end yield to 90% during development stage.

Minimized single source material percentage, and maintained 88% multiple source rate in BOM risk report.

Led internal and external meetings, analyzed defect data, developed corrective actions for customer complaints.

Led numerous cost reduction activities from concept phase to implementation, tracked the results and saved an average $1M per year.

Creation of Jobs and scheduling of flows through Management Console.

Used excel sheet, flat files, CSV files to generated Tableau Ad-hoc reports.

Creation of ER diagrams and database design for the project.

Hongkong and Shanghai Banking Corporation - Shanghai, CN Jan 2011 – Apr 2013

Data Analyst

HSBC, officially known as The Hongkong and Shanghai Banking Corporation Limited, is a wholly owned subsidiary of HSBC, the largest bank in Hong Kong, and operates branches and offices throughout the Asia Pacific region, and in other countries around the world. It is also one of the three commercial banks licensed by the Hong Kong Monetary Authority to issue banknotes for the Hong Kong dollar.

Responsibilities

Visualize and report business data monthly to clients and manager to analyze and improve the business model.

Led and conducted JAD sessions for requirements gathering, analysis and design of the system.

Created context and workflow models, information and business rule models, Use Case during the analysis using Rational Tools.

Documented requirements and transformed them into functional and technical requirement specifications.

Developing animated visual stories, interactive kiosks, and navigation capable decks that allow presenters and users to use the tool in ways never imagined.

Strong conceptual skills and ability to turn statistical information into opportunities for visual storytelling & Info graphics.

Developed visualizations using Tableau for better understanding of data, performed data cleaning, normalization, data transformation.

Actively interacted with different business groups to perform Gap analysis to identify the deficiencies in the system by comparing the actual objectives with the system objectives desired.

Conducted JAD sessions and baseline the user requirement specifications.

Utilized RUP to configure and develop process, standards, and procedures.

Worked with designers and developers for interpreting requirements, tracking the requirement status & timeline.

Gathering demand data from product team, analyzing data and identify trends and variances, preparing.

and presenting demand analysis reports for HSBC to lead a better understanding of target market.

Designing customer research studies to improve repurchased rate, identified KPIs for customer acquisition, retention and loyalty.

Proactively managing and execute project, focusing on building and launching Maximus Lighting brand, delivering analysis of product development and customer preference for smart security lights.

Acting like a consultant looking for ways to increase Maximus brand activity, analyzing a consumer panel of 20,000 subscribers, defined A/B test.

Working with product manager and IT department on compliment of Quotation Database design, led to 80% greater efficiency on product quote procession.

Data preparation and combination. Handling customer data and interact with customers to fix data.

Created new formats for reporting and presenting the sales and purchases that shorten the daily update procedure by 30%.

Complied and distributed successful statistical information in order status spreadsheets.



Contact this candidate