Data scientist

Location:

Virginia

Posted:

July 06, 2018

Contact this candidate

Resume:

Arjun

**********@*****.***

Professional Summary:

Around 6 Years Of Experience in developing different Statistical Machine Learning, Text Analytics, Data Mining solutions across various business functions: providing BI, Insights and Reporting framework to optimize business outcomes through data analysis.

Expertise in transforming business requirements into building models, designing algorithms, developing data mining and reporting solutions that scales across massive volume of unstructured data and structured.

Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.

Experience in designing visualizations using Tableau and PowerBI software and Storyline on web and desktop platforms, publishing and presenting dashboards.

Experience on advanced SAS programming techniques, such as PROC APPEND, PROC DATASETS, and PROC TRANSPOSE.

Hands on experience in implementing LDA, Naive Bayes and skilled in Decision Trees, Random Forests, Linear and Logistic Regression, SVM, Clustering, neural networks and good knowledge on Recommender Systems.

Highly skilled in using visualization tools like Tableau, Matplotlib for creating dashboards.

Extensive working experience with Python including Scikit-learn, Sci-py, Pandas and Numpy developing machine learning models, manipulating and handling data.

Having good domain knowledge on B2B SaaS, Payment processing, Entertainment analytics, Healthcare, Retail.

Well experienced in Normalization & De-Normalization techniques for optimum performance in relational and dimensional database environments.

Developed data variation analysis and data pair association analysis in the Bioinformatics field.

Regularly accessing JIRA tool and other internal issue trackers for the Project development.

Integration Architect & Data Scientist experience in Analytics, Big Data, SOA, ETL and Cloud technologies.

Worked and extracted data from various database sources like Oracle, SQL Server and Teradata.

Experience in foundational machine learning models and concepts( Regression, boosting, GBM, NNs, HMMs, CRFs, MRFs, deep learning).

Skilled in System Analysis, Dimensional Data Modeling, Database Design and implementing RDBMS specific features.

Facilitated and helped translate complex quantitative methods into simplified solutions for users.

Knowledge of working with Proof of Concepts and gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.

Analyzed data using R, Perl, Hadoop and queried data using structured and unstructured databases

Expertise in complete software development life cycle process that includes Design, Development, Testing and Implementation in Hadoop Eco System, Documentum sp2 suits of products and Java technologies.

Experience developing data models and processing data through Big data frameworks like Hive, HDFS and Spark to access streaming data and implement data pipelines to process rela-time suggestion and recommendations.

Technical Skills:

Databases

Oracle(10g, 11g), PostgreSQL, SQL Server, MongoDB 3.2, ER Studio

Web Programming

HTML, CSS, XML, JavaScript

Programming/Statistical Packages

R, Python, UNIX, C, C++, C#, ASP .NET

BI Tools

Tableau Desktop, QlikView, Excel

Machine Learning

Regression, Classification Models, Clustering, Boosting and Bagging Methods, Decision Trees and Random Forests, Recommendation Systems, Association Rules

Data Visualization

Tableau (9.4, 10), D3.js, ggplot2, R-Shiny, matplotlib, bokeh, seaborn, plotly

Big Data Framework

HDFS, MapReduce, Spark, Hive, Pig

Technologies/ Tools

Azure Machine Learning, IBM Watson Analytics, TensorFlow

Professional Experience:

Client: StoryFit, Austin, TX March 2018 - Till Date

Role: Data Scientist

Responsibilities:

Built the movie and book metadata pipeline by implementing 12 web extractors to extract more than 150 important variables related to movie and books.

Wrote scripts to automate the cleaning of both structured metadata and unstructured script data and processed it into the data warehouse and AWS S3for advanced prediction applications.

Gathered data from multiple web and central data marts, writing SQLto join and aggregate various data views for analysis.

Developed trend and spike detection algorithm in Python to automatically detect spike/decline in sales using time series functions and Sci-py

Monitored daily and weekly sales rank changes for various books on POS systems and communicated trends to be focused on.

Built RNN and CNNs from the historic movie scripts and metadata to train models and predict the movie performance and revenues based upon genre, cast and investment based variables.

Developed tf-idf from the unstructured movie scripts and reviews to analyse Sentiments and Emotions, Structure and Style, Themes, character overview, plot features and emotions.

Ran AWS EC2 instances to compute ML and NN models to mine through text and data from over 4000 movies from over 30 years.

Used Data Robot to automate the combination of highly optimized Machine Learning models and extracted the Python code and parsed it through the production system.

Acted as an Analytics liaison and reported Sales intelligence for 7 customers under the keyword product, analysed the year-over-year, pre-post sales; developed anecdotal approaches to isolate and investigate keyword impact

Extracted features and built 3-time series models for seasonal release date, genre based performance and pricing segmentation to better target future book releases and wrote to statistical tests to validate hypothesis.

Conceptualized the dashboard for automated reporting of the trends in Amazon Best Seller Rank and New York Times Best Seller Rank by genre, release date and by combining knowledge from Good Reads lists, characters, shelves, story-setting.

Client: ServiceNow, Santa Clara, CA Aug 2017 – March 2018

Role: Data Scientist

Responsibilities:

Responsible for working with various teams on a project to develop analytics based solution to target roaming subscribers specifically.

Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.

Configured the project on WebSphere 6.1 application servers

Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.

By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).

Up to 10 times more accurate predictions over existing state-of-the-art algorithms.

Developed in-disk, huge (100GB+), highly complex Machine Learning models.

Used SAX and DOM parsers to parse the raw XML documents

Used RAD as Development IDE for web applications.

Develop and plan required analytic projects in response to business needs.

In conjunction with data owners and department managers, contribute to the development of data models and protocols for mining production databases.

Develop new analytical methods and/or tools as required.

Contribute to data mining architectures, modelling standards, reporting, and data analysis methodologies.

Conduct research and make recommendations on data mining products, services, protocols, and standards in support of procurement and development efforts.

Work with application developers to extract data relevant for analysis.

Collaborate with unit managers, end users, development staff, and other stakeholders to integrate data mining results with existing systems.

Provide and apply quality assurance best practices for data mining/analysis services.

Client: VISA, Austin, TX Sep 2016 – July 2017

Role: Data Scientist

Responsibilities:

Involved in defining the Source to business rules, Target data mappings, data definitions.

Performing Data Validation / Data Reconciliation between disparate source and target systems for various projects.

Identifying the Customer and account attributes required for MDM implementation from disparate sources and preparing detailed documentation.

Presented DQ analysis reports and score cards on all the validated data elements and presented -to the business teams and stakeholders.

Used T-SQL queries to pull the data from disparate systems and Data warehouse in different environments.

Extracting data from different databases as per the business requirements using Sql Server Management Studio.

Interacting with the ETL, BI teams to understand / support on various ongoing projects.

Extensively using MS Excel for data validation.

Extensively used open source tools - R Studio(R) and Spyder(Python) for statistical analysis and building machine learning algorithms.

Generating weekly, monthly reports for various business users according to the business requirements. Manipulating/mining data from database tables (Redshift, Oracle, and Data Warehouse).

Interface with other technology teams to load (ETL), extract and transform data from a wide variety of data sources

Utilize a broad variety of statistical packages like SAS, R, MLIB, Python and others

Provides input and recommendations on technical issues to Business & Data Analysts, BI Engineers and Data Scientists.

Client: Aeronube Technology Pvt. Ltd, India March 2015 – April 2016

Role: Data Analyst

Responsibilities:

Partner with analysts and live producers to identify strategic business questions, key metrics, and actionable insights.

Conducted feasibility study and performed GAP and Impact analysis for the proposed state.

Conducted JAD sessions to allow different stakeholders to communicate their perspective with each other, resolve any issues and come to an agreement quickly.

Managed direct relationship with third party vendors and off-shore development teams, created

Collaborated with Project Architect/Designer to detail technical requirements and refine the solution design.

Prepared Business Requirement Document (BRD) and Functional Specification Document (FSD).

Created executive summary with the risks and mitigation plan for the projects.

Collaborated with the QA team to ensure adequate testing on software, maintained quality procedures, and ensured that appropriate documentation was in place.

Queried the database using complex SQL queries.

Utilized data analytics to evaluate workload performance data and recommend changes.

Conducted peer review meetings periodically to keep track of the project's milestones.

Led Defect triage sessions with client during UAT phase, and was the point of contact on Defect management for the project team and led the change management process for the PMO.

Reviewed business requirements documented by other business analysts to scope level of effort in testing functionality and identifying possible inter-dependencies.

Client: Religare Health Insurance Company Limited, India Jan 2013 – Feb 2015

Role: Data Analyst

Responsibilities:

Developed scalable machine learning solutions within a distributed computation framework (e.g. Hadoop, Spark, Storm etc.).

Utilizing NLP applications such as topic models and sentiment analysis to identify trends and patterns within massive data sets.

Creating automated anomaly detection systems and constant tracking of its performance

Strong command of data architecture and data modelling techniques.

Knowledge in ML & Statistical libraries (e.g. Scikit-learn, Pandas).

Having knowledge to build predict models to forecast risks for product launches and operations and help predict workflow and capacity requirements for TRMS operations

Having experience with visualization technologies such as Tableau

Draw inferences and conclusions, and create dashboards and visualizations of processed data, identify trends, anomalies

Generation of TLFs and summary reports, etc. ensuring on-time quality delivery.

Solved analytical problems, and effectively communicate methodologies and results

Worked closely with internal stakeholders such as business teams, product managers, engineering teams, and partner teams.

Data mining using state-of-the-art methods

Extending company’s data with third party sources of information when needed

Enhancing data collection procedures to include information that is relevant for building analytic systems

Processing, cleansing, and verifying the integrity of data used for analysis.

Contact this candidate