Post Job Free

Resume

Sign in

Data Analyst Scientist

Location:
Rogers, AR
Posted:
June 23, 2023

Contact this candidate

Resume:

Rahul M

Data Analyst/Data Engineer/Data Scientist

+1-214-***-****

adxvm8@r.postjobfree.com

Professional Summary

Over 8 years of experience in Data Science and Analytics, including Artificial Intelligence/Deep Learning/Machine Learning, Data Mining, Data Extraction, Data Modelling, Data Wrangling, and Statistical Analysis.

Adept in statistical programming languages like R and Python, SAS, Apache Spark including Big Data technologies like Hadoop, Hive, Sqoop, Pig.

I am skilled in writing Teradata L and PL/SQL scripts to test and validate backend databases. I am an expert in designing OLTP/OLAP systems and databases such as Star Schema and Snowflake Schema for multidimensional, dimensional, and relational models.

Experienced in using Python/R Studio/SQL/ SAS to perform statistical analysis and to implement machine learning algorithms utilizing different packages.

Leveraged big data tools and supporting technologies for extracting meaningful insights from large data sets. Good knowledge on Distributed Computing, Hadoop Architecture, and its ecosystem components like HDFS, Map Reduce, HIVE, IMPALA, Spark (PySpark) and Kafka.

Implemented Bagging and Boosting to enhance the model performance.

Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA.

Extensively worked on Python 3.5/2.7 (NumPy, Pandas, Matplotlib, NLTK and Scikit-learn)

Proficient in managing the entire data science project life cycle and actively involved in all the phases of project life cycle

including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling, testing and validation and data visualization.

Proficient in Machine Learning algorithm and Predictive Modeling including Regression Models, Decision Tree, Random Forests, Sentiment Analysis, Naive Bayes Classifier, SVM, Ensemble Models.

Participated in creating data intake pipelines for the Azure HDInsight Spark cluster using Spark SQL, Azure Data Factory, and Azure Databricks.

Used the Version control tools like Git 2.X and build tools like Apache Maven/Ant.

Extracting data from the Azure Data Lake, loading it into the HDInsight Cluster (Intelligence + Analytics), and applying PySpark transformations & Actions to the data.

Have good knowledge on Spark MLlib algorithms and utilities such as regression, classification, clustering, collaborative filtering and dimensionality reduction.

Experience in AWS services like EC2, Lambda and Step functions, IAM etc and Google Cloud Platform (GCP)

Sound knowledge with Netezza SQL.

Involved in design and development of multiple Power BI Dashboards and reports and Managing data privacy and security in Power Bl.

Proficient in Data Visualization tools such as Tableau, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.

Experienced as Lead in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling (decision trees, regression models, clustering), dimensionality reduction using Principal Component Analysis.

Developed a customer churn prediction model using machine learning algorithms in R Shiny, resulting in a 20% reduction in customer attrition.

Created a sentiment analysis application using R Shiny to analyze customer feedback and generate actionable insights for product improvement.

Worked with complex applications such as R, SAS, MATLAB and SPSS to develop neural network, cluster analysis.

Strong C#, SQL programming skills, with experience in working with functions, packages and triggers.

Extensive exposure on analytics project life cycle CRI SP-DM (Business understanding, Data understanding, Data preparation, Modelling, Evaluation, and Deployment).

Capable of generating new insights, driving business decisions based on data and strong commitment to make positive impact.

Experienced in Text mining, implementing concepts of Natural Language Processing, Text classification, Sentiment Analysis, Topic modeling, Segmentation methodologies.

Have good experience on multiple domains, addressed critical problems like fraud detection, Customer retention, forecast demand.

Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, ER/ Studio, Erwin and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.

Procedural knowledge in cleansing and analyzing data using HiveQL, Pig Latin, and custom MapReduce programs in Java.

Experience in all stages of SDLC (Agile, Waterfall), writing Technical Design document, Development, Testing and implementation of Enterprise level Data mart and Data warehouses.

Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner.

TECHNICAL SKILL

Programming Languages

Python, Java, SAS Base, SAS Enterprise Miner, Bash Scripting,

Regular Expressions and SQL (Oracle & SQL Server).

Packages and tools

Pandas, NumPy, SciPy, Scikit-Learn, NLTK, Spacy, matplotlib,

Seaborn, BeautifulSoup, Logging, PySpark, Keras and TensorFLow.

Querying languages

Spark SQL, MySQL, Microsoft SQL

Machine Learning

Scikit-learns, Keras, TensorFlow, Numpy, Pandas, Matplotlib, ggplot2,

Scrapy, Seaborn, Stats Models, Logistic Regression, Naive Bayes,

Decision Tree, Random Forest, KNN, Linear Regression, Lasso,

Ridge, SVM, Regression Tree, K-means, Ridge and Lasso, PCA

Data Visualization

Tableau, Google Analytics, Advanced Microsoft Excel and Power BI

Big Data Tools

Spark/PySpark, HIVE, IMPALA, HUE, Map Reduce, HDFS, Sqoop, Flume

and Oozie

Text Mining

Text Pre-Processing, Information Retrieval, Classification, Topic

Modeling, Text Clustering, Sentiment Analysis and Word2vec.

Cloud technologies

AWS, Azure

Reporting Tools

MS Office (Word/Excel/PowerPoint/ Visio/Outlook), Crystal Reports XI,

SSRS

BI and Big Data Tools

Spark, Hadoop Map Reduce, Hive, Sqoop

Project Management

JIRA, Azure Devops Boards

Python Libraries

Scikit Learn, Pandas, NumPy, SciPy, Matplotlib, Seaborn, Plotly, NLTK,

Gensim – Word2Vec, GloVe, Keras, TensorFlow,Pytorch.

RDBMS

Microsoft SQL Server, MySQL, Oracle, PostgreSQL

Version Control Tools

SVM, GitHub

PROFESSIONAL EXPERIENCE

CLIENT: Molina Healthcare June 2022 – Till Date

Sr. Data Scientist

Roles and Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Supervised the complete development of a machine learning system, including data collection, model creation, feature selection, system implementation, and evaluation.

Cleaning, transforming, and reorganizing data using NumPy, Pandas, Seaborn, SciPy, matplotlib, and sci-kit-learn.

Managed nodes on the Hadoop cluster and monitored Hadoop cluster job performance using Cloudera Manager.

Sentimental Analysis of customer feedback emails was performed to detect real tone and emotion using Long-Short Term Memory (LSTM) and Recurrent Neural Networks (RNN). We deployed TensorFlow for implementing NLP.

Manipulated text data using pre-processing techniques like Tokenizing, Stemming, and Lemmatization.

Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from RedShift.

Used Python 3.x (NumPy, SciPy, pandas, scikit-learn, seaborn) and Spark 2.0 (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes. Implemented financial time series analytical techniques in python including ARIMA, Garch, Exponential Smooth, and Markov Chain.

A design methodology was developed for sampling and analysis of survey results for pricing and availability of products. An analysis of market sizing, competitive analysis, and positioning was conducted to determine product feasibility.

Expert in writing efficient MapReduce jobs for processing and analyzing large data sets, improving performance and reducing processing time by 30%.

Using Spark MLlib, implemented various classification algorithms to solve complex classification problems with high accuracy, such as Logistic Regression, Decision Trees, Random Forests, and Gradient Boosting.

Implemented Spark MLlib's collaborative filtering techniques, such as Alternating Least Squares (ALS), to build customized user experiences, which led to a 20% increase in user engagement.

Created custom functions and packages within the R console to automate repetitive tasks and improve the efficiency of data analysis workflows.

Using the R console, I applied various machine learning algorithms, such as decision trees, random forests, support vector machines (SVM), and neural networks, for predictive modeling and identifying patterns.

Hands-on expertise in the creation and implementation of cloud solutions that are built upon the Microsoft Power Platform. This involves utilizing a range of powerful tools including Power Apps, Power Automate, Power BI, Power Apps Portal, and the Common Data Service (CDS).

Seeking a challenging position where I can leverage my skills in developing interactive data visualization applications using R Shiny.

Employed Cosmos DB as a data storage solution and implemented strategies such as replica sets, sharing, and efficient document design to enhance scalability and features of the service.

Conducted troubleshooting and analysis of JCL abends, as well as re-executed batch jobs using scheduling packages like CONTROL-M, CA7, Tivoli, and ZEKK/ZEBB.

Extracted transaction data of all 11 big territories (1 million+) by PySpark and analyzed the data to forecast the areas (SkLearn/MLlib) with higher revenue in a 95% accuracy rate.

Created measures, calculated columns, and relationships, and performed time series analysis using DAX in Power BI.

Utilized data science tools such as Python, R, and SQL to extract, transform, and analyze large datasets and generate actionable insights.

Developed interactive data visualization dashboards using R Shiny to facilitate data exploration and analysis for internal stakeholders.

Utilized R and SQL to extract, clean, and transform data from various sources for use in Shiny applications.

Used Cascading Style Sheets (CSS) to maintain design consistency across all web forms.

Automated different workflows, which are initiated manually with Python scripts and Linux bash scripting.

Developed interactive data visualizations using D3.js and utilized CSS to enhance the styling and layout of the visualizations for optimal user experience.

Involved in designing the database migration from DB2 to Azure Cosmos DB for transferring data.

Developed and deployed a scalable data warehouse on GCP using Big Query, enabling the analysis of billions of records in seconds.

Hands-on experience in designing and developing cloud solutions based on Microsoft Power Platform using Power Apps, Power Automate, Power BI, Power Apps Portal, and Common Data Service (CDS).

Utilized CSS selectors and techniques for web scraping, extracting structured data from various websites for analysis and modeling purposes.

Used Cosmos DB as data storage and utilized aspects like replica sets, sharing, and clever document design to make the service extensible scale and feature-wise.

Troubleshoot and analyzed JCL abends and rerun batch jobs using CONTROL-M, CA7, Tivoli, and ZEKK/ZEBB Scheduling package.

Utilized Spark, Scala, Hadoop, HQL, VQL, oozie, pySpark, Data Lake, TensorFlow, HBase, Cassandra, Redshift, Mongo DB, Kafka, Kinesis, Spark Streaming, Edward, CUDA, MLLib, AWS, Python, a broad variety of machine learning methods including classifications, regressions, dimensionally reduction etc.

Designed, developed, and deployed a suite of Python modeling APIs specifically tailored for customer analytics. These APIs seamlessly integrate multiple machine learning techniques, enabling accurate prediction of user behavior and supporting diverse marketing segmentation programs.

Expertise in designing Star Schema and Snowflake Schema for Data Warehouse solutions. To accomplish this, I have utilized tools such as Erwin Data Modeler, Power Designer, and Embarcadero E-R Studio.

I have developed, trained, and deployed a wide range of machine learning and deep learning algorithms using popular frameworks like Keras, TensorFlow, and PyTorch. These implementations were conducted within the AWS SageMaker environment.

I have leveraged Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, and MLlib in various machine learning methodologies, including classifications, regressions, dimensionality reduction, and more.

Theoretical principles and practical application of various topics, including: (i) Supervised Learning: This encompasses linear regression, logistic regression, boosted decision trees, Support Vector Machines (SVM), neural networks, and Natural Language Processing (NLP) techniques, (ii) Unsupervised Learning: This involves clustering methods, dimensionality reduction techniques, and recommender systems, (iii) Probability & Statistics: I possess knowledge in probability theory and statistical analysis, including concepts such as experiment analysis, confidence intervals, and A/B testing, (iv) Algorithms and Data Structures: I have experience working with various algorithms and data structures, enabling efficient data processing and manipulation.

Experience in effectively handling large datasets by utilizing R packages such as tidyr, tidyverse, dplyr, reshape, lubridate, and caret. These packages enable efficient data manipulation, cleansing, and transformation.

Expertise in visualizing data using lattice and ggplot2 packages. These powerful tools allow for the creation of visually appealing and informative graphs and plots to effectively communicate insights derived from the data.

Experienced in Amazon Web Services (AWS) and Microsoft Azure, such as AWS EC2, S3, RD3, Azure HDInsight, Machine Learning Studio, Azure Data Lake. Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.

Implemented Installation and configuration of multi-node cluster on Cloud using AWS.

Excellent skills in Database administration (DBA), which encompass a range of tasks such as managing user authorizations, creating databases, tables, indexes, and performing backup operations.

Implemented MongoDB indexes and employed various optimization techniques to enhance query performance and reduce response time. This involved utilizing explain plans and applying performance tuning strategies to achieve optimal query execution.

Environment: Python, Matlab, Spark, Azure, Azure ML studio, AWS SageMaker, NoSQL, MySQL, MongoDB, Cassandra, PostgreSQL, Hive, Hadoop, SQL, Oracle, Apache Kafka, Spark’s Machine learning, Spark API, MapReduce, HDFS, NLP scripts, Tableau, Spark (Pyspark, SparkSQL, MIlib), AWS (S3/EC2), ODS, OLTP, Power BI, Oracle 10g, OLAP, DB2, Metadata, Tera Data, MS Excel, Mainframes MS Vision, Rational Rose, Unix/Linux.

CLIENT: PEPCO (Offshore) April 2021- Jan 2022 Data Scientist/Data Engineer

Responsibilities:

Collaborated with database engineers to implement ETL process, wrote and optimized SQL queries to perform data extraction and merging from SQL server database.

Conducted analysis in assessing customer behaviors and discover value of customers, applied customer segmentation with clustering algorithm.

Performed data integrity checks, data cleansing, exploratory analysis and feature engineer using python and data visualization packages such as Matplotlib, Seaborn.

Responsible for Retrieving data using SQL/Hive Queries from the database and perform Analysis enhancements.

Actively participated as a member of the Regulatory and Legal Compliance (RLC) team, successfully completing user stories (tasks) within critical deadlines in an Agile environment.

Utilized Collaborative Filtering, Alternating Least Squares, Similarity Matrix, and KNN to improve recommendation engine performance in apps based on user preferences, location, and age.

Evaluated multiple Machine Learning models in Python (SVMs, Logistic Regression, Neural Networks, XG Boost, Gradient Boost) to classify delinquent customers with a 62 Gini Index.

Used Python to implement different machine learning algorithms, including Generalized Linear Model, Random Forest and Gradient Boosting.

Built regression models with Ridge and Lasso Regularization to estimate the loan Annual Percentage Rates (APRs) for customers with low credit scores.

Implemented machine learning algorithms in R Shiny applications to provide predictive analytics solutions.

Presented key insights on customer spending behavior and provided recommendations on the best performing model to the leadership using Python and Tableau visualizations.

Conducted analysis on Confidential unstructured data from various sources such as e-Prescription and surveys collected through channels like call survey, IVR, Web, Chat, MyCigna Mobile, Claims, Home delivery pharmacy, and Benefits. Utilized supervised and unsupervised algorithms, as well as text mining packages like NLTK, Pandas, Scikit, tm, qdap, and caret, in both R and Python.

Designed and built interactive dashboards using R Shiny and Tableau to present data-driven insights to senior management.

Developed predictive models incorporating clustering, SVM, Bayes, and Elastic Net techniques to automatically classify comments based on business-defined attributes. Evaluated the performance of these models to ensure their effectiveness.

Utilized a range of machine learning algorithms, including decision trees, K-Means, Random Forests, and regression, in the R programming language. The necessary packages were installed to support these algorithms.

Integrated the ML model with the front-end using the Flask framework. Stored input data and intermediate results in AWS S3 buckets. Deployed the model as AWS Lambda and Step functions.

Transformed data with Data frame and Data set APIs in Spark and Scala. Recorded results in JSON daily on a real-time basis and sent to business owners across the organization through AWS SNS.

Utilized a comprehensive set of technologies and tools including Spark, Scala, Hadoop, HBase, Kafka, Spark Streaming, MLlib, R, and a diverse range of machine learning methods encompassing classifications, regressions, and dimensionality reduction.

Worked in an Agile environment using JIRA end-user stories.

Deeply passionate about exploring and visualizing data to uncover meaningful insights within the context of patient care. My goal is to leverage these insights and provide data-driven recommendations that can positively impact clinicians' workflow and enhance patient care outcomes.

With a strong clinical background and extensive knowledge of medical terminology, diagnoses, treatments, biomarkers, as well as hospital, pharmaceutical, and provider workflows, I possess the expertise to bridge the gap between healthcare and data analytics.

I am proficient in various aspects of data management and analysis, including data extraction, curation, normalization, feature engineering, cohort selection, and predictive analytics, utilizing tools such as SQL, PostgreSQL, R Studio, and Python, specifically tailored for healthcare data.

Additionally, I am adept at generating customized reports using Google Analytics to track and analyze website visitor interactions. This includes creating and managing custom reports that provide valuable insights into visitor behavior and interactions to support diverse teams and their specific needs.

Performed data visualization and Designed dashboards with Tableau, and provided complex reports, including charts, summaries, and graphs to interpret the findings to the team and stakeholders.

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau and Power BI.

Data science/Text mining/ Data wrangling with Python libraries –

Matplotlib/SciPy/NumPy/Pandas/Seaborn/Scikit/NLTK/Tensor flow/ggplot libraries

Neural networks, Gradient Boosted Decision trees, Random forests, Naive Bayes, Bayesian Networks, K Nearest Neighbor (KNN), Hyper parameter tuning, Regularization and Optimization, Supervised/Un supervised learning, Model & Multiclass evaluation, NLTK tool kit (Natural language processing), Spark MLlib.

Environment: Python (Scikit-Learn/SciPy/NumPy/Pandas), Linux, Tableau, Hadoop, Map Reduce, Hive, ETL DataStage, Oracle, Windows 10/XP, JIRA, Spark 1.6, Scala, AWS (EMR, S3, SNS), Git, Python (NLTK, PySpark), Flask, R.

CLIENT: ADP (Offshore) FEB 2020- March 2021

Data Analyst/ BI Engineer

Roles and Responsibilities:

Implemented Machine Learning, Computer Vision, Deep Learning and Neural Networks algorithms using TensorFlow, Keras and designed Prediction Model using Data Mining Techniques with help of Python, and Libraries like NumPy, SciPy, Matplotlib, Pandas, Scikit-learn.

Used pandas, NumPy, Seaborne, SciPy, matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms.

Worked with text feature engineering techniques like n-grams, TF-IDF, word2vec etc.

Applied Support vector machines (SVM) and it’s kernels such Polynomial, RBF-kernel on machine learning problems.

Worked on imbalanced datasets and used the appropriate metrics while working on the imbalanced datasets.

Worked with deep neural networks and Convolutional Neural Networks (CNN's) and Recurrent Neural networks (RNN‘s).

Developed low-latency applications and interpretable models using machine learning algorithms.

Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.

Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and MapReduce concepts.

Programmed by a utility in Python that used multiple packages (SciPy, NumPy, pandas).

Implemented Classification using supervised algorithms like Logistic Regression, SVM, Decision trees, KNN, Naive Bayes.

Responsible for design and development of advanced R/Python programs to prepare to transform and harmonize data sets in preparation for modeling.

Worked as Data Architects and IT Architects to understand the movement of data and its storage and ER Studio 9.7

Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.

Gathering, retrieving and organising data and using it to reach meaningful conclusions.

Developed a system for collecting data and generating their findings into reports that improved the company.

Developed a multi class, multi label 2-stage classification model to identify depression- related tweets and classify depression- indicative symptoms. Utilized the created model to calculate the severity of depression in a patient using Python, Scikit learn, Weka and Meka.

Aggregating data using technologies such as Kafka, Hadoop Distributed File System (HDFS), Hive, Scala, and Spark Streams within the Amazon AWS environment. These tools have enabled efficient processing and analysis of large volumes of data.

I have applied feature engineering techniques and utilized NLP methods, including Word2Vec, Bag of Words (BOW), Avg-Word2Vec, and Weighted Word2Vec, for text analysis and representation.

In addition, I have conducted sentiment analysis on customer email feedback by utilizing Long-Short Term Memory (LSTM) cells in Recurrent Neural Networks (RNN). This approach has allowed capturing the emotional tone and attitudes expressed in the text for further analysis.

Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, RDF.

Developed and maintained stored procedures, implemented changes to database design including tables and views and Documented Source to Target mappings as per the business rules.

Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.

Environment: Python 3.6.4, R Studio, MLlib, Regression, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPy Pandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, TensorFlow, Teradata, PCA, T-SNE, Cluster analysis, SQL, Scala, NLP, Spark, Kafka, Mongo DB, logistic regression, Hadoop, PySpark, CNN's, RNN's, Oracle 12c, Netezza, MySQL Server, SSRS, T-SQL, random forest, OLAP, Azure, HDFS, ODS, NLTK, SVM, JSON, XML, Cassandra, MapReduce, AWS, Linux.

CLIENT: COROMANDEL INTERNATIONAL (Offshore) NOV 2018- Jan 2020

DATA ANALYST/Business Analyst

Responsibilities:

Worked with subject matter experts (SMEs) and various stakeholders to ascertain the necessary specifications for identifying entities and attributes, facilitating the development of conceptual, logical, and physical data models.

Imported and Exported data files between SAS and other formats such as Excel, utilizing procedures like Proc Import and Proc Export. This involved importing and exporting various types of delimited text-based data files, including .TXT (tab delimited) and .CSV (comma delimited) files, to SAS datasets for further analysis.

Extensively used Python's multiple data science packages like Pandas, NumPy, matplotlib, Seaborn, SciPy, Scikit-learn and NLTK.

Utilized SQL, SQL PLUS, Oracle PL/SQL stored procedures, triggers, and SQL queries to manipulate data and load it into data warehouses and data marts.

Leveraged Python and Tableau to clean data and perform Exploratory Data Analysis (EDA) respectively to remove outliers, detect underlying patterns and find a correlation between variables.

Worked on data that was a combination of unstructured and structured data from multiple sources and automated the cleaning using Python scripts.

Developed Python scripts to find vulnerabilities with SQL queries by conducting SQL injection, permission checks, and performance analysis.

Designed MySQL relational databases, defined source fields and data dictionaries in collaboration with Data Compliance and Data Governance Team.

Scraped HTML data from webpages using REST APIs, parsed them in JSON format using Python scripts (Beautiful Soup and Selenium), and loaded them into the MySQL DB as structured data.

Improved fraud prediction performance by using random forest and gradient boosting for feature selection with Python Scikit-learn.

Written and executed numerous MySQL database queries using Python, utilizing tools such as Python-MySQL connector and MySQL DB package for seamless integration between Python and MySQL.

Adopting a Test-Driven Development approach, I have developed Merge jobs in Python for efficient extraction and loading of data into MySQL databases. Additionally, I have worked with SQL Server Integration Services (SSIS) and SQL Lite for database-related tasks.

Implemented machine learning model (logistic regression, XGboost) with Python Scikit- learn.

Performed second and third normalizations for OLTP data models.

Developed a technical brief based on the business brief. This contains detailed steps and stages of developing and delivering the project including timelines.

Collaborated with Data Scientists to create Data Marts with Star and Snowflake schema for ML projects.

Separately calculated the KPIs for Target and Mass campaigns at pre-promo-post periods with respect to their transactions, spend and visits.

Also measured the KPIs at MoM (Month on Month), QoQ (Quarter on Quarter) and YoY (Year on Year) with respect to pre-promo-post.

Extensively used SAS procedures like IMPORT, EXPORT, SORT, FREQ, MEANS, FORMAT, APPEND, UNIVARIATE, DATASETS and REPORT.

Worked extensively with data governance team to maintain data models, Metadata and dictionaries.

Analyzing the websites regularly to ensure site traffic and conversion funnels are performing well.

Collaborating with Sales and marketing teams to optimize processes that communicate insights effectively.

Conducted safety check to make sure that my team is feeling safe for the retrospectives.

Extracting features from the given data set and use them to train and evaluate different classifiers that are available in the WEKA tool. Using these features, we differentiate spam messages from legitimate messages.

Created numerous SQL queries to modify data based on data requirements and added enhancements to existing procedures.

Conducted safety check to make sure that my team is feeling safe for the retrospectives.

Developed Use Case Diagrams to identify the users involved. Created Activity diagrams and Sequence diagrams to depict the process flows.

Responsible for maintaining and analyzing large datasets used to analyze risk by domain experts.

Developed Hive queries that compared new incoming data against historic data. Built tables in Hive to store volumes of data.

Used big data tools Spark (Sparksql, MLlib) to conduct the real time analysis of credit card fraud based on AWS * Performed Data audit, QA of SAS code/projects and sense check of results.

Environment: Spark, Hadoop, AWS, SAS Enterprise Guide, SAS/MACROS, SAS/ACCESS, SAS/STAT, SAS/SQL, ORACLE, MS-OFFICE, Python (scikit-learn, pandas, Numpy), Machine Learning (logistic regression, XGboost), Gradient Descent algorithm, Bayesian, optimization, Tableau, Python, SQL (MySQL), Erwin, JSON, HTML

CLIENT: UNIFY TECHNOLOGIES AUG 2015-OCT 2018



Contact this candidate