Anubhav Rastogi US Citizen *****************@*****.***
Education: Bachelor's B. Tech in Computer Science & Engineering Amity University, Passing year 2014
PROFESSIONAL SUMMARY:
Overall 8+ years of experience in IT industry as a Data Scientist. Also have experience in machine learning, and deep learning. Good Experience of Software Development Life Cycle (SDLC), Agile and Waterfall.
Strong experiences working with data structures and algorithms. Experiences working with AWS (EC2, S3 bucket).
Experience in data engineering and building ETL pipelines on batch and streaming data using Pyspark and Spark SQL.
Good Experience of implementing Regression, Classification, Neural network, Clustering, Decision Trees, SVM, Random Forests, Naive Bayes, PCA, KNN, and K-means, feature engineering, parameter tuning, CNN, Time series analysis ARIMA, CNN, RNN.
Experience in developing Tableau dashboards to visualize staffing KPI and optimize patient flow & improve patient outcomes.
Expert in performing Data Cleaning, features scaling, features engineering using pandas and NumPy packages in python.
Performed Linear Regression, Logistic Regression, Decision Trees (CART), KNN, K-Means and Random Forest using python and R programming.
Work closely with account managers, business unit managers, and database engineers to review the technical specifications to build machine learning solutions quickly.
Experience building AI models in platforms such as Keras and TensorFlow.
Analysed different types of data to derive insights about relationships between locations, statistical measurements and qualitatively assess the data using R.
Experience on designing and developing Data Pipelines for Data Ingestion or Transformation using Python and PySpark.
Experience in designing deep-dive visualizations in Tableau for target audiences including Sales, Marketing, Product management, Business Development, Manufacturing Process Engineering Group.
Provided Tableau trainings to analysts in Sales and Marketing. One of the 2 across-company Tableau Online site administrators.
Experience in working with Relational DB (RDBMS) like Snowflake, MYSQL, PostgreSQL, SQLite and No-SQL database MongoDB for database connectivity.
Good Experience of Python and R libraries including Scikit learn, NumPy, Pandas, NLTK, Matplotlib.
Knowledge in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to Provide data summarization.
TECHNICAL SKILLS:
Tools
Python, MySQL, R, Pyspark, JMP, SAS, MS [Excel, PowerPoint, Word]
Data Visualization
PowerBI, Tableau, UX and UI Design, Canva, Figma, AdobeXD
Project Management
Agile and Scrum Methodologies, Wireframing, Mockups, Prototyping, Jira
Data Science/Analytics
Data cleaning, Data visualization, Descriptive analysis, Data Mining, Machine Learning, Regression Models, Hypotheses Testing, A/B Testing, RDBMS, Time Series Analysis, Adobe Analytics, Google Analytics, Deep Learning, Data Warehousing, ETL Development, DataLoad Production
Soft Skills
Team player, Leadership, Self-learner, Multitasker, Articulate & Proficient Written/Oral Communication
WORK EXPERIENCE:
Sr. Data Scientist
Core & Main - San Jose, CA: April 2021 Present
Responsibilities:
Involving in Design, Development and Support phases of Software Development Life Cycle (SDLC) and Agile.
Developed a framework for implementing predictive models to be used internally by developers and research associates
Use R and python for Exploratory Data Analysis, A/B testing to compare and identify the effectiveness of Creative Campaigns.
Working closely with machine learning engineers and other data engineers to deploy validated models into production.
Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
Designed ETL to process from different data sources and loaded data into destination tables by performing transformations using SSIS.
Developed the machine learning model to identify the tone of the customers who are interacting with AI bot using NLP algorithms and to analyze the customer feedback on the services from the feedback text given by customers to identify the satisfaction rate.
Created data marts in SQL database that were used for analysis, reporting, and Tableau visualizations.
Mentored interns and one entry-level data scientist.
Creating data visualizations and ad hoc reports that adhere to data-quality standard best practices to provide business insight.
Created Cross tables, Graphical Tables, Charts (Pie Chart, Tree Map, Heat Map, Bar Chart, Line Chart, KPI Chart, Combination Chart), Calculated Columns, Hierarchies, Bookmarks and complex reports with multiple filters.
Implemented a ROI analysis program utilizing ML algorithms by analyzing customer usage behavior patterns and trends, developing a propensity to convert to paid prediction model, and identifying the specific tenant groups that the trial program should target for optimal results
Designing and implementing a predictive model to handle the productive level of complex supply chain transaction requests.
Identified and tracked key performance indicators (KPIs) for the business, developing reporting dashboards to monitor KPI performance.
Use Alteryx for extract, transform and load (ETL) workflow from SQL database and other data sources for both exploratory data analysis, building useful models and visualization.
Work closely with account managers, business unit managers, and database engineers to review the technical specifications to build machine learning solutions quickly.
Applied machine learning to build a secondary credit model to calculate ROI of each loan by training with historical data in Lending Club.
Developed ETL scripts for large data sets; ETL performance optimization and automation
Use Statistical methods to build models using Python, R and H2O.
Build clustering algorithms that profiles market segments. Build machine learning model that predicts fraud transactions.
Working closely with machine learning engineers and other data engineers to deploy validated models into production.
Implemented and followed Agile development methodology within the cross-functional team and acted as a liaison between the business user group and the technical team
Wrote, executed, performance tuned SQL queries for data analysis and profiling and wrote complex SQL queries using joins, and sub queries.
Build visualization executive dashboards that measures key performance indicators in tableau.
Built a python script that transcribe audio into text for text and sentiment analysis using OpenAI Whisper, NLP, NLTK and SPACY packages.
Working with team cross-functionally to provide technical assistance for preparation of shell scripts for task automation.
Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow.
Leveraging data analytics involving trade dataset to across areas of supply chain, procurement to drive product development.
Developing smart dashboards and reports to track KPIs, identifying buy/sales trends, measuring marketing performance.
Fetched data from Snowflake data warehouse and performed ETL on the platform.
Environment: GitHub, Gitlab, JIRA, Python, Confluence, SQL, R, Pyspark, MS Excel, Apache, Spark, BigML, ggplot2, Tableau, Jupyter, Matplotlib, NLTK, Scikit-learn, TensorFlow, Weka, Pandas, NumPy, Apache Hadoop.
Data Scientist
ABM Industry - Austin, TX: January 2019 - April 2021
Responsibilities:
Developed and deployed a Python application to extract, structure, and analyze over 1.5 million records collected from a RESTful API service.
Coded Snowflake data loaders using Python. Reorganized large volumes of data.
Created and deployed several Python-based web scrapers using Selenium, Beautiful Soup, and Requests. Employed waits, try/except blocks, and user-agent headers to extract meaningful insights from open-source data.
Skilled in manipulating, extracting, aggregating, normalizing, and analyzing large data using SAS, R, SQL programming
Helped companies with social media analysis, performed text mining, NLP, sentiment analysis and presented the results using Link Graphs in SAS E-Miner.
Communicating SMEs and Domain Experts to identify the KPIs and Metrics related to the enterprise level application Log data for the Data Collection and Preparation.
Developed SQL queries to retrieve data from the database to enhance data handling for analysis.
Designed and tested complex ETL Mappings and Packages with SSIS based on business user requirements and business focus to load data from the source data server to target tables to enhance data flow for analysis.
Helped retrieve data for downstream users. Used logical functions in Microsoft Excel to analyze and stage data.
Used Python and R to generate regression models to provide statistical forecasting.
Used data blending techniques to merge data from multiple sources to create visualizations and publish them to the reporting server for easy accessibility by all stakeholders.
Developed Tableau visualizations and dashboards using Tableau Desktop.
Coach business units to find optimal KPI thresholds for their Operations and Performances.
Built visualizations by translating business requirements into forecasting and trend development dashboards for data modelers and the data science team.
Skilled in crafting and executing intricate ETL procedures utilizing SSIS, capable of extracting data from multiple diverse sources.
Build Time Series Forecasting models to forecast the infrastructure KPIs Memory utilization and CPU utilization using Kalman Filters and ARIMA algorithms for capacity planning and management.
Created production-grade code to convert machine learning models into RESTful API service and ML pipelines to be consumed at web-scale using AWS cloud. Developed RESTful API for machine learning services using Flask, OpenAPI.
Implemented analytical models into production. Applied machine learning algorithms to build new product features.
Coded Snowflake data loaders using Python. Reorganized large volumes of data.
Rendered ad hoc statistical analysis using R/Python across diverse clinical datasets to interprets biological information.
Environment: HBase, HDFS, Hive, Pig, Microsoft office, SQL Server Management Studio, MS Access.
Data Scientist
City Of Hollywood Tampa, FL: August 2017 - January 2019
Responsibilities:
Managed many analytical projects in parallel such as build predictive models, optimization models, unstructured data analysis, data graphs.
Developed cross-validation pipelines for testing the accuracy of predictions.
Enhanced statistical models (linear mixed models) for predicting the best products for commercialization using Machine Learning Linear regression models, KNN and K-means clustering algorithms.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, visualization and performed Gap Analysis.
Created ETL data pipelines to clean and process unstructured data with C# and SQL.
Boosted the performance of regression models by applying polynomial transformation and feature selection.
Generated report on predictive analytics using Matplotlib and Seaborn including visualizing model performance and prediction results.
Identified Key performance indicators (KPIs) among all the given attributes.
Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
Restricted data for users using Row level security and User filters.
Developed Tableau workbooks from multiple data sources using Data Blending.
Designed and performed Data Import/Export, Data Conversions, Data Cleansing and worked.
Involved in data merging and feature extraction using complex SQL queries.
Improved sales by 5% by performing marketplace segmentation and profiling using k- means clustering and identified high value and low value customer segments to give e- mail promotional offers.
Implemented, tuned, and tested the model on AWS EC2 with the best algorithm and parameters.
Wrote Unit, Functional, and Integration test cases for Cloud Computing applications on AWS.
Environment: GitHub, Gitlab, Google Sheets, Python, Confluence, SQL, Tableau, Jupyter, Matplotlib, Scikit-learn, Pandas, NumPy, JIRA, Salesforce, Data Studio.
Data Analyst
FIS Global Atlanta, GA: September 2015 - July 2017
Responsibilities:
Involved in all the steps and scope of the project reference data approach to MDM, have created a Data Dictionary and Mapping from Sources to the Target in MDM Data Model.
Designed the business requirement collection approach based on the project scope and SDLC methodology.
Installing, configuring and maintaining Data Pipelines.
Worked with departmental staff to define, gather, analyze, and wrote business requirements.
Translated business requirements into data analytics and reporting solutions to deliver actionable information to management.
Designed and built Predictive Analytic models using Machine Learning and Data Science.
Designed ETL to process from different data sources and loaded data into destination tables by performing transformations using SSIS.
Build machine learning model that predicts fraud transactions.
Developed reports for decision-making by management and Creating Pipelines in ADF using Linked Services/ Datasets/ Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
Working with relational database systems (RDBMS) such as Oracle and database systems like HBase.
Developed Automation Regressing Scripts for validation of ETL process between multiple databases like AWS Redshift, Oracle, Mongo DB, T-SQL, and SQL Server using Python.
Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System.
Environment: Python, Pyspark, Spark SQL, Plotly, Dash, Flask, Post Man Microsoft Azure, Autosys, Docker.
Data Analyst
John Deere Milpitas, CA: November 2014 September 2015
Responsibilities:
Using ORC, Parquet file formats on HDInsight, Azure Blobs and Azure tables to store for raw data.
Involved in writing T-SQL working on SSIS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
Performing PoC for Big data solution using Hadoop for data loading and data querying
Working with Data governance and Data quality to design various models and processes.
Files extracted from Hadoop and dropped on daily hourly basis into S3
Conducted statistical analysis and generated transaction reports on a monthly and quarterly basis for all the transaction data using R.
Authoring Python (PySpark) Scripts for custom UDFs for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks.
Writing Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Environment: Random Forests, Python (SciPy, NumPy, Pandas, StatsModel, Plotly), MySQL, Excel, Google Cloud Platform, Tableau, SVM.