Data Python

Location:

San Francisco, CA

Posted:

October 15, 2020

Contact this candidate

Resume:

VIKAS MARUPADIGA

571-***-**** ***********@*****.***

Professional Summary:

A data science enthusiast having 7+ years of experience, having broad and in-depth knowledge in statistics, python programming with a strong math background.

Identified areas of improvement in business by uncovering insights by analyzing huge amount of data using various machine learning techniques.

Effectively utilized analytical applications like Python and R to identify trends and relationships between different chunks of data, to draw meaningful inferences and translate analytical conclusions into risk management and marketing strategies that drive value.

Experienced in facilitating the entire lifecycle of a data science project: Data Extraction, Data Pre-Processing, Feature Engineering, Dimensionality Reduction, Algorithm implementation, Back Testing and Validation.

Expertise in using Python libraries and packages in R.

Proficient at wide varieties of Data Science programming languages Python, SQL, Tableau, Power BI, Sci-kit Learn, NumPy, SciPy and Pandas.

Expert knowledge in creating the Dashboards end to end (scripting, data modelling Star Schema & Snowflake Schema, Front end Charts) using Power BI Data visualization tools.

Good understanding of Tableau Desktop architecture for designing and developing dashboards.

Proficient in Machine Learning techniques (Time series forecasting, Decision Trees, Linear/Logistic Regressors, Random Forest, SVM, Bayesian, XG Boost, K-Nearest Neighbors), AWS Sage Maker and comprehend, Natural Language Processing and Statistical Modeling in Forecasting/ Predictive Analytics, Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/PCA, Ensembles.

Extensive experience in SQL, PL/SQL stored procedures, triggers and Query optimization in Oracle, MS SQL Server, MySql, DB2 Databases and performed tuning and database normalization.

Experience in Extraction, Transformation and Loading (ETL) data from various sources into Data Warehouses, as well as data processing like collecting, aggregating and moving data from various sources using Apache Flume, Kafka, PowerBI and Microsoft SSIS.

Experience with emerging technologies such as Big Data, Hadoop, NoSQL and AWS.

Experience in RDBMS using Oracle 11g/10g.9i, IBM DB2, MySQL database servers.

Highly Proficient in Agile, Iterative, and Waterfall and Scaled Agile Framework (SAFe) software development life cycles.

Experienced in developing data models and processing data through big data frameworks like HDFS, Hive and Spark to access streaming data and implementing data pipelines to process real-time data and recommendations.

Experience developing SQL procedures on complex datasets for data cleaning and report automation.

Good understanding on the usage of NoSQL database column oriented, HBase.

Proficient in SQL Server and T- SQL (DDL and DML) in constructing Tables, Normalization techniques on database Tables.

Familiarity with AI and Deep learning platforms/methodologies like tensor flow, RNN, LSTM.

Hands on experience in various version control systems: GitHub, Bitbucket and SVN developing on Linux, Mac, Windows Operating Systems.

Technical Summary:

Machine Learning

Classification, Regression, Feature Engineering, Clustering, Regression analysis, Naive Bayes, Decision Tree, Support Vector Machine, KNN.

Statistical Analysis

Time Series Analysis, Regression models, Confidence Intervals, Principal Component Analysis and Dimensionality Reduction, Cluster Analysis

Programming Languages

Python (Pandas, NumPy, SciPy, Scikit-learn, Seaborn, Keras, Plotly, Matplotlib, Tensorflow, NLTK), R (dplyr, ggplot, shiny, caret, plotly, tidyquant, lubridate, xgboost, randomforest), SQL, Java, C, C++

Database

Oracle 10g/11g, PostgreSQL, DB2, SQL Server, MySQL

IDE

Jupyter Notebook, Eclipse, STS (Spring Tool Suite)

Visualization and other tools

OBIEE, Oracle Analytics Cloud, BICS, PowerBI, Tableau, AWS, Azure, Docker

Operating Systems

Window, UNIX, Linux, MacOS

Professional Experience:

Wells Fargo, San Francisco, CA July 2018- Present

Machine Learning Engineer

Roles and Responsibilities:

Requirement discussions with the clients and coordinating with Operations team to understand the manual work around done by the team.

Review the estimates for work items, help the team with impact analysis and participate in estimation justification meetings with the clients.

Identifying the points from where we need to get data sources and data consolidation process.

Involved in Data Pre-processing Techniques such as data cleaning, visualizing the data, identifying outliers and making data ready for Machine Learning Techniques.

Feature engineering & extraction, Auto Encoder pipelines for both numeric & categorical data, effectively handling imbalanced datasets, Hypothesis sampling tests.

Machine Learning Algorithms Evaluation & performance boosting.

Implemented ensemble techniques of bagging (Random forest) and boosting like ADA Boost, Gradient Boosting, XG Boost techniques.

Built multiple classification algorithms, such as Logistic Regression, Support Vector Machine (SVM), Random Forest,

Ada boost and Gradient boosting using Python, Scikit-Learn and evaluated the performance on customer discount optimization.

Analyzed data using data visualization tools and reported key features using statistical tools and supervised machine learning techniques such as Naïve Bayes, Decision Trees, Random Forest and Gradient Boosting for predictive analysis using python Scikit-Learn to achieve project objectives

Environment: Python 3.3, Scikit Learn Libraries, Data Visualization Matplotlib/Seaborn, Machine Learning Algorithms, PowerBI Reporting tool & Dashboard development, OBIEE Business Intelligence Tool

American Express, Phoenix – AZ Dec 2016-May 2018

Data Scientist

Roles and Responsibilities:

The project uses streaming Wi-Fi network data to understand the occupancy of various buildings in the campus to determine the usage, prepare alternatives for usage and reduce spending on leasing and buying real estate to accommodate growing need of resources.

Efficiently translated raw data to numbers, generated visualizations and dashboards using R shiny with streaming data using Kafka and AWS Lambda for business validation against existing databases.

Prepared presentations and prototypes for product owners providing various insights of the data related to days and times of high occupancy, areas of high occupancy and users with least usage of space assigned.

Prepared data for modelling from additional sources and documented development of a predictive model using Poisson regression to analyze building occupancy and allow better functioning of the campus.

Researched and explored data from various network API’s provided by the third party, documented data transformations required and communicated the data requirements to the data engineers to ensure the data pipeline is built using Apache Kafka and AWS Lambda, EC2 and S3.

Prepared logical schema of the database to ensure the historical data is stored in Snowflake data warehouse and communicated with the database administrators to ensure the physical tables are generated.

NOSQL allowed me to use a programming framework that supports the processing and storage of extremely large data sets in a distributed computing environment. NOSQL allowed to scale large amounts of machines while having greater control over the availability of data.

Developed BI reports that provided predictive analytics & Reporting with dashboards on analyst performance, client activity, future workload & anomalies based on data collected from discrete

applications. Apache spark is used to develop ML models, API keys and various applications during projects.

Developed ETLs for data sources used in production reporting for marketing and operations teams

Promoted enterprise-wide business intelligence by enabling report access in SAS BI Portal and on Hadoop

Built tools for detecting botnets and ad fraud hidden in terabytes of real-time ad market data.

Environment: Anaconda, Python, NOSQL, AWS Lambda, EC2, S3, Kafka, Hadoop cluster, PowerBI.

Cigna, Louisiana, LA Oct 2014-Dec 2016

Data Scientist

Roles and Responsibilities:

Worked on projects starting from gathering requirements to developing the entire application. Hands on

with Anaconda Python Environment. Developed, activated and programmed in Anaconda environment

Performed Exploratory Data analysis (EDA) to find and understand interactions between different fields in

the dataset, for dimensionality reduction, to detect outliers, summarize main characteristics and extract important variables graphically.

Responsible for Data Cleaning, feature scaling and feature engineering using NumPy and Pandas in Python

Extract data and actionable insights from a variety of client sources and systems, find probabilistic and

deterministic matches across second- and third-party data sources, and complete exploratory data analysis.

Worked on writing and as well as read data from CSV and excel file formats.

Proficient in object-oriented programming, design patterns, algorithms, and data structures.

Wrote python routines to log into the websites and fetch data for selected options.

Implemented Python scripts to update content in databases and manipulate files.

Experience using python libraries for machine learning like pandas, numpy, matplotlib, sklearn, SciPy to

Load the dataset, summarizing the dataset, visualizing the dataset, evaluating some algorithms and making

some predictions.

Worked with python modules like urllib, urllib2, requests for web crawling. Experience using ML techniques: clustering, regression, classification, graphical models.

Carried out Regression, K-means clustering and Decision Trees along with Data Visualization reports for the management using R

Implemented classification algorithms such as Logistic Regression, KNN and Random Forests to predict the Customer churn and Customer interface.

Implemented various algorithms like Principal Component Analysis (PCA) and t-Stochastics Neighborhood Embedding (t-SNE) for dimensionality reduction and normalize the large datasets.

Performed data visualization and designed dashboards using Tableau, generated reports, including charts,

summaries, and graphs to interpret the findings to the team and stakeholders.

A keen eye for code re-usability and maintainability.

Involved in debugging and troubleshooting issues and fixed many bugs in two of the main applications

Developed, tested and debugged software tools utilized by clients and internal customers

Coded test programs and evaluated existing engineering processes.

Designed data warehouses on platforms such as AWS Redshift, Azure SQL Data Warehouse and other high-performance platforms.

Knowledge in extracting and synthesizing of data from Azure: data lake storage (ADLS), blob storage, SQL

DW, SQL Server; and legacy systems: Oracle and its companion data lake storage

Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines,

such as Azure, AWS, GCP, etc.

Developed Full life cycle of Data Lake, Data Warehouse with Bigdata technologies like Spark and Hadoop.

Environment: Anaconda, Python, R, Hive, Spark, Kafka, MapReduce, Apache Airflow, AWS, Nginx, Tomcat,

Numpy, Pandas, Matplotlib, Sklearn, Scipy, PCA, k-means Clustering, Decision Trees, KNN, Random Forest, t- SNE, Tableau

Virtusa, Hyderabad, India Jun 2012-May 2014

Data Analyst/ Reporting Analyst

Roles and Responsibilities:

Led the development of prescriptive data model to optimize operations and increase packaging efficiency

Analyzed requirements, developed and debugged applications using BI tools

Interacted with the users and business analysts to assess the requirements and performed impact analysis

Used agile software development with Scrum methodology.

Worked in coordination with clients for feasibility study of the project in Spotfire and with DB Teams for designing the database as per project requirement.

Produced databases, tools, queries and reports for summarizing and root causing failure data.

Designed, developed, tested, and maintained Spotfire reports based on user requirements.

Designed and maintained Tableau visualization dashboards on daily basis to facilitate Field testing.

Strong experience in creating sophisticated visualizations, calculated columns and custom expressions.

Developed Map Chart, Cross table, Bar chart, Tree map and complex reports which involves Property Controls, Custom Expressions.

Extensively used Spotfire and Tableau to create data visualizations enabling prescriptive operations and increasing data visibility.

Created Information Links for many databases for populating data into Visualization from underlying databases such as Sql Server and Oracle.

Built scalable pipelines, model deployments and reusable protocols using pipeline pilot to support daily operations

Preparing reports, extracts and integration of existing data as needed to support analysis, visualizations, and statistical models.

Extensively worked on designing and creating the dashboards. Also, automated Dashboards to help end users to identify the KPIs and facilitate strategic planning in the organization.

Used data blending for reading data from different data sources, while designing dashboards.

Created ad-hoc SQL queries for customer reports, executive management reports.

Extensively worked on Tableau dashboards optimization for better performance tuning.

Extensively worked with various functionalities like context filters, global filters, Drill-Down Analysis, Parameters, Background images, Maps, Trend Lines and created customized views, conditional formatting for end users.

Assisted report/visualization developers and statistical modelers in filtering, tagging, joining, parsing, and

normalizing data sets for use in reports & analytical models.

Extract, transform, and load data from a variety of data sources into staging datasets and ultimately into

visualization tool.

Worked on Data blending, Data preparation using Pipeline pilot for tableau consumption and publishing data sources to Tableau server.

Developed custom SQL connections and calculated fields within the environment to facilitate automated production of data visualizations.

Provided production support to various tools and created tickets in Jira.

Environment: TIBCO Spotfire 7.x, Tableau, Pipeline Pilot, Alteryx, Oracle, SQL Server, MS SQL, AWS, Python, R-Script, SharePoint, APIs, Insomnia, Postman, Jira, MS Office Suite (Word, Excel, PowerPoint)

EDUCATION:

Bachelor’s in Computer Science,

Jawaharlal Nehru Technological University, Hyderabad, India May 2013

Master’s in Data Analytics Engineering,

George Mason University, Fairfax, Virginia May 2016

Contact this candidate