Data Python

Location:

Jersey City, NJ

Posted:

November 16, 2020

Contact this candidate

Resume:

Page * of *

Vijay Kumar Reddy

********************@*****.***/ +1-732-***-****

SUMMARY:

• Experienced, self-motivated data scientist/python developer with experience in Statistical Modeling, Data Mining, Machine Learning, and Data Visualization with rich domain knowledge in Marketing, Healthcare, and Banking industries.

• Strong background in Machine Learning, Predictive Modeling and Data Mining with a broad understanding of Supervised and Unsupervised learning techniques and algorithms (eg: Regression, K-NN, SVM, Naïve Bayes, Decision trees, Clustering, etc.)

• Excellent experience in Python 3.6, with packages pandas, NumPy, datetime, matplotlib seaborn, scikit-learn, SciPy and stats models to apply data cleaning, data manipulation, data mining, machine learning, data validation and data visualization.

• Experienced in writing SQL queries in Oracle SQL server for ETL (Extract, Transform and load) from Large Datasets.

• Worked on SAS to perform Data Cleaning and Data Analysis.

• Implemented and developed Predictive models into our marketing operations such as propensity to buy, customer lifetime value and uplift models to adjust audience segmentation. Designed, implemented and analyzed A/B tests for different marketing media.

• Automated some of the API integration for the data consumption.

• Experienced with Recommender System Design by implementing Collaborative Filtering, Matrix Factorization, Clustering Methods, and Market Basket Analysis.

• Proficient in data visualization tools such as SQL Server, SSIS, Tableau 10.5, Power BI 2.30, Python Matplotlib/Seaborn, R ggplot2/Shiny to generate charts like Box Plot, Scatter Chart, Pie Chart and Histogram e,t,c. and to create visually impactful and actionable interactive reports and dashboards.

• Familiar with Hadoop ecosystem such as Apache Spark framework, HDFS, MapReduce, Sqoop, Kafka, Oozie, Apache Pig, HiveQL, SparkSQL and PySpark.

• Experienced in Non-relational database such as HBASE and Cassandra.

• Involved in entire data science project life cycle, including Data Acquisition, Data Cleansing, Data Manipulation, Feature Engineering, Modeling, Evaluation, Optimization, Testing and Deployment.

• Employed confusion matrix, Recall, Precision and AUC under Precision-Recall curve to check the performance of model on imbalanced dataset.

• Expert in Digital Marketing Analytics using Google Analytics by observing user behavior accessing the effectiveness of marketing campaigns, SEO and SEM. Decision making using various methods.

• Excellent understanding of Software Development Life Cycle (SDLC) such as Agile and Waterfall.

• Having Experience in using collaborative tools like Jira and used Github for the Version control.

• Strong business acumen and analytical skills to transform business resources and tasks into regularized data and analytical models, designing algorithms to support clients in various environments for solution development phase and digital road map creation. Page 2 of 7

TECHNICAL SKILLS:

Machine Learning: \ Documentation Tools: \

KNN, K-means, CART, Random Forest, \ MS Excel (Pivot Tables, vlookup, hlookup, \ Naïve Bayes, Cluster Analysis, Text Mining, \ index), MS Word, MS Power Point, Outlook, \ bagging, Gradient Descent, Adaboost, Neural \ MS Office 2010, MS Project \ Network, XGBoost, LDA, Decision Trees\

Statistics: \ Languages: \

Linear and Logistic Regression, PCA, ARMA, \ Python 2.7/3.3, R 3.3.4, SAS 9.0, Excel \ GARCH, VaR, Ridge, Lasso, Elastic-net, \ Macro, Spark 2.0, Pig, MapReduce, Matlab \ Hypothesis Test, ANOVA\ 1.7, Java\

R Packages: \ Databases: \

ggplot2, glm, e1071, caret, tsa, fnn, tm, \ MySQL 5.7, Oracle 11, SQL Server 2012, \ wordcloud, quantmod, splines, KernSmooth, \ Cassandra 3.0, HBase 1.0+, Microsoft Access \ MASS, LibSVM, sqldf, shiny\

Python Packages: \ Hadoop Ecosystem: \

seaborn, matplotlib, pandas, numpy, scipy, \ Hadoop 2.X, Spark 2.0, Hive 2.1, Hbase 1.0, \ bokeh, plotly, NLTK, scarpy, statsmodel, sqlite3\ MapReduce, Sqoop\ Analytical Tools: \ Visualization: \

Excel, IBM SPSS 20.0, E-views, Google \ Tableau 9.3, Power BI, Pivot Table, D3.js, \ Analytics \ R ggplot2, Python plotly \

IDE: \ Scripting Language: \

Anaconda 1.6, Jupyter Notebook 4.2, Spyder \ Unix Shell, SQL, Markdown \ 3.0, RStudio 1.0 \

PROFESSIONAL EXPERIENCE:

VANGUARD, MALVERN

AWS Data Engineer/ Data Analyst Jan 2020 - Present PROJECT TITLE: ADVICE AI/ML

The Vanguard Group is an American investment management company based in Malvern, Pennsylvania, that manages approximately $3.4 trillion in assets. It is the largest provider of mutual funds and now the second-largest provider of exchange-traded funds (ETFs) in the world after BlackRock, with about $451 billion in ETF assets under management, as of March 2015. It offers mutual funds and other financial products and services to retail and institutional investors in the United States and abroad.

Key Responsibilities:

• Experience with SQL/Presto/HIVE/Spark •

• Experience with AWS EMR.

Page 3 of 7

• Worked on the AWS and its various services like IAM, S3, EC2, Lambda, Glue, Cloud Formation according to Business requirement.

• Designed SSIS package to perform extract, transform, and load (ETL) data across different platforms, validate the data, and achieve the data from database.

• Identified patterns, data quality issues, and leveraged insights by communicating with BI team.

• Explored and Visualized the data to check the pattern, distribution, and correlation using Python Matplotlib and Seaborn on Jupyter Notebook.

• Experience with software engineering tools, such as Eclipse, Git, and others.

• Intermediate knowledge of automated testing, CI/CD, Atlassian tool suite (Jira, bitbucket, bamboo).

• Experience in Big Data Tools, Spark, Python & AWS understanding and was able to understand the cost implications and help steer teams towards Cost Optimized Solutions.

• Translates intermediate-level user-defined capabilities into system/product requirements. Works with more experienced technology specialists to understand business rationale and client expectations.

• Implements agreed upon solutions to documented business needs.

• Built Python API to consume the data from different sources.

• Designs and builds data pipelines as well assists with upgrades, enhancements or evaluations of any vendor products.

• Provides subject matter expertise in assigned projects through all phases of the development lifecycle. Defines field of use, standards, and best practices around assigned products and technologies.

• Supports assigned products/services by responding independently to client inquiries and resolving issues of intermediate complexity.

• Participates in special projects and performs other duties as assigned.

• Written test cases and Mocking the code using on DynamoDB, Kinesis and CloudWatch. Environment:

Hadoop 2.x, HDFS, AWS, ASW EMR, S3, CloudFormation, IAM, S3, EC2, Lambda, Oracle SQL, Hive, Impala, PySpark, Scrum, Agile Methodology, Kafka, Python 3.x (NumPy, Pandas, Scikit- learn, Matplotlib), Jupyter, Excel, Jira, Bitbucket, GitHub and Linux. Deutsche Bank – NY July 2019 to Dec 2019

Data Scientist

Project Title: Anti Financial Crime Model Development Deutsche Bank's Research team offers true global and industry knowledge that generates consistent outperformance through an in depth comprehensive multi asset class research product. Key research disciplines include Equity, Fixed Income, Economics, Foreign Exchange and Emerging Markets. Over the past few years Deutsche Bank has consistently ranked in the Top 3 in Institutional Investor's European surveys.

Goal: The purpose of the AML Financial Crime Investigations team is to ensure that AML coverage assessment, model review and validation is completed annually in accordance with the Anti-Money Laundering/ Sanctions model governance framework within the Americas. Reporting to the Model Development VP in New York and as part of the wider Americas AML and Anti Financial Crime Team. As part of team, my main focus is supporting and performing key responsibilities for model development and implementation process.

Key Responsibilities:

Page 4 of 7

• Work with street leading analysts to facilitate their requests for new and innovative data analytics.

• Help design and construct the data processing and analytics infrastructure necessary to perform advanced analytical research within a global enterprise.

• Created original insights from data by executing applied empirical research in a timely manner.

• Created content for existing research reports or help write stand-alone insightful research for Deutsche Bank's clients and documented.

• Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.

• Worked on different SWIFT message types across multiple categories and its tags.

• Developed parsing code for the different tags in the swift message using Pyspark to decode the data and get requirement.

• Programmer using code extensively and debug any issues. Python and R development with machine learning.

• Migrated ETL operations using PySpark for joins, filtering, and transformations in to the Hadoop system. Also worked with Hive and Impala for faster querying in Hadoop ecosystem.

• Developing models for various Risk Indicator to get the unusual activities and AML(Anti Money Laundering) using stratified methods and Back propagation.

• Tuning the data and testing with the survey to get the reviews and performance of the model.

• Maintained Scrum reports by managing the team and leading team in a right path to complete the production within deadlines.

• Worked on ETL Testing sanity checking, unit testing and end to end testing. Environment:

Hadoop 2.x, HDFS, Cloudera Data Science Workbench, Oracle SQL, Hive, Impala, PySpark, Scrum, Agile Methodology, Kafka, Python 3.x (NumPy, Pandas, Scikit-learn, Matplotlib), Jupyter, Excel, GitHub and Linux

Marlabs, Piscataway – NJ March2019 to June 2019

Data Scientist

Project Title: Customer Fraud Detection

Goals: The project was to build predictive models for the identification/detection of fraudulent customers by applying machine learning methods. Predicting whether new application of customer applied for the insurance is fake or genuine using the classification techniques in machine learning Our new model considered more features and involved more models and therefore successfully increased the accuracy of prediction for more than 5%.

Responsibilities:

• Applied models and regression, comparing various initial models, creating pipelines for data processing and presenting reports to other teams within the company.

• Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions. Excellent knowledge in Auto insurance.

• Used PCA and feature engineering techniques for dimensional reduction while maintaining the variance of most important features.

• Worked on fraud detection analysis on payments transactions using the history of transactions with supervised learning methods.

• Collected data in Hadoop 2.x and retrieved the data required for building models using Hive.

• Developed Spark Python modules for machine learning & predictive analytics in Hadoop using MLlib and PySpark.

Page 5 of 7

• Created Transformation Pipelines for preprocessing large amount of data with methods such as imputing, scaling, selecting, etc.

• Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models and utilized algorithms such as Decision Trees, Logistic regression, SVM and Random Forest.

• Ensembled methods were used to increase the accuracy of the training model with different Bagging and Boosting methods.

• Used cross-validation to test the models with different batches of data to optimize the models and prevent overfitting.

• Built the TABLEAU 10.5 Dashboard utilizing complex calculated field, real-time table calculations, filters, parameters. Generated context filters and used performance actions while handling huge volume of data.

Environment:

Hadoop 2.x, HDFS, Kafka, Storm, Hive, Python 3.x (NumPy, Pandas, Scikit-learn, Matplotlib), Jupyter, GitHub, Power BI, TABLEAU 10.5 and Linux

Saint Peter’s University, Jersey City, NJ Jan 2018 to Feb 2019 Data Scientist (Academic)

Project Name: Propensity of Non-Notification Claim. To build a Risk Selection model to identify Bad risk policies which could help in mitigating risk. Utilize data analytics as part of underwriting guide to highlight areas of focus and appetite when understanding risk dynamics in the consideration of individual risk acceptance through geography, line of business, attachment point and industry sector. Responsibility:

• Worked with Financial Institutions Portfolio Underwriting Team and Claim Management Team on underwriting strategies, design and claim process for new and renewal policy’s risk assessment and involved in End to End project implementation from Scoping and Data Acquisition to Model building and Deployment.

• Responsible for (ETL) Data Extraction, Transformation and Loading into Pandas DataFrame by establishing pyodbc connection with ODBC driver 13 for SQL Server from Enterprise Information (EI) and Claim management Systems (CMS).

• Performed Data Cleaning, Outlier Detection and Treatment, Feature Engineering using Python 3.6 and other Python libraries like Pandas, NumPy along with user defined functions.

• Performed detailed portfolio analysis as part of Exploratory Data Analysis (EDA) and generated statistical reports in Python using Pandas, Matplotlib, Seaborn packages.

• Performed Chi-Squared, Recursive Feature Elimination and F-Test to filter and narrow down the interaction variables by selecting only the significant variables which constitute 5% or greater of the total data.

• Considered Policy Inception year from 2011-2018 to create train and test and then applied the model on Out of time bound Policies to evaluate the performance.

• Handled Class Imbalance using Class weights method and various techniques like SMOTE for over sampling and NearMiss for under sampling using Imblearn package.

• Applied and Evaluated Several ML and Deep Learning models like Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Xgboost, Artificial Deep Neural Page 6 of 7

Network in Python and Keras, TensorFlow using Scikit-Learn Library to build a Classification Model.

• Employed Regularization to penalize the statistically insignificant features in the Logistic Regression model.

• Applied Grid Search to optimize parameters and K-fold Cross validation to select models.

• Employed a user defined detailed classification report function to evaluate the performance of the model which includes Confusion Matrix, Precision, Recall, Accuracy, F1-Score, Support and AUC for every model.

• Visualized and created reports in Tableau and Python using Matplotlib, Seaborn.

• Followed Agile methodology in entire project lifecycle and used collaborative tools like GitHub and Git for version controlling all the different versions of the Jupyter Notebooks.

• Involved in the Implementation and deployment of the Model in the form of Excel dashboards. Model refresh will take place every quarterly if the performance drops below thresholds defined by business.

Environment:

Python 3/2.7, SQL Server, Anaconda 3.X, Jupyter Notebook 5.7, Logistic Regression, Metrics, Pandas 0.24, Numpy 1.14, Matplotlib 3, Scikit-Learn 0.19, Tableau, Seaborn 0.9, Git, MS Excel eCloud Labs, Hyderabad, India. Jan 2016- Aug 2017

Marketing Analyst / Junior Data Scientist

Project title: Marketing Research

Pitney Bowes is a global technology company most known for its postage meters and other mailing equipment and services, and with recent expansions, into global e-commerce, software, and other technologies. The project is migrating to big data technologies and store the client data which we get from various sources to analyze client information and forward to BI teams to make reports on it. The challenges are providing customer satisfaction and increase the sales. Responsibilities:

• Analyzes marketing audience performance, identifying business impact of marketing approaches and audience segments

• Develops and deploys advanced functionalities using Marketing Automation to support a Data Driven Marketing Enterprise, such as develop complex flow charts for response handling, scoring, and nurture

• Responsible for enabling the consumption of the data our team provides to key team members in the Sales & Marketing organizations

• Work with the agencies designing the reports and manages the cadence for each report

• Ensure my reports & analysis always highlight the key takeaways or decisions the team(s) can make with the information.

• Optimized online marketing performance in terms of revenue, ROI and other KPIs by working with website developers and account managers.

• Upsold current clients on web marketing services by at least 20% during yearly renewal periods.

Page 7 of 7

• Optimized website copy, title tag and meta descriptions through keyword research for SEO value.

• Improved marketing strategies and best practices for verticals outside of the companies' core competencies of hotel and restaurant vertical

• Trained peers on Google Analytics platform - where to find data and how to use it for optimization.

• Provided monthly, quarterly and other ad hoc performance reports with strategic recommendations for clients using ROI analysis, detailed

• Created targeted Google Ads and monitored web traffic with Google Analytics

• Instituted online and in-store customer tracking tools to measure sales conversion rates

• Segmented customer audience and designed Constant Contact emails to provide relevant bi- monthly.

• Produced various kinds of reports using Tableau and Power BI based on client requirements. Environment:

Hadoop, MapReduce, Yarn, Hive, HBase, Spark, Python, Oozie, Sqoop, Linux and Maven, Git. EDUCATION:

• Master’s in Data Science with Concentration in Business Analyst, Saint Peters University- Feb 2019.

• Diploma in Data Science

• Bachelor’s in Computer Science Engineering - May 2017. CERTIFICATIONS:

• Bloomberg

• Data Science

• SQL Silver Belt

• Business Intelligence Specialist

• Google Analytics

Contact this candidate