Sukrit Singh Arneja
Data Scientist Data Analyst 443-***-**** **************@*****.***
MD 21227
Professional Summary:
Highly qualified individual with 3 years of experience in Data Extraction, Data Modeling, Machine Learning, Data Mining, and Data Visualization.
Domain knowledge and experience in Healthcare, Manufacturing and Retail industries.
Proficient in managing entire data science project life cycle and actively involved in all the phases of project life cycle including data acquisition, data cleaning, data engineering, features scaling, features engineering, statistical modeling and data visualization.
Experienced with various supervised and unsupervised machine learning algorithms such as Linear Regression, Logistic Regression, SVM, Decision Tree, KNN, Naive Bayes, Random Forest, SVM, Boosting, K-means Clustering, Hierarchical clustering, PCA, Feature Selection and Collaborative/Content Filtering.
Hands-on experience in using packages in Python like matplotlib, scikit-learn, numpy, scipy, pandas and seaborn.
Excellent understanding in developing and maintaining SQL databases for a wide variety of business uses.
Proficient in writing complex SQL queries like stored procedures, triggers, joints, and subqueries.
Expertise in designing and creating various analytical reports and dashboards in tools like Tableau and Qlik to help users to identify critical KPIs and facilitate strategic planning in the organization.
Working experience in version control tools such as GitHub to coordinate work with multiple team members.
Knowledge in employing various SDLC methodologies such as Waterfall, Agile and SCRUM methodologies.
Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making.
Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
Good team player and quick learner; highly self-motivated person with good communication and interpersonal skills.
Technical Skills:
Programming Languages
Python (NumPy, SciPy, scikit-learn, Pandas, Matplotlib, etc.), R, XML, HTML, SQL, PL/SQL
Machine Learning
Linear Regression, Logistic Regression, Decision Tree, Random Forest, KNN, SVM, K-Means, etc.
Databases
Oracle, MS SQL Server 2012/2014/2016, MS Access, MySQL
Cloud Platform
AWS (Redshift, S3, EC2, etc.)
Operating systems
Windows 7 or more, Windows Server 2008/2012, UNIX, LINUX, IOS
Analytical Tools
Jupyter Notebook, Excel, Minitab, Mathcad, Weka
Visualization Tools
Tableau, Qlik Sense, Qlik View
RPA
Automation Anywhere
Project Management Tools
Smartsheet, Ms Office, Ms Project
Professional Experience:
Visa, Littleton, CO Dec 2019 – Jan 2021
Data Scientist
Responsibilities:
Used Medicare claims dataset and list of provider exclusions dataset for fraud labels to build and assess machine
learning models – SVM, decision tree and logistic regression.
Tackled highly imbalanced Fraud dataset using undersampling with ensemble methods, oversampling with SMOTE and cost sensitive algorithms using scikit-learn.
Improved fraud prediction performance by using random forest and gradient boosting for feature selection using scikit-learn in Python.
Used data analytics to assess processes, determine requirements and deliver data-driven recommendations and reports to executives and stakeholders.
Used metrics like F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate the performance of each model.
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Conducted data exploratory analysis to gather insights and identify correlations between CMS 5-star ratings for nursing homes and Covid-19 cases and deaths.
Worked on data cleaning and ensured data quality, consistency, integrity using Pandas, Numpy.
Participated in feature engineering such as feature intersection generating, feature normalize and label encoding with scikit-learn preprocessing.
Conducted a research on the application of various algorithms and statistical modeling like decision trees, regression models, neural networks, SVM, clustering to identify data volume using scikit-learn package in python.
Performed advanced SQL queries in response to business needs and gained proficiency in advanced aggregation functions, windowing functions, datetime wrangling, and working with multiple layers of subqueries.
Gained experience presenting insights and trends from the data to a non-technical audience.
Environment: jupyter notebook, python (scikit-learn/scipy/numpy/pandas), oracle sql, machine learning (SVM/logistic regression/decision tree), weka, github
Arch Insurance, Jersey City, NJ Sep 2018 – Dec 2019
Data Analyst Intern
Responsibilities:
Provided insights, suggest recommendations & influence the direction of business by effectively working with & communicating results to cross functional groups to solve open-ended business problems.
Identify and quantify root-cause data quality issues within the organization and assist in the development of plans toward resolution.
Built machine learning algorithms such as Linear & Logistic Regression, Decision Tress, Naïve Bayes, along with ensemble models in Python. Evaluated and validated the models with test data and choose the best option based on the appropriate metric.
Improved existing processes by eliminating variation & non-value-added work, resulting in huge cost-savings.
Collected & analyzed data for improvement tracking & build excel models for forecasting future business plans.
Involved in complete Software Development Life Cycle (SDLC) process by analyzing business requirements and understanding the functional workflow of information from source systems to destination systems
Built reports and dashboards in tableau and qlik to help organization understand progress and trends of key strategic initiatives and projects.
Used drill downs, filter actions and highlight actions for developing dashboards in Tableau.
Used joins, correlated and non-correlated sub-queries for complex business queries in SQL involving multiple tables from different databases.
Worked with different data formats such as JSON, XML and performed machine learning algorithms in Python.
Participated in weekly review meetings with the Technical Program Manager to discuss project Status, reports design and/or requirements.
Environment: python, JSON, XML, ETL, oracle, tableau, scikit-learn, numpy, pandas, qlik, PL/SQL, SDLC, agile, machine learning (naïve bayes/KNN/regressions/random forest/SVM/ensemble)
Wipro, India May 2017 – Jan 2018
Data Analyst
Responsibilities:
Investigated and analyzed complex customer data sets for multi-billion-dollar businesses using advanced querying, visualization and analytics tools, resulting in an increase in customer satisfaction and revenues.
Worked on large datasets containing data about web applications usage and online customer surveys.
Write SQL queries to store, sort, and retrieve data. These ranged from basic commands like read, write, create, and update to complex commands that fetch and manage data across multiple tables.
Improved sales and logistic data quality by data cleaning using numpy and pandas in Python.
Designed data model from the scratch to assist the team in data collection to be used for Tableau visuals.
Collected and analyzed data for improvement tracking and build models for forecasting future business plans.
Identified patterns and trends in data sets and reported the results back to the relevant members of the business.
Worked on data transformation and accessed raw marketing data in varied formats with different methods for analyzing and processing.
Worked with BI team in data investigation, responsible for interpreting variables and data visualization.
Tested the models for problems like goodness of fit, over-fitting, multicollinearity, residual normality, etc.
Communicated and coordinated with other departments to collection business requirement.
Utilized various techniques like histogram, bar plot, pie-chart, scatter plot, box plots to determine the condition of the data.
Performed quality check on data, identifying outliers, normality, and standardizing the data.
Developed data dictionaries for project coordination and future access.
Environment: SQL, matplotlib, numpy, pandas, tableau, ms word, ms excel, smartsheet, python, unix/linux, oracle, csv, SQL Server, ms access
Academic Experience:
Data Science Project – Analysis on 311 Service Requests in Baltimore
Using RStudio, predicted time taken to resolve streetlights out requests in Baltimore based on neighborhoods and found frequencies of requests with regards to population, race, neighborhood and service request type.
Visualized crime rates with respect to empty buildings in a neighborhood and time & day of a week. Analysis involved Data Wrangling and Exploratory Data Analysis (EDA) of 311 service requests.
Data Mining Project – Fact Extraction and Verification
Goal of the project – Train machine learning systems to determine the accuracy of factual assertions online through text mining. Data preprocessing involved text cleaning, tokenization and forming a dictionary and corpus for topic modeling.
Built an insightful topic model on python, based on the Latent Dirichlet Allocation (LDA) algorithm to extract set of topics from the datasets. Extracted features were then classified using Doc2Vec approach.
Systems and Information Integration – Semantic Interoperability Project
Integrated information from multiple repositories by designing a global layer with metadata information and canonical representation of databases on top of a local layer with participating databases.
Built a system to decompose global query into a set of subqueries, one per each database and execute using dynamic SQL.
Advanced Database Oracle PL/SQL Project – Ez-Pass Toll Management System
Designed an Ez-pass toll management database system using some sample data and implemented features such as allowing users to login, deduct toll of a trip or generate video toll bill, display trips and payments, generate monthly statement, etc.
Education:
Master’s in Information Systems, Specializing in Data Science Graduation: Dec 2019
University of Maryland Baltimore County GPA: 3.9/4.0
Bachelor of Science, Mechanical Engineering Graduation: May 2017
Colorado State University, Fort Collins, CO GPA: 3.4 /4.0