Data Scientist

Fort Lee, New Jersey, 07024, United States
January 14, 2018

Abhishek Gupta

* ***** ** **perience in Data Extraction, Data Pipeline, Data Engineering, Machine Learning, and Statistical Analysis.

Experienced in Big Data Modelling, Database Optimization, and Query Plans using the query languages such as SQL, Hive, & Pig.

Experienced with common data science toolkits, such as R, Python, AWS, ETL, SQL/NoSQL databases and Tableau. EDUCATION:


Masters in Information Technology GPA: 3.93

Relevant Courses: Into to Statistical Modelling, Intro to Data Science & ML, Data Analysis and Decision Making, Data Mining. VIT UNIVERSITY June 2013

Bachelor in Information Technology GPA: 3.5


Cloud and Analytics Intern at ASCENSIA DIABETES CARE (Tarrytown, NY) June 2017 – Dec 2017

Dig into the large datasets (15 million records) using Redshift (AWS) to test these hypotheses using statistical techniques and develop optimal datasets to support insight analytics and interpretation. . Also, detected underperforming patients in advance.

Automated analyses of investment strategies and optimized Sharpe ratio among parameters sets using SQL, and Python.

Used cross validation and model selection methods for training data; calculated validation errors, test errors, and F1 scores to measure model performance; and achieved 82.5% accuracy.

Implemented in and out of sample back test on strategies, examined and mitigated model over-fitting problem using statistics.

Developed and maintained data pipelines to extract, transform, and load (ETL) business data, ensure data quality and optimize the performance of the automated daily process. Techniques include API collection, web scraping, and database management. Research Assistant at RUTGERS UNIVERSITY (Newark, NJ) Jan 2017 – May 2017

Performed analysis on Newark City Data in collaboration with Newark Police Department by applying Regression analysis, K means clustering and SVM for predicting the occurrences of future crimes within 95% CI and accuracy using R and its libraries.

Performed data analytics on classroom data for deriving classroom utilizations, forecast, and showed various dashboard on the classroom trends and patterns on the munged data, pulled from the MS-Access and Oracle database in Tableau.

Utilized SQL in SAS to sanitize and compile the raw data and automating statistic models to increase accuracy and scalability. Application Developer at IBM (Pune, IN) June 2013 – June 2016

Extracted and manipulated 20 million rows’ viewing data in sparse matrix, saving 90% memory usage in SQL, and Python.

Designed and created an application using python to perform ETL, model run, report generation & deployed in live environment.

Applied large-scale linear and nonlinear constrained optimization algorithms for managing the right portfolio for client markets.

Conducted end-to-end analysis that includes brainstorming, research, data gathering, processing, analysis, ongoing deliverables, and presentations, to generate business recommendations and showed visualization work in D3.JS and Tableau.

Leveraged 300+ sources of data to build machine learning models, including regularized regression and classification, GB-tree, SVM, GLM & ensemble models to optimize client KPIs and prove product value and presented to the senior managements.

Modified the business logic in the SQL Server Stored Procedure to update the asset cost evaluation correctly. Reviewed by the business analyst and deployed, which later made a savings worth of a 1.3 million annually. TECHNICAL SKILLS:

Languages: SQL, SAS, Python, R, C#, Scala and Hadoop Database Software’s: SQL Server 2014/2008, MongoDB, and Cassandra. DB Utilities / ETL: SSIS (Microsoft SQL Server Integration Services) Other Tools: Anaconda, MS Excel, and Microsoft Visual Studio. ACHIEVEMENTS:

IBM Manager’s Choice Award for excellence – 2013, 2015.

