Data Scientist Machine Learning

Location:

Marlton, NJ

Posted:

December 05, 2023

Contact this candidate

Resume:

Aakash Parwani

Marlton, New Jersey *****

Phone: 551-***-**** » Email: ******.*******@*****.*** » Github

PROFESSIONAL PROFILE

An insatiable intellectual curiosity and the ability to mine hidden gems located within large sets of structured, semi-structured and unstructured data with demonstrated experience in the delivery of analytics that unlocks data for insights-driven companies. Focused on building teams and relationships that empower the technical innovations that enable deeper connections with customers and reduces friction in business processes.

Leverages wide range of statistical and machine learning methodologies s to obtain project requirements and implement solutions that drive bottom line. Recognized for providing outstanding support in code quality, code review sessions, data life cycle development, software development, accuracy and strong work ethics.

AREAS OF EXPERTISE

»Successful track record in creating roadmaps and managing a robust pipeline of data-driven deliverables.

»Leadership and interpersonal skills that inspires teams to be effective.

»Extensive experience with SDLC, data analysis, visualization & management technologies like: PYTHON, R, AWS, TABLEAU, POWER BI, SQL Etc.

»Highly accurate and experienced Data Scientist adept at collecting, analyzing, and interpreting large datasets, developing new forecasting models, and performing data management tasks.

»Skilled in providing data analytics support including create key metrics, trend analysis, data modeling, data wrangling, pattern recognition.

»Profound knowledge and ability to write stored procedures, temporary tables, views indexes, and triggers when required.

»Good understanding in database and data warehousing concepts (OLTP and OLAP).

PROFESSIONAL EXPERIENCE

Capgemini America, Inc. Jul 2021 –

Role: Lead Data Engineer

»Directs the planning, documenting, and executing events for the plans; along with responsibilities, budgets, timelines, and project recruitment to positively complete projects on a timely basis and in budget.

»Own data acquisition into Snowflake and data delivery to downstream system via a variety of tools.

»Design, develop, and test data pipelines and data transformations for variety of data sources.

»Work closely with architecture teams and cross-functional IT teams to ensure solution alignment.

»Prepare data for predictive and prescriptive modeling.

»Performed data analysis and visualization using Pandas, NumPy, Matplotlib, and Seaborn.

»Developed and maintained data marts for financial forecasting, risk analysis, and portfolio optimization using Snowflake’s cloud-based data platform, utilizing its features such as virtual data warehouses, clustering, materialized views, and data sharing.

»Migrate accounting applications from legacy platforms like: PeopleSoft etc. into AWS Cloud environment.

»Determine the rules and workflows that define and monitor the process of creating, modifying, storing, and deleting data for the Master Data Solution and audit data to ensure it is in accordance with the Master Data Governance program.

»I used Apache Spark and other distributed data processing technologies to handle and examine massive financial data, such as transactions, market prices, and customer behavior, using Python, Scala, and SQL.

»I deployed and managed machine learning models using Docker containers and cloud native platforms, such as Kubernetes, AWS, and Azure, to ensure scalability, reliability, and security of the solutions.

Tools: Python, AWS Services: S3, Lambda, Step Functions, AWS GLUE, CloudWatch, SNS etc., SnowFlake, POWERBI, PyTorch, Scikit-learn, Snowpark, Pandas, Numpy, Financial forecasting.

Cygnus Professionals Oct 2019 – Jun 2021

Role: Data Engineer

Client: Prudential Financial, Newark, NJ

Prudential Financial provides financial products and services, including life insurances, mutual funds, pension, as well as administration and asset management.

»Interact with client groups to determine user requirements, Business rules and goals. Utilized Agile Methodology to configure and develop process, standards, and procedures and create a Business requirement Document (BRD), Technical documents.

»Provided input into the collection of new data sources and the refinement of existing ones to improve analysis and model development.

»Performed POC on customer data by building credit risk prediction models using Logistic Regression & Linear Discriminant Analysis (LDA) algorithm.

»Used Python programming to find outliers and generated a numerical variable analysis report. Achieved 87% accuracy score using Logistics Regression.

»Setup full CI/CD pipelines so that each commit a developer makes will go through standard process of software lifecycle and gets tested well enough before it can make it to the production.

»Added capability to enable data science team to interact with the data residing in AWS Redshift Spectrum and develop their own data insights using Looker.

»Developed framework to perform ETL operations to support the incremental, historical data loads and transformations using SSIS and technologies like: Python, Glue, Lambda etc.

»Perform data analysis on international insurance data set and design dashboards using visualization tools like: Tableau, Looker that assist leaders to take proper decisions.

»Designed YAML and JSON configuration file to make data integration process flexible according to business units.

»Created autosys (Unix script – bash) and scheduled jobs for various tasks.

»Applied Apache Spark and other distributed data processing technologies to process and analyze large-scale financial data, such as transactions, market prices, and customer behavior, using Python, Scala, and SQL.

Tools: Python, Denodo, SQL, Looker, Hive, Informatica, Power BI, Microsoft Excel, CloudWatch, SSRS, SSIS, Amazon EC2, Amazon RDS, Amazon Elastic Load Balancer, Amazon S3, Amazon Glue, Amazon Redshift, ESpatial, Linear Regression, Logistic Regression, GIT, Unix, CA Workload, YAML, JSON, CI/CD Bitbucket.

Essen Medical Associates Sep 2017 – Oct 2019

Role: Data Analyst/Machine Learning Specialist

Client: Essen Medical Associates, Bronx, NY

Essen Health is an integrated healthcare delivery organization that provides high quality, compassionate, and accessible medical care to many of the most under-served residents of New York State.

»Assisted in healthcare data analysis, star schema data modeling and design specific to data warehousing and business intelligence environment.

»Implemented best practices for management of Looker BI application to support providers performance, accounting, data science.

»Setup Continuous integration with bitbucket. Educate developers on how to commit their work and how can they make use of the CI/CD pipelines that are in place.

»Responsible for maintaining quality reference data in PostgreSql & Microsoft SQL by performing operations such as cleaning, transformation and ensuring integrity in the relational environment

»Performed data analysis and design dashboards using visualization tools like: PowerBI, Plotly, SSRS, Excel for multiple departments that assist leaders to take proper decisions

»Managed the team to calculate, analyze claims data and create dashboard for providers incentive and benefit analysis using python, sql, powerbi, Microsoft Excel

»Used python, matplotlib, plotly & powerbi to perform insurance data analysis in order to understand the pattern of patients encounters, frequency analysis & calculate the yield for line of business

»Got involved and facilitated Data and Information exchanges across systems

»Performed geospatial analysis and generated dashboard report using Looker to figure out patients living close to facility center and provide quick & better services.

»Use python programming modules like: - scipy, scikit-learn, pandas, numpy etc. to perform data cleaning, analysis & predictive modeling purpose

»Applied machine learning and statistical methods like: Linear, Logistic Regression, Multivariable Clustering, Time Series Analysis etc. on insurance dataset to predict the revenue for line of business, understand the pattern of fraud insurance claims and predict future claim as fraud or legitimate

Tools: SaaS Data Lake, Python, data modelling, Micorsoft SQL Server 2015, Looker, PostgreSQL, XML, Microsoft Excel, Microsoft Word, Microsoft PowerPoint, SSRS, SSIS, Power BI, Amazon EC2, Amazon RDS, Amazon Elastic Load Balancer, Amazon S3, Amazon Glue, Amazon Redshift, ESpatial, Linear Regression, Logistic Regression, Time Series Analysis, Multivariable Clustering, Predictive Modeling, json, yaml, CI/CD Bitbucket.

ISC Software Pvt. Ltd. Jul 2011 – Jun 2015

Role: Machine Learning / Data Engineer

Client: Core Card Software INC., Atlanta, GA

CoreCard Software is a provider of card management systems and boutique processing services. CoreCard offers an array of account management solutions to support the complex requirements of the evolving global financial services industry.

»Developed API’s to allow college students perform transactions in prepaid domain

»Provided program management and processing services supporting a diverse array of prepaid card solutions

»Developed and implemented custom data validation stored procedures for metadata summarization for the data warehouse tables, for aggregating transactional data, for identifying approved and declined transactions, and for identifying pos terminal information

»Performed linear regression, logistic regression & time series analysis to understand the pattern of customer’s card usage that helped the company to bring new features (Payback feature) in the market

»Used python programming to design API that calculates the loyalty point of customer and provides score in XML format to display response on front end application

»Gained understanding of business architecture and designed business prepaid card solution to specifically assist companies in streamlining employee expense management

»Interacted with the client for system study, requirements gathering and analysis

»Understood application architecture then undertaking requirement and impact analysis

»Determined the missing data, outlier and invalid data and applied appropriate data management techniques

»Implemented partitions on a large dataset as well as index functions using SQL Server resulting in improved performance

»Wrote simple and advanced SQL queries and scripts to create standard reports for senior managers

»Handled database archiving, database replication, database partitioning and resolved deadlock issues in production environment

»Handled performance tuning for production environment after examining the most expensive queries in the environment

»Performed Clustering & Regression (logistic & linear) statistical modeling techniques on transaction data and developed solution to filter out the fraud transactions

Tools: Python, Micorsoft SQL Server 2008, C++, Data Warehousing, Visual Basic, XML, DBBIDE, Asp.Net, WCF, Microsoft Excel, Microsoft Word, Microsoft PowerPoint, Linear Regression, Logistic Regression, Time Series Analysis, Multivariable Clustering, Predictive Modeling, Applied Mathematics.

TECHNICAL CERTIFICATIONS

IBM Badge, Machine Learning with Python

Introduction to R for Data Science

Introduction to Python for Data Science

Big Data Foundations

Hadoop Foundations

SQL Fundamentals

EDUCATION

Master of Science, Data Science with Concentration in Business Analytics, 2017

Saint Peter’s University, Jersey City, New Jersey

Bachelor of Engineering, Computer Science & Engineering, 2011

Radharaman Institute of Technology & Science, Bhopal, India

HONORS & AWARDS

Awarded with February, 2015 Star Performer Award by Core Card Software team for providing optimized & error free software development and data solutions.

Contact this candidate