ZALAK SHAH 510-***-****)
*************@*****.***
LinkedIn: https://www.linkedin.com/in/zalakmlengineer/ Location: Fremont, CA
Summary:
• Data Engineer/Data Scientist/ML Engineer - Experienced in the end-to-end business intelligence process from initial requirements gathering to database design to front-end reporting.
• Develop, optimize, and maintain complex SQL queries and stored procedures to support various application and reporting needs.
• Used machine learning to create linear regression, logistic regression and other models.
• Created interactive dashboards in Tableau and shared with others.
• Did exploratory analysis using Python. Studied about statistical analysis, data distributions and histograms. Performed mathematical computations on datasets using Numpy. Data manipulation with Pandas.
• Used machine learning and natural language processing with Scikit Learn. Created data visualizations using Matplotlib.
Education:
• Master of Science in Management Science and Quantitative Methods, 2022 Courses Accomplished - Finance, Python, SQL, Building Business Leaders
• Bachelor of Technology in Information Technology, 2009 Certifications:
• Microsoft Certified DP-900 Azure Data Fundamentals (May 2022)
• https://www.credly.com/badges/59e4229d-139b-415f-8e5f-557b94506f5a/public_url
• Microsoft Certified: Azure AI Fundamentals
• https://www.credly.com/badges/75b6af7b-795b-452c-ad9ed68f1f7eff03?source=linked_in_profile
• Microsoft Certified 70-461 Querying Data using SQL Server 2012
• Microsoft Certified 70-773 Analyzing big data with Microsoft R
• Microsoft Certified 70-768 Developing SQL Data Models
• Various certifications related to Power BI, R, Predictive analytics on LinkedIn. Professional Experience:
Kaiser Permanente, San Francisco Bay Area May 2023 to Present Senior Data Scientist /ML Engineer
Skills: SQL, NLP, Statistics
• Project: Predicting disease based on diagnosis (text)
• Managing small ETL projects and collaborating with data modelers, data architects and junior data analysts on project goals
• Worked on Complex SQL queries to gather data and clean data stored in SQL Server Database using T-SQL.
• Experience of designing data management system and maintaining large etl processes
• Discussed requirements with Business Users and also communicated with domain experts to improve machine learning models constantly
• Used previous data analytics experience and Python‘s pandas, nltk and scikit libraries for the preprocessing steps such as lemmatization, stemming, removing of punctuation marks.
• Developed a few base models such as Naïve Byes, Logistic Regression, Linear Regression, Random Forest, Ada boost Classification etc. using scikit library to compare with the LLM models.
• Created different types of data visualizations using matplotlib to understand monthly data and training file.
• Implemented TF-IDF, Bag of words type of word embedding techniques.
• Worked extensively on ngram and preprocessing of text. Cleaned text by removing stop words and punctuation marks.Used stemming and lemmatization.
• Plotted various charts.Graphically, looked into various ngrams to see top N words.
• Extensive use of Pandas library to manipulate data. Familiar with different operations on DataFrames.
• Implemented BookGPT which can take any book as an input and can answer questions. Developed semantic search using Python and universal sentence encoding embedding technique. Walmart, San Francisco Bay Area Apr 2022 to May 2022 Data Scientist
Skills: OpenAI, Prompt engineering
• Project: Physical Security In Stores
• Used web scrapping techniques to gather information from various online sources using Python
• Cleaned and Analyzed data using Data Wrangler
• Implemented Open AI and prompt engineering solution to get effective entities from the data Meta, San Francisco Bay Area Oct 2022 to Apr 2023
Data Scientist / Senior Data Engineer
Environment: Tableau, Python, Hive, HQL
• Project: Customer Relationship Management Analytics. Lead whole project from the beginning communicated with Business users on what type of fields they should have in application for their project.
• Experience of managing small data engineering projects.
• Implemented Data warehouse Design and made sure it covered all the requirements given by Business Users and on top of that what type of analysis would be better for them in bringing more customers and so Business.
• Created more than 50 metrics and developed them using Python,SQL and Tableau. Few of the metrics were related to Descriptive Analytics and few of them were related to Predictive and Prescriptive Analytics.
• Experience working on Finance Domain project by writing effective SQL Queries using advance window functions
Alept Consulting Dec 2017 to Nov 2019
Data Scientist
Environment: SQL, Azure Synapse, Statistics, machine learning PowerBI, Python
• Created a modern cloud data warehouse solution for an online retail store named Sales store using Microsoft azure synapse analytics and implemented useful visualizations using PowerBI.
• Tools used Azure Synapse Analytics and POWERBI in general
• Created Sales Data warehouse using premise data and live data.
• Created azure synapse workspace.
• Created dedicated SQL pool (Azure SQL Data warehouse)
• Loaded CSV files in Azure Blob Storage
• Used Copy tool/pipelines to transfer all data from Blob Storage to dedicated SQL pool.
• Created an eventhub workspace and stream analytics job by providing input as eventhub and output as SQL dedicated pool.
• Execute stream analytics job and then transferred data from source to destination.
• Prepared PowerBI visualizations such as total sales by country, total sales by city, profit ratio, sales by category, top 5 cities by sales, sales trend of total sales and profit by connecting to Synapse.
• Implemented time series analysis using Python to calculate sales over the time and implemented Correlation and Linear Regression between different variables.
• Used Apache Airflow to create data engineering pipelines. Cygnet Infotech Pvt. Ltd. Dec 2015 to Nov 2017
SQL Developer/Database Designer
• Environment: MSBI (SSIS, SSRS, SSAS) + POWERBI, Tableau, ETL,
• Product Name: Icarus (Real Time Transportation Management System) https://www.crunchbase.com/organization/cloud-amber-limited
• Cloud Amber, based in Bletchley Park, Milton Keynes, has a product suite which is focused on strategic highway control, local transport network management, public transport information and fleet management systems. It currently services over 40 local authorities and transport operators in the UK, with an increasing geographic spread through partner sales in Australia and current partnership opportunities in Malaysia.
• Tools used Business Intelligence Development Studio 2012 and Sql Server 2012 in general
• Trained 20+ professionals of Europe and UK on how to use Report Builder 3.0 effectively to get answers to analytical questions and provided data dictionary to users to create Adhoc reports.
• Implemented various visualizations and charts such as total number of flights for each origin state, which airline has the most flights coming out of state, how much distance specific airline travelled, flight with the most miles in all quarters using Tableau.
• Proactively engaged in assisting other team members with additional Microsoft Business Intelligence projects as needed. Involved in the end-to-end business intelligence process from initial requirements gathering to warehouse architecture to front-end reporting using Microsoft Business Intelligence (MSBI) tools such as SSAS, SSRS and SSIS.
• Created more than 50 various types of reports (Timekeeping Analysis, Compliance Analysis, Compliance Summary, Headway measurement report, Journey matching report, Operator Journey report, Time Match Analysis & Fleet management reports) like parameterized reports, drill down reports, sub reports, linked reports using SSRS2012 and customized MDX.
• Created multi-dimensional (facts and dimensions) data warehouse model and made schema changes in related databases on BI server (Sql Server Management Studio 2012). Implemented Factless fact for one of the requirements.
• Involved in creating extraction of data from source (OLTP) and loading data into data warehouse (OLAP) using ETL(SSIS).
• Deployed and processed cube with newly added dimensions and measures.
• Created PowerBI dashboard visualizations for number of buses were late yesterday and for the time they were late, and it was categorized as per the UK/Europe government's transportation rules by connecting to multidimensional model.
• Created a modern cloud data warehouse solution for HomeDepot’s vendors using POWERBI.
• Designed various reports such as executive summary, account, KPI report, top 10 sku (dollars/units)
• Implemented rotating 52 fiscal weeks line chart to show time series analysis data and various visualizations. Radixweb
SQL Developer May 2014 to Dec 2015
Product Name - FastTrack-360, https://www.fasttrack.com.au/products/fasttrack360-the-platform/
• FastTrack-360 is a web-based job portal application, the purpose of built is to meet the business needs of performance centric organization.
• As a result, business critical recruitment, pay and bill functions can be coordinated and managed from one intelligence software solution, Implemented full text search.
• FastTrack360 is an end-to-end recruitment management system to help you find the best candidates to fill a job, capture the time they work, pay them and bill clients.
• To create database design from SRS, create database objects like stored procedures, views, functions, tables using SQL Server 2012, Prepare scripts by comparing production database with development database.
• To make performance related changes into database objects like stored procedures and views.