Resume

Data Scientist Machine Learning

Location:

Chicago, IL

Posted:

February 06, 2024

Contact this candidate

Resume:

Zaman Ali

Data Scientist/Machine Learning Engineer.

Email: ad3fco@r.postjobfree.com

phone: (779) - 901-0323

LinkedIn Link: https://www.linkedin.com/in/zaman-ali-6107a6251/

PROFESSIONAL SUMMARY:

Professional Qualified Data Scientist/Data Analyst with over 7 years of experience in Data science and Analytics including Machine Learning, Data Mining and Statistical Analysis

Involved in the entire data science project life cycle and actively involved in all the phases including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data.

Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means

Implemented Bagging and Boosting to enhance the model performance.

Proficient in Power BI, Tableau, Qlik and R-Shiny data visualization tools to analyze and obtain insights into large datasets and to create visually powerful and actionable interactive reports and dashboards.

Evaluating model performance metrics and refining models for better accuracy.

Access to real-time time series data allows for quick decision-making on the production floor. For instance, data from kiosks or sensors can trigger immediate adjustments in production parameters or workflow to optimize efficiency or address issues promptly.

KBC, Chatbots, Adaptive Supervised Learning (deterministic classification), Unsupervised Learning methods for IE, ANN and DeepNN for NLP and Chatbots, Probabilistic models for NLG and inferences, Decision science. Integrated the OpenAI API into data science pipelines to automate text-related tasks, such as summarization of lengthy documents, creating chatbots, or generating creative content.

Worked with packages like ggplot2 and shiny in R to understand data and developing applications.

Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA.

Knowledge of Information Extraction, NLP algorithms coupled with Deep Learning.

Extensively worked on Python 3.5/2.7 (Numpy, Pandas, Matplotlib, NLTK and Scikit-learn)

Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook 4.X, R 3.0 (ggplot2, Caret, dplyr) and Excel …

Used the version control tools like Git 2.X

Passionate about gleaning insightful information from massive data assets and developing a culture of sound, data-driven decision making

Ability to maintain a fun, casual, professional and productive team atmosphere

Experienced the full software life cycle in SDLC, Agile and Scrum methodologies.

Skilled in Advanced Regression Modeling, Correlation, Multivariate Analysis, Model Building, Business Intelligence tools and application of Statistical Concepts.

Proficient in Predictive Modeling, Data Mining Methods, Factor Analysis, ANOVA, Hypothetical testing, normal distribution and other advanced statistical and econometric techniques.

Developed predictive models using Decision Tree, Random Forest, Naïve Bayes, Logistic Regression, Cluster Analysis, and Neural Networks.

Experienced in Machine Learning and Statistical Analysis with Python Scikit-Learn.

Experienced in Python to manipulate data for data loading and extraction and worked with python libraries like Matplotlib, Numpy, Scipy and Pandas for data analysis.

Worked with complex applications such as R, SAS, Mat lab and SPSS to develop neural network, cluster analysis.

Strong SQL programming skills, with experience in working with functions, packages and triggers.

Experienced in Visual Basic for Applications and VB programming languages to work with developing applications.

Worked with NoSQL Database including Hbase, Cassandra and Mongo DB.

Experienced in Big Data with Hadoop, HDFS, Map Reduce, and Spark.

Experienced in Data Integration Validation and Data Quality controls for ETL process and Data Warehousing using MS Visual Studio SSIS, SSAS, and SSRS.

EDUCATIONAL DETAILS:

Bachelor in electrical engineer from University of Lahore, 2015

PROFESSIONAL EXPERIENCE:

Client: CONA Services

Data Scientist / Machine Learning Engineer Dec 2022 - Till Date Responsibilities:

Gathering, retrieving and organizing data from multiple sources and mapping it to reach meaningful data to use.

Developed a pipeline for collecting data from multiple sources and generating their general stats into reports that give overall understanding of data.

Expertise in training and fine-tuning NLP models beyond conversational AI, like sentiment analysis, text summarization, and language translation.

Experience in working on different Databases/Data warehouses like Teradata, Oracle, AWS Redshift, and Snowflake.

Developing algorithms that utilize time series data to predict equipment failures or anomalies.

Implementing real-time monitoring systems to detect deviations from normal operating conditions and trigger alerts for maintenance or corrective action.

Proficient in utilizing statistical techniques like regression, clustering, and classification to derive insights from data.

Identifying relevant features within the time series data that correlate with manufacturing processes or quality parameters.

Creating new features or transforming existing ones to improve model performance.

Selecting the most informative features for predictive modeling and analysis.

Experience in identifying inefficiencies, optimizing workflows, and ensuring that equipment operates within optimal parameters.

Analyzing historical time series data alongside real-time streams enables manufacturers to identify trends, patterns, and areas for improvement in processes. and facilitates continuous optimization and refinement of manufacturing operations.

competent in using programs like Matplotlib, Tableau, or Power BI to create eye-catching visual representations of complex data that make it easier to understand.

NLP engineer with a profound interest in research and development for cutting edge machine learning techniques.

Designing and implementing time series analysis models (such as ARIMA, LSTM, or Prophet) to extract insights or make predictions.

Testing and validating models using historical data to ensure accuracy and reliability.

Conducted A/B tests and hypothesis testing to evaluate the effectiveness of data-driven solutions.

Capable of building predictive models to forecast trends and outcomes, enhancing decision-making processes.

Developed machine learning models, such as clustering algorithms (e.g., K-means) and collaborative filtering, to personalize marketing strategies and product recommendations.

Developed predictive models in R using machine learning algorithms like Random Forest, Gradient Boosting, or XGBoost for real-world business problems.

Able to create custom Python geo processing tools and scripts inside ArcGIS capable of automating tedious work and improving the effectiveness of data processing in a variety of fields, including market analysis, emergency response, and natural resource management.

Initially the data was stored in Snowflake. Later the data was moved to Azure Data Lake.

Developed Seasonality analysis pipeline which can impute missing values, remove outliers and extract seasonal patterns in the historical data using multiple algorithms like, dart seasonality, Ljung box, spectral analysis, Seasonal decompositions and seasonal index.

Developed multiple time series forecasting models f on retail store data.

Developed anomaly detection algorithms in R to identify unusual patterns or outliers in financial transactions, network traffic, or user behavior.

Developed conventional time series model like, Arma, Arima, Auto Arima and some Deep learning based time series models like neutral forecasts and NBeats etc.

Created a reports for our forecasts which will explain the forecasted results to the non-technical persons.

Worked on Chat Bot Product Management using NLP/NLU and designed roadmap for launch/future phases.

Justify models forecasted results with hypothesis testing, ensuring statistical significance for business application.

Experence in generative AI powered by LLMs to automatically create marketing content, such as social media posts, blog articles, and product descriptions.

Extracted texts from promotions images using multiple OCRs like tesseract, easy our, keras ocr, cognitive AI based ocr and created a tool for completing the missing extraction based on historical data.

Developed a system for analysing the discounts effect on sales and finding the beast discount for having the optimal profits.

Developed Machine Learning algorithms with Spark MLib standalone and Python.

Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.

Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.

Environment: Python 3.9, R Studio, MLib, A/B Test, SQL Server, Hive, Hadoop Cluster, ETL, NumPyPandas, Matplotlib, Plotly, Azure ecosystem, Data Bricks, Pyspark, Power BI, Scikit-Learn, ggplot2, Shiny, Tensor Flow, Teradata, Flask.

Client: Numerator

Data Scientist / Machine Learning Engineer Aug 2021 - Dec 2022 Responsibilities:

Gathering, retrieving and organizing data and using it to reach meaningful conclusions.

Developed a system for collecting data and generating their findings into reports that improved the company.

Analyzing time series data from sensors and machinery, manufacturers can predict equipment failures before they occur. Anomalies or deviations in data patterns might indicate potential issues, enabling proactive maintenance to prevent costly downtime.

Ensuring compatibility and consistency in data formats for seamless integration.

Developing protocols to gather, timestamp, and synchronize data streams from different sources.

Experience in leveraging parallel computing techniques to expedite data analysis and reduce processing times.

Skilled in optimizing algorithms for specific NLP tasks, enhancing accuracy and efficiency.

Proven track record of applying GIS expertise to diverse domain classifications, ranging from environmental conservation, urban development, and public health to infrastructure management and market research, demonstrating versatility in problem-solving and data analysis across industries.

Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension).

Integrated LLMs into chatbot systems to enhance customer support. These chatbots can understand and respond to customer inquiries in a more human-like and contextually relevant manner, resulting in improved customer satisfaction and reduced response times.

Involved in building Data Models and Dimensional Modeling with 3NF, Star and Snowflake schemas for OLAP and Operational data store (ODS) applications.

Migrated on premises enterprise data warehouse to cloud based snowflake Data Warehousing solution and enhanced the data architecture to use snowflake as a single data platform for all analytical purposes.

Setting up the analytics system to provide insights.

Initially the data was stored in Mongo DB. Later the data was moved to Elastic search.

Used Kibana to visualize the data collected from Twitter using Twitter REST APIs.

Developed a multi class, multi label 2-stage classification model to identify depression- related tweets and classify depression- indicative symptoms. Utilized the created model to calculate the severity of depression in a patient using Python, Scikit learn, Weka and Meka.

Conceptualized and created a knowledge graph database of news events extracted from tweets using Java, Virtuoso, Stanford CoreNLP, Apache Jena, and RDF.

Producing and maintaining internal and client-based reports.

Proficiency in handling large datasets using tools like Hadoop, Spark, or SQL databases, enabling efficient data processing.

Creating stories with data that a non-technical team could also understand.

Worked on Descriptive, Diagnostic, Predictive and Prescriptive analytics.

Implementation of Character Recognition using Support vector machine for performance optimization.

Monitored the Data quality and integrity of data was maintained to ensure effective functioning of department.

Implemented Normalization Techniques and build the tables as per the requirements given by the business users.

Machine learning automatically scores user assignment based on few manually scored assignments.

Utilized various techniques like Histogram, bar plot, Pie-Chart, Scatter plot, Box plots to determine the condition of the data.

Researching and developing Predictive Analytic solutions and creating solutions for business needs.

Worked on data processing on very large datasets that handle missing values, creating dummy variables and various noises in data.

Mining large data sets using sophisticated analytical techniques to generate insights and inform business decisions.

Building and testing hypothesis, ensuring statistical significance and building statistical models for business application.

Developed Machine Learning algorithms with Spark MLib standalone and Python.

Design and develop analytics, machine learning models, and visualizations that drive performance and provide insights, from prototyping to production deployment and product recommendation and allocation planning.

Performed data pre-processing tasks like merging, sorting, finding outliers, missing value imputation, data normalization, making it ready for statistical analysis.

Implemented various machine learning models such as regression, classification, Tree based and Ensemble models.

techniques such as K-Means and further processed using Support Vector Regression.

Used classification techniques including Random Forest and Logistic Regression to quantify the likelihood of each user referring.

Accomplished multiple tasks from collecting data to organizing and interpreting statistical information.

Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau, and Power BI.

Environment: Python 3.6.4, R Studio, MLib, Regression, A/B

Test, SQL Server, Hive, Hadoop Cluster, ETL, Tableau, NumPyPandas, Matplotlib, Power BI, Scikit-Learn, ggplot2, Shiny, Tensor Flow, Teradata.

Client: Alteryx, CA Apr2020-Jul 2021

Data Scientist

Responsibilities:

Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.

Performed data analysis by using Hive to retrieve the data from Hadoop cluster, SQL to retrieve data from Redshift.

Worked on Shiny and R application showcasing machine learning for improving the forecast of business.

Proficiency with various data visualization tools like Tableau, Matplotlib/Seaborn in Python, and ggplot2/Rshiny in R to create interactive, dynamic reports, and dashboards.

Explored and analyzed the customer specific features by using Spark SQL.

Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.

Performed data imputation using Scikit-learn package in Python.

Work experience with Cherwell Service Management tool for tickets.

Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn preprocessing.

Used Python 3.X (numpy, Scipy, pandas, Scikit-learn, seaborn) and Spark 2.0 (PySpark, MLib) to develop variety of models and algorithms for analytic purposes.

Used F-Score, AUC/ROC, Confusion Matrix, MAE, and RMSE to evaluate different Model performance.

Designed and implemented recommender systems which utilized Collaborative filtering techniques to recommend course for different customers and deployed to AWS EMR cluster.

Utilized natural language processing (NLP) techniques to Optimized Customer Satisfaction.

Designed rich data visualizations to model data into human-readable form with Tableau and Matplotlib.

Environment: AWS Redshift, EC2, EMR, Hadoop Framework, S3, HDFS, Spark (PySpark, MLib, Spark SQL), Python 3.x (Scikit-Learn/Scipy/Numpy/Pandas/NLTK/Matplotlib/Seaborn), Tableau Desktop (9.x/10.x), Tableau Server (9.x/10.x), Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), NLP, Teradata, Git 2.x, Agile/SCRUM

Client: Resurface Labs, CO Apr 2019 - Mar 2020

Data Scientist

Responsibilities:

Involved in gathering, analyzing and translating business requirements into analytic approaches.

Worked with Machine learning algorithms like Neural network models, Linear Regressions (linear, logistic etc.), SVM's, Decision trees for classification of groups and analyzing most significant variables.

Converted raw data to processed data by merging, finding outliers, errors, trends, missing values and distributions in the data.

Implementing analytics algorithms in Python, R programming languages.

Performed K - means clustering, Regression and Decision Trees in R.

Worked on Na ve Bayes algorithms for Agent Fraud Detection using R.

Performed data analysis, visualization, feature extraction, feature selection, feature engineering using Python.

Generated detailed report after validating the graphs using Python and adjusting the variables to fit the model.

Worked on Clustering and factor analysis for classification of data using machine learning algorithms.

Used Power Map and Power View to represent data very effectively to explain and understand technical and non-technical users.

Created SQL tables with referential integrity and developed advanced queries using stored procedures and functions using SQL server management studio.

Worked with risk analysis, root cause analysis, cluster analysis, correlation and optimization and K-means algorithm for clustering data into groups.

Environment: Python, Jupiter, MATLAB, SSRS, SSIS, SSAS, Mongo DB, Hbase, HDFS, Hive, Pig, SAS, Power Query, Power Pivot, Power Map, Power View, SQL Server, MS Access.

Client: Teradata, CA Jan2018 – Mar 2019

Data Analyst/Data Scientist

Responsibilities:

Gathered, analyzed, documented and translated application requirements into data models and Supports standardization of documentation and the adoption of standards and practices related to data and applications.

Participated in Data Acquisition with Data Engineer team to extract historical and real-time data by using Sqoop, Pig, Flume, Hive, Map Reduce and HDFS.

Wrote user defined functions (UDFs) in Hive to manipulate strings, dates and other data.

Performed Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.

Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.

Performs complex pattern recognition of automotive time series data and forecast demand through the ARMA and ARIMA models and exponential smoothening for multivariate time series data.

Delivered and communicated research results, recommendations, opportunities to the managerial and executive teams, and implemented the techniques for priority projects.

Designed, developed and maintained daily and monthly summary, trending and benchmark reports repository in Tableau Desktop.

Generated complex calculated fields and parameters, toggled and global filters, dynamic sets, groups, actions, custom color palettes, statistical analysis to meet business requirements.

Environment: Machine learning (KNN, Clustering, Regressions, Random Forest, SVM, Ensemble), Linux, Python 2.x (Scikit-Learn/Scipy/Numpy/Pandas), R, Tableau (Desktop 8.x/Server 8.x), Hadoop, Map Reduce, HDFS, Hive, Pig, Hbase, Sqoop, Flume, Oracle 11g, SQL Server 2012.

Client: Algorithmia, WA Jan 2016 – Oct 17

Data Analyst

Responsibilities:

Successfully Completed Junior Data Analyst Internship in Confidential.

Built an Expense Tracker and Zonal Desk.

Identifying inconsistencies, correcting them or escalating the problems to next level.

Assisted in development of interface testing and implementation plans.

Analyzing data for data quality and validation issues.

Analyzing the websites regularly to ensure site traffic and conversion funnels are performing well.

Collaborating with Sales and marketing teams to optimize processes that communicate insights effectively.

Creating and maintaining automated reports using SQL.

Understood all the Hadoop architecture and drove all the meetings

Conducted safety check to make sure that my team is feeling safe for the retrospectives

Aided in data profiling by examining the source data

Extracting features from the given data set and use them to train and evaluate different classifiers that are available in the WEKA tool. Using these features, we differentiate spam messages from legitimate messages.

Conducted safety check to make sure that my team is feeling safe for the retrospectives

Aided in data profiling by examining the source data

Performed data mappings to map the source data to the destination data

Developed Use Case Diagrams to identify the users involved. Created Activity diagrams and Sequence diagrams to depict the process flows.

Environment: Python, Mat lab, Oracle, HTML5, Tableau, MS Excel, Server Services, Informatica Power CenterSQL, Microsoft Test Manager, Adobe Connect, MS Office Suite, LDAP, Hive, Spark, Pig, Oozie.

Contact this candidate