Zahra Asiaee, M. S c.
Mobile : +1-416-***-****
E-mail: **.******@*****.***
LinkedIn: https://www.linkedin.com/in/asiaee/
Toronto, ON
PROFILE:
Data enthusiast with strong mathematical background and 10+ years of experience in using predictive modeling, data processing, and data mining algorithms to solve challenging business problems. Designing and developing real-time and batch BI/ETL solutions and advanced data services. A high level of knowledge of Data Modeling, Warehousing, Analytics, BI Tools. Characterized as an ambitious, dedicated, highly self-starter resourceful person. Skilled in analytical thinking, analyzing systems, finding solutions, interacting effectively with cross-functional teams, vendors & customers and meeting customer expectations.
HIGHLIGHTS OF STRENGTHS:
Experience building serious data pipelines ingesting and transforming billions rows of data per day/week/month.
Develop python code with Airflow/Pyspark to programmatically author, schedule and monitor workflows.
Passion for producing clean, maintainable and testable code part of real-time data pipeline.
Provide visualization and analysis for more than 20 billions rows of data that are merged from different data sources.
Completely familiar with AWS, GCP, Azure Platform.
Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse.
Experience in Developing Spark applications using Spark - SQL in Databricks for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
Proficiency in using PySparkSQL, NoSql, SQL server studio,PostgreSQL to manipulate data query.
Excellent understanding of machine learning techniques and algorithms such as RNN, LSTM, Seq2Seq, Decision trees, Xgboost, Random forest, Logistic Regression, Ordinary Least Squares Regression, Clustering, Time Series, Naive Bayes,Support Vector Machines, Association Rules, Ensemble Methods, Naive Bayes Classifier, Principal Component Analysis, Singular Value Decomposition.
Hands-on experience in software: Python (Panda, Numpy, Matplotlib, PySpark,SparkSQL, Tensorflow, Keras,SciPy,scikit-learn, PyTorch), R(CARET, ggplot2, C50, dplyr, stringr, R Shiny), Tableau, Power BI,Elasticsearch/Kibana, SAS, SPSS.
Experience in NLP for Information Retrieval, Sentiment Analysis, Text Summarization, Spam Filtering, Generative and Retrieval-based agents (aka ChatBot).
Develop Q-A system with using tf-idf, extract linguistic features like part-of-speech tags, dependency labels and named entities, Mapping Word Embeddings with Machine learning and deep learning algorithms like Naive Bayes classifier, RNN, LSTM, Seq2Seq with Attention mechanism with Python.
Teaching and applying mathematics and advanced statistics models at university level.
Ability to prioritize effectively to accomplish goals while maintaining a professional manner and staying calm under pressure.
Excellent Problem solving, analyzing systems and finding solutions for complex problems. Zahra Asiaee Page 1 of 6
EXPERIENCE
TD Bank
Canada, Toronto
Business Intelligence Developer 04/2021- Now
Member of TD automonitering Team to support and meet the requirement set by industry standard and regulatory bodies.
● Development with hadoop/Pyspark and automation of pipeline to fulfill The requirement set by The Investment Industry Regulatory Organization of Canada (IIROC).
● Analyzing, profiling and pattern recognition of the extracted data.
● Presenting the result to management of various internal TD customers/organization and release of the approved result and presenting in dashboard to C-level
● Developing logic to recognize exemption with NLP/ML Apple inc. 10/2020- 03/2021
USA, California (CA)
Data engineer
As a member of the Data engineer team, I was responsible for design and build of scripts and tools that perform automated migrations, transformations, aggregations, data analysis, and other augmentations on large sets in a spark-based AWS environment (Databricks, Glue, S3, Redshift, Athena, Sage maker).
● Designed and Developed data pipelines with Airflow/PySpark to migrate data sources to Databricks and PostgreSQL
● Developed scripts with Python for the testing pipeline.
● Used modern version control system git/gitflow on GitHub
● Recognize abnormality and categorize data with ML algorithms in Sage maker.
● Extracted, compiled, tracked, and analyzed data to generate reports and dashboards with Python APIs (Plotly/Dash) R (Shiny) and BI tools (ReDah)
● Provided insight and analysis of AWS Cost and Usage Reports.(ReDash)
● Develop a chatbot on Lex(AWS) connected to Slack to handle technical questions.
● Work in an Agile (Scrum) high performing team.
Vividata 06/2019- 09/2020
Canada, Toronto
Director of Data Science/engineer
Vividata is the leader in Canadian cross-media and consumer research, providing essential consumer intelligence to a wide range of marketers & media agencies in Canada and around the world. I was Responsible for management of a team of data engineers and delivery of projects in the following Project: Passive Digital Measurement: Transformation of the digital activities of panelists from App, Web, Shop, Streaming Video (approx. TB of Data per week) into new behavioral insights. Following activities were performed and resulted in $200k of budget savings for the company. Zahra Asiaee Page 2 of 6
● Developed python code for merging, joining and concatenating data from different resources, and loaded data into a warehouse on Azure.
● Built different ML/AI models to innovate different products from data with Azure Platform.
● Provided more than 200 Dashboard with PowerBi
Survey of Canadian Consumer (SCC): (Survey of the Canadian Consumer is a comprehensive study that surveys 35,000+ Canadians 365 days a year against 60,000+ variables) 4 times per year.
● Analyzed and developed behavioral, consumptional and segmentational models with statistical approach combined with machine learning and deep learning algorithms in Python, R, PySpark,SparkSQL, and provide comprehensive, neutral, strategic insights.
https://members.vividata.ca/shop/
● Merged Sport data with SCC with ML techniques and provide different report with R (CARET, ggplot2, C50, dplyr, stringr, R Shiny)
Covid19_timeseries_analysis : The project scope was data collection and analysis of impacts of Covid 19 on different aspects of consumer & end user behavior and providing biweekly reports to clients in the different areas e.g. Shopping, Online shopping, Educational site.(PySpark,SparkSQL, PowerBI) . Sample report can be downloadable through the below link : The Online Shopping Habits of Canadians During ... - Vividata Client Projects: client projects: Applied dimensionality reduction techniques and unsupervised/supervised machine learning in Pyspark and R to identify shopping behaviors and doing market segments in following domains :
● Clustering Geofencing dataset : The project scope was based on clustering geofencing data for predefined polygons throughout GTA . The goal was to help advertisement companies expose people to relevant ads. (PySpark, Esri/ArcGIS)
● Clustering for Media Agency Company: Media Agency has targeted Canadian consumers based on their music streaming habits. As part of its go-to-market strategy, the company has introduced podcast listener targeting, which allows advertisers to reach their users who listen to podcasts. As a result, five market segments were identified by applying PCA and machine learning algorithms in PySpark,SparkSQL.
MobileLive
Canada, Toronto 2017 - 2019
Data Scientist/Engineer
Responsible for solution development and delivery of internal projects (e.g. ChatBot) and external projects (e.g. Churn prediction, Recommendation systems) serving midsize to large clients. Providing following services:
● Analyzed the scope of the project and supported the sales department by providing a high level solution, time and budget estimate.
● Created balance between offering quick solutions and writing maintainable code. Zahra Asiaee Page 3 of 6
● Developed and delivered several projects to support clients with analysis and prediction of client’s data to understand the market share trend and customer retainability utilizing different machine learning and deep learning techniques.
● Created customized reports in MicroStrategy and Power BI for data visualization
● Worked in a devops and high performing team.
● Churn Prediction project within Logistics management industry: Used SQL queries to extract information from SQL server of client and applied all ML algorithms for achieving higher accuracy (90% Accuracy) in Python to predict Churn of Customers (In R and Python) and report results with PowerBI.
● Recommendation system project within Telecom industry: Extract data from hadoop ecosystem with hive and pig queries and Design Recommendation system in PySpark for prediction which client willing to add another line and report result with MicroStrategy.
● Project (ChatBot) : In the following steps:
● Create a training dataset.
■ Implementation of Scraping in Wikipedia to create dataset (with Beautiful Soup)
● Investigating a novel end-to-end method for automatically generating Question -Answer( Q_A) systems that generate meaningful and coherent dialogue.
■ Applied deep learning algorithms (RNN, LSTM, CNN, Seq2Seq models) in Python with Keras, Spacy, PyTorch and Tensorflow for generative agents.
■ Using all methods for embedding (Word2Vec, CBOW, SIGNS and Gloves)
● Retrieval-based agent for (Sentence matching, Sentiment Analysis, POS tagging, classification of sentences) with ML and deep learning Algorithms.
■ Got 88% accuracy for Name entity recognition and classification of sentences using Word_embedding with Glove algorithm BiLSTM
■ Got 99% accuracy for matching question with answer with tf-idf
■ Got 98% accuracy with Naive Bayes Classifier
■ Multi-Class Text Classification with CNN
■ Find similarity between documents with Doc2Vec
■ Deploy a chatbot with Dialog Flow, Node.je, Heroku.
Teach chatbot any kind of dialogue.
Extract data from the conversation and store it into a database.
Call a third party API and use it.
Shams University, Gonbad-e Qabus (http://nbshams-gonbad.ac.ir). 2011- 2015 Data Scientist/Engineer
As a data Scientist/Engineer consultant, the team supported mid to large size industries and research institutes with analytical advisory services and delivery of projects. o Provided solution as a full cycle process (i.e. Converted business problems into analytical solutions)
o Developed data pipelines from multi source databases and build real time dashboard solution for metric analysis
Zahra Asiaee Page 4 of 6
o Ensured accurate and consistent statistical analysis by meticulously going through the data and validating results.
o Provided status & progress report to client and internal stakeholders. Example of Projects
● Joint Industry-University project, Sentiment analysis of Reviews in ecommerce industry: Data sentiment analysis of comments left by customers for products and services using Hive, Pig, Spark As part of the Big Data Team, we were responsible for the actual programming of Hadoop applications for analysis of the project.
o Loading and transforming large sets of structured, semi structured and unstructured data. o Importing and exporting data from HDFS into Hive. o Developed Hive queries for Analysis.
● Client project (Medical industry): Diagnosis of Diabetes patient and whether they are Benign or Malignant (Increase the accuracy of prediction by using different kind of machine learning techniques
(Neural Networks, Support Vector Machines, Bagging,Boosting, Gradient Boosting Tree)(In R)
● Client project (Bank) : Credit card fraud detection with Skewed data recognize outliers and develop a final model for determine fraud trends by analyzing accounts and transaction patterns based on testing different algorithms (LSTM/RNN/Logistic regression/SVMs/Decision trees In Python) University lecturer 08/2011 – 05/2014
Planned and delivered courses to undergraduate and graduate level students. o Prepared course materials, case studies, PowerPoint presentation to communicate complex topics in statistics & probability and Mathematics to the students. o Implemented data modeling techniques in research projects of Economic, Commerce & MBA departments.
o Supervised research post-graduate students in implementing quantitative methods & interpreting their result
o Lectured University courses in the following areas: o Applied Mathematics
Basic and advanced Analysis with R
Data mining techniques
Statistics & Probability (1),(2)
EDUCATION
Ryerson University Toronto, Canada
Certification in Data Analytics, Big Data, and Predictive Analytics 2016 1. Data Analytics: Basic Methods (A+)
2. Introduction to Machine learning and Big Data (pass A+) 3. Data Analytics: Advanced Methods (pass A+)
4. Big Data Analytics Tools (Hive, Pig, Spark) (pass A-) 5. Data Organization for Data Analysts (SQL, MySQl) (pass B+) 6. Capstone Course (pass B)
Damghan University Damghan, Iran
Master of Science in Applied Mathematics (Data Analysis) 2009-2011
Dissertation: Frames in finite dimensional spaces
Zahra Asiaee Page 5 of 6
Ferdowsi University Mashhad, Iran
B.Sc. in Applied Mathematics 2002-2006
CERTIFICATION
University of Michigan, “Coursera”
Applied Data Science with Python (6 courses)
DataCamp Completed 2020-2021
1. Introduction to PySpark
2. Cleaning Data with PySpark
3. Big Data Fundamentals with PySpark
4. Introduction to Data Visualization with Plotly in Python 5. Intermediate SQl
6. Joining data in SQL
7. Introduction to SQL
Udemy
8. How to create ChatBot with deep learning and NLP Completed 2018 9. Power BI Completed 2018
Deeplearning.ai, coursera
1. ” Sequence Models Grade: A+ 2018
2. Introduction to Recommender Systems: Non-Personalized and Content-Based 2018 Zahra Asiaee Page 6 of 6