Phone# +92-33-666*-****
E-mail – *****.*****@*****.***
Linkedin: https://www.linkedin.com/in/ahsan400/
Located – Islamabad, Pakistan
Ahsan Saeed
Summary
Currently working as a Data Scientist for Teradata Corporation, Global Delivery Center. Previously, worked for IBM Saudi Arabia where planned, designed, and implemented software solutions. Also, working as a researcher remotely part-time at Brown University where involved in analyzing how deep learning models can be trained when the data is scarce. Going further back, worked for Huawei Technologies as a Systems Analyst/Analyst Programmer. Machine Learning
Worked in
• Data Science and ML (including Deep Learning) based Analytical solution development for the past 5 years
• Statistical Inference and hypothesis testing to gather key insights from the data
• ML branches of Reinforcement Learning (Q-learning & MDPs), Supervised Learning (such as SVMs, Logistic/Linear Regression, and Ensemble Methods), and Unsupervised Learning (such as K-means, and soft-clustering methods such as Expectation Maximization)
• Histogram Analytics to optimize manufacturing events of sequential and parallel nature
• Worked for 6 months in Weka, Hadoop, and Spark tools
• Worked for 5 years in Python, R, and SQL to build ADS, perform EDA, and build models
• Spark SQL, HiveSQL, Pig, Numpy, and Pandas for data preprocessing/manipulation and dimensionality reduction
• R (GGPlot2) and Python (Matplotlib, Seaborn) for Data Visualization
• Bayesian Filters and Monte Carlo methods such as KALMAN filters, and Particle Filters
• Tools such as Jupyter & Zeppelin notebooks, Docker & Vagrant, github, and R Shiny apps Professional Software Engineering Experience Summary
• Strong record of delivering software solutions as a solution architect
• Over 14 years of experience in high-level to low-level implementation & architecture details of software systems in algorithmic detail.
• Worked in C/C++ for 7 years as a Software Developer and Support Engineer
• Worked in Oracle 9i, 10g, and 11g, SQL Server, and PostgreSQL for around 10 years, primarily query and procedures writing
• Worked extensively on various Unix platforms (scripting), such as Linux (Ubuntu, Red-Hat), AIX, Sun, and HP
• Worked extensively in customer relationship management software products, especially Siebel/Oracle CRM, and gained first-hand experience of accepting software from customers’ perspective
• Volunteered to take on added role of System Administrator beside working as a Functional Consultant for CRM system at Saudi Telecom
• Worked in a Telecom Operations setting to automate routine tasks using Python and/or Linux/Unix scripts, and scheduled them as Cron jobs Education
• SAS Certified Statistical Business Analyst (2018)
• Masters in Computer Science, Georgia Institute of Technology (GPA: 3.80/4.0), Concentration: Machine Learning/Data Science 2014-2017
• Bachelors in Science. Computer Science, Mathematics (Linear Algebra, Calculus, Numerical Analysis), and Statistics.1998-2003 May 2017 – Jan 2019
Position: Researcher
Worked as a part-time researcher in deep learning. We used Tensorflow for our experiments and tried to explore the possibility of introducing other architectural biases (beyond convolutions) into Neural networks (specifically CNNs) to help them learn in richer representational problems that require more relational-type reasoning.
We understand that many relational type reasoning problems take significantly longer than other problems for CNN to learn. We are trying to understand and help improve CNN’s on these problems.
02Nov2017- To date Teradata
Position: Data Scientist
Working as a Data Scientist, and currently working in R and SAS to build predictive models. Following is a brief description of my recent work: Suzuki Japan
• Working to detect anomalous points (or combinations) from wheels RPM data for a vehicle. Our focus has been to use multivariate anomaly detection techniques such as Mahalanobis distance to classify points that are farther away from the whole data distribution. Another question we are answering is which other factors contribute to the occurrence of such far away points. National Australia Bank
• Working in multi-channel attribution using data-driven models, such as Markov Chain &Shapely. The aim is to attribute share of marketing channels (such as TV, RADIO, FB etc.) in order to fairly distribute the spend. Models work on transformed population exposure to the marketing ads.
Apple Inc. Projects
• Apple SP & O - this project is for Apple Inc. Adstock and to calculate half-lives for it. We are working to build an analytic framework that could help us build an ADS (analytic dataset) using views and run analytics using the framework. Technically, we calculate the half-lives and optimize half-life alphas using monte-carlo simulation.
• AppleCare Signal Detection – It is intended to highlight the subset combination of device owner/device attributes which identify scenarios that result in a disproportionately high volume of customer support cases for Apple ID related support. The project will allow demonstration of signal detection on the "minimum viable" dataset that shows the most value for the least amount of effort (to fit within the time available). Following analytics are delivered as part of the solution: o Exploratory analysis to understand data
o Pareto Analysis to determine most important features. As an alternative CHAID was also tried. o Binomial Z Test to compare multiple AppleCare complaint groups o SPC/XMR control chart to monitor issues over time. Weibull distribution was used an alternative. o Kolmogorov-Smirnov (KS) distribution as a way to calculate distance, as a measure of statistical testing
• Automobile Sensor Failure - It’s a front-end app developed to collect and transform vehicle sensors dataset. Once ready, the dataset is used to build a deep learning, feed forward network model that predicts anomalies in measurements of a vehicle. The dealer can run this analysis well in advance and be ready for the customer.
• Customer Churn Prediction - It’s a prediction accelerator front-end app that facilitates a user run initial data pre- processing steps on any arbitrary dataset, extract important features, remove multi-collinearity, and build a base-line model.
• Stocks Prediction - It’s a front-end app developed to predict stock market actions BUY, SELL, or Do Nothing. The app works by facilitating load of closing prices of an organization and builds a state space for Q Learning, which gives an optimum policy that when followed would maximize the revenue. State space is built using various indicators such as MACD, RSI, Bollinger Bands etc. The app also handles scaling of these indicators.
• Plant Disease Prediction - It's a deep transfer learning-based app that uses a pre-trained model (Inception V3) and train it on plant leaves of over 38 categories. The classifier handles healthy vs unhealthy plants and further points out the type of disease for unhealthy leaves. The model can be scaled up to include more categories. We are accommodating seasonality, weather patterns, and location in our predictions.
• Histogram Analytics - Sequential histogram operations, such as Parallel Combinations & Serial Combinations, were used to estimate optimum likelihood to optimize supply & demand. Feature preparation using statistical techniques of finding important features, such as pairwise correlation, chi-square tests, and CART models
• Working with Telecom customers to build Churn prediction models using SVM, Random Forrest, and GBM
• Working in R libraries of h2o, SparklyR, with Hive dataset to scale the solution using Big Data technologies Fall 2014- Spring 2017 Georgia Institute of Technology Machine Learning & Data Science Projects
• Built an AI agent in Python to solve a Raven’s Progressive Matrices using Image/Pattern recognition
• Explored Random Search using 4 algorithms: randomized hill climbing, genetic algorithm, simulated annealing
• Analyzed decision trees, neural networks, KNN, kernel-based methods such SVMS, and ensemble methods such as Boosting & Bagging on Letter recognition & Chess datasets
• K-means, Expectation Maximization & various dimensionality reduction algorithms such as PCA, ICA, Randomized Projections and Info-Gain were used to preprocess a dataset
• Applied TD methods on a random walk experiment to replicate the results published by Sutton in his 1988 paper in order to learn best performing range of λ values
• Applied Rule-Based and Machine Learning based strategies to the stock of IBM and generated market orders. Then, evaluated performance against a market simulator. My strategy gained 60+%. Part of our exploratory analysis focused on tweaking rewards for Q-learning agent to get an optimal policy
• Developed a model for Netflix Movie revenue predictor – from data preprocessing, to training a Logistic & Linear Regression model in R. Also, used Non-linear transformations by including interaction terms to fit a linear classifier in R
• Worked on predicting future certain locations for a “hex-boson” particle using Particle & Kalman Filters Big Data for Health Informatics
• Performed Extract, Transform, Load (ETL) on electronic health records using Pig, built a Logistic Regression classifier in Python, and trained multiple classifiers with Hadoop in parallel
• Implemented unsupervised phenotyping with k-means clustering, (Gaussian Mixture Model and Non-negative matrix factorization) and evaluated the quality of clustering using purity, RI, and ROC
• Implemented Jaccard coefficient to compute similarities among objects and Random walk with restart to compute patient similarity in Spark GraphX using Scala
• Text notes decomposition using Term Frequency/Inverse Document Frequency algorithms to rank the importance of words
• Used Latent Dirichlet Allocation (LDA) to observe the data and find hidden structures to find topics in hospital notes and used these topics as the feature-set to make predictions. We trained an SVM for prediction.
• Used MIMIC-III dataset for patient records. The dataset contains records for more than 40,000 hospital patients. Our job was to analyze the hospital notes manually taken by the doctors and other staff during a patient’s stay at or before their admittance to ICU and find/predict mortality. There were approximately 3 million hospital notes.
• Used Spark and Hadoop platforms on AWS (EC2 & S3) instances to get required computing power for analyses. (The paper can be shared if required.)
Oct2012- September 2017 IBM Saudi Arabia
Position: Senior Telecom Consultant/Solution Architect Outsourced to Saudi Telecom on KAFD-ITCC, a smart city under-development – I was tasked with accepting, moving to production, and maintaining a telecom billing solution
Responsibilities
• Customer representative for Telecom Billing and CRM solution acceptance
• Project main-drop and incremental release migrations to production
• Product presenter; solution design & programming technical consultant; CRM module owner
• Working closely with Marketing and Sales department for CRM requirement gathering and solution analysis/design
• Planning and execution of product administration and system scalability, load-balancing, and fault-tolerance activities
• Worked heavily in CRM system for end-to-end Telecom solution use case testing 2006-Oct2012 Huawei Pakistan
Position: Systems Analyst
Worked in C++ software development and support consultancy. Travelled to more than 5 countries in Middle east and Africa to provide software consultancy, technical support, and handle product deployment activities. Responsibilities
• Software Requirement Analysis and Design
• Travelling on-site to customer premises in different countries and handle deployment activities
• Senior Product Support Consultant
• C language Client/Server, Socket programming
• C language Inter-process communication and multi-threaded design and programming for Call Detail Record processing
• C language RPC client/server for image compression, and proxy server with caching
• Worked extensively in writing PLSQL queries and procedures