Data Analyst

Location:

Posted:

January 22, 2021

Resume:

ZAINAB MURTAZA Springfield, Illinois, 627**-***-*** 2754 Email GitHub LinkedIn

SUMMARY: Graduated December (2020) with a Masters degree in Computer Science from the University of Illinois. I have extensive hands-on experience with data curation, wrangling and visualization along with statistical analysis, machine & deep learning frameworks. Sound understanding of cloud platforms such as AWS and Azure.

EDUCATION:

Univeristy of Illinois – Master of Science; Computer Science – CGPA 4.0/4.0 Graduated December 2020

Coursework: Artificial Neural Networks, Machine Learning, Deep Learning, Data Mining, Data Visualization, NoSQL Databases.

SZABIST – Bachelor of Science; Computer Science Graduated September 2017

Coursework: Artificial Intelligence, Compiler Construction, Discrete Mathematical & Computational Structures, Numerical & Symbolic Computation, Physics & Electronics, Statistics & Probability, Web Engineering.

SKILLS:

Programming Languages: R (mlr, randomForest, caret,glmnet, glmnet, e1071), Python (tensorflow, theano, pandas, scikit-learn, nltk, matplotlib), Scala, SQL, PostgreSQL, Java.

NoSQL Databases: MongoDB, CouchDB, Cassandra, HBase, Neo4j, Redis, Riak, Bigtable.

Frameworks/Technologies: Hadoop, Apache Spark, Apache Kafka, Snowflake, Athena, Redshift, Mahout, Flume, Matlab, Tableau, Hive, SAS, AWS, Azure, Kubernetes, Docker, Kibana.

EXPERIENCE:

Research Interviewer at the Survey Research Office – Illinois Department of Public Health, Illinois, USA Aug 2020-Dec 2020

Appraised Behavioral Risk Factor Factor Surveillance System (BRFSS) surveys for the Illinois Department of Public Health,

Administered survey data from WinCATI softwares into QualtricsXM to produce insights for BRFSS research,

Interacted with survey respondents via WinCATI softwares to requisition data for future BRFSS reports.

Student Assistant at the Computer Science Department – University of Illinois, Illinois, USA Oct 2019-May 2020

Provided ubiquitous technical and administrative support for faculty and students at the Computer Sciences Department,

Assessed prospective student data, troubleshoot degree audit reports and registrations for commencing semesters, processed undergraduate/graduate degree closure petitions,

Wrangled confidential student data dating to the past decade in an ERP system, Banner, to predict future prospective student demographics.

PROJECTS:

Urban Audio 8K Data; Neural Network Classification – Python (Tensorflow) Nov 2020-Dec 2020

Introduced a feedforward, convolutional and recurrent neural network for multi-class classification of raw aural files,

Used 10-Fold Cross Validation to train, evaluate and predict accuracies of the three networks.

CIFAR-10 Image Classification; – Python (Tensorflow) Oct 2020-Nov 2020

Used DenseNet121 architecture, Transfer Learning & Data Augmentation to optimize classification metrics of Intel Image Dataset,

Model was improved with learning rate scheduling, stochastic gradient descent with momentum optimizer & batchnormalization.

Uber Trip Dataset (NYC); K-Means Cluster Analysis – Apache Spark Mar 2020-April 2020

Built a real-time example for analysis and monitoring of routed GPS data via K-Means Clustering in Apache Spark. Used the dataset in LibSVM format to to handle features, training, prediction and evaluation for clustering.

Titanic Dataset; Random Forest Classifier – Apache Spark Feb 2020-Mar 2020

Performed data imputation, indexed categorical features by encoding values & used this to form feature vectors for training & testing,

Calculated area under ROC and area under Precision-Recall curves in a Binary Classification Evaluator class for model evaluation,

Computed quantiles to perform unsupervised anomaly and outlier detection.

Instacart Online Shopping Dataset; Frequent Pattern Mining – PySpark, Apache Spark Feb 2020-Mar 2020

Conducted market basket analysis and association rule mining using Spark’s FPGrowth algorithm,

Performed co-occurrence analysis to demonstrate a models ability indicative to consumer purchase probability,

To gauge probability of a customers inclination for purchases; computed & visualized antecedent, consequent & confidence values.

Google Stock Data; MapReduce Framework – Riak Feb 2020-Mar 2020

Used MapReduce and link walking to test Riak’s querying capabilities with Google’s stock data.

US Yellow Pages Data; MapReduce, B-Trees and Geo-Spatial Indexing – MongoDB Feb 2020-Feb 2020

Used JavaScript functions to populate collections and to increase query performance, used B-Tree (range) and geospatial indexing,

Used query-by-code functions, references, directives, ad-hoc queries such as Perl-compatible regular expressions & range operators.

YTK Twitter Dataset; Compressing Data via Algorithms – HBase, Hive, Spark Jan 2020-Feb 2020

Retrieved data from HBase using HiveQL and evaluated the effectiveness of compression tehniques in HBase; used Ruby Hash to change compression configuration with Snappy Compression Algorithm.

Ames Housing Dataset; Ensemble Methods – R (glmnet, caret) Jan 2020-Feb 2020

Used Lasso, Ridge, Elastic Net Linear Regression models to predict housing sale prices with a random forest model and gradient boosted trees on training data & performed 10-Fold Cross Validation to tune the lambda parameter.

OneHotEncoderEstimator & MinMaxScaler – Apache Spark Jan 2020-Feb 2020

Transformed categorical data into binary sparse vectors via OneHotEncoderEstimator & MinMaxScaler,

Normalized and scaled numerical data; used z-score techniques, center, decimal, & unitSum.

Decision Tree Classification & Pipelining – Apache Spark Jan 2020-Feb 2020

Evaluated the performance of a trained model with MulticlassClassificationEvaluator,

Used pipeline in Spark for simplifying training processes; coded three evaluation metrics: weighted precision, recall, & F-1 measure.

Facebook Comments Dataset; Sentiment Analysis – Apache Cassandra Feb 2020-Feb 2020

Used Cassandra Query Shell Language, cqlsh to perform text analysis with filtering and secondary indexes,

Explicitly assigned TTLs to obsolete data to render it as Tombstones after expiration.

Contact this candidate