Post Job Free
Sign in

Data Assistant

Location:
Newark, NJ
Posted:
April 03, 2020

Contact this candidate

Resume:

Hassan Ali Khan

Greater New York Area +1-917-***-**** *****************@*****.*** GitHub: hassanalikhan1 LinkedIn: hassanalikhan1 EDUCATION

Master of Science - Data Science Sep 18 – Dec 19

New Jersey Institute of Technology (NJIT) Newark, New Jersey

- Machine Learning, Deep Learning, Applied Statistics, Big Data, Data Mining, Data Analytics, R programming Bachelor of Science - Computer Science Sep 14 – May 18 Lahore University of Management Sciences (LUMS) Lahore, Pakistan

- Advanced Programming, Software Engineering, Data Structures and Algorithms EXPERIENCE

Research Assistant Jan 19 – Dec 19

New Jersey Institute of Technology Newark, New Jersey

- Hadoop Developer: Facilitated insightful daily analysis of around 20 GB of website data collected from external sources. Developed MapReduce programs to parse the raw data, populate staging tables and store the refined tables in the partitioned tables in the EDW. Tested raw data and executed performance scripts.

- Pneumonia from X-ray Images: Build an algorithm to automatically identify whether a patient is suffering from pneumonia or not by looking at chest X-ray images using a deep learning CNN model in Keras. Achieved 90% accuracy.

- Time Series Analysis: Built different time series models (ARIMA, LSTMs) to forecast multiple indicators. Undergraduate Research Assistant May 17 – May 18

Lahore University of Management Sciences Lahore, Pakistan

- YouTube Kids Explicit and Forged Content Detection: Successfully detected explicit and/or inappropriate content on YouTube Kids using a deep learning model in Keras and TensorFlow that takes in video, audio and movement features. Used a pre-trained Convoluted Neural Network to extract audio and image features. Achieved 95% accuracy with the classifier. [Report][Code]

- Human Microbiome Analysis: Conducted a deep analysis of the simulated microbiome data of thousands of patients containing RNA read counts mapped to 1000 microbe’s genome. Performed dimensionality reduction using the chi- squared test. Used sklearn, NumPy, matplotlib and pandas Python libraries for data exploration, data modelling, data cleaning and data visualization. Employed seaborn reporting tools to create graphs and visualizations. [Code] PROJECTS

• PLAsticc Astronomy on Kaggle: Time series classification using LSTM networks on astronomy dataset.

• Customer segmentation: Used the RFM (Recency, Frequency and Monetary) model to select attributes to cluster customers into different groups

• Fraud Detection: Built Tableau dashboards to identify potential fraudulent/ incorrect transactions.

• Recommender System: Developed a movie recommender system using collaborative filtering on public datasets. SKILLS

Programming Languages: C, C++, Python, MATLAB, HASKELL, Java, Socket Programming, SQL, R Data: Data Analytics, Data Wrangling, Statistical Inference, Data Modeling (Decision Tree, K-Means Clustering, SVM, Gaussian Naïve Bayes, Neural Network), Machine Learning (Supervised and Unsupervised Learning), Deep Learning, Data Visualization, Data Reporting, Data Engineering, Natural Language Processing, Data Mapping, A/B Testing, Hadoop, Spark Technical: scikit-learn (sklearn), Pandas, NumPy, Keras, TensorFlow, Pytorch, SQL, MapReduce, GitHub, Tableau, Jupyter



Contact this candidate