Post Job Free
Sign in

Python Data

Location:
Brighton, MA
Posted:
July 21, 2020

Contact this candidate

Resume:

SOUMYA ZACHARIA

Boston, MA ********.*@************.*** 732-***-**** linkedin.com/in/soumya-zacharia/ EXPERIENCE

Data Analytics Coop: Brigham & Women’s Hospital Harvard Medical School, Boston Jan 2019- Jun 2019

• Developed scalable and reproducible pipeline using Python to generate summary plots, perform association tests and detect data anomalies, reducing the execution time of Genome analysis by 30%

• Designed Tableau dashboards and performed statistical tests (Hypothesis tests, regression) to analyze the overlap between COPD and Asthma condition, using the clinical study data of 3000 patients

• Created reference catalog mapping variants to gene information from datasets of over 30M rows using R Studio, reducing decision making time for end users by 20%

• Prepared and transformed clinical data to perform regression analysis and examined the relationship between the gene information and phenotypes, with 93% accuracy using Python

• Improved the efficiency of the file search system by using the Natural Language Processing and Image Processing techniques to automatically retrieve information from the research manuscripts, reducing the data entry time from 20 mins to less than 4 mins Marketing Analyst: Places for less, Cambridge May 2018- Jul 2018

• Analyzed the student housing data by generating Tableau dashboards, to identify the most preferred housing locations, price range

• Laid out marketing strategies based on the website leads and housing data leading to a 10% increase in responses on the website Application Analyst: Accenture Solutions, Chennai India Nov 2016-Dec 2017

• Utilized optimized SQL queries for inserting and retrieving data for testing APIs, reducing task completion time by 40%

• Served as the team leader for testing portal application and CRM software, assuming sole responsibility among 50 colleagues

• Created operational excellence dashboards using Excel to efficiently prioritize tasks and monitor the progress of the team, improving the delivery of backlog by 25%

• Comprehensive knowledge of the Agile SDLC and collaborated with multiple teams to layout test strategies PUBLICATION

Craig P. Hersh, Soumya Zacharia, Lystra P. Hayden (2019): Immunoglobin E as a biomarker for the overlap of Atopic Asthma and COPD SKILLS

Programming Languages: Python (NumPy, Pandas, SciPy, Scikit-learn, Keras, TensorFlow, nltk, PySpark, Django), R, C++, Bash scripting Databases: MySQL, SQL Server, MongoDB

Tools: Tableau, R Studio, MS Excel, SPSS, Minitab, Git, Jupyter notebook, Databricks, Apache Spark, familiar with Apache Hadoop Machine Learning techniques- Regression, Decision Trees, Clustering, Statistical & Predictive Modelling, Time Series Analysis, Natural Language Processing, Deep Learning techniques

EDUCATION

Master of Science in Data Analytics Engineering GPA: 3.92 Northeastern University, Boston May 2020

Coursework: Probability and Statistics, Database Management & Database Design, Data Science with Python, Data Mining, Visualisation Engineering, Artificial Intelligence & Applications, Operations Research, Statistical Methods in Engineering Bachelor of Engineering in Electronics & Communication GPA: 8.34 University of Kerala, India May 2016

Coursework: C++ Programming, Computer Organization & Architecture, Digital Signal Processing (MATLAB), Microcontrollers PROJECTS

Chatbot- Python (Keras, nltk) Nov 2019-Dec 2019

• Developed an interactive chatbot using LSTM Sequence to Sequence Encoder-Decoder model trained on Amazon QA dataset

• Performed NLP steps (tokenization, vectorization, embedding) on the data before training the model Netflix Data Analysis - Tableau, Python Sep 2018 – Oct 2018

• Visualised and analyzed the Netflix subscriber trends, country wise library contents and user interests in the past years utilizing the filters, calculated fields and parameters in Tableau

• Created an interactive dashboard to visualise the top-rated movies and shows based on genre, category and rating descriptions Home Credit Default Risk Prediction – Python May 2018- Aug 2018

• Performed EDA and statistical tests on the loan application dataset to analyze the loan repayment probability

• Predicted probability that a client repays the loan by applying ML Algorithms (XGBoost, Random Forest, LightGBM) on the dataset and obtained results with 91% accuracy

Baller Index- MySQL Jan 2018 – Apr 2018

• Developed normalized stock market database of sports players in which the share price of each player is determined by an algorithm

• Created stored procedures and triggers to view and update player prices, customer portfolio (buy/sell shares) and back up



Contact this candidate