Sign in

Data Customer Service

New York City, New York, United States
February 11, 2019

Contact this candidate



**** ******** **. *** **, Bronx, NY 10463 347-***-****


Stony Brook University, Stony Brook, NY

Masters of Science in Applied Health Informatics Data Analytics Specialization 5/2017- 8/2018

Bachelor of Science Health Science Concentration in Environmental Health and Safety 5/2017

(double major) Sustainability Studies


HIPAA Policy and Procedure Compliance

Verbal/Written Communication, bilingual (Spanish, English)

Big Data/Data Analysis, Data Science

Team Building and Leadership

adoop (HDFS/MapReduce), Hive, Kafka, Pig, Impala and Avro, Oozie, Sqoop, Flume, ZooKeeper, Hue, HBase, Spark/Scala, Storm

AWS (EC2, RedShift, EMR, IAM, etc.), MongoDB, Cassandra

SQL, Tableau, Python, R, Java, Linux, Unix Shell Scripting, Machine Learning

Project Management

MS Word, Excel, and PowerPoint


Building a Sustainable Healthcare System Project 7/2017 - 8/2017

Designed a project with four other group members to analyze sustainable healthcare systems such as the Cleveland Clinic and how it can contribute greatly in achieving Triple Aim’s goal of reducing costs, improving population health, and improving patient experience.

Python Analytics Project

Utilized Python (Pandas, Numpy, MatPlotLib) to manipulate data and analyze various datasets such as movie ratings, stock market data, and U.S. diabetic cases by finding their descriptive data (average, max, min, etc.), performing joins, creating and merging dataframes/pivot tables, and even creating and uploading the data into histograms.

Developed user-friendly games in Python such as Pong or Snake.

Performed a tweet visualization and a sentiment analysis of tweets in real-time on Python.

Parsed data (i.e. personal information) from a CSV file to an HTML list using Python.

Health Data Analytics Project 1/2018 - 5/2018

Designed a project with three other group members to analyze a diabetic dataset using SQL, to clean up and extract the desired data, then we used R to perform linear and logistic regressions on our extracted dataset, and then Tableau to visualize the data.

MovieLens Analysis 5/2018 - 9/2018

Developed MapReduce jobs to analyze the top ten most viewed, as well as the highest rated movies, and how each movie genre was ranked by average rating for each profession and age group.

Sentiment Analysis 5/2018 - 9/2018

Collected data twitter tweets, and used MapReduce to pre-process the raw JSON data and extract relevant information from it, then wrote Hive UDF for the classification of data into positive/negative opinions, then used Hive queries to generate a report of both good and bad opinions/evaluations/attitudes in the tweets collected.

Set Top Box Analysis 5/2018 - 9/2018

Used Spark to analyze details unstructured data on users’ activities such as browsing for videos, tuning a channel, duration, purchasing video using Video On Demand to understand the devices and channels with the maximum duration, calculate how many junk records there are, get the total number of devices used in certain amounts, etc.

Aadhar Card Analysis 5/2018 - 9/2018

Utilized SparkSQL to create dataframes and performed dataset operations to prepare and process the Aadhar data for various demographic parameters in India to analyze and find out the top 3 states and private agencies generating the most number of Aadhar cards, finding the number of unique pin-codes in the data as well as Aadhar registrations that were rejected in certain states, and even writing commands to see certain correlations.



Big Data Hadoop Certification Training 7/2018- 12/2018

Installed Hadoop and Spark (Master and Slave) on a single/multi-node virtual machine (CentOS/Ubuntu).

Used MapReduce to aggregate data, perform joins of multiple input files, and graph problem solving (i.e. breadth first search).

Generated reports for by using Sqoop to create jobs to import/export various data such as banking financial transaction data from MySQL to HDFS, processing it through MapReduce, and analyzing the data through Hive queries.

Performed a various data analyses such as analyzing weather data to learn the hot, cold, and hottest days recorded in the dataset using Pig.

Created a schema and dataframe of hospital services using SparkSQL and used Scala to perform SQL queries such as finding the number of doctors, patients that visit each doctor, minimum number of visits to a doctor, average, and maximum visits, etc.

Sqooped data in Avro format and analyzed it in Impala via Hue.

Used R to perform a linear regression on a company's advertisement expenditure on a product to understand how its helped in driving their sales revenue and also planning the future spending in advertising.

Created and executed various shell scripts such as creating loops & iterations, functions, performing calculations, etc. in Linux using Unix.

AWS Solutions Architect Certification Training 9/2018- 12/2018

Worked on and learned to better utilize Amazon Web Services (EC2, EBS, ELB, EMR, VPC, S3, CloudFront, IAM, RedShift, RDS, Route 53, CloudWatch, SNS, VPC).


Certified Hadoop and Spark Developer Training 5/2018- 12/2018

Designed a MapReduce job that calculated the average CPU utilization percentage of each and every node machine per day.

Developed a MapReduce job that got the details of employees (males and females, respectively) with the highest salary for the year from each department.

Configured a single node-single broker cluster in Kafka to publish messages (as a Producer) to a topic that was received by the Consumer which has subscribed to that topic.

Performed a join of two different files (employee name data/employee department data) using MapReduce.

Utilized Hive and UDFs to process and analyze rollercoaster data to create queries concerning various questions such as ride length and speed in relation to participant excitement, intensity, and nausea levels.

Created RDDs in Spark/Scala to analyze various datasets such as financial data from the World Bank and the most populous countries, those with the highest urban population and GDP growth, as well as countries whose internet usage has grown the most in the past decade.

Analyzed retail purchase data to calculate sales breakdown by product category and store across all stores using Spark/Scala.

Created dataframes and ran various queries on Titanic data to analyze the average age of people who died and survived, embarked locations and their count, the number of people who survived based on gender in between the age of 20 and 50 using SparkSQL.

Analyzed Olympic data containing information about the number of athletes who participated, number of medals won by each country, sports played, etc., and generated reports and outputs on them using SparkSQL.

Used Spark streaming (Flume) to get and count the number of trending hash tags from twitter in real-time.

Generated a Spark streaming (Flume) job that parsed weblogs into a structured format.

Used GraphX to generate the popularity (or rank) of users through the PageRank algorithm.

Clustered sample data into 6 clusters by using Spark MLlib.

St. Barnabas Hospital, Bronx, NY 1/2018- 7/2018

Informatics and Analytics Intern

Created a Project Plan for the CMIO to implement a Patient Survey to be used as a part of another major project.

Utilized Python and Tableau to analyze and visualize the total number of discharges per PDI/PQI for every month of the previous year.

Performed data review, organization, normalization, and clean-up.

Created a documentation of St. Barnabas Tableau workbooks, views, data sources, description, filters available, and associated images by using Mail Merge to improve information accessibility.

Visualized the PDI/PQI discharge prevalence within the hospital by using Tableau.

Created various dashboards in Tableau for provider use such as the monthly medication prescriptions made by providers as well as a dashboard showing the number of Continuous Observation (CO) Orders for Emergency and Inpatients.

Used Tableau and Excel to track Meaningful Use VDT daily and for every other 90-day period to ensure providers from every department, except dentistry, have successfully passed and met their 5% patient target.

Assisted the EHR MU Coordinator in reporting and validating the 2017 Meaningful Use VDT for every 90-day period by using AllScripts.

Used SQL to prepare various reports such as returning patients who were in the Emergency Room in a specific date for every month throughout the year for the Director of Analytics or a list of patients who satisfied certain criteria to aid a Pediatrician's quality improvement project.

Utilized Microsoft Access to create a user-friendly search form for Human Resources to find employee demographic, deduction, and earnings information.

Stony Brook Southampton Hospital, Southampton, NY 12/2017- Present

Graduate Informatics Intern/Volunteer

Created and updated a log and graph on Excel that tracked cases of stroke and tPa in patients for the Director of Performance Improvement.

Created a graph and chart of the number of medication errors in Southampton Hospital and analyzed the most common errors that occurred throughout the year.

Created a PowerPoint Presentation for hospital MDs to assist them in better understanding and staying updated on an overview and current requirements of Meaningful Use Stages 1-3 and MACRA/MIPS.

Created a database on Microsoft Access for the ongoing collection, aggregation, and report of information about patients transferred from Southampton Hospital to other hospitals.

Brookhaven National Laboratory, Brookhaven, NY 6/2014 – 8/2014

U.S. Department of Energy SULI Research Intern

Contributed to establishing an eco-city indicator system to measure the city’s performance on their sustainable development.

Collected and researched data for the eco-city indicator systems for more than 20 cities.

Created poster presentations with the help of colleagues and presented research findings and results of the sustainable city indicator system.


Educational Opportunity Program Student Association, Secretary and Activities Chair 9/2016 - 5/2017

Composed, typed, and distributed meeting agendas.

Maintained event scheduling, and planned and booked room for events.

Coordinated events with other student groups.

Student African American Brotherhood (SAAB), Member 9/2012- 5/2017

Participated in-group activities to cultivate diversity, awareness, mentoring, and self-discovery.

Worked with a team of 4-10 Stony Brook students to educate 20+ younger students about the college experience.

Recruitment membership committee member; worked in a team of 4-5 people to recruit students to become a part of SAAB to help them grow personally, academically, and professionally.

Educational Opportunity Program, High School Initiative 8/2012- 5/2017

Educated high school students on an array of support programs aimed for economically disadvantaged students who possess the potential to succeed in college, but whose academic preparation in high school has not fully prepared them to pursue college education successfully.

Motivated high school students to attend college by discussing college life and the expectations of being a college student.

Contact this candidate