Data Analyst

Location:

Tampa, FL

Salary:

90000

Posted:

November 05, 2020

Contact this candidate

Resume:

Harika Gillela

Tampa, FL ***************@*****.*** 813-***-**** LinkedIn

Summary

Data Enthusiast with knowledge in fields of Big Data, Machine Learning, Deep Learning, NLP, Statistical Modeling, Streaming Analytics and Data Visualization.

Extensively worked with Transactional Database Systems (OLTP) and analytical processing (OLAP), Dimensional Modeling, Data Migration, and ETL processes for Data Warehouses.

Cloudera certified Hadoop and Spark Developer

AWS certified Cloud Practitioner

Technical Skills

Big Data Ecosystem: Cloudera Stack, Hadoop, Apache Spark, MapReduce, Kafka, HBase, Hive, Pig, Impala, Sqoop, Oozie, Flume, Elastic Search

Programming Languages: Python, Scala, R, SQL, Java

Databases: Oracle, MySQL, MSSQL Server, NoSQL (Cassandra, MongoDB)

Cloud: AWS (EC2, S3, Glue, Redshift, RDS, Lambda functions), GCP (Big Query, GFS)

Data Visualization: Tableau, Power BI, Python

Tools: Jira, Docker, Git, SAS EM, AZURE ML

Education

Master’s in Business Analytics and Information Systems (GPA 3.91/4) Expected Graduation: December ‘20

University of South Florida, Tampa, FL

Bachelor of Technology, Computer Science Engineering (GPA 3.8/4) May ‘13

Jawaharlal Nehru Institute of Technology, Hyderabad, India

Professional Experience

Accenture Solutions Pvt Ltd, India August 2013 – July 2019

Application Developer Senior Analyst January 2018 – July 2019

Develop Spark programs using Pyspark to analyze, transform and load large data sets efficiently into HDFS cluster.

Extensively worked on effective data ingestion and data migration projects which involves optimization techniques like broadcast join, cache / persist and executor tuning.

Migrate tables from relational databases to HDFS using Sqoop import and implemented different compression techniques like Gzip, Snappy.

Tuning the spark jobs by adding more resources and executors to overcome out of memory issue when there is fast movement data with high volume.

Developed Data quality checks with Spark Scala to identify data failures resulting in error-free delivery to clients.

Experience in working with Hive - creating tables, data distribution by implementing partitioning / bucketing, writing and optimizing the Hive queries.

Used Oozie Scheduler system to automate the pipeline workflow and orchestrate the MapReduce Jobs that extract data in timely manner.

Implemented complex functions and numerous joins to handle data transformations as per business rules.

Worked on building supervised machine learning models (regression and classification) using spark MLlib.

Analyzed and Resolved existing long running issues in daily data ingestion jobs and other complex Hive queries.

Create, design and deploy ETL/ELT jobs and pipelines using Talend Bigdata edition, according to the requirements specified by the client.

Develop Talend jobs to automate the task of loading data from cloud server, S3 buckets into Hadoop file system HDFS.

Application Data Analyst August 2013 – December 2017

Extract data from SAP HANA DB, perform data transformations in Python, R, and create dashboards in Tableau.

Perform Exploratory Data Analysis and Feature Engineering to indicate the KPI’s and relevant features to ensure the company’s performance and planning.

Build and publish customized interactive reports and dashboards using Tableau server for cost saving insights.

Work directly with clients to provide ad hoc resolution of queries through data extraction and modeling using SQL.

Perform data validation, data quality analysis, troubleshooting, performance tuning and regression testing to ensure data integrity and accuracy of reporting.

Analyzing and interpreting data reports, drawing conclusions and recommendations which answer specific business needs.

Student Employment

Research Assistant, USF – Tampa, FL April 2020 – August 2020

Developed a model to forecast the completion date of 5g fiber layout initiative project for a leading mobile carrier.

Extracted data from Oracle database, performed exploratory data analysis and feature engineering using python.

Implemented Random Forest algorithm with hyper-parameter tuning after testing various machine learning algorithms for performance.

Automated model training using Scikit-Learn data pipelines and deployed model on AWS server.

Graduate Student Assistant, Center for Urban Transport Research (CUTR), USF November 2019 – March 2020

Data collection, cleaning and analysis of the field traffic data of Hillsborough County to identify the reasons for crashes.

Preprocess, clean and categorize data using R, Python and provide data-driven actionable insights using Tableau.

Academic Projects

Loan Default Prediction March 2020 – May 2020

Analyze and build a loan prediction model with huge data from Lending Club.

-Technical Skills: Databricks, Pyspark, Spark MLlib, Spark SQL.

Revenue Management in Retail DB

Developed and deployed a utility in YARN to calculate the revenue for each day and each product.

This utility reads retail data from HDFS processes it using Spark Data Frames and saves data back to HDFS and Hive tables.

-Technical Skills: Cloudera, PySpark, SparkSQL, Hive, HDFS, Linux.

Emotion Recognition from EEG Data: Predict patients’ emotion-based on brain activity February 2020 – April 2020

Captured different brain activity spikes using sensors and cleaned data by employing statistics.

Implemented various Machine Learning models and created an RNN model to understand the implicit patterns and unfold the patient’s mental state.

-Technical Skills: LSTM Deep Learning, Random Forests, Neural Networks, Ensembles

Adult Obesity Prediction February 2020 – April 2020

Implemented a Regression Model in R to identify the factors that affect the health of people-Obesity among the various counties of the United States. Performed exploratory data analysis and visualized data in Tableau.

-Technical Skills: Regression Models, Multi-level & Panel Models, Endogeneity

Analysis of Drug Reviews September 2020 – November 2020

Project involves analyzing drug reviews: Processed text reviews and the useful counts of drugs to create a model that predicts the most relevant drug based on the patient’s condition.

-Technical Skills: Python, NLP, TensorFlow, Deep Learning

Advance Database Management Systems: Event Management System October 2019 – November 2019

Designed a database that stores real-time data of musical events. Data generated is cleaned and then imported using SQL.

-Technical Skills: Stored procedures, Triggers, CTEs, UDF scalars, MSSQL Server, Microsoft SSMS

Contact this candidate