Harshitha Sanikommu
404-***-**** Boston, MA ******************@*****.*** linkedin.com/in/harshitha-sanikommu https://github.com/SanikommuHarshitha
EDUCATION
Northeastern University, Boston, MA Expected Aug 2020 Master of Science in Information Systems GPA : 3.5/4.0 VR Siddhartha College, Vijayawada, India May 2016
Bachelor of Technology in Information Technology GPA : 8.45/10.0 Relevant Courses: Data Science, Big Data, Python, Relational Databases(DB), Machine Learning, Data Warehousing Business Intelligence, Artificial Intelligence(AI), Natural Learning Processing(NLP), Data Structures and Algorithms, R programming, Cloud Computing, Data Mining
TECHNICAL SKILLS
Programming Language : Python, R Programming, SAS, Java, Scala, JavaScript Machine Learning packages : Numpy, Pandas, Matplotlib, Scikit-learn, Keras, Tensorflow, NLTK, Selenium Databases : MySQL, Oracle, PL/SQL, PostgreSQL, T-SQL, NoSQL, MongoDB, Cassandra AWS Cloud Services : S3, Redshift, EC2, lambda, Kubernetes, SageMaker, EKS ETL Pipelines : Apache Beam, Apache Airflow, Metaflow(Netflix), Kafka, DASK, DAG Data Visualization Tools(BI) : Power BI, Tableau, ER Studio, Talend, Alteryx, Looker, PivotTables Big Data : PySpark, Hadoop, MapReduce, Hive, HBase, Pig, Apache Spark, Oozie, Hbase Other Tools/Technologies : Jupyter Notebook, Docker, Github, Slack, Trello, Turnilo, Pycharm, A/B testing PROFESSIONAL EXPERIENCE
Data Engineer Co-op July 2019 – June 2020
Shah Family Foundation, Boston, USA
• Built AWS data pipelines and company’s MySQL server from the ground up which optimized the query performance by 92%
• Engineered $30M budget project and increased the profit by 10% by improving efficiency of Boston Public Schools
• Conserved 92% of the time spent by transferring and wrangling raw data with custom-made ETL application and automated them to prepare unruly data for machine learning models
• Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python to provide storage and perform data analysis
Data Scientist July 2016 – June 2018
EdgeVerve, Bangalore, India
• Conducted statistical data analysis using logistical model, KNN, decision tree classification and random forest model to increase the accuracy(P-scores) by 30%
• Accelerated statistical and analytical insights by 40% for effective strategic positioning using PySpark
• Expanded recurrent business among land financial specialists by 25% ACADEMIC PROJECTS
Dockerize Sentiment Analysis Model using Metaflow Data Pipeline Feb 2020
• Designed a ETL pipelines using Airflow and Apache Beam to scrape, preprocess and label the data of company earning calls and store the retrieved data on S3
• Predicted the sentiment of sentences using Python, Metaflow(Netflix), NLP, Docker and Flask App
• Designed Bert Model on GPU using NVIDIA CUDA and increased the accuracy to 93%
• Incorporated Amazon Comprehend API in algorithms to label the data for sentiment analysis
• Developed a flask app which runs Tensorflow model and created Docker containers Quora Question Pairs Python, Tensorflow, LSTM, Word2Vec, Log Loss March 2019
• Assembled a Long Short-Term Memory model that identifies the duplicate questions on Quora
• Used Keras and Tensorflow packages to build the log loss algorithm and increased the accuracy by 73%
• Ranked 37th position on Kaggle’s public leaderboard Customer Relationship Management Using Hadoop March 2019
• Led and managed a team of 4 members to classify whether customers are valued customers or not using MapReduce jobs
• Triggered and monitored workflows in Oozie using Linux commands and developed HiveQL queries