Post Job Free
Sign in

Data Analyst Science

Location:
Bloomington, IN
Posted:
July 04, 2024

Contact this candidate

Resume:

PAVAN SAI GANESH VEMULAPALLI

Open to Relocation • 302-***-**** • ***********@*****.*** • LinkedIn • GitHub PROFESSIONAL SUMMARY

Data Analyst with over 3 years of professional experience in managing and analyzing large datasets. Proven track record in developing complex queries, providing insightful data reports, and collaborating with cross-functional teams. Strong background in data science and analytics with a focus on improving data integrity and supporting clinical research initiatives. EDUCATION

Indiana University, Bloomington, IN Aug 2022 – May 2024 Master of Science in Data Science GPA: 4.0/4.0

TECHNICAL SKILLS

Programming: Python, SQL, SAS, R, PySpark

Databases: MySQL, PostgreSQL, Neo4j, MongoDB, Spark SQL, Google BigQuery Libraries: Pandas, Numpy, Scikit-learn, Seaborn, Matplotlib, dplyr, polars, Statsmodels, NLTK Tools: Power BI, Tableau, Excel, Looker, VS Code

Cloud: AWS, Microsoft Azure, GCP

Certifications: AWS Certified Data Engineer - Associate, AWS Certified Solutions Architect - Associate, AWS Certified Cloud Practitioner

PROFESSIONAL EXPERIENCE

Research Assistant, Indiana University, IN Jan 2023 – Dec 2023

● Managed a 3-member cross-functional team in NLP research project, leveraging journal publisher APIs and OCR techniques to process and analyze 30,000 management research articles, including image-based PDF conversion.

● Implemented customized Python preprocessing code for diverse APIs, reducing processing time by 40%.

● Trained LSG-BERT-Base-Uncased-4096 model for contextual embedding refinement and devised domain-specific document vectors. Applied cosine similarity for accurate journal paper classification, improving the accuracy by 30%. Senior Data Analyst, Effiya Technologies, Noida, India Apr 2022 – Jul 2022

● Led a team of four business analysts to fine-tune AML scenario thresholds using SAS for 6 branches of the bank.

● Spearheaded ETL processes with SAS Data Integration Studio, ensuring seamless transition and data integrity.

● Examined scenario logic for 25 Anti-Money Laundering scenarios using SAS to understand relevant activities. Fine-Tuned 3-4 scenario parameter thresholds for each AML scenario among 33 customer segments, utilizing SAS statistical analysis.

● Performed statistical analysis on bank customer transactions using SAS, SQL, and MS Excel and decreased the false positive alerts by 30%. Developed Power BI dashboards to monitor alerts across all AML scenarios. Data Analyst, Effiya Technologies, Noida, India Jul 2020 – Mar 2022

● Orchestrated the implementation of SAS Viya to run machine learning models, clustered 150 features based on customer transaction and demographic data. Segmented customers according to behavioral patterns identified by the models.

● Validated customer segments through SAS Viya visualizations, Tableau dashboards, and statistical analysis, analyzing absolute transaction values. Targeted efforts aligned with segment characteristics led to a 25% revenue increase.

● Engineered and trained a Python-based deep learning model utilizing transfer learning on the Mozilla DeepSpeech architecture to transcribe Hinglish speech into English text. Achieved a 67% accuracy in indoor Hinglish environments.

● Designed a Python model leveraging NLP, web scraping (BeautifulSoup, Scrapy) to fetch relevant documents and suggest related internet content based on user input, reducing manual search time by 70%. PROJECTS

Social Media Content Analysis

● Built an ELT pipeline to analyze and visualize social media content, using Reddit API and Apache Airflow for extraction, with S3 for storage and PostgreSQL for temporary storage and metadata management.

● Transformed and Cataloged data using AWS Glue and Athena, optimizing for SQL-based analytics and data management.

● Designed data warehousing solution with AWS Redshift, and visualized insights using Tableau & Amazon Quick Sight. Aspect-Based Hotel Recommendation System using Sentiment Analysis

● Built a user preference-based hotel recommendation system by implementing an ETL pipeline to process and analyze 515k hotel reviews, leveraging BERT for contextual word embeddings and spacy for aspect word extraction.

● Applied K-means clustering for aspect generation, grouping hotels into categories like amenities, food, location, and price.

● Enhanced the accuracy of the system by 20% in categorizing reviews and assigned confidence scores to over 1,200 hotels.

● Deployed a user-friendly JavaScript application for personalized hotel recommendations based on user preferences. PUBLICATION

● P. S. Vemulapalli, A. K. Rachuri, H. Patel, and K. P. Upla, “Multi-object Detection in Night Time” Asian Journal for Convergence in Technology, vol. 5, no. 3, pp. 01-07, Mar. 2020.



Contact this candidate