Yamini Anusha Ravuri
Data Scientist/Data Analyst
Email : ******.**********@*****.***
Linked in: www.linkedin.com/in/yamini-ravuri
Contact :+1-361-***-****
Summary:
Data Scientist and Analyst with over 3 years of experience in predictive modeling, and machine learning, large-scale data processing. Expertise in deploying advanced algorithms like XGBoost, Random Forest, and Generative AI to deliver actionable insights and Optimize workflows. Proficient in Python, Spark, Tableau, and AWS services, with a proven record of improving efficiency by 35% and reducing analysis time. Proficient in large-scale data processing, NLP, and real-time feedback systems, and data Visualization delivering measurable improvements in accuracy, user engagement, and operational efficiency.
Technical Skills:
Programming Languages & Python skills:
Python, R, MAT Lab, C++, Java, Numpy, Scipy, Sklearn, PySpark, Pandas, Matplotlib, Nltk, beautiful soup and Pyunit
Big Data & Data Visualization & BI
Hadoop, MapReduce, Spark, Kafka, Hive, NoSQL DB, MongoDB, Alteryx,
Tableau, Power BI, Matplotlib and Seaborn
Development Environments & Software Deployments
Anaconda, Eclipse, Jupyter Notebooks, Jupyter Lab, Google Colab, Git, Jira, Docker container and Kubernetes
Machine Learning:
Linear Regression, NLP, KNN, Clustering Analysis and Recommendation systems, Sentiment Analysis, PyTorch, Random forests and Decision Trees, Tensor Flow, Reinforcement Learning, Computer vision, Generative AI and Deep Learning
Cloud Computing services:
AWS ML SageMaker, S3, EC2, Lambda, EMR, Databricks, GCP, AWS RDS and Redshift, Azure.
Work Experience :
Edward Jones, Dallas, TX Nov 2024 – Present now
Data Analyst/Data Scientist
Responsibilities:
• Designed and implemented advanced Factor and Cluster Analysis frameworks using Python SciPy, enhancing customer segmentation for 15+ product lines based on purchasing behavior.
• Led A/B testing for a new recommendation engine, driving a 20% monthly increase in product upsell opportunities.
• Optimized large-scale data processing and model training using Databricks and Spark, shortening analysis time by 50% through automation of feature selection and missing value handling.
• Developed NLP solutions with Generative AI and Large Language Models (LLMs), achieving a 45% improvement in customer sentiment analysis accuracy.
• Enhanced text generation capabilities using LLMs, implementing real-time feedback loops that boosted user engagement by over 50%.
• Accelerated large dataset processing by 30+ hours weekly using machine learning algorithms like XGBoost and Random Forest.
Dynamic Healthcare Systems, CA Feb 2024 – Oct 2024
Data Analyst
Responsibilities:
• Analyzed and transformed large datasets with pandas and PySpark, executing data pre-processing techniques that enhanced the quality of insights generated for over 10 key research projects each month.
• Implemented robust machine learning frameworks using Python’s top-tier libraries like Scikit-learn which improved model accuracy rates by at least 15%, driving more reliable forecasting outcomes within research projects.
• Streamlined the integration of unstructured and structured datasets from diverse sources, automating the data cleaning process through Python scripts; achieved a 50% reduction in manual data preparation time.
• Visualized complex datasets using matplotlib, Bokeh, and Plotly, transforming raw data into actionable intelligence for risk assessments and improving prediction accuracy by 15% within six months.
• created data pipelines within Databricks, processed 1+ million daily transactions with a 99.99% data accuracy rate.
Sixbase Technologies, Hyderabad, India Jul 2022 - Aug 2023
Python Developer
Responsibilities:
• Built backend services and APIs using frameworks like Django, Flask, or FastAPI. Integrating third-party APIs and services to enhance application functionality.
• Optimized database schemas for relational (PostgreSQL, MySQL) and non-relational databases (MongoDB). Writing complex SQL queries and stored procedures for efficient data retrieval and manipulation.
• Formulated scripts and tools for data extraction, transformation, and ETL processes using libraries like Pandas, NumPy, and PySpark to process and analyze large datasets. Debugged and resolved issues in existing applications, ensuring optimal performance and reliability.
• Mobilized applications to production environments utilizing Docker, Kubernetes, and CI/CD pipelines, achieving a 99.9% uptime and reducing deployment time from 4 hours to under 30 minutes.
• Positioned applications on cloud platforms like AWS, Azure, or Google Cloud Platform (GCP). Utilizing cloud services such as AWS Lambda, S3, and EC2 to build scalable and cost-effective solutions.
Indian Institute of Science, Hyderabad, India Apr 2021 - Jun 2022 Algorithms / Python Developer
Responsibilities:
• Architected robust database connections using SQL integration, enabling faster data retrieval across three major web applications.
• Transitioned scalable algorithms to enhance data processing capabilities across applications while collaborating with cross-functional teams to identify solutions for the three major computation bottlenecks impacting performance.
• Spearheaded algorithms for performance, focusing on time and space complexity, which led to a 40% reduction in server costs and improved application responsiveness for end users.
• Engineered advanced algorithms for sorting, searching, and pattern matching, improving processing speed by 30%.
• Standardized algorithm performance metrics across the organization, creating a unified framework that enabled a 40% faster identification of underperforming algorithms, directly improving business outcomes.
Education:
• Bachelor’s in information technology at JNTU, Apr 2021
• Master's in Science, Computer Science, Texas A&M University – Kingsville, Dec 2024