Data Scientist Machine Learning

Location:

Jersey City, NJ, 07306

Salary:

80000

Posted:

May 20, 2025

Contact this candidate

Resume:

Rasika Gulhane

***************@*****.*** +1-551-***-**** USA LinkedIn GitHub

Profile Summary

Data Scientist with 6+ years of experience in machine learning, deep learning, and statistical modeling, leveraging Python, R, SQL, and Big Data technologies (Hadoop, Spark, Hive) to develop scalable, data-driven solutions. Expertise in NLP, Computer Vision, Time Series Analysis, and deploying AI models using TensorFlow, PyTorch, Scikit-Learn, and XGBoost. Skilled in ETL, MLOps, and cloud platforms (AWS, Azure, GCP) with hands-on experience in Docker, Kubernetes, Airflow, and CI/CD pipelines. Strong background in A/B testing, hypothesis testing, feature engineering, and data visualization

(Power BI, Tableau, Matplotlib, Seaborn) to drive business insights and process optimization. Collaborating in Agile/Scrum environments, working cross-functionally to build impactful AI solutions that enhance efficiency and profitability. Skills

Language and Databases: Python, Go, SQL, R, C/C++, MATLAB, Scala, MySQL, PostgreSQL, Oracle, Firebase, MongoDB, Redshift, XML, Unix, Shell, Bash, Neo4j, AstraDB, Pinecone, ChromeDB

Library/ Packages: PyTorch, TensorFlow, Keras, NumPy, SciPy, Pandas, Regex, SciKit-Learn, XGBoost, OpenCV, NLTK, SpaCy, Matplotlib, Ggplot, Seaborn, ResNet 50, Langchain, Llama, GPT, Claude

Methodologies and IDEs: SDLC, Agile, Waterfall, Visual Studio Code, PyCharm, Colab, Scrum

Tools: Power BI, Tableau, Microsoft Excel, Hadoop, Hive, MapReduce, Alteryx, Spark, Airflow, Kafka, Snowflake, MLflow, Docker, Kubernetes, Google Analytics, Git, GitHub, Jira, and Jenkins, ETL, BigQuery, Databricks

ML Algorithms: Regression, Supervised Learning, Unsupervised Learning, Random Forest, Linear Regression, Decision Tree, Deep Learning, Clustering, Classification, Time-Series, Tensorflow, Keras,NLP, GANs, Open AI, LLM, RNN, CNN

Other Skills: Data Cleaning, Data Wrangling, Data Warehousing, Data Visualization, Communication Skills, Presentation Skills, Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), A/B Testing, FastAPIs, Restful APIs Professional Experience

Data Scientist, PWC 02/2024 Remote, USA

Developed a personalized recommendation system using collaborative filtering, content-based filtering, autoencoders, LLMs, and NLP, resulting in a 15% sales increase and improved customer satisfaction.

Applied K-means clustering and machine learning techniques to segment customers, enhancing product relevance through personalized recommendations based on behavior patterns and preferences.

Conducted end-to-end data preprocessing and ETL operations using Python, SQL, and Hadoop, including data cleaning, transformation, and database structuring, ensuring high-quality datasets for accurate model training and performance.

Created an autoencoder-based recommendation model with 92% accuracy using Python (Pandas, NumPy, Scikit- learn) and distributed processing with Hive and MapReduce.

Deployed ML pipelines and CI/CD workflows on AWS Cloud, leveraging SageMaker, Lambda, Redshift, Kafka, CloudWatch, and Prometheus for scalable operations and performance monitoring.

Collaborated in an agile environment, integrate APIs, and continuously enhance the system's accuracy and scalability, using Tableau for data visualization and reporting to align stakeholders on actionable insights. Data Science Intern, Shoptaki 07/2023 – 09/2023 New York, USA

Implemented an AI-driven real estate recommendation system using BERT and Cosine Similarity, boosting accuracy by 15% and user engagement by 20%. Deployed on AWS SageMaker with REST APIs, reducing response times by 35%.

Developed a Risk Analysis ML model using Gaussian Naïve Bayes and Random Forest, reducing investment risk by 25% through data-driven insights.

Built an automated ETL pipeline using Apache Airflow, streamlining real-time data ingestion from multiple sources, optimizing processing time by 30%, and ensuring continuous delivery of actionable property insights. Data Analyst, Flipkart Internet 05/2021 – 06/2022 Bangalore, India

Optimized ETL pipelines to process over 1M daily records from Flipkart’s FDP, Google BigQuery, and Adobe Omniture using SQL, Apache Spark, and Airflow, achieving a 60% efficiency gain and 40% latency reduction.

Developed real-time data pipelines using Kafka and Spark Streaming, enabling low-latency data ingestion and enhancing data availability for business intelligence and analytics teams.

Built and managed a cloud-based data warehouse on Google BigQuery and AWS Redshift, optimizing data storage, retrieval, and analytics for demand forecasting, pricing optimization, and customer insights.

Automated business intelligence dashboards with Python, SQL, Google Sheets, and App Script, improving reporting accuracy by 50% and reducing manual efforts by 80%.

Enhanced machine learning models, including a Random Forest-based customer churn prediction system, improving retention by 18%, and conducted A/B testing on recommendation strategies, increasing conversion rates by 12%.

Implemented association rule mining to analyze buying patterns, identifying high-impact product pairings and driving a 15% increase in upselling opportunities.

Maintained Tableau dashboards tracking 23+ KPIs, enabling data-driven decision-making for pricing, inventory forecasting, and sales analytics, contributing to a 100% revenue target achievement during peak sales events. Data Engineer, Disha IT and Systech 10/2018 – 05/2021 Pune, India

Designed high-performance Kafka ETL pipeline, streaming 20 KB/sec of real-time household power consumption data, ensuring 99.9% data integrity, and automating ingestion with Apache Spark and Airflow, reducing ETL processing time by 40%.

Optimized scalable data storage using MongoDB and SQL, enhancing query efficiency by 30%, reducing latency, and integrating data warehousing solutions like Snowflake and Redshift to support large-scale analytical workloads.

Deployed an ML-powered anomaly detection model leveraging Python (Pandas, NumPy, SciPy) and AWS CodePipeline for CI/CD automation, improving failure prediction accuracy by 20%, reducing operational downtime by 15%.

Built containerized FastAPI-based machine learning services, orchestrating deployments with Docker, Kubernetes, and AWS Elastic Container Registry (ECR), ensuring scalable and fault-tolerant cloud infrastructure on AWS EC2.

Led data governance, quality monitoring, and real-time analytics, automating data validation with Apache Airflow, and visualizing business intelligence insights via Tableau and Power BI, driving data-driven decision-making across teams.

Managed Agile workflows using JIRA, streamlining CI/CD pipelines, GitHub-based version control, and cross-functional collaboration, reducing issue resolution time by 25% and enhancing team efficiency in DevOps-driven data operations.

Associate Engineer, Wipro Ltd 06/2017 – 10/2018 Pune, India

Optimized SQL queries for Telstra's network tracking system, boosting transaction efficiency by 20% and data processing speed. Automated data validation and integrity checks, enhancing workflow efficiency by 17%.

Designed and implemented performance monitoring dashboards using Power BI/Tableau, providing real-time network usage analytics and reducing latency issues by 25%.

Managed Agile project workflows using JIRA and Confluence, streamlining network performance tracking and accelerating incident resolution time by 30% through proactive monitoring and issue tracking.

Optimized ETL pipelines and SQL performance tuning, ensuring seamless data integration, indexing, and query optimization, leading to improved database efficiency and enhanced data accessibility.

Collaborated with DevOps teams to integrate CI/CD pipelines, automating deployment processes and improving the reliability of network tracking applications.

Education

Master of Science, Pace University 09/2022 – 05/2024 NY, USA Data Science

Bachelor of Technology, Amravati University 06/2013 – 08/2017 Pune, India Computer Science

Projects

Medical Prescription ReadingOut June 2024

Developed an end-to-end LLM-based Text-to-Speech system for automated handwritten prescription interpretation.

Integrated OCR AWS Textract, Google Vision, and NLP spaCy’s Drug NER with Gensim’s LDA to enhance medicine identification and contextual insights.

Implemented GPT-4o with Sequential Chain of LangChain for contextual prescription understanding and Google TTS for structured audio output.

AI/ML RAG-Based Sentiment Analysis Chatbot for Movie Rating and Reviews June 2024

Developed a Retrieval-Augmented Generation (RAG)-based chatbot to analyze movie reviews and provide sentiment summaries using ChromaDB and FAISS for efficient data retrieval.

Fine-tuned GPT-4 models to generate personalized, context-aware sentiment on rating and reviews with 89.8% accuracy.

Engineered a scalable pipeline to process large datasets from TMDB, ensuring real-time query handling with minimal latency.

Designed the chatbot’s UI using Streamlit, enabling intuitive user interaction and seamless query submissions.

Deployed the solution with AWS bedrock, ensuring accessibility and high availability for end users. Sensor Based Wafer Fault detection (ETL data pipeline)

Designed and implemented an ETL pipeline to extract, transform, and load large volumes of sensor data, ensuring seamless integration with the wafer fault detection model for real-time processing.

Developed a real-time wafer fault detection model using sensor-generated data, leveraging K-Means clustering to create 3 distinct clusters for improved fault classification.

Optimized model performance by applying hyperparameter tuning on XGBoost and TensorFlow Neural Network for each cluster, enhancing fault detection accuracy and improving AUC score. Awards/Honors

Gen AI Research and Document Reading Project (Capgemini Honorable Mention) June 2023

Contact this candidate