Machine Learning Data Scientist

Location:

Cary, NC, 27513

Posted:

February 18, 2025

Contact this candidate

Resume:

DHRUV CHAUBEY

551-***-**** ****************@*****.*** linkedin.com/in/dhruv-chaubey/ github.com/dhruvchaubey SUMMARY

Data Scientist with expertise in machine learning, statistical modeling, and data-driven decision-making. Proficient in Python, SQL, and cloud platforms (AWS, Azure), with experience in exploratory data analysis, feature selection, data visualization, and predictive modeling. Adept at building scalable, production-ready AI/ML systems, developing analytical solutions, and optimizing models for business strategies. Strong background in big data technologies, unstructured data processing, and deploying models into production environments. EDUCATION

New Jersey Institute of Technology, New Jersey MS

Coursework:Big Data, Machine Learning, Deep Learning, Data Analysis with R, NLP, Data Engineering, Applied Statistics Uttar Pradesh Technical University, India BS

PROFESSIONAL EXPERIENCE

Rebecca Everlene Trust Company Sep 2024 – Present

Data Analyst

• Extracted, cleaned, and analyzed large structured and unstructured datasets and deployed predictive models using linear regression, decision trees, and neural networks, enhancing decision-making efficiency.

• Automated high-volume data pipelines in SQL and Python, increasing data processing efficiency by 25%

• Built interactive data visualizations in Tableau, in collaboration with cross-functional teams to improve data integrity, model scalability, and production system reliability.

NeurotechR3,Inc. Feb 2024 – May 2024

Data Scientist

• Programmed and deployed recommendation system model using TensorFlow, enhancing content personalization and increasing user engagement by 10%.

• Applied time-series forecasting to predict demand patterns, optimizing resource allocation and marketing spend, improving prediction accuracy by 5%.

• Integrated AWS SageMaker, Glue, and S3 for optimized ML deployment and CI/CD, reducing model latency by 20% and implementing A/B testing frameworks to refine ad performance. IBM Apr 2018 – May 2022

Software Engineer

• Developed scalable ML pipelines using PySpark and Airflow, reducing model training time by 35% for high-frequency financial risk assessment models.

• Applied clustering algorithms (DBSCAN, Isolation Forest) and autoencoders to detect anomalies in real-time financial transactions, integrating streaming analytics with Kafka and Spark, which reduced fraudulent activities by 12% through early detection and automated risk flagging.

• Engineered distributed machine learning systems, data pipelines and production-ready models for analytics using Python, Spark, and Kafka and integrated with Cloud-based ML solutions.

• Collaborated with cross-functional teams, ensuring seamless AI integration into business workflows and enhancing model interpretability for stakeholders.

PROJECTS

Text Summarization & Query Generation Python,TensorFlow, OpenAI API, NLP Techniques

• Developed a text summarization system for a 10+GB dataset using OpenAI’s GPT models, reducing document review time by 40% while preserving critical information.

• Integrated Retrieval-Augmented Generation (RAG) by leveraging vector databases and semantic search to retrieve relevant external knowledge, enriching model inputs, Optimizing retrieval pipelines and context-aware embeddings, enhancing response relevance and accuracy by 25%.

Demand Forecasting for Supply Chain Optimization Time Series, PyTorch,AI, SQL, Deep Learning, AWS

• Built a forecasting model for inventory optimization using LSTM and XGBoost by integrating AWS services (Lambda, DynamoDB, SageMaker) for real-time anomaly detection, reducing false positives by 15% and improving inventory planning accuracy by 30%. SKILLS

Programming: Python, R, SQL, PySpark, TensorFlow, PyTorch, NumPy, Pandas, FastAPI Machine learning & AI: Predictive Modeling, Generative Models (GANs, VAEs), Neural Networks, NLP, Retrieval-Augmented Generation (RAG), Feature Engineering, Model Optimization, Unstructured Data Processing Data Analysis: Exploratory Data Analysis (EDA), Data Science, Statistical Modeling, Business Insights, Tableau, Matplotlib, ggplot, Feature Selection, Linear Regression

Tools & Infrastructure:AWS, Azure, Tableau, Kafka, GraphQL, REST APIs, Data Pipelines, Data Warehousing, ETL, CI/CD, Model Deployment

Contact this candidate