Ishani Bandyopadhyay
Data Scientist/ ML engineer
PROFESSIONAL SUMMARY
Experienced Machine Learning Engineer with 6+ years of expertise in designing, developing, and deploying machine learning models across various industries. Proficient in building and optimizing large-scale ML systems, with a strong background in data science, software engineering, and statistical analysis. Skilled in leveraging innovative machine learning frameworks and tools to solve complex business problems. Adept at working in cross-functional teams to deliver scalable, production-ready models that drive business outcomes. Proven record of deploying, consuming, and fine-tuning NLP models & LLMs such as Azure Open AI, Llama 2/3, and Hugging Face. Extensive expertise in designing and implementing scalable AI solutions. Strong proficiency in Python, TensorFlow, PyTorch, and scikit-learn, coupled with practical experience in deploying AI solutions on cloud platforms like Azure and AWS. Exceptional communication and leadership skills with a history of successful collaboration across cross-functional teams.
TECHNICAL SKILLS
Languages: Python, R, SQL, JavaScript
ML/AI: TensorFlow, Keras, Scikit-learn, Prophet, PySpark, Pandas, OpenCV Database: MySQL, SQL Server, PostgreSQL, MongoDB
Reporting Tools: Tableau, Power BI, Wavefront
Predictive and Machine Learning: Regression (Linear, Logistic, Bayesian, Polynomial, Ridge, Lasso), Classification
(Logistic, two/multiclass classification, Boosted Decision Tree, Random Forest, Decision Tree, Naïve Bayes, Support Vector Machines, KNN, Neural Network, Clustering (K-means, Hierarchical), Anomaly Detection Cloud: Google Cloud Platform, Azure, AWS
Cloud Resources: Azure Data Bricks, AWS Glue, GCP Big Query, Cloud Composer, Dataflow Frameworks: Flask, Django, Falcon
Tools: Jupyter, Git, Jira, Docker
Operating System: Linux, Windows, MacOS
EDUCATION
University of Texas at Dallas, USA 2017 – 2019
MS Information Technology and Management
Dhirubhai Ambani University, India 2012 – 2016
B. Tech Information Communication Technology
PROFESSIONAL EXPERIENCE
PayPal, Austin, TX Oct 2024 – May 2025
Data Scientist
Responsibilities:
● Identified business problems or management objectives through data analysis, proposing creative solutions and strategies to existing business challenges
● Analyze, manipulate, and process massive amounts of data using statistical software to discover trends, patterns, and insights via Jupyter, Sci-kit learn, and Tableau.
● Automating the collection process pipeline by identifying valuable data sources using ETL tools like Apache NiFi and Apache Beam.
● Apply feature selection algorithms to models such as ANOVA (analysis of variance), decision trees using PySpark to hyper-tune the parameters based on interest, and predict the outcomes
● Developed and maintained Tableau dashboards used by Transaction Monitoring and FIU departments for reporting essential Anti-Money laundering (AML) transactional metrics, improving workstream by 17%
● Revamped PostgreSQL/SQL to Hive queries, increasing processing speed by 40% and reducing deployment time
● Reviewed 120 AML Data Quality rules consistent with the Business Requirement Documents (BRD) and Functional requirement documents (FRD), and made updates based on the upstream rule change requirements
● Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, effective and efficient Joins, Transformations, and others during the ingestion process
● Automated data governance for asset management data using Python scripts, reducing manual efforts by 60%
● Optimized SQL queries on AML transaction monitoring rules by automating data cleansing, extraction, and analysis processes, resulting in improved performance and data accuracy by 50%. Environment: SQL, Python, JavaScript, TensorFlow, PyTorch, MongoDB, MySQL, Azure (Databricks) Lucid Motors, Newark, CA Aug 2023 – Aug 2024
Machine Learning Engineer
Responsibilities:
• Developed chatbots which use Generative AI (GenAI) and Large Language Models (LLMs) to deliver personalized and context-aware responses, enhancing user engagement and satisfaction.
• Applied NLP techniques using transformer-based models for natural language understanding and generation, enhancing the chatbot’s ability to process and generate contextually accurate responses based on user input and domain-specific language.
• Developed and implemented custom web scraping pipelines using Python libraries to gather domain-specific data from websites and APIs, ensuring the chatbot model is continuously fed with high-quality, relevant data to enhance its understanding and response capabilities.
• Utilized Vector Databases to store and retrieve embeddings efficiently, facilitating quick and accurate similarity searches for improved user query handling.
• Fine-tuned LLMs using domain-specific datasets to optimize the chatbot’s performance, ensuring high accuracy and relevance in conversations.
• Integrated Hugging Face Transformers to leverage pre-trained LLMs, streamlining fine-tuning and deployment processes for domain-specific applications, enhancing response quality and model adaptability
• Deployed the chatbot on AWS using Bedrock for scalable model training and inference, ensuring reliable and efficient performance under varying load conditions.
• Developed and deployed scalable RESTful APIs with Fast API, enabling real-time communication between the chatbot and external systems for data retrieval and model inference.
• Designed and managed scalable infrastructure solutions using Amazon EKS for Kubernetes-based container orchestration, ensuring high availability and efficient resource utilization, and Git for CI/CD pipeline management.
• Coordinated cross-functional efforts to integrate domain-specific knowledge into Generative AI models, ensuring the system met both technical and business requirements. Environment: Python, Tableau, Power BI, Machine Learning (Keras, PyTorch), Generative AI, Deep Learning, Natural Language Processing, Cognitive Search, Data Analysis (Pandas, NumPy), Vertex AI, Agile Methodologies, SCRUM Process, GCP, GitLab, Databricks, PySpark, Big Query, Dataflow The Walt Disney, Newark, WA, USA Jan 2022 – June 2023 Data Scientist
Responsibilities:
• Explored and analyzed the customer-specific features - performed data imputation using the Scikit-learn package in Python.
• Explored and analyzed the customer-specific features, performing data imputation using the Scikit-learn package in Python and implementing feature generation, feature normalization, and label encoding with Python (NumPy, SciPy, pandas) and R to develop a variety of models and algorithms for analytic purposes.
• Experimented with ensemble methods to increase the accuracy of the training model with different bagging and boosting methods, deploying the model on AWS
• Designed and implemented a recommendation system that leveraged Google Analytics data and machine learning models and utilized Collaborative filtering techniques to recommend courses for different customers.
• Designed rich data visualizations to model data into a human-readable form with Tableau and Matplotlib.
• Built machine learning models to identify whether a user is legitimate using real-time data analysis and prevent fraudulent transactions using the history of customer transactions with supervised learning
• Involved in different phases of data acquisition, data collection, data cleaning, model development, model validation, and visualization to deliver solutions
• Performed data cleaning, including transforming variables and dealing with missing values, ensuring data quality, consistency, and integrity using Pandas and NumPy
Environment: Python, R, Linux, Spark, Tableau Desktop, Microsoft Excel, MATLAB, Spark SQL Geico, Richardson, USA Sep 2019 – Dec 2021
Machine Learning Engineer
Responsibilities:
• Collaborated closely with data engineering teams to streamline data ingestion workflows, reducing data processing time by 25%
• Developed ML models in AWS SageMaker for fraud detection and risk scoring, reducing fraudulent claims by 12%
• Led the implementation of containerized ML models using Docker and Kubernetes, ensuring scalability and efficient resource utilization across the organization.
• Worked with cross-functional teams to optimize and validate ML models, contributing to a 20% increase in overall model performance.
• Developed and maintained CI/CD pipelines for ML model deployment using Jenkins, reducing deployment time by 30%.
• Implemented edge computing solutions for ML inference, reducing latency by 35% and improving user experience in real-time applications
• Conducted regular model reviews and A/B testing, leading to the successful deployment of models with a 95% accuracy rate.
• Ensured security and compliance in data handling processes, adhering to industry standards, thereby reducing potential vulnerabilities by 40%
• Optimized resource allocation and cloud services, cutting infrastructure costs by 25% maintaining performance.
• Documented best practices and processes for ML model deployment, contributing to a shared knowledge base that improved team efficiency by 15%.
• Worked on optimizing ELT workloads against the Hadoop file system by implementing Hive SQL for transformation and performance tuning methodology in optimizing SQL, ETL mappings, and HIVE tables
• Utilized version control systems like Git and collaboration platforms like Jira to facilitate seamless collaboration with cross-functional teams and track project progress efficiently Environment: Hadoop, Jenkins, Docker, Sci-kit learn, Python, Tableau, Hive SQL, SQL, Microsoft Excel, Business Analysis, Feature Engineering, Git, Jira