Senior Data Scientist

Location:

Deerfield, IL, 60015

Posted:

December 16, 2024

Contact this candidate

Resume:

Elie Niring

Contact: 331-***-****; Email: ***********@*****.***

DATA SCIENTIST ML ENGINEER GEN AI SCIENTIST

SUMMARY:

• Innovative Generative AI Researcher & Comprehensive Machine Learning Developer with over 12 years of expertise in leveraging deep learning, artificial intelligence, and statistical techniques to solve data science challenges, thereby enhancing organizational understanding, profitability, and market presence.

• Adept at crafting algorithms and deploying innovative solutions to complex business issues promptly and effectively; experienced in knowledge management systems and language ontologies.

• Proficient in executing solutions utilizing popular Generative AI and NLP frameworks and libraries in Python (Langchain, LlamaIndex, Hugging Face, NLTK, spaCy) or vector databases (Pinecone, FAISS). Familiar with the implementation of Neural Networks, Support Vector Machines (SVM), and Random Forest techniques.

• Keeps abreast of the latest advancements in data science, operations research, and Natural Language Processing to ensure the use of cutting-edge techniques, algorithms, and technologies.

• Possesses expertise in remote sensing; skilled in identifying and creating suitable algorithms to uncover patterns and validate findings through experimental and iterative methods.

• Strong interpersonal and analytical abilities, capable of multitasking and adapting in high-pressure environments; a creative problem solver with logical thinking skills and keen attention to detail.

TECHNICAL SKILLS:

IDEs: Jupyter, Google Colab, PyCharm, R Studio

Programming: Python, R, SQL, MatLab

Python Libraries: Tensorflow, Pytorch, NLTK, Numpy, Pandas, OpenCV, Python Image Library, Scikit-Learn, Scipy, Matplotlib, Seaborn, HuggingFace

NLP: Sentiment Analysis, Sentiment Classification, Sequence to Sequence Model, Transformer, Bert, GPT 3.5

Analytical Methods: Exploratory Data Analysis, Statistical Analysis, Regression Analysis, Time Series Analysis, Survival Analysis, Sentiment Analysis, Principal Component Analysis, Decision Trees, Random Forest

Data Visualization: Matplotlib, Seaborn, Plotly, Folium

Computer Vision: Convolutional Neural Network (CNN), HourGlass CNN, RCNNs, YOLO, Generative Adversarial Network (GAN)

Regression Models: Linear Regression, Logistic Regression, Gradient Boosting Regression, L1 (Lasso), L2 (Ridge)

Tree Algorithms: Decision Tree, Bagging, Random Forest, AdaBoost, Gradient Boost, XGBoost, Random Search and Grid Search

Cloud Data Systems: AWS, GCP, Azure

PROFESSIONAL EXPERIENCE:

Baxter - Deerfield IL, Sep 2023 - Present

Senior Data Scientist Architect – ML-Ops

In this project, I leveraged Elasticsearch for RAG implementation, enhancing few-shot prompting for the Text2SQL engine. By deploying the application with FastAPI and Docker, I achieved reduced latency through caching and automated scaling via Kubernetes. Utilizing advanced Large Language Models and transformer architectures, I analyzed consumer sentiment from platforms like Yammer and Cultura, implementing classification algorithms to identify emerging trends. Additionally, I designed a CI/CD pipeline to streamline data processing and model deployment, contributing to a solution that personalizes Medicare plan recommendations during the annual enrollment period.

• Utilized Elasticsearch for RAG implementation, establishing it as a source for few-shot prompting to the Text2SQL engine.

• Decreased latency in the analytics engine application by applying caching techniques.

• Deployed the application using FastAPI and Docker containers, incorporating automated scaling via the Kubernetes framework.

• Achieved a 30% reduction in LLM inference costs by utilizing Route LLM.

• Leveraged advanced Large Language Models (LLMs) and transformer-based architectures to analyze patterns and trends in consumer comments and posts from platforms like Yammer and Cultura, aiming to capture real-time insights into sentiment and emerging trends.

• Implemented classification algorithms to categorize consumer comments and posts into predefined topics, enabling the identification of trending content and a deeper understanding of consumer sentiment on specific subjects.

• Developed techniques to automatically generate new topics from Yammer and Cultura posts and comments, enriching the topic classification system with more relevant and dynamic categories.

• Seamlessly deployed and managed both processes through Azure Pipelines, ensuring automated, scalable, and efficient delivery.

• Played a pivotal role in a collaborative project, overseeing crucial stages such as developing an unsupervised outlier identification algorithm and implementing the CI/CD pipeline.

• Executed unsupervised outlier detection using a unique approach that involved five distinct outlier detection methods to label the dataset, aggregating findings to create an outlier_percent column for streamlined filtering.

• Developed robust and scalable data science solutions in Python, focusing on data preprocessing, feature engineering, and machine learning model development.

• Wrote efficient and maintainable code for data analysis, model training, and deployment, adhering to best practices in software development.

• Implemented and optimized machine learning algorithms in Python, utilizing popular libraries such as TensorFlow, PyTorch, Scikit-Learn, and Pandas.

• Leveraged Azure Cloud, Databricks, Jenkins, Docker, and Kubernetes to design, implement, and manage a robust CI/CD pipeline for automated data ingestion, processing, model training, and deployment.

• Developed and deployed scalable machine learning models using Azure Machine Learning, optimizing performance and cost-effectiveness for tasks such as classification, regression, and clustering.

• Hosted on Databricks, the pipeline seamlessly extracted raw data from Snowflake, conducted ETL operations, identified outliers, and uploaded processed data to Azure Blob Storage.

• Designed and deployed scalable machine learning solutions in Azure, utilizing services like Azure Machine Learning, Azure Databricks, and Azure Synapse Analytics.

• Developed and managed data pipelines using Azure Data Factory and Azure Data Lake to support data science workflows.

• Implemented and optimized cloud-based infrastructure for data storage, processing, and model deployment.

• Ensured the security and compliance of data science applications by configuring and managing Azure Identity and Access Management (IAM) roles, Key Vault, and encryption standards.

• Monitored and optimized the performance of machine learning models in production using Azure Monitor and Application Insights.

• Leveraged transformer-based architectures like GPT, BERT, and T5 to build and optimize models capable of understanding and generating human-like text.

• Designed and implemented NLP pipelines that preprocess, tokenize, and clean large text corpora for model training and inference.

• Created and managed complex SQL queries for extracting, transforming, and loading (ETL) large datasets from relational databases, data warehouses, and cloud platforms.

• Optimized SQL queries and database structures to enhance data retrieval and analytics processes.

• Conducted data validation, cleansing, and preparation using SQL to ensure data quality and integrity for machine learning models.

• Orchestrated Jenkins tasks for data preparation, merging dataset files into a unified CSV, and storing it in the workspace's datastore with versioning.

• Initiated model training tasks in Azure, executing code on a Databricks cluster and saving the output as a new model in AzureML.

• Successfully deployed models using an Embedded Architecture approach, integrating models with the app within a Docker image for efficient deployment.

• Implemented creational design patterns in the CI/CD pipeline for reusability and behavioral patterns in algorithms and integrations to enhance efficiency.

• Collaborated within a team structure led by a Data Science Manager, working alongside three Data Scientists to achieve project objectives.

• Utilized tools such as Snowflake, Jenkins, Azure Cloud, Docker, Databricks, PySpark, and Twistlock to streamline various aspects of the project.

• Adopted a canary deployment process, starting with limited access and gradually expanding to ensure smooth and controlled model deployment.

• Contributed significantly to the overarching project goal of developing a workflow for personalizing Medicare plan recommendations for members seeking new plans during annual enrollment.

• Leveraged Langchain and Azure OPENAI to build a full fledge scalable Generative AI application.

• Build a Natural language Data Analytics and Text2SQL engine using Agentic workflow in langchain.

• Integrating RAG (Retrieval Augmented Generation) with the Natural language Data analytics and Text2SQL engine to respond queries specific to the organization.

Regions Financial Bank - New York, Mar 2021 - Aug 2023

Senior AI Scientist

As a Senior AI Specialist at Regions Financial Bank, I spearheaded the development of innovative AI solutions that enhanced customer engagement and operational efficiency. By deploying predictive algorithms using Keras and TensorFlow, I facilitated a 9% increase in repeat customers through an optimized recommendation system. I led cross-functional collaborations to ensure seamless integration of machine learning models, while continuously monitoring performance for sustained accuracy. My role also involved mentoring junior team members and conducting training sessions, fostering a culture of knowledge sharing and skill development in AI technologies.

• Created a script for deploying updated Docker images to EC2 instances, ensuring timely and efficient updates.

• Led discussions with Regional Bank representatives to review project progress, explore concepts, and resolve any blockers or issues.

• Utilized Keras and TensorFlow to develop predictive algorithms and address analytical challenges.

• Designed an NLP-based filter using embeddings and BERT in TensorFlow and Keras for advanced text analysis.

• Analyzed large datasets to identify patterns and insights, informing the development of data-driven solutions.

• Developed and maintained machine learning models and algorithms, ensuring high accuracy and reliability.

• Monitored model performance and implemented continuous improvement processes for AI applications.

• Provided technical guidance and mentorship to junior data scientists and AI specialists, fostering skill development and knowledge sharing.

• Conducted research on emerging AI technologies and methodologies to drive innovation within the organization.

• Conducted A/B testing that resulted in a 9% increase in repeat customers among users of the recommendation system.

• Investigated new AI techniques and methodologies, incorporating innovative solutions into existing projects.

• Developed and executed strategies for data collection, storage, and analysis, ensuring data quality and availability for machine learning applications.

• Oversaw the deployment of machine learning models into production environments, ensuring smooth integration with existing systems.

• Continuously monitored and tuned model performance, applying techniques such as hyperparameter tuning and feature selection to enhance predictive accuracy.

• Ensured that AI solutions adhered to ethical standards and regulatory requirements, promoting fairness, accountability, and transparency.

• Worked closely with business stakeholders to understand their needs, translating requirements into effective AI solutions.

• Maintained thorough documentation of algorithms, processes, and code to facilitate knowledge sharing and support reproducibility in AI projects.

• Conducted training sessions and workshops for staff to improve understanding and capabilities in AI technologies and tools.

• Rapidly prototyped new AI solutions and algorithms to validate concepts and demonstrate their feasibility to stakeholders.

• Defined and tracked key performance indicators (KPIs) for AI initiatives, using data to measure success and guide decision-making.

• Collaborated with data engineers, software developers, and other teams to ensure cohesive development and deployment of AI solutions.

• Analyzed customer feedback and behavior data to enhance AI models, tailoring solutions to meet specific user needs.

• Assisted in budgeting for AI projects, estimating resource needs and justifying expenditures based on projected returns.

• Conducted risk assessments related to AI implementation, identifying potential challenges and developing mitigation strategies.

• Participated in AI conferences, seminars, and workshops to stay updated on industry trends and contribute to the AI community.

• Developed a Hybrid Mixed recommendation system by integrating collaborative filtering, content-based, and demographic techniques.

• Executed A/B testing to identify the most effective recommendation system, addressing the "cold start" challenge with a demographic-based approach.

• Employed Pandas and NumPy for data preprocessing, cleaning, and feature engineering, utilizing Python to manage missing values in the dataset.

• Applied text preprocessing techniques, such as stemming and lemmatization, to optimize the corpus for analysis.

• Implemented cloud-based data warehousing and analytics solutions on GCP, leveraging tools like Cloud Storage, Cloud SQL, and Looker for insights.

Publix - Lakeland FL, Jan 2019 - Feb 2021

ML-Ops Engineer

As an ML-Ops Engineer at Publix, I developed a customized in-session product recommendation engine to elevate the user experience. I automated text summarization and clustering tasks with Python and contributed to Next-Best Offer predictions, designing Microassortments for future store layouts. My role involved conducting anomaly detection and root cause analysis, merging consumer profiles for deeper insights, and implementing end-to-end solutions for both batch and real-time algorithms, ensuring efficient deployment and monitoring using tools like TensorFlow and Docker.

• Developed a tailored in-session product recommendation engine to enhance user experience.

• Created Python scripts to automate text summarization and clustering tasks.

• Contributed to Next-Best Offer predictions and designed Microassortments for next-generation stores.

• Conducted anomaly detection and root cause analysis to identify and resolve issues.

• Prepared data for integration with machine learning models, ensuring quality and relevance.

• Merged consumer profiles using probabilistic record linkage to create comprehensive customer insights.

• Led visual search initiatives to identify similar and complementary products effectively.

• Designed, built, maintained, and optimized a comprehensive suite of algorithms and their underlying systems.

• Analyzed large datasets, applied machine learning techniques, and developed predictive and statistical models, utilizing best-in-class methodologies for enhancements.

• Implemented end-to-end solutions for both batch and real-time algorithms, including monitoring, logging, automated testing, performance testing, and A/B testing.

• Collaborated closely with data scientists and analysts to roll out new product features on the e-commerce website, in-store portals, and mobile applications.

• Established scalable and efficient automated processes for data analysis, model development, validation, and implementation.

• Deployed solutions using TensorFlow, Keras, Docker, and Elastic Kubernetes Service for seamless integration.

• Executed strategies for monitoring model drift and implemented retraining procedures.

Dodge - Auburn Hills MI, Aug 2017 – Dec 2018

Machine Learning Engineer

Improved operational efficiency and product quality through data-driven solutions. Developed and deployed predictive models to optimize maintenance schedules, streamline supply chain logistics, and automate quality control processes. Leveraged advanced machine learning techniques to enhance vehicle performance and safety. Utilized data analytics to gain valuable insights into customer preferences and market trends, informing product development and marketing strategies.

• Developed machine learning models to detect anomalies in vehicle sensor data, identifying potential mechanical issues and reducing maintenance costs.

• Implemented predictive maintenance models to predict equipment failures and schedule preventive maintenance, optimizing vehicle uptime and reducing downtime costs.

• Utilized machine learning to optimize supply chain logistics, improving inventory management, reducing lead times, and minimizing transportation costs.

• Developed computer vision models to automate quality control inspections, identifying defects and inconsistencies in vehicle assembly.

• Analyzed customer reviews and social media data to identify trends, preferences, and potential issues, informing product development and marketing strategies.

• Utilized Pandas to meticulously clean and transform data, ensuring data quality and accuracy for subsequent analysis and modeling.

• Extracted key features from vehicle sensor data, historical maintenance records, and other relevant sources, identifying variables that contribute to predictive accuracy.

• Implemented advanced machine learning algorithms, including those from Scikit-learn and TensorFlow, to build predictive models for various automotive applications.

• Fine-tuned model performance through hyperparameter tuning using Scikit-learn and AWS SageMaker, maximizing accuracy and efficiency.

• Seamlessly integrated machine learning models into existing systems, enabling real-time decision-making and automated processes.

• Improved model transparency through SHAP and other interpretability techniques, enabling better understanding of model decisions and identifying potential biases.

• Monitored key performance metrics, such as accuracy, precision, recall, and F1-score, to assess model performance and identify areas for improvement.

• Established a feedback loop to continuously retrain and update models, ensuring their effectiveness over time.

• Created comprehensive documentation and collaborated with cross-functional teams to facilitate knowledge transfer and effective implementation of machine learning solutions.

New York Life Insurance - New York City, NY, Nov 2015 - Jul 2017

Data Scientist

As a Data Scientist at New York Life Insurance, I developed customized product recommendations using advanced machine learning algorithms, focusing on Collaborative Filtering to enhance customer engagement and attract new clients. I led the design and deployment of various machine learning models, including logistic regression and neural networks, while innovating optimization algorithms for diverse applications. By conducting in-depth research on statistical techniques and leveraging tools like R and Tableau for data visualization, I gained valuable insights into customer behavior, ensuring high data integrity through meticulous cleaning and analysis.

• Developed tailored product recommendations by implementing sophisticated machine learning algorithms, focusing on Collaborative Filtering to meet the unique needs of current customers and attract new ones.

• Led the design and deployment of a variety of machine learning algorithms, utilizing techniques such as logistic regression, random forest, KNN, SVM, neural networks, linear regression, lasso regression, and k-means for comprehensive modeling.

• Innovated optimization algorithms specifically designed for data-driven models, broadening their use across multiple machine learning approaches, including supervised, unsupervised, and reinforcement learning.

• Conducted thorough research on statistical machine learning techniques, covering forecasting, supervised learning, classification, and Bayesian methods, to integrate cutting-edge methods into the modeling framework.

• Enhanced the technical complexity of solutions by incorporating machine learning and advanced technologies, leading to improved overall model performance.

• Performed exploratory data analysis and created impactful data visualizations using R and Tableau to deepen insights into underlying data patterns.

• Collaborated effectively with data engineers to implement the ETL process, playing a key role in optimizing SQL queries for efficient data extraction and merging from Oracle databases.

• Utilized a diverse skill set in R, Python, and Spark to develop a range of models and algorithms, addressing various analytical needs within the project.

• Maintained data integrity through thorough checks, effective data cleaning, exploratory analysis, and feature engineering, using both R and Python to uphold high data quality standards.

Cloudera - Santa Clara CA, Jul 2012 – Oct 2015

Data Scientist

As a Data Scientist with Cloudera, I spearheaded the development of a vectorization function to capture and embed facial features, significantly enhancing the representation of key facial attributes. I designed a custom algorithm for efficient storage and comparison of these features, optimizing the verification workflow. Improved image analysis depth while implementing robust data cleaning and augmentation techniques. Finally, I ensured seamless deployment of the models through Flask and Pickle, making them easily accessible for practical use.

• Developed a vectorization function to capture and embed facial features, improving the representation of essential facial attributes.

• Created a custom algorithm for the efficient storage and comparison of vectorized features, optimizing the verification workflow.

• Implemented image analysis techniques using traditional machine learning algorithms like Support Vector Machines (SVMs) and Random Forests.

• Performed thorough data cleaning on both image and tabular datasets to ensure high data quality and accuracy.

• Employed image augmentation methods to introduce rotational, motion, and scale invariance for more resilient model training.

• Established statistical evaluation methods to assess and validate the performance of the models created.

• Managed deployment through Flask and Pickle, ensuring smooth integration and accessibility of the models.

EDUCATION:

Master of Science in Information Technology

Carnegie Mellon University – 2012

Contact this candidate