Post Job Free

Resume

Sign in

Data Scientist Machine Learning

Location:
Chicago, IL
Salary:
200000
Posted:
April 18, 2024

Contact this candidate

Resume:

ALI KONE

Data Scientist ML Operations ML Engineer

Contact: 425-***-**** Email: ad43z2@r.postjobfree.com

KEY SKILLS

Data Analysis and Visualization Statistical Modeling and Analysis Data Mining and Extraction Risk Modeling and Forecasting Python Data Insights Machine Learning Deep Learning Data Preprocessing Business Intelligence Project Management Stakeholder Management Team Management

SUMMARY

Analytically minded Data Scientist with 16 years of comprehensive experience employing advanced mathematical, statistical, and machine learning techniques to distil meaningful business insights, driving innovation in productivity, efficiency, and revenue growth across diverse industries.

Expertise in Cloud Services: AWS, Google Cloud, and Azure

Proficient in querying and working with large datasets from various big data stores including Amazon AWS, Cassandra, Redshift, Aurora, GCP BigQuery

Specialized in ensemble algorithm techniques (Bagging, Boosting, Stacking), Natural Language Processing (NLP) methods (BERT, ELMO, word2vec), sentiment analysis, Name Entity Recognition, and Topic Modeling

Accomplished in Time Series Analysis using ARIMA, SARIMA, LSTM, RNN, and Prophet

Adept at managing the entire data science project life cycle, from data extraction and cleaning to statistical modeling and data visualization with large datasets of structured and unstructured data

Demonstrated excellence in using Python and R packages such as Pandas, NumPy, SciPy, Matplotlib, Seaborn, TensorFlow, Scikit-Learn, and ggplot2, Shiny.

Skilled in statistical analysis programming languages, including R and Python, with expertise in Big Data technologies (Spark, Hadoop, Hive, HDFS, MapReduce, Data Bricks).

Applied Naïve Bayes, Regression, and Classification techniques, as well as Neural Networks, Deep Neural Networks, Decision Trees, and Random Forests.

Conducted exploratory data analysis (EDA) to uncover patterns in business data, effectively communicating findings using visualization tools such as Matplotlib, Seaborn, and Plotly.

Hands-on experience with PySpark for live data stream processing and batch processing techniques.

Proven leadership in leading teams to productionize statistical or machine learning models, creating APIs, and developing data pipelines for business leaders and product managers.

Exceptional communication and interpersonal skills, adept at gathering requirements, defining business processes, and identifying risks through interviews, workshops, and analysis tools.

Proficient in creating visualizations, interactive dashboards, reports, and data stories using Tableau and Power BI.

Extensive hands-on experience with LlaMa2 and Open AI models, including Davinci, and GPT 3-4.5

A quick learner with an intuitive approach, possessing a blend of analytical prowess, leadership, and effective communication to deliver impactful data-driven solutions

TECHNICAL SKILLS

Analytic Development - Python, R, Spark, SQL

Python Packages - Numpy, Pandas, Scikit-learn, TensorFlow, Keras, PyTorch, FastAI, SciPy, Matplotlib, Seaborn, Numba

Machine Learning - Classification and Regression Trees (CART), Support Vector Machine, Random Forest, Gradient Boosting Machine (GBM), TensorFlow, PCA, Regression, Naïve Bayes

Natural Language Processing - Text analysis, classification, chatbots.

Deep Learning - Machine Perception, Data Mining, Machine Learning, Neural Networks, TensorFlow, Keras, PyTorch, Transfer Learning

Programming Tools and Skills - Jupyter, RStudio, Github, Git, APIs, C++, Eclipse, Java, Linux, C#, Docker, Node.js, React.js, Spring, XML, Kubernetes, Back-End, Databases, Bootstrap, Django, Flask, CSS, Express.js, Front-End, HTML, MS Azure, AWS, GCP, Azure Databricks, AWS Sagemaker

Data Modeling - Bayesian Analysis, Statistical Inference, Predictive Modeling, Stochastic Modeling, Linear Modeling, Behavioural Modeling, Probabilistic Modeling, Time-Series analysis

Artificial Intelligence - Natural Language Processing and Understanding, Machine Intelligence, Machine Learning algorithms.

Generative AI – OpenAI: Ada002, DaVinci, Gpt-3, 3.5,3.5Turbo, 4, AWS Bedrock, Anthropic Claude, RAG, Prompt Engineering.

Analysis Methods - Forecasting, Multivariate analysis, sampling methods, Clustering Predictive, Statistical, Sentiment, Exploratory, and Bayesian Analysis. Regression Analysis, Linear models,

PROFESSIONAL EXPERIENCE

Lead AI Engineer

Numerator, Chicago, Illinois Mar 2022 - Present

Numerator is a data and tech company reinventing the market research industry with first-party, consumer-sourced data. Numerator is a primary source of real-time, path and purchase data for industry leaders such as Nike, Unilever, Samsung, Procter & Gamble, and MillerCoors. As Senior AI Engineer I was tasked with developing, architecting, and implementing a solution to apply generative techniques to our Knowledge Management System. After developing a successful RAG POC, I lead a team tasked with implementing a Retrieval Augmented Generation system utilizing a combination of technologies including Langchain, Open AI’s embeddings and GPT 3.5 Turbo for LLM Chat completions. The principal system was implemented using Pinecone as Vector DB and deployed on docker and Kubernetes as an API Microservice.

Participated and lead in Agile Ceremonies including daily standups and weekly scrums. As technical lead it was my responsibility to gather requirements and clarify KPIs.

Developed a custom document ingestion algorithm to convert, split and prepare documents for embedding and upsert to vector database.

Utilized Langchain and OpenAI’s ADA002 to split into chunks and embed pdf, html, and text documents.

Created a custom chuck provenance scheme to incorporate into Vector DB metadata.

Wrote Context Retrieval Class using python and Pinecone.

Implemented Prompt Engineering to generate system and user prompts for LLM insertion.

Created LLM completion Functions incorporating retrieved Data.

Evaluated LLM responses using BLEU index, Perplexity and Diversity

Evaluated RAG effectiveness by testing Relevance and Veracity.

Deployed complete system using a CI/CD pipeline using Jenkins and Docker.

Built a microservice utilizing Flask, Gunicorn

Presented and Shared findings and results to Stakeholders and potential Users

Senior Data Scientist

Moss Adams LLP, Seattle, WA Feb 2020 - Feb 2022

Moss Adams is a professional services firm that provides integrated solutions to help clients build, manage, and safeguard their wealth. My role involved developing an automated system that leverages machine learning and AI techniques to identify potential financial irregularities or anomalies. After multiple meetings with stakeholders and SMEs we established that simple outlier detection would not be sufficient. I developed several algorithms including isolation forest, XG Boost and finally deep learning based Autoencoder s to flag anomalous transactions in client records. The solution was deployed as a batch process, containerized, and executed using Apache Airflow as orchestrator and EC2 instances as compute.

Collaborated with key stakeholders to understand audit objectives and requirements.

Developed a strategic plan for the integration of advanced analytics, statistical modelling, and machine learning into the audit testing strategy

Utilized Python and R along with Pandas, NumPy, and SciPy for comprehensive data analysis

Extracted and normalized data from multiple systems to construct datasets suitable for in-depth analysis

Applied statistical modelling techniques such as Regression Trees and Kernel PCA to identify patterns and trends

Developed models that contribute to risk assessment and control gap identification

Leveraged TensorFlow and Scikit-Learn to implement machine learning algorithms for risk assessment

Developed and deployed models that enhance the overall effectiveness of the audit process

Utilized Tableau to create automated dashboards and reports to present data-driven insights

Ensured visualization techniques are effective in communicating complex findings to stakeholders

Worked with Hadoop and HDFS to handle large volumes of data and enable scalability

Implemented solutions that align with big data technologies for efficient data processing

Used AWS, Azure, and Google Cloud for cloud-based data storage, processing, and analysis

Leveraged the capabilities of cloud platforms to enhance the scalability and accessibility of data

Extracted data from SQL Server and Cassandra databases using SQL queries for analysis

Ensured seamless integration with various database systems for a comprehensive data approach

Employed Git for version control to manage codebase efficiently

Collaborated with cross-functional teams using collaboration tools such as Jira for effective project management

Designed and implemented continuous monitoring frameworks to identify anomalies and trends

Conducted thorough testing and validation of machine learning models to ensure accuracy

Documented data analysis procedures, machine learning models, and visualization techniques

Provided comprehensive documentation for reproducibility and knowledge transfer

Sr. Data Scientist/ ML Engineer

BMO Financial Group, Chicago, IL Aug 2018 – Jan 2020

BMOB is a prominent North American bank with a singular mission: to boldly foster growth in both business and life. As a Senior Data Scientist spearheading machine learning model development for wholesale credit risk, my responsibilities encompass overseeing the entire lifecycle of building robust models. This includes establishing data pipelines, defining success metrics, and ensuring our models are reliable and effective in assessing credit risk for our wholesale lending operations.

Utilized TensorFlow, Scikit-Learn, and PyTorch to develop and implement machine-learning models for assessing wholesale credit risk

Employed various algorithms, including deep learning techniques, to enhance the predictive capabilities of the models

Designed and managed data processing pipelines using Apache Spark and Apache Airflow

Ensured seamless integration and transformation of data for use in machine learning models

Leveraged Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn for statistical analysis and visualization of credit risk data

Created insightful visualizations to aid in understanding patterns and trends in the data

Deployed machine learning models using Docker and Kubernetes for scalability and containerization

Implemented monitoring solutions, such as ModelDB, to track model performance and detect anomalies

Utilized Git for version control, ensuring the integrity and traceability of model code and associated artifacts

Collaborated efficiently with cross-functional teams using tools like Jira and Confluence

Documented and shared key insights, findings, and progress with stakeholders

Performed feature engineering to enhance the quality of input features for machine learning models

Applied preprocessing techniques to clean and normalize data for optimal model performance

Conducted rigorous testing and validation of machine learning models to ensure accuracy and reliability.

Implemented cross-validation strategies and other testing methodologies

Worked in alignment with regulatory requirements, ensuring that machine learning models adhered to industry standards and compliance mandates

Stayed abreast of the latest developments in machine learning and credit risk modeling

Continuously explored opportunities to enhance models and methodologies

Generate comprehensive reports and documentation for model validation and regulatory purposes

Sr. NLP Engineer

Merck & Co. Inc., Rahway, NJ Dec 2015 - July 2018

Merck & Co., Inc. is a leading American multinational pharmaceutical corporation, headquartered in Rahway, New Jersey. In my role as a dedicated Senior NLP Engineer at Merck, I spearheaded the development of cutting-edge automated pipelines for literature search and biological sequence analysis. Together, formulated scalable methods for automated literature analysis, contributing significantly to advancements in this domain. Additionally, contributed to the creation of predictive DNA and protein language models, showcasing proficiency in various sequence-based prediction methods.

Extracted and validated data from the production SQL database for third-party data integration

Managed extensive datasets, exceeding 10 million text data observations, employing various cleaning techniques

Integrated seamlessly with the AWS platform, optimizing models and conducting hyperparameter tuning.

Utilized cloud computing resources for efficient cross-validation of statistical data science models

Developed diverse machine learning models (logistic regression, random forest, gradient boost decision tree, neural networks) using Python libraries: Pandas, NumPy, Seaborn, Matplotlib, and Scikit-learn

Built and analysed datasets using Python and R, applying linear regression in Python and SAS for relationship understanding

Conducted exploratory data analysis (EDA) using techniques such as a bag of words, K-means, and DBSCAN

Utilized Git for version control on GitHub, fostering collaboration with team members

Explored various embedders (Universal Google Encoder, DocToVec, TFIDF, BERT, ELMO) to identify optimal solutions

Developed predictive models for Key Performance Indicators (KPIs), creating ready-to-use templates based on specifications

Prepared insightful reports and presentations using Tableau, MS Office, and ggplot2 to communicate data trends and analyses

Worked with data warehouse architecture, crafting SQL queries for extracting insights from the available data

Data Scientist

PwC, Albany, New York Sep 2012 - Nov 2015

PwC, is a renowned multinational professional services network, ranking as the second largest in the world. Alongside Deloitte, EY, and KPMG, PwC is considered one of the 'Big Four' prestigious accounting firms. During my tenure as a Model Developing Engineer at PwC, I played an integral role in leveraging data-driven insights to refine and optimize business strategies across the firm. My responsibilities encompassed various facets of data science and analytics, contributing to critical initiatives focused on enhancing customer satisfaction, fine-tuning marketing efforts, and driving overall business growth. Through my work, I helped harness the power of data to drive informed decision-making and strategic improvements within the organization.

Conducted sentiment analysis on customer feedback data, identifying key drivers of satisfaction and implementing targeted improvement initiatives

Developed and deployed cutting-edge customer segmentation algorithms, leading to a substantial 20% reduction in marketing expenditures through optimized budget allocation

Designed and implemented robust data pipelines and databases using SQL, Python, and Hadoop technologies, ensuring data integrity and reliability for analysis purposes

Executed A/B testing experiments, optimizing conversion rates, and achieving a notable 15% improvement in campaign ROI.

Collaborated on cross-functional teams to develop and deploy recommendation systems, enhancing personalized customer experiences and increasing upsell opportunities

Led a cross-functional team in designing and executing customer segmentation analysis, resulting in targeted marketing campaigns and a 25% increase in customer engagement

Defined project objectives, gathered data requirements, and developed analytical solutions for market research and customer lifetime value analysis

Stayed abreast of the latest advancements in data science and machine learning technologies to continuously improve analytical capabilities

Applied natural language processing techniques to analyse customer feedback for sentiment analysis and product improvement

Executed market segmentation analysis to identify distinct customer groups and tailored marketing strategies accordingly

Delivered comprehensive reports and presentations to senior executives, highlighting key findings, and providing actionable recommendations based on data analysis.

Utilized A/B testing methodologies to assess the impact of marketing campaigns on customer behaviour, offering data-driven recommendations for optimizing future initiatives

Collaborated with cross-functional teams to define project objectives, gather data requirements, and develop analytical solutions

Built customer lifetime value estimation models to predict future revenue potential, informing customer acquisition and retention efforts

Data Analyst

MU Sigma, Chicago, Illinois April 2008 - Aug 2012

Transformed database management systems to adapt to evolving company requirements, fostering agility and innovation

Enhanced customer satisfaction through strategic implementation of SQL-driven database tools, streamlining service delivery

Utilized advanced SQL queries and MS Excel reporting to provide actionable business insights, facilitating informed decision-making

Orchestrated end-to-end project schedules, collaborating with stakeholders to ensure successful product releases

Drove continuous improvement initiatives in product development and process optimization, bolstering operational efficiency and standardization.

Led a multidisciplinary team in executing a comprehensive data cleansing and ETL process, ensuring data integrity for a new database system

Implemented rigorous quality control measures to maintain data consistency and integrity, conducting thorough audits of generated data samples.

Facilitated seamless transition from legacy services to new systems through proactive inter-departmental coordination and regular progress meetings

Oversaw end-user training programs to enable proficient operation of software tools, empowering employees with necessary skills

Ensured uninterrupted service availability by proactively maintaining the company's online Oracle database infrastructure.

ACADEMIC DETAILS & CERTIFICATIONS

CFA Institute

CFA candidate

MS in Financial Engineering

World Quant University, New Orleans, LA

Master of Finance

Rotman School of Management, University of Toronto

Deep Learning Certificate

deeplearning.ai/Coursera

Certificates in Data Science & Quantitative analysis with R & Python

DataCamp



Contact this candidate