Post Job Free
Sign in

Machine Learning Engineer

Company:
Prodigy Labs
Location:
Halifax, NS, Canada
Posted:
May 17, 2024
Apply

Description:

We are looking for a Machine Learning Engineer to join our team to help build systems that accelerate the development and deployment of machine learning models, especially large language models (LLMs). You will partner closely with Machine Learning researchers and internal users to understand requirements and apply your own domain expertise to build high performance and reusable APIs.

The ideal candidate is someone who has strong ML fundamentals and can also apply them in real production settings. In particular, this role has a core focus on optimizing inference and fine tuning for LLMs. They should also be comfortable with infrastructure and large scale system design, as well as diagnosing both model performance and system failures.

You will:

Responsibilities

Architect/Enable distributed compute aligning workloads to Small/Mid/High end GPUs.

Leverage appropriate storage hardware and data formats to improve read/re-read efficiency.

Identify and remediate latency contributors esp. IO bottlenecks, Inefficient Data shuffling, under/over utilized compute.

Scale models by employing Distributed training using Data / Model Parallelism techniques. Parallelize inference processing to improve prediction latency.

Provide Subject Matter Expertise in Graph and Vector databases for a variety of use cases that include Knowledge Graphs, RAG etc.

Implement LLM observability and monitoring solutions.

Required Education and Experience

Degree in Computer Science or Engineering

Prior Experience with:-

Docker, Kubernetes, and containerization.

Distributed systems.

Databricks ML

Machine Learning Engineering

Cloud (Azur Preferred)

Expert level – Python, SQL

Preference will be given to candidates who in addition to required experience have:-

Experience/Expertise with LLM Fine tuning, LLM Ops, Model Evaluation and Prompt Engineering

Experience (or knowledge of) Mosaic ML, Ray Framework.

Experience with Lang Chain or LlamaIndex

Experience with any vector database.

Job Specifications:-

Authorities, Impact, Risk

Influence the data, AI and cloud journey for the bank. Influence the Sustainability roadmap for the bank.

Impact

Revenue generation thru New Business for Alternative Data

Innovation

6 years of AI, Big Data and cloud expertise

3-4 years of Alternative data experience

Risk

Mitigate reputation risk thru AI driven Data Quality to ensure highest quality data and services are offered to clients

Mandatory Skillsets:-

2+ years of experience building machine learning training pipelines or inference services in a production setting.

Experience with LLM deployment, fine tuning, training, prompt engineering, etc.

Experience with LLM inference latency optimization techniques, e.g. kernel fusion, quantization, dynamic batching, etc.

Experience with CUDA, model compilers, and other model-specific optimizations.

Preferred

Experience working with a cloud technology stack (eg. Azure or AWS).

Experience building, deploying, and monitoring complex microservice architectures.

Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).

Experience with LLMs, MLops

Experience with distributed notebook environments like Databricks

Experience building AI driven Data Quality frameworks and other data governance tools and capabilities

Experience building meta data driven AI and statistical models for repeatable insight generation

Experience building front to back data pipelines comprising of data ingestion, enrichment, data quality, Analytics and reporting

Experience with Agile development methodology

Experience with company KPIs and back testing of alternative data factors against company KPIs.

Experience with NLP techniques and transfer learning frameworks like BERT

Experience with using HuggingFace Model Artifacts

Apply