Hi, Job Title: ML/AI Architect Location: Gaithersburg, MD (3 days per week) Job Description: This position requires a deep understanding of cloud-native ML/AI Ops methodologies and technologies, AWS infrastructure, State-of-the-art (SOTA) Foundation Models and AWS GenAI Services, and the unique demands of regulated industries, making it a cornerstone of our success in delivering impactful solutions to the pharmaceutical industry.
workloads and processes.
ML/AI Cloud Operations and Engineering Architect and implement scalable AWS ML/AI cloud infrastructure in a multi-tenant SaaS environment.
Establish governance frameworks for ML/AI infrastructure management and ensure compliance with industry standard processes.
Ensure principled and methodical validation pathways and a Well Architected Framework for Embryonic Research (WAFER) similar to and building on AWS’s Well Architected Framework (WAF) for all early stage product and operational GenAI PoC’s across the organization.
Oversee ML/AI related Kubernetes (k8s) cluster management and provide guidance on alternative ML/AI workflow orchestration options such as Argo vs Kubeflow, and ML/AI data pipeline creation, management and governance with tools like Airflow.
Employ AWS CDK (TypeScript), Projen, and Argo CD to automate infrastructure deployment and management.
Help set the strategy and manage the tactical balance between framework and platform experimentation and democratization with standardization and centralized management and governance Conduct cost-benefit analyses and formal processes for selection and utilization of foundation models, evaluating their architectures, performance, and costs.
Work with multiple teams to ensure that the platform meets organizational needs and scales effectively.
Essential Skills/Experience: HS Diploma and 5 years of experience in Engineering/IT solutions OR BA/BS Minimum of 5 years in cloud infrastructure design and management roles.
Deep understanding of the Data Science Lifecycle (DSLC) and the ability to shepherd data science projects from inception to production within the platform architecture.
Expert in Typescript, AWS CDK, Projen, and Argo CD and other Cloud Infrastructure CI/CD Tools Extensive experience in managing Kubernetes clusters for ML workflows.
Solid understanding of foundation models and their applications in ML/AI solutions.
Strong background in AWS DevOps practices and cloud architecture.
Deep knowledge of AWS services (Bedrock, Sagemaker, EC2, S3, RDS, Lambda, etc) and hands-on design and implementation cloud systems (microservices architecture, API design, and database management (SQL/NoSQL)) Experience with monitoring and optimizing cloud infrastructure for scalability and cost-efficiency.
Ability to collaborate effectively with engineering, design, product, science and security teams.
Strong written and verbal communication skills for reporting and documentation.
Demonstrated ability to manage large-scale, complex projects across an organization.
Proven experience in conducting performance and cost analyses of AWS infrastructure and ML/AI models.
Full-Time