Job Description
Responsibilities:
We are seeking an AWS ML Cloud Engineer to design, deploy, and optimize cloud-native machine-learning systems that power our next-generation predictive-automation platform. You will blend deep ML expertise with hands-on AWS engineering, turningdata into low-latency, high-impact insights. The ideal candidate commands statistics, coding, and DevOps—and thrives on shipping secure, cost-efficient solutions at scale.
Objectives of this role:
Design and productionize cloud ML pipelines (SageMaker, Step Functions, EKS) that advance predictive-automation roadmap
Integrate foundation models via Bedrock and Anthropic LLM APIs to unlock generative-AI capabilities
Optimize and extend existing ML libraries / frameworks for multi-region, multi-tenant workloads
Partner cross-functionally with data scientists, data engineers, architects, and security teams to deliver end-to-end value
Detect and mitigate data-distribution drift to preserve model accuracy in real-world traffic
Stay current on AWS, MLOps, and generative-AI innovations; drive continuous improvement
Responsibilities:
Transform data-science prototypes into secure, highly available AWS services; choose and tune the appropriate algorithms, container images, and instance types
Run automated ML tests/experiments; document metrics, cost, and latency outcomes
Train, retrain, and monitor models with SageMaker Pipelines, Model Registry, and CloudWatch alarms
Build and maintain optimized data pipelines (Glue, Kinesis, Athena, Iceberg) feeding online/offline inference
Collaborate with product managers to refine ML objectives and success criteria; present results to executive stakeholders
Extend or contribute to internal ML libraries, SDKs, and infrastructure-as-code modules (CDK / Terraform)
Skills and qualifications:
Primary technical skills:
AWS SDK, SageMaker, Lambda, Step Functions
Machine-learning theory and practice (supervised / deep learning)
DevOps & CI/CD (Docker, GitHub Actions, Terraform/CDK)
Cloud security (IAM, KMS, VPC, GuardDuty)
Networking fundamentals
Java, Springboot, JavaScript/TypeScript & API design (REST, GraphQL)
Linux administration and scripting
Bedrock & Anthropic LLM integration
Secondary / tool skills:
Advanced debugging and profiling
Hybrid-cloud management strategies
Large-scale data migration
Impeccable analytical and problem-solving ability; strong grasp of probability, statistics, and algorithms
Familiarity with modern ML frameworks (PyTorch, TensorFlow, Keras)
Solid understanding of data structures, modeling, and software architecture
Excellent time-management, organizational, and documentation skills
Growth mindset and passion for continuous learning
Preferred qualifications:
10+ years of Software Experience
3+ years in an ML-engineering or cloud-ML role (AWS focus)
Proficient in Python (core), with working knowledge of Java or R
Outstanding communication and collaboration skills; able to explain complex topics to non-technical peers
Proven record of shipping production ML systems or contributing to OSS ML projects
Bachelor’s (or higher) in Computer Science, Data Engineering, Mathematics, or a related field
AWS Certified Machine Learning – Specialty and/or AWS Solutions Architect – Associate a strong plus
Full-time
Hybrid remote