What you will be doing:
Join our dynamic and collaborative technology team as a Site Reliability Engineer! You'll be at the heart of our operations, playing a pivotal role in ensuring the reliability, scalability, and performance of the critical services our customers depend on.
As part of the CloudOps team within our Platform tribe, you'll collaborate with fellow SREs and other engineering teams to support the entire Technology organization and the wider company. The Platform tribe is dedicated to building and maintaining the foundational systems, tooling, and services that empower our developers to bring exceptional products to life and keep them running smoothly and securely in production. We're focused on standardizing key areas like cloud infrastructure, deployment pipelines, and observability, allowing product teams to concentrate on their core applications.
The CloudOps team is crucial in architecting, building, and operating the systems that underpin our production environment. Producing reliable and cost effective infrastructure at scale.
As a Site Reliability Engineer you will:
Design and Build: Architect, implement, and maintain highly available, cost effective and reliable foundational tools, architecture and infrastructure on the cloud and kubernetes.
Ensure Reliability: Participate in an on-call rotation to effectively respond to and resolve production incidents swiftly. Lead thorough post-incident reviews to identify root causes and implement proactive preventative measures.
Automate Infrastructure: Manage and automate our cloud infrastructure using Terraform and Helm, adhering to GitOps best practices.
Collaborate Effectively: Partner closely with development and data engineering teams to ensure seamless customer experiences of our services and provide robust operational support.
Our Tech Stack:
Cloud-Based Infrastructure: Fully cloud-based with a Kubernetes-focused tech stack. Compute workloads run in Kubernetes clusters across multiple regions on AWS and GCP.
Cloudops uses Golang and Python for our backend languages, and leverages TypeScript to build and maintain our Cloudflare Workers and related edge services. A basic knowledge of bash and shell scripting will be useful.
Our products are built using Kotlin and Python at the backend, with Typescript and React forming the frontend. All workloads are containerised.
Our products make substantial use of relational database technologies, notably Postgres and Yugabyte
We use an event-sourced model powered by Kafka for our communication bus and gRPC for our intra-service communication protocol
We use modern observability solutions from Grafana Cloud, we build with GitLab tooling and deploy our code using ArgoCD
We have a strong emphasis on engineering excellence and strive to ship the best possible code and the best possible solutions to our customers.
About you:
Deep expertise in cloud services (AWS and/or GCP) particularly IAM
Significant experience managing and troubleshooting services within Kubernetes environments, and an understanding of Kubernetes as an ecosystem
Strong proficiency in observability platforms, including monitoring, alerting, and production operations. Particularly Prometheus / Grafana.
Hands-on experience codifying infrastructure with Terraform and Helm charts.
Excellent incident response and troubleshooting abilities.
Proficiency in scripting and automation using Python and shell scripting.
Experience working with containerized workloads.
Knowledge of networking and basic HTTP/TLS.
Experience collaborating with software engineers to support production cloud-native applications.
Nice to have:
Familiarity with ArgoCD, GitLab CI
Education:
BSc/BA degree in computer science, engineering or related discipline OR relevant years of experience in required skills.
What’s in it for you?
Equity as we want you to have a part of what we are building
Private medical insurance designed to keep you ensuring peace of mind while you excel in your career.
Unlimited Time Off Policy- A work-life balance and focus on our well-being are critical to keeping us performing at our best
We embrace a hybrid approach that requires employees to be in the office for two days a week. We strongly believe that this approach fosters collaboration and enables the building of meaningful relationships
You will also get a new starter budget to kit out your home office
Opportunity to work on innovative projects with smart-minded people keen to share their knowledge and continuously improve
Annual learning budget (prorated based on start date) to drive your performance and career development.
About us:
Our mission is to empower every business to eliminate financial crime.
By harnessing AI, a unified platform, and an extensive partner ecosystem, we help customers turn compliance into a catalyst for growth, operational resilience, and enduring regulatory trust.
More than 3,000 enterprises across 75 countries rely on our end-to-end platform and the world’s most comprehensive financial crime risk intelligence. With full-stack agentic automation, we help organizations automate up to 95% of KYC, AML, and sanctions reviews, cut onboarding times by 50%, reduce false positives by 70%, and handle 7x more work with the same staff.
ComplyAdvantage is headquartered in London and has global hubs in New York, Lisbon, Singapore, and Cluj-Napoca. It is backed by Balderton Capital, Index Ventures, Ontario Teachers’ Pension Plan, Goldman Sachs, and Andreessen Horowitz. Learn more about compliance re-engineered for the age of AI at complyadvantage.com.