BrightAI is a high-growth Physical-AI company transforming how businesses interact with the physical world through intelligent automation. We are building a cutting-edge AI platform that processes visual, spatial, and temporal data across billions of real-world events-from edge devices and mobile sensors to large-scale cloud systems.
We are looking for a Senior DevOps Engineer who wants to join us in a fast paced setting to build next generation clouds and applications that leverage all the latest technologies in AWS, Kafka, Cloud Native, Containers, Terraform, CI/CD, AI and ML. Our tech stack is a loosely coupled, event-based architecture, built on top of Kafka with many microservices and IoT devices, that processes tens of millions of events a day. As a member of the team you will play an integral role in shaping the architecture, processes and features needed to maintain and release a high scale cloud platform with teams.
Responsibilities
Building the tools, processes and support for teams to be able to self-manage in an owner operator model
Establishing and supporting infrastructure as code best practices and patterns for teams
Managing and supporting engineering teams' software releases
Working with cloud engineers to define repeatable processes and patterns for supporting services in a production environment
Staying on top of latest cloud trends and topics
Prototyping unproven concepts to inform final implementations
Automating everything possible. Leveraging CI/CD to automate repeatable tasks and quality checks.
Managing multiple AWS accounts and providing unified operational support
Building many common dashboards, monitoring and alerts used by the business Skills and Expertise
Passionate for continuous learning and understanding things in and around DevOps, site reliability, high availability and release engineering best practices
BS/MS in Computer Science or equivalent practical experience
4+ years of experience in software languages like TypeScript, JavaScript, Python, Go, etc.
Deep experience with AWS cloud infrastructure and services, including EKS, EC2, VPC, Route53, S3, RDS (Postgres), MSK, Lambda, ECS, DocumentDB
In-depth knowledge of Kubernetes, including experience with deploying, managing, and scaling clusters
Experience with Configuration Management and Infrastructure as code: Terraform, Kustomize, Helm, CloudFormation, Ansible, etc
Extensive networking experience
Experience with Linux and shell scripting
Experience with Datadog, SumoLogic, OpsGenie, Prometheus, Grafana, or similar for reporting and alerts
Experience with high availability deployments across multiple regions and AZs
Experience with setting up and maintaining CI/CD pipeline tooling
Experience working within an Agile environment
Experience with managing multiple AWS accounts
Familiarity with leveraging reserved instances
Experience with AWS Cost explorer and managing usageBonus
Previous startup experience
Experience with growing large teams and can describe the issues that arise with growth.
Experience operating and monitoring large Kafka clusters
Experience with cloud development using TypeScript, JavaScript, Python, Go, etc.