Senior Java SRE
Experience Required: 14+ years
Assignment Duration: 12+ Months
Engagement Type: Contract
Work Location: Santa Monica, CA - Onsite (Hybrid/Initial Remote options depending on end-client)
Consultants have to give a coding round based on Java
Key Responsibilities:
• Architect globally distributed, multi-region GCP platforms with 99.99%+ availability targets.
• Define and operationalize SLIs, SLOs, error budgets, and reliability governance models.
• Lead incident command, RCA, and long-term reliability remediation for large-scale systems.
• Engineer and tune Java-based microservices (JVM internals, GC strategies, memory profiling).
• Design and operate GKE (Google Kubernetes Engine) at scale, including multi-cluster and fleet management.
• Implement GCP-native architectures using: GKE, Compute Engine, Cloud Load Balancing Cloud Spanner, Bigtable, Cloud SQL Pub/Sub, Cloud Storage IAM, VPC Service Controls
• Build secure and repeatable infrastructure using Terraform and policy-as-code.
• Design advanced service mesh and traffic management using Istio / Anthos Service Mesh.
• Implement stateful Kubernetes workloads using Portworx.
• Implement advanced Kubernetes storage using Portworx for stateful workloads.
• Support event-driven architectures using Kafka, Kafka Streams, KSQLDB, and Spark Streaming.
• Integrate GCP-native streaming solutions such as Pub/Sub.
• Optimize systems for low-latency, high-throughput workloads.
• Implement advanced observability using Prometheus, Datadog, Splunk, Kiali.
• Leverage eBPF for kernel-level tracing, networking diagnostics, and performance tuning.
• Manage advanced ingress, load balancing, and traffic shaping using Nginx Controller and Seesaw.
• Architect high-scale CI/CD pipelines using GitLab CI/CD, Jenkins, and GCP-native tooling.
• Build internal developer platforms (PaaS) to standardize deployments and reduce toil.
• Automate operations using Python, Go, Bash, and custom reliability tooling.
• Provide 24 7 production support across U.S. time zones.
• Participate in on-call rotations, weekend releases, and incident war rooms.
• Continuously improve monitoring, alerting, and incident response maturity.
Required Technical Expertise:
• Java (Advanced JVM internals, GC, performance tuning)
• GCP Cloud (Professional-level depth)
• GKE/Kubernetes (CKA/CKS depth)
• Docker, Terraform
• CI/CD: GitLab CI/CD, Jenkins
• Streaming: Kafka, Kafka Streams, KSQLDB, Spark
• Service Mesh: Istio, Anthos Service Mesh
• Monitoring & Logging: Prometheus, Datadog, Splunk, Kiali
• OS & Scripting: Linux/Unix, Bash
• Programming: Python or Go
• Virtualization: VMware
• Networking & Performance: eBPF, Nginx Controller, Seesaw
• Multi-cluster Kubernetes governance
• Internal platform engineering (PaaS)
• High-traffic SaaS or consumer-scale platforms
• Real-time streaming & event-driven architectures
• Deep observability and kernel-level tracing
• GKE fleet & Anthos multi-cluster architectures
• JVM performance engineering at hyperscale
• Service mesh traffic shaping & zero-downtime releases
• eBPF-based observability & kernel tracing
• Platform engineering / internal PaaS design
• Real-time streaming & event-driven systems
Certifications Required:
• Google Professional Cloud Architect or Professional Cloud DevOps Engineer
• Certified Kubernetes Administrator (CKA) or Certified Kubernetes Security Specialist (CKS)