Post Job Free
Sign in

Senior Platform Engineer

Company:
ICONSTAFF
Location:
Cambridge, MA
Pay:
120000USD - 200000USD per year
Posted:
June 14, 2026
Apply

Description:

Job Description

Senior Platform/Infrastructure Engineer

Location: Fully remote (HQ Cambridge, MA)

Hours: 9–5 EST, with 2-day on-site visits every 6 weeks

You’ll be responsible for designing, scaling, and maintaining the infrastructure and internal developer platforms that power a real-time learning AI at a seed-stage startup. The role blends infrastructure ownership with platform engineering to enable AI/product teams to ship quickly and reliably.

Key Responsibilities - Infrastructure

Maintain production health: performance, reliability, cost efficiency, and security.

Manage GCP Kubernetes clusters (GKE), networking, storage, and compute resources.

Handle scaling, resource allocation, and high availability for growing customer demand.

Refine observability: logs, traces, metrics, dashboards, and alerts.

Perform security hardening and cost optimization.

Kep Responsibilities - Platform Engineering

Build internal tooling and abstractions for developer productivity.

Design CI/CD pipelines using GitHub Workflows and ArgoCD.

Provide self-service environments, internal portals, and deployment systems.

Collaboration & Communication

Work closely with AI and full-stack teams to optimize system architecture.

Explain technical concepts and trade-offs clearly to engineers and non-engineers.

Troubleshoot issues across multiple systems (Python, JavaScript, SQL).

Requirements

5+ years in production cloud environments at scale.

Strong familiarity with GCP (primary) and some AWS experience.

Experience with Kubernetes (GKE), node pools, and memory-intensive jobs.

Working knowledge of CI/CD systems (GitHub Workflows + ArgoCD).

Exposure to observability tools (Datadog), databases (Cloud SQL, ClickHouse, Bigtable), and cloud services.

Skills & Qualities

Strong analytical and problem-solving ability.

Clear, collaborative communication.

Curiosity and ownership mentality.

Fluent in reading/debugging code across Python, JavaScript, SQL.

Technical Stack

Cloud: GCP (primary), AWS (secondary)

Kubernetes: GKE, multiple node pools

CI/CD: GitHub Workflows + ArgoCD

Data: Cloud SQL, ClickHouse, Bigtable, GCS, Dataflow

Networking: Cloudflare Workers, Durable Objects, WebSocket communication

Monitoring: Datadog

Environments: Production, Staging, Integration, Development

Full-time

Fully remote

Apply