Post Job Free
Sign in

Kafka Site Reliability Engineer (SRE)

Company:
Tata Consultancy Services - TCS
Location:
Sunnyvale, CA, 94085
Posted:
May 07, 2025
Apply

Description:

At the top of the resume, please include the candidate\'s current location. If the candidate is not local, clearly state whether they are willing to relocate and work onsite five days a week. Applications missing this information will be rejected.

Role Description & Required/Preferred Skills:

• Experience in monitoring and troubleshooting large-scale data platforms, data streaming pipelines, and complex backend services

• Able to effectively communicate incidents, coordinate incident responses, document/manage runbooks, and set up processes to support other software engineering teams and data analysts

• Experience in managing large datasets

• Experience in creating Kubernetes configurations, managing services in Kubernetes, and building Docker images

• Demonstrate good understanding and troubleshooting Kafka jobs

• Kafka platform design, installation, operation, and best practices for Brokers, Zookeepers, Kafka Connect/Connectors, Security Settings, JMX for Kafka monitoring, and performance tuning

• Setting up new Kafka clusters and onboarding Kafka APIs

• Topic management (creating, deleting, enabling, disabling, monitoring)

• Upgrade, installation, patching, and deployment for Kafka

• Experience in developing/maintaining automation tools in programming/scripting languages like Python

• Demonstrate good understanding of big data ecosystems such as HDFS, Kafka, SQL, etc.

• Experience in using IT automation tools such as AlgoCD and job orchestrator systems such as AirFlow, Jenkins

• Experience in cloud-based environments (AWS/GCP, etc.) is a big plus

• Experience in monitoring tools such as Splunk, Prometheus/Grafana is a big plus

• Experience with full-stack web development (React, Django, etc.) is a big plus

• Support large-scale data pipelines and backend services by monitoring and troubleshooting/recovering incidents

• Deploy and support Kafka clusters

• Build and operate data management infrastructure services

• Automate build, deployment, and monitoring

• Participate in on-call and release turns

• Take on-call turns, including weekend coverage. Skills : Kubernetes Configurations, Managing Services In Kubernetes, And Building Docker Images, Brokers, Zookeepers, Kafka Connect/Connectors, Security Settings, JMX For Kafka, HDFS, Kafka, SQL, Etc..

Contract

Apply