We are having openings for 5 to 5 8 yrs. Exp. Elk Stack Engineer for one of our client in Chennai (WFO)
Customer currently uses ELK stack, and the goal is to standardize and modernize logs, metrics, and traces using OpenTelemetry, while improving visibility, reliability, and operational intelligence.
Observability Architecture & Modernization
· Assess the existing ELK-based observability setup and define a modern observability architecture
· Design and implement standardized logging, metrics, and distributed tracing using OpenTelemetry
· Define observability best practices for cloud-native and Azure-based applications
· Ensure consistent telemetry collection across microservices, APIs, and infrastructure
Logging, Metrics & Tracing
· Instrument applications using OpenTelemetry SDKs (SpringBoot, .NET, Python, Javascript – as applicable)
· Support Kubernetes and container-based workloads (if applicable)
· Configure and optimize log pipelines, trace exporters, and metric collectors
· Integrate OpenTelemetry with ELK / OpenSearch / Azure Monitor / other backends
· Define SLIs, SLOs, and alerting strategies
· Knowldege in integrating the GitHub and Jira metrics as DORA metrics to observability.
Operational Excellence
· Improve observability performance, cost efficiency, and data retention strategies
· Create dashboards, runbooks, and documentation
AI-based Anomaly Detection & Triage (Good to Have )
· Design or integrate AI/ML-based anomaly detection for logs, metrics, and traces
· Worked on AIOps capabilities for automated incident triage and insights
Required Technical Skills
Core Observability
· Strong hands-on experience with ELK Stack (Elasticsearch, Logstash, Kibana)
· Deep understanding of logs, metrics, traces, and distributed systems
· Practical experience with OpenTelemetry (Collectors, SDKs, exporters, receivers)
Cloud & Platforms
· Strong experience with Microsoft Azure to integrate with Observability platform.
· Experience with Kubernetes / AKS to integrate with Observability platform.
· Knowledge of Azure monitoring tools (Azure Monitor, Log Analytics, Application Insights)
· Experience with Kubernetes / AKS is a strong plus.
Soft Skills
· Strong architecture and problem-solving skills
· Clear communication and documentation skills
· Hands-on mindset with an architect-level view
Good to Have / Preferred Skills
· Experience with AIOps / anomaly detection platforms
· Exposure to tools like Prometheus, Grafana, Jaeger, OpenSearch, Datadog, Dynatrace, New Relic (any)
· Experience with incident management, SRE practices, and reliability engineering