The Observability Engineer – NOC, you will build and maintain tools, applications and services supporting the BlackLine SaaS application and internal teams. A successful candidate must possess solid critical thinking skills and have experience supporting large server farms, 24x7 High Availability mission-critical traffic-intensive web infrastructures and be familiar with container technologies. Large SaaS experience is very desirable.
Responsibilities
Ensure 99.99%+ availability of the services and infrastructure that spans across multiple global datacentres in private and public clouds.
Troubleshoot BL container platforms and supporting automation in a highly available, high traffic environment.
Monitor and maintain health, performance, and security of all infrastructure components.
Build systems and perform necessary tasks to deliver against committed project timelines. Desire to automate everything
Solve real-life problems in a bleeding-edge, high-performance, and high-traffic environment. Maintain documentation and operational knowledge base.
Triaging first level events and incidents.
Adhere to the change management and other established processes and procedures.
Respond to and troubleshoot incidents (Incident Management). Conduct root cause analyses.
Evaluate and analyse systems, performance, issues and metrics in order to provide recommendations for continuous improvements.
Adhere to SLA compliance as defined.
Participate in a scheduled 24/7 on-call rotation for second tier support escalations.
Should be willing to work 3 days from office.
Qualifications
3 - 6 years industry experience
3+ years supporting Unix and/or Linux (Ubuntu, CentOS, Redhat) and/or Windows
3+ years supporting a SaaS/Hosting type critical revenue-generating environment.
2+ years working with development and continuous integration related tooling (Jenkins, BitBucket, GitHub)
2+ years working with tools like New Relic, Jira, Foglight.
1+ years of experience using container platforms and tooling (Kubernetes, Docker, Rancher, Helm, Anthos, Istio, GKE, AKS, etc...)
Experience in hybrid cloud and/or multi-cloud environments (GCP (primary), Azure, AWS, VMWARE)
Understanding of software development processes and methodologies.
Experience with scripting and/or systems programming languages (Bash, PowerShell, Python, Golang, C#).
Hands-on problem-solving skills, technical leadership and mentoring qualities.
Strong written and oral communication skills.
Ability to participate in On-Call rotation
A minimum of two years of experience in a 24x7 operations organization, deploying and operating complex cloud infrastructure at scale
3 days hybrid mandatory.
Salary Range
-
Employee Referral Bonus Amount
$1,000
Regular