DevOps - Site Reliability Engineer SRE)

Company:

Resource Informatics Group Inc

Location:

Atlanta, GA

Posted:

May 08, 2025

Apply

Description:

Job Description

Role: Site Reliability Engineer

Location: Atlanta, GA

Duration: 12 months

Rate: $market All Inclusive

Job Description:

This Software Engineer will be part of the Site Reliability Engineering (SRE) team.

The SRE team is an innovative team devoted to providing automated solutions and services for Cox Automotive to measure, evaluate and plan for visible, reliable application delivery and maintenance.

As a member of the SRE team, you will work with development teams to help create automated pipelines and solutions required for continuous delivery in an Agile Dev/Ops environment.

The tools and use-cases are diverse, and our challenge is to increase the development velocity by optimizing various parts of the pipeline and increase application stability.

This is an opportunity to create automation, monitoring, and pipelines to improve deploy and response time across the board.

We are looking for engineers who are passionate about infrastructure as code and continuous deployment to build scalable and highly reliable applications.

If you love to figure out how all the pieces are put together and if automation and building tools to monitor and manage your applications sounds interesting to you, we want to talk to you.

What you will do:

Automate anything and everything! (Infrastructure build out, testing, deploying, monitoring, etc)

Design and assist in the authoring of software tools that reliably manage application delivery

Design and assist in the setup and maintenance of application monitoring and alerting

Engage with Development/Capability Teams to ensure best practices are implemented

Improve predictability and reliability of software releases, workflows and operating software.

Reduce application deployment windows by leading company towards a Continuous Deployment environment

Reduce mean time to recovery (MTTR) by helping troubleshoot, monitor, alert, and automating recovery.

The skills we require:

Python, Ruby, Go or other systems programming (moderate skills required)

Experience with configuration management systems (Octopus, Chef, Puppet)

Experience rolling out redundant, mission-critical applications in a highly available production environment

Experience with version control systems (Git or SVN)

Experience with Cloud Computing platforms (Amazon AWS, Kubernetes, Heroku, etc)

Experience with continuous integration tools (Jenkins, CircleCI, etc), Artifactory (or Nexus)

Excellent written communication, problem solving, and process management skills

Desire to work in a fast paced, evolving, growing, dynamic environment

The skills we prefer:

Linux system engineering expertise

VMWare, VirtualBox experience.

Experience supporting Ruby or Java applications - Experience supporting Database Server infrastructure (MySQL, Postgres, etc)

Networking Knowledge

Experience with Hashicorp tools (Vagrant, Terraform, Packer, etc), Linux Containers (docker, rocket)

Experience with Java build tools such as Ant, Maven, Gant, or Gradle

Experience with agile development, continuous integration and automated testing

Experience with dashboarding, monitoring

Full-time

Apply

DevOps - Site Reliability Engineer SRE)

Description:

Report this job