Post Job Free
Sign in

Software Engineer: III (Senior) - NA

Company:
Mindlance
Location:
Clinton Township, OH, 43224
Posted:
March 15, 2026
Apply

Description:

Principal Site Reliability Engineer

Remote - can work anywhere in US

1 Year Contract

About this role:

As a Principal Site Reliability Engineer, you are expected to have extensive experience in software and systems engineering to automate operations, reduce toil, and lead continuous improvement across the service lifecycle. Additionally, the role involves collaborating with the engineering, go to market and technical marketing teams through in-person and remote interactions by aligning their requirements and use cases to the Demo Platform functional capabilities and services.

This is a remote position and can be located anywhere in the United States or Canada. Successful applicants must reside where client is registered to do business.

What you will do:

Design, develop, and implement robust, scalable, and secure IT infrastructure solutions, aligned with business objectives and industry best practices.

Implement automation and DevOps processes to improve the cloud life cycle, including infrastructure and application uptime, availability, right-sizing and time-to-market

Collaborate with teammates and project stakeholders to meet timelines, goals and SLA

Design and implement Model as a Service platform utilizing AI products and GPU enabled Client hardware systems

Perform architectural planning, deployment, and management of OpenShift Container Platform environments.

Architect and optimize virtualization solutions using KVM/QEMU, including advanced capabilities offered by OpenShift Virtualization (Kubevirt).

Design and implement advanced network architectures, particularly Software-Defined Networking (SDN) and Open Virtual Network (OVN), ensuring high performance and reliability.

Develop comprehensive storage strategies, including the design and administration of physical storage solutions and distributed storage systems like Ceph / OpenShift Data Foundation (ODF).

Oversee the administration and automation of bare-metal infrastructure, ensuring optimal performance and resource utilization.

Drive automation initiatives using Ansible and Advanced Cluster Manager for Kubernetes (ACM) for infrastructure provisioning, configuration management, and operational tasks.

Establish and optimize CI/CD pipelines for infrastructure and platform deployments, promoting agile and efficient delivery.

Provide technical leadership, mentorship, and guidance to engineering teams on architectural patterns and best practices.

Evaluate new technologies and trends, recommending solutions that enhance our IT landscape and provide competitive advantages.

Collaborate cross-functionally with development, operations, and business teams to gather requirements and translate them into architectural designs.

Create and maintain detailed architectural documentation, including design specifications, diagrams, and operational guides.

Contribute to performance testing and tuning, quality assurance (QA), ticket and incident management

What you will bring:

8+ years of progressive experience in IT architecture, with a significant focus on infrastructure design and implementation

5+ years of experience with Public Cloud, Virtualization and Linux technologies, specifically KVM/QEMU, and a strong understanding of OpenShift Virtualization (Kubevirt)

5+ years of experience with OpenShift Container Platform or Kubernetes including cluster operations, networking, storage integration, and security

3+ years of experience with automation frameworks and tools like Ansible or Terraform

Hands-on experience with Bare-metal administration, including hardware provisioning, firmware management, and operating system deployment.

Solid understanding and practical experience with CI/CD methodologies and tools for automated deployments.

Strong problem-solving abilities, analytical skills, and a strategic mindset.

Excellent communication, presentation, and interpersonal skills, capable of articulating complex technical concepts to diverse audiences.

Nice to have:

Experience with AI/ML technologies and recent developments including OpenShift AI, inference systems and technologies like vLLM

Proven experience with enterprise-grade storage solutions, Software-Defined Storage technologies especially Ceph and ODF

Advanced knowledge of Software-Defined Networking (SDN) principles and practical experience with Open Virtual Network (OVN).

Extensive experience with (RHEL) administration and design

Experience with AppDev automation and pipelines including technologies like Jenkins, Tekton, ArgoCD, etc

Experience with networking technologies including VLANs, routing protocols, IPAM solutions, and Load Balancers

Experience in designing and delivering implementations using various public and private cloud infrastructure technologies and providers

Experience in Python development for automation, scripting, and tool development.

Experience in Go development for building high-performance applications or infrastructure components.

Relevant certifications (e.g., Red Hat Certified Architect, Kubernetes certifications, industry cloud certifications).

SPOTLIGHT CALL: 3/9/2026

Aleks Definition

- Specializing in AI and supporting optimization

- Using AI tools and implementing them

- Deployed properly and can perform properly

- Working with Client

- Client based resources

- OpenShift AI product

- Goal is to develop and support demo and workshops and proof of concepts to show how RH AI works on IBM hardware

- Focus on AI is very important - LLMs

- Position is Sr. so they don't expect ramp up time, this person needs to be able to hit the ground running

Q+A

- What would the first month focus be?

• Expect to start and hit the ground quickly. Existing infrastructure and this position supports a demo platform.

- Expected to work a specific time zone?

• Normal hours, no extended hours where they are located (they are flexible)

- Any other skills

• Python and Ansible will be very beneficial

- What does the interview process look like?

• 2 people/interviews

• Technical interview first and then managerial interview

- Is the team in early stage of automation or is everything done?

• Expecting that this person will be involved in implementation of new services

• Some will already be in place, some is going to be in place in a few weeks

• Demo platform is used to support marketing, workshops, demos, etc.

• This person needs to automate

- Involves RH AI and based on Client hardware. What if there is someone on different hardware?

• That is fine.

- What is missing on previous candidates?

• Did not have enough expertise in running AI tasks. Do not have practical experience on that.

• Some had experience using but not enough is running, deploying, etc.

- How large is the team?

• Infrastructure architects

• First level of support people

• Infrastructure developers

• This person will be part of infrastructure management team

- Does this resource need to be USC or GC

• No security limitations

- Ideal start date:

As quickly as possible

Apply