Post Job Free
Sign in

Sr. HPC Architect - Hybrid

Company:
Carislifesciences
Location:
Irving, TX, 75063
Posted:
May 22, 2025
Apply

Description:

At Caris, we understand that cancer is an ugly word—a word no one wants to hear, but one that connects us all. That’s why we’re not just transforming cancer care—we’re changing lives.

We introduced precision medicine to the world and built an industry around the idea that every patient deserves answers as unique as their DNA. Backed by cutting-edge molecular science and AI, we ask ourselves every day: “What would I do if this patient were my mom?” That question drives everything we do.

But our mission doesn’t stop with cancer. We're pushing the frontiers of medicine and leading a revolution in healthcare—driven by innovation, compassion, and purpose.

Join us in our mission to improve the human condition across multiple diseases. If you're passionate about meaningful work and want to be part of something bigger than yourself, Caris is where your impact begins.

Position Summary

A Senior HPC Architect is responsible for designing and optimizing high-performance computing (HPC) systems, leveraging their expertise in parallel programming, performance analysis, and hardware architecture to create scalable, efficient solutions for demanding computational workloads, often collaborating with software developers and hardware engineers to achieve optimal performance across complex scientific or data-intensive applications.

Job Responsibilities

System Design and Implementation:

Architecting and designing high-performance computing clusters, selecting appropriate hardware components like CPUs, GPUs, storage systems, and networking infrastructure.

Installing and configuring operating systems (typically Linux) on cluster nodes.

Setting up and managing distributed file systems (like Lustre, Ceph, GPFS) for large data storage and access.

Implementing job scheduling systems (e.g., LSF, Slurm, PBS) to manage workload distribution across the cluster.

Performance Optimization:

Monitoring system performance metrics (CPU utilization, memory usage, network bandwidth) to identify bottlenecks and optimize resource allocation.

Benchmarking applications and performing performance analysis to identify areas for improvement.

Tuning application code for parallel processing to leverage the power of the HPC cluster.

User Support:

Providing technical support to researchers and users on how to access and utilize the HPC system

Training users on best practices for submitting jobs and optimizing their applications for the HPC environment

Troubleshooting user issues related to application execution, data management, and system access

System Administration:

Managing system updates, patching, and security configurations to maintain a stable and secure HPC environment

Implementing backup and disaster recovery procedures for critical data and system configurations

Monitoring system health and proactively addressing potential issues through alerts and logging systems

Required Qualifications

Minimum of five years’ experience in Linux systems administration.

Bachelor's degree in computer science, engineering, math, or scientific discipline with 2+ years of systems engineering; or 6 years’ experience in HPC architecture.

Hands-on architecture design experience with HPC to include storage, file system, InfiniBand, security, authentication, and compute architecture

Experience using Git to manage shared software configuration code bases

Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP).

Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations.

Deep understanding of parallel computing concepts and programming paradigms (MPI, OpenMP, CUDA).

Expertise in performance analysis tools and techniques to identify and address performance bottlenecks.

Knowledge of HPC hardware architectures, including processors, memory subsystems, network fabrics, and interconnects

Familiarity with HPC software stack components like compilers, runtime systems, job schedulers, and scientific libraries

Good understanding of storage administration and optimization, such as performing upgrades and defining RAID configurations.

Strong programming skills in languages commonly used in HPC (C, C++, Fortran)

Strong skills with scripting languages like Python and Shell scripting (e.g.,bash,ksh, Perl, Python) for automation

Experience with system administration and cluster management tools (e.g., LSF, Slurm, PBS)

Experience with distributed file systems (Lustre, Ceph, GPFS)

Excellent communication and problem-solving abilities to effectively collaborate with cross-functional teams

Preferred Qualifications

Experience in life sciences, healthcare and/or research institutions highly preferred

Experience building and installing scientific software and other 3rd party software applications on HPC systems

Experience with HPC schedulers and resource managers

Experience executing scientific software on HPC systems

Experience writing user documentation

Strong technical and analytical skills

Strong verbal and written communication skills

Always maintains the highest level of professionalism when interacting with internal and external customers

Demonstrates a high-energy, positive attitude and commitment to quality customer service

Contributes to a positive team environment within the center by demonstrating a strong work ethic, effectively communicating with others, and proactively anticipating center and user needs

Experience coordinating and running support teams

Related industry certifications preferred.

Physical Demands

Ability to lift, move and install HPC data center hardware and supplies.

Standing for extended periods while performing data center related tasks.

Training

All job specific, safety, and compliance training are assigned based on the job functions associated with this employee.

Other

This position requires periodic travel and some evenings, weekends, and/or holidays.

Job may require after-hours response to emergency issues.

Periodically scheduled on-call may require after-hours response for technical emergencies not explicitly related to assigned job responsibilities

Conditions of Employment: Individual must successfully complete pre-employment process, which includes criminal background check, drug screening, credit check ( applicable for certain positions) and reference verification.

This job description reflects management’s assignment of essential functions. Nothing in this job description restricts management’s right to assign or reassign duties and responsibilities to this job at any time.

Caris Life Sciences is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, religion, color, national origin, gender, gender identity, sexual orientation, age, status as a protected veteran, among other things, or status as a qualified individual with disability.

JR103261

Apply