Sign in

High Performance Computing (HPC) administrator - Remote

Company:
MSys Inc
Location:
Raleigh, NC
Posted:
January 12, 2021

Description:

Title: High Performance Computing (HPC) administrator - Remote

Location: Remote

Length: Long term

Restriction: W2 or C2C

Description:

Webcam interview; *** Long term project *** Remote; due to security only US Citizens

high performance computing (HPC) administrator to support our machine learning (ML) platforms and ecosystems. This candidate would support a product team including data scientists and machine learning engineers to support ML products and services for our Manufacturing and Engineering customers. The role includes the running and maintenance of the current environment including incident and change management as well as capacity and release management associated with the development and execution of the ML solutions. A successful candidate will have a solid understanding of operation systems and databases as well as some exposure with HPC environments utilizing graphics processing units (GPU). An understanding of basic programing skills covering UNIX Shell scripting and Python is also required.

Deliverables:

Assists in the day-to-day operations including incident management working w/the team to resolve the associated issues with the infrastructure and systems.

Develop and maintain automation and orchestration software and scripting to assist with Machine Learning code release management.

Prioritize and efficiently manage deployment and configuration tasks

Support of the containerized ecosystem

Workflow processes for

Data ingest (Python)

API provisioning service

Scheduling (HTCondor)

Watchdog jobs monitoring for data and performing data ingest

Support of the hardware specific configurations, tuning, GPU support

Skills

Experience with enterprise standard Database systems (i.e. Oracle, Microsoft SQL Server) Experience at working both independently and in a team-oriented, collaborative environment is essential

Functional understanding of networking concepts; routing, switching, firewalls, load balancers, proxy services, & protocols (TCP, UPD, HTTP, TLS, SIP, SMTP, SNMP, LDAP)

Customer service focused with excellent communication (written and verbal) and interpersonal skills. Must be able to effectively work with customers, coworkers, vendors and management

Experience in Python, PL/SQL

Experience in Unix shell scripting, PowerShell, and Bash

Working knowledge of containers and/or orchestration platforms (i.e. Docker, Singularity, Kubernetes, Rancher)

Prior application development experience using tools such as Jenkins, Gradle/Ant, SVN/Git, Artifactory, Automation

Understanding of computer HW and architecture specifically high performance compute (HPC) utilizing GPUs

Familiarity with agile methodology, ideally Scaled Agile Framework (SAFe)

Apply