We are seeking a highly skilled and experienced Lead DevOps Engineer to join our team. As the Lead DevOps Engineer, you will be responsible for overseeing all aspects of our DevOps operations and working closely with our development and operations teams to streamline processes and ensure efficient and reliable software delivery.
In this role, you will be responsible for managing and maintaining our infrastructure, deployment processes, and configuration management tools. You will also be responsible for monitoring system performance, identifying areas for improvement, and implementing solutions to enhance system reliability and scalability.
The ideal candidate for this position is a problem solver with a strong background in software development and system administration. You should have a deep understanding of DevOps principles and practices, as well as experience working with cloud platforms and containerization technologies.
This is an on-site position to work in our office in Fairfax, Virginia.
Responsibilities
Oversee the design, implementation, and maintenance of our infrastructure, deployment processes, and configuration management tools
Collaborate with development and other teams to identify and implement automation and process improvements to streamline software delivery
Monitor system performance, identify areas for improvement, and implement solutions to maximize system reliability and scalability
Maintain and enhance our CI/CD pipelines to enable efficient and reliable software delivery
Manage cloud environments and infrastructure, ensuring optimal performance and cost efficiency
Stay up-to-date with industry trends and best practices in DevOps, and drive the adoption of new technologies and tools within the organization
Requirements
Bachelor's degree in Computer Science, Engineering, or a related field
7+ years of experience in software development or system administration
Strong experience with Amazon Web Services (AWS) or similar cloud platforms
Comfortable creating monitoring scripts for system health and doing log parsing/harvesting of various application logs to find errors
Familiarity with monitoring and logging tools such as Prometheus, Grafana, Splunk, or ELK stack
Expertise in managing IAM users and permissions
Crafting strategies for multi-zone availability and deployment to maximize system uptime
Proficiency with configuration management tools such as Ansible, Puppet, or Chef
Experience with containerization technologies such as Docker and Kubernetes
Knowledge of scripting languages such as Python, Ruby, or PowerShell
Strong problem-solving and troubleshooting skills
Excellent communication and leadership abilities
Ability to work effectively in a fast-paced and collaborative environment
Experience with Agile methodologies and DevOps principles
Relevant certifications such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA) are a plus.