Key Responsibilities
• Incident Management: Act as the escalation point for high-priority incidents, troubleshoot complex platform issues, and manage incident resolution to minimize downtime.
• Platform Monitoring & Optimization: Monitor platform performance and proactively identify and resolve potential issues to ensure optimal uptime and reliability.
• Root Cause Analysis: Conduct thorough root cause analysis for recurring issues, implementing permanent fixes, and improving platform stability.
• Collaboration: Work closely with product, engineering, and DevOps teams to identify, diagnose, and resolve platform-related issues in a timely and efficient manner.
• Automation & Efficiency: Drive automation initiatives to improve operational efficiency, streamline support processes, and reduce manual interventions.
• CI/CD: Maintain classic and YAML-based Azure DevOps Release Pipelines, ensuring robust CI/CD processes are in place.
• IAC Review: Manage reviews and deployments of Terraform (custom provider environment) and SaltStack states to ensure consistency and efficiency in infrastructure management
• Change Control Management: Oversee and enforce the Change Control process, ensuring all platform updates, infrastructure changes, and deployments follow established approval workflows, risk assessments, and rollback strategies to maintain system stability and compliance.
• Documentation & Reporting: Maintain detailed documentation of platform issues, troubleshooting steps, and resolutions. Provide regular reports on platform health, incident trends, and support KPIs to senior management.
• Production Deployment: Co-ordinate the planning and execution of application/infrastructure releases and configuration changes.
• Continuous Improvement: Stay up to date with emerging technologies, tools, and best practices to continuously improve platform support processes and capabilities.
• Security: Oversee security and compliance adherence in application deployment and infrastructure management.
Required Skills & Qualifications
• 10+ years of experience in platform support, system administration, or a related field, with at least 2 years in a leadership or team lead role.
• Experience with SOC-2 compliance and secure deployment practices.
• Deep knowledge of Windows/Linux/Unix systems, and networking.
• Experience with automation tools and scripting (PowerShell, Bash, SaltStack, Terraform).
• Knowledge of CI/CD pipelines and deployment processes (GitHub Actions, Azure DevOps).
• Strong troubleshooting skills with experience in debugging complex platform issues.
• Thorough understanding of the processes and procedures required to support a 24x7, "follow-thesun" support model.
• Familiarity with monitoring tools (Prometheus, Grafana, Azure Monitor & Performance Insights).
• Strong communication skills with the ability to convey technical issues to non-technical stakeholders. • Excellent leadership, problem-solving, and conflict resolution abilities.
• Willingness to work outside of normal business hours, including weekends, as part of a support role.
Preferred Skills
• Bachelor’s degree in computer science, Engineering, or a related field.
• Strong background in database deployments and monitoring (MS SQL Server, schema migrations, backups, performance tuning).
• Strong background in .NET application support (.NET 6 and 8, ASP.NET).
• Experience troubleshooting NATs/Jetstream messaging.
• Strong experience with IAC tools (Terraform).