Post Job Free
Sign in

Lead Platform Engineer (Observability)

Company:
Integrated Talent Strategies
Location:
Cleveland, OH, 44113
Posted:
May 09, 2024
Apply

Description:

Integrated Talent Strategies (ITS) is seeking a Lead Platform Engineer to work remotely in the United States. This is a direct hire position that offers a variety of great benefits, including health insurance, 401(k), and paid time off.

#LI-RS1 JOB DESCRIPTION

Traceable, event-based Observability enables exploratory investigation when issues occur, for causes both known and unknown. It helps teams troubleshoot issues without first having to predict what or how problems may happen, especially with complex, multi-layer distributed applications connected with microservices.

Observability also helps teams improve their understanding of how customers use digital products. Product teams use that awareness to influence future development. The Observability Engineer contributes to the overall strategic vision of the organization for Observability capabilities, processes, patterns, and tooling.

The Lead Observability Engineer will lead efforts working closely with the Product and Infrastructure teams to ensure that all aspects of the telemetry from applications, business events, appliances and infrastructure are accurately received, tagged, and reported. This role involves leading efforts to maintain observability platforms, and ensure it is optimized and operating within SLA's and SLO's. The incumbent of this role will serve as a SME in Observability practices for the enterprise and will provide services/solutions across the enterprise which enable businesses to achieve and sustain a higher SLA by improving quality of software, reducing problem determination/down time and enhancing the end user experience.

RESPONSIBILITIES Essential Functions

Strategy & Planning

Socialize the Observability capabilities, processes, and Technology with the various application groups.

Work with various product and business groups to help determine SLIs, SLOs and SLAs for products, applications, and services offered to the customer. Establish strategies, processes, and tooling to adhere to the SLAs.

Lead efforts to provide self-service capabilities to analyze and visualize Observability data providing End to End visibility to Products and application performance (this will include Dashboards, Alerting, automated incident response capabilities etc,.).

Provide strategic roadmap for Observability maturity including recommendations on tooling, capabilities to support the ever-growing enterprise needs and new products.

Create, support, and sustain methods and procedures to measure outcomes of Observability practices.

Provide ability for developers to use tools to identify symptoms and diagnose application issues by providing them requisite access levels and training

Develop and document Observability standards, procedures, and best practices for using the tool, provide education in the tools use.

Clearly communicate to IT and business stakeholders regarding performance-related recommendations and tradeoffs.

Partner with QA team, assisting with creating and refining effective performance test objectives, test plans, and scenarios that help the organization achieve quality requirements for applications.

Acquisition & Deployment

Work with business to provide guidance for developing KPI's in support of strategize business initiatives.

Establish measurements for KPI's and related business transactions of interest and develop executive dashboards required to observe application, user behavior, and user-interaction for business-critical functions.

Work with development and architecture teams to manage Observability data collection, analysis, and visualization for critical applications through the lifecycle of the application.

Working on continuous improvements of Observability capabilities, providing technical guidance to development teams and aid in triaging production problems

Independently utilize Observability tools to detect, isolate, and resolve issues effecting positive user experience and user interaction with the applications.

Contribute to aspects of the solution delivery lifecycle in prototyping, capacity modeling, performance driven design, profiling, performance testing, availability management, and troubleshooting.

Guide operations and support team on building and refining application behavior data capture and reporting for Production systems, and corresponding processes.

Assist in major application and/or security incident troubleshooting.

Operational Management

Provide and design cross-team training opportunities.

Improve knowledge and skills in Enterprise Devops team to become more competent and able to accept greater responsibilities.

Install and configure software products. Ensure compatibility between target product, operating system, and other resident software. Apply maintenance according to best practices.

Lead in capacity planning and performance management activities.

Contribute to the development of service level goals and objectives.

Develop and prepare metrics that measure services rendered.

Identify opportunities to improve service levels and/or minimize support efforts.

Perform standard configuration, management, and maintenance tasks in support of web resources.

Mentor and/or provide guidance to all members of the team. Incidental Functions

Participate in disaster planning/mitigation/recovery.

Conduct Product Proof-of- Concepts.

Research and recommend new technologies, including tools, components, and frameworks.

Assist with other projects as may be required to contribute to the efficiency and effectiveness of the group and other business/technical entities.

Assist and participate with Change Management preparations and implementations, providing technical subject matter expertise.

Attend and periodically lead meetings in participation with the team.

Participate in hiring activities and fulfilling affirmative action obligations and ensuring compliance with the equal employment opportunity policy.

Provide periodic 24/7 on-call support of specific functions. QUALIFICATIONS Formal Education & Certification

Bachelor's degree (or foreign equivalent) in a Computer Science, Computer Engineering, or Information Technology field of study (e.G., Information Technology, Electronics and Instrumentation Engineering, Computer Systems Management, Mathematics) or equivalent experience. Knowledge & Experience

10+ years of IT experience.

5+ years hands on development experience with object-oriented programming (such as Java, C++ ... Etc.)

3+ years of experience in Observability or Application monitoring and Log aggregation.

Experience working in a DevOps environment doing Docker deployments to a Kubernetes environment preferred.

Skilled in architecting, installing, configuring, and using Monitoring and Log aggregation tools.

Demonstrated knowledge and experience in implementing Open Telemetry framework.

A solid understanding of many different types of application infrastructures both in the UNIX and Windows environment.

Experience working with multiple monitoring tools.

Experience setting up alerting thresholds for application performance settings.

Experience with various software development methodologies such as waterfall, agile, scrum, Kanban ... Etc.

Experience working with development groups and application architects.

Proven track record operating in multiple stake holder environment and successfully handle delivery across multiple locations

About Integrated Talent Strategies (ITS)

ITS is an international recruiting and staffing firm specializing in Engineering, Technical and Professional positions. Founded in 1984 as a subsidiary of an engineering firm, ITS continues to remain a strategic partner for the job seeker.

Our clients include some of the largest and most respected architectural, engineering, and manufacturing companies in business today. Our ability to offer a wide range of services, and the flexibility to adjust to the changing needs of our clients, has allowed us to maintain a solid reputation for 40 years.

Apply