The USDS TikTok Product Engineering SRE team works with engineering and product teams to build, maintain and run large-scale, globally distributed, observable, fault-tolerant systems.
SREs on this team will deliver on production ownership and be responsible for observability and automation across complex, large-scale service mesh architectures.
In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department.
We regularly review our hybrid work model, and the specific requirements may change at any time.
Responsibilities: * Provide technical leadership and mentorship to a team of Site Reliability Engineers focused on building observable, fault-tolerant systems * Drive architectural decisions for large-scale, globally distributed service mesh architectures * Establish and maintain production ownership models, incident response protocols, and service level objectives * Develop strategic roadmaps for observability and automation initiatives that enhance system reliability * Balance technical contributions with people management responsibilities, including career development, performance evaluations, and team growth * Foster a culture of reliability, continuous improvement, and knowledge sharing within your team and across the organization * Lead security initiatives to safeguard critical assets, partnering with security and compliance teams to implement robust protocols that ensure data protection and regulatory compliance across all services