Atit Shah
Technical Program Leader Cross-Org Strategy, AI/Infra Programs, Business Impact at Scale
People Management
Charlotte, NC *********@*****.***
OBJECTIVE
Strategic and execution-focused Technical Program Leader with 17+ years of experience delivering enterprise-scale AI, infrastructure, and platform programs. Seeking a Senior Leader, Technical Program Management/Scrum Master role to lead cross-functional TPM teams, drive org-wide alignment, and scale technical programs aligned to business outcomes for corporate and customer success.
PROFESSIONAL EXPERIENCE
Leader - Technical Program Management Meta (Facebook) March 2024 – Till Date
Core Infrastructure Org (via TEKsystems)
AI Programs:
Architected and owned the Document Summary Program using Llama LLM models (Meta.AI) for ~35M internal docs, with 15K docs added daily. Led a team of 20 team members, DS/DE/SWEs.
Reliability Chatbot - Drove and launched an internal chatbot feature (paired with Meta AI and LLM Modelling teams using RAG) Meta-wide for 250+ teams with Reliability questions, Q&A, and scheduling reviews for the orgs without human involvement. Directly managed a team of 3DS, 5DE, 5MLE, and 10SWEs.
Reliability:
Led Meta-wide Reliability Initiative (Proactive, Reactive, and Strategic programs) supporting 38 Organizations, 250+ teams involving MRDR (Multi-Region Disaster Recovery), Incident Management - SEV analysis.
Held teams accountable for delivering the OKR and KR mapping SLOs & SLIs aligned to business needs, supporting Meta’s Core Infrastructure services for ~10B+ Customers, ~5B+ MAU, and ~200 M+ ALM.
Led strategic Reliability Initiatives, including Main SEV Reviews, Organizational Reliability Reviews, Infra-wide SEV Reviews, Service Reliability Reviews, Threat Reviews, and Org-specific check-ins across 38 Meta Infra organizations.
Unified Change Safety:
Led a team of 4 TPMs with Meta’s Change Safety Initiative, partnering with 17 Infrastructure Service Reliability teams supporting 38 organizations with Code and Config Change Management. Goal to get ‘0’ SEV 0 in prod by H2 2026.
Led the Unified Change Safety (UCS) program with 20 orgs infra-wide, migrating ~2k call sites/APIs and ~3k Services behind a safety flag; Canary, Tumbleweed, bespoke, and emergency lands, resulting in a 60% reduction in production SEVs.
Measurement:
Drove the SEV Engagement Loss, Revenue Loss, and Cost of fixing the SEV Program by measuring the SEV impact.
Instrumented the SEV Prevention program (Measured 13% of SEVs prevented across Meta for H1 2025, goal 15%), partnered closely with cross-functional teams, Monetization, Dev Infra, and Family of Apps Tech Directors and DS/DE team members.
Directly managed 7 TPMs; served as acting SDM for a 10+ person SDE team for 3 months during leadership attrition.
SEV Analysis:
Architected and drove Meta-wide SEV Analysis initiative (250+ teams), created an automated tool, a platform for all teams to come to one place and perform SEV analysis, dashboarding, and make informed decisions based on the Root Cause Analysis.
Key contributor in implementing the SEV–SLI linkage, 20% SEVs linked. Every SEV filed is mapped to SLI, and revenue loss is identified against dependent services and products, providing insights into the impact of the SEV and informing measures for areas of improvement.
Drove the Meta AI – SEV Manager fields extraction and summarizing SEVs (via LLM), expediting the SEV analysis time by 60%.
Technical Product & Program Manager III Amazon Nov 2023 – March 2024
Music Org (via TEKsystems)
Prime Music:
Launched Amazon Prime Music Free Trial subscription to 950K Prime Students in the German and Austrian marketplaces in 3 months, projected to add 101K users incrementally each year.
Defined and managed the operating rhythms for the Engineering org - planning, program management, partnership syncs, etc Led Software development, validation, and release management efforts across the Amazon Music platform. Program roadmap planning, dependency tracker, weekly/monthly reviews, status reports, and playbook creation.
Partnered with 7 Product Managers on the 2024 OP2 roadmap with the Amazon Music Unlimited Promotions, Cost Savings, LLM, Payment, and Subscriptions programs.
Directly managed a team of 2 Junior PMs, 5 SWE, and 3 QA.
AI Programs: Amazon Music Subscription Cancel Save feature (0->1)
Led the customer category Ads engagement Personalization program via AI/ML and LLM, improved the revenue by ~$50M.
Drove initial requirements/design for the Music personalization program with 20+ AI teams.
Led LLM/GenAI initiative Chatbot Cancel Save program for Amazon Music, led to ~ $3M/year in cost savings. (Amazon Music automated Customer Service Chatbot feature).
Sr. Technical Program Manager T-Mobile USA June 2019 – Oct 2023
Sr. Scrum Master (via Verveba Telecom)
Device Certification and Life Cycle:
Owned the Device Life Cycle and Certification, end-to-end automation investments to deliver timely results in reducing the Go/No Go decision time from 14 days (~336 hours) to 48 hours, serving 300M+ subscribers.
Led the SPZD (Specialized Automation Device Labs) $100M Program by setting up the T-Mobile Production cell site Infrastructure across 25 US Markets in the Lab, collaborating with Service Providers (Nokia, Ericsson) and 10 OEMs and 25 IOT manufacturers.
Ideated and developed certification optimization strategies for current 300M+ T-Mobile, MPCS, Sprint, and Assurance subscribers by automating the program with ~120 members (7 cross-functional teams, 5 vendors, and 10 Field testing teams).
Managed Senior PMs, PMs, Scrum Masters, and the Managed Services team (45 team members).
Starlink - T-Mobile Integration:
Led the initial onboarding and T-Mobile integration with the Starlink network for device Messaging services with 10 OEMs.
Led the teams on both sides (65+ members) and ideated the execution strategy with Messaging services using Starlink Network.
Managed the Program tracker and the executive dashboard for the Leadership.
Data Platform:
Led the Customer Data Privacy Platform Program by capturing 1.5B calls/day and 8B activities/day by architecting the parsing strategy, bucketing the logs, and storing them in cold, warm, and hot storage formats, considering effective regulations and compliances.
Product and Software DLC expert - Drove preventive (proactive) and reactive (corrective) actions for enterprise customer product issues, leading to a 70% drop in live site incidents/issues.
Drove the PowerBI chapter at T-Mobile (300 attendees), set office hours, and hosted monthly hour sessions on PowerBI learnings and Q&A.
Device Inventory System: revamped the entire device inventory system:
Led and initiated T-Mobile’s AIQ (Automation Insights and Quality) Device Inventory system, which tracks the device life cycle within the org and deprecated unused resources (local hardware/on-premises servers/VMs), resulting in a systematic scale to reach $5M+/yr in cost savings.
Shepherd in defining success metrics (KPIs and LPIs) for the team and ensuring program investments are driven by data & feedback.
Managed Services:
Efficiently partnered with vendor teams (MSP) to meet and surpass SLAs, resource management, and work breakdown structure ~ $75M+. Led and coordinated design/implementation efforts between 15 internal XFN teams, 50 external consultants, and 5 vendor companies in creating optimal solutions.
Annual PM/TPM conferences:
Primary organizer of the Annual PM/TPM Conference at T-Mobile in 2022 and 2023. Transitioned the event from virtual to in-person format, resulting in a 65% increase in engagement.
EDUCATION
Business Analytics and Insights – Executive Education – The Wharton School, University of Pennsylvania
Project Management Training – Exams PM
Certified Scrum Master (CSM) – Scrum Alliance
Master of Science (MS) (contd..) – University of St. Thomas
Master of Business Administration (MBA) – Silicon Valley University
Bachelor of Engineering (BE) - S.P. University