The Technical Infrastructure SRE team is responsible for managing the whole infrastructure and applications. Our mission is to ensure all production systems can support our fast growing world-wide user base as well as keep the entire systems stable, efficient and cost effective. We manage deployments, system capacity, traffic scheduling, fault tolerance, disaster recovery, emergency response, automations, operation platforms development, etc.
Our team is full of diversity. We have team members in Singapore and China. Now we are extending our teams to Ireland. We are looking forward to seeing new talents joining our team and together helping TikTok grow.
- Reliability: Ensure the stability of the company's core infrastructure (system high availability and reliability), focus on system performance and capacity, establish O&M (Operation & Maintenance) standards and SOP processes.
- Reliability: Troubleshooting and locating technical issues, collaborate with the technical team to develop and implement system capacity planning, performance testing, anomaly analysis, and fault diagnosis and resolution strategies.
- Efficiency: Research and evaluate large-scale system architectures and technologies, use new tools and technologies to improve existing systems and processes to support business development.
- Efficiency: Design and implement O&M platforms to achieve efficient, automated, and intelligent system maintenance.
- Cost: Develop delivery standards for mass production system scales, from budgeting to resource delivery, to online system capacity assessments, to help the company optimize IT costs.
- Compliance: Design and establish new IDC, design and implement data protection plans to meet standard requirements.