Karanveer Anand

Karanveer Anand

Karanveer Anand is a technical program manager with expertise in software infrastructure and reliability. He leverages deep technical understanding to drive complex projects, mitigating risks and ensuring system stability and scalability.
Authored Publications
Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
    Preview abstract Cloud application development faces the inherent challenge of balancing rapid innovation with high availability. This blog post details how Google Workspace's Site Reliability Engineering team addresses this conflict by implementing vertical partitioning of serving stacks. By isolating application servers and storage into distinct partitions, the "blast radius" of code changes and updates is significantly reduced, minimizing the risk of global outages. This approach, which complements canary deployments, enhances service availability, provides flexibility for experimentation, and facilitates data localization. While challenges such as data model complexities and inter-service partition misalignment exist, the benefits of improved reliability and controlled deployments make partitioning a crucial strategy for maintaining robust cloud applications View details
    Preview abstract To ensure project success, incorporating Midmortem is essential. It aids in organization by eliminating potential risks and implementing necessary changes to reach project milestones and objectives. View details
    Preview abstract Site Reliability Engineering (SRE) teams face unique project management challenges due to their dual responsibilities of supporting production environments and executing infrastructure projects. This paper explores the common issue of project delays caused by unexpected production incidents that divert SRE resources. Through a case study of a regionalization project, the author highlights the difficulties of adhering to timelines when engineers are frequently reassigned to address operational crises. To mitigate these challenges, the paper advocates for enhanced planning strategies, specifically reserving a percentage of engineering time for production work. Based on historical data, the author's team implemented a 25% buffer, significantly improving project delivery while maintaining focus on critical production incidents. Furthermore, the paper outlines best practices for Technical Program Managers (TPMs) in SRE, including proactive staffing, cross-service collaboration, early engagement, management of external dependencies, and consistent performance evaluation. By adopting these strategies, SRE teams can effectively balance project execution and production support, ensuring timely delivery and operational stability. View details