A warehouse-scale computer (WSC) is a vast collection of tightly networked computers providing modern internet services, that is becoming increasingly popular as the most cost-effective approach to serve users at global scale. It is however extremely difficult to accurately measure the holistic performance of WSC. The existing load-testing benchmarks are tailored towards a dedicated machine model and do not address shared infrastructure environments. Evaluating the performance of a live shared production WSC environment presents many challenges due to the lack of holistic performance metrics, high evaluation costs, and potential service disruptions they may cause. WSC providers and customers are in need of a cost effective methodology to accurately evaluate the holistic performance of their platforms and hosted services. To address these challenges, we propose WSMeter, a cost effective framework and methodology to accurately evaluate the holistic performance of WSC in a live production environment. We define a new performance metric to accurately reflect the holistic performance of a WSC running a wide variety of unevenly distributed jobs. We propose a model to statistically embrace the performance variances amplified by co-located jobs, to evaluate holistic performance with minimum costs. For validation of our approach, we analyze two real-world use cases and show that WSMeter accurately discerns 7% and 1% performance improvements, using only 0.9% and 6.6% of the machines in the WSC, respectively. We show through a Cloud customer case study, where WSMeter helped quantify the performance benefits of service software optimization with minimal costs.