A novel denoising scheme is proposed to fully exploit the spatio-temporal correlations of the video signal for efficient enhancement. Unlike conventional pixel domain approaches that directly connect motion compensated reference pixels and spatially neighboring pixels to build statistical models for noise filtering, this work first removes spatial correlations by applying transformations to both pixel blocks and performs estimation in the frequency domain. It is premised on the realization that the precise nature of temporal dependencies, which is entirely masked in the pixel domain by the statistics of the dominant low frequency components, emerges after signal decomposition and varies considerably across the spectrum. We derive an optimal non-linear estimator that accounts for both motion compensated reference and the noisy observations to resemble the original video signal per transform coefficient. It departs from other transform domain approaches that employ linear filters over a sizable reference set to reduce the uncertainty due to the random noise term. Instead it jointly exploits this precise statistical property appeared in the transform domain and the noise probability model in an estimation-theoretic framework that works on a compact support region. Experimental results provide evidence for substantial denoising performance improvement.