Medical images such as 3D computerized tomography (CT) scans, have a typical resolution of 512×512×512 voxels, three orders of magnitude more pixel data than ImageNet images. It is impossible to train CNN models directly on such high resolution images, because feature maps of a single image do not fit in the memory of single GPU/TPU. Existing image analysis approaches alleviate this problem by dividing (e.g. taking 2D slices of 3D scans) or down-sampling input images, which leads to complicated implementation and sub-optimal performance due to information loss. In this paper, we implement spatial partitioning, which internally distributes input and output of convolution operations across GPUs/TPUs. Our implementation is based on the Mesh-TensorFlow framework and is transparent to end users. To the best of our knowledge, this is the first work on training networks on 512×512×512 resolution CT scans end-to-end, without significant computational overhead.