Abstract: Masked autoencoders (MAEs) have established themselves as a powerful pretraining method for computer vision tasks. While vanilla MAEs put equal emphasis on reconstructing the individual ...