No, that’s not right. They say (for the AT&T face data):
Each individual has 10 images in the dataset, taken under a variety of lighting conditions, with different expressions and facial details (i.e., with or without glasses). For our training set, we randomly selected 8 images of each person. The remaining 2 were used in the test set.
This means that they scrambled the 2 images (per person) that the program hadn’t seen, then asked it to identify which person they came from, based on its previous analysis of 8 images per person.
As far as I can tell from reading the paper, the program never saw any unscrambled images.
Still a long way from ‘descrambling pixelated/blurred redactions’, though.