I’ve been thinking about this a bit, and it seems like a good project for machine transcription. Google voice already does an ok job transcribing voice mail messages, and it would seem like podcasts would lend themselves better due to generally being recorded under better circumstances.
At the very least, getting a rough transcription would reduce the initial workload, which could then be edited.
At its best, I could imagine a system that would correctly recognize speakers and produce transcriptions with them correctly attributed.