Apple introduces new AI-narrated audiobooks which are somehow different from just using Siri

Originally published at: Apple introduces new AI-narrated audiobooks which are somehow different from just using Siri | Boing Boing


Typo: “without, alas, lavishing poor Piesing’s prose with similar prose.”

I think the final “prose” should be “praise”.

xo gabe


Google, of course, is working on something similar. I’m in the middle of having a bunch of files prepared for them that our sales dept wants to put in the program.


“So here’s the new Stephen King…” “ok, process it through the Rathbone and Price vocal databases…”


I wonder what Piesing will think when he hears his book being read with a mostly flat intonation, incorrectly pronouncing made-up words or foreign language words? I’m guessing he will want his voice-over actors back.

I expect this AI narration will work less optimally than they are hoping, at least at first.

The inflection, acting, and the varied reading pace needed to voice a book well, that will be difficult to code. It will also be extremely difficult to know when the voice needs to change as the narration switches between characters.

At the very least, I’m guessing Apple will likely have to manually code triggers into the text of each book. That won’t be easy or fast or cheaply scalable.


Even though it’s not perfect, I often prefer text to speech to real narrators. A great narrator can make an okay book much better. But a mediocre human reading can also make a great book really annoying.

I also find that text to speech lets me put my own interpretation into the story rather than having to accept the narrator’s.

The problem with Apple introducing an AI narrator, is that no doubt Apple wants to find a way to charge for the service, and others may follow. Right now I can get my Kindle to do accessibility text to speech, although Kindle makes this awkward to do, but it’s not a feature I want to have to pay extra for on top of already having to pay to access the book.


At least in our case, it is going to be used for books that no one is ever going to pay a human narrator for, and it will likely be better than what the ebook speak aloud software can do now. Language switching can be managed, if the texts are tagged correctly. The tagging has to be provided in the publisher’s files.

This is also true of other book readers on Apple’s platform. I worked on one of them.

I did some of the accessibility work. A lot of it was telling iOS “don’t try to read the text from this bunch of views, when you want to read that text ask me and I’ll provide it” so we could make the VoiceOver description of a book make audible sense as opposed to having it be identical to the visual version (so you could get a whole author name not something truncated to fit under the image of the book cover), and to get a reading order that makes sense (all info about book X not X and Y interleaved because they happen to be adjacent to each other).

Some of it was somewhat messy code that let us turn to the next page and resume narration when VoiceOver got to the end of a page.

The part that I was proudest of though is when we got a bug report about poor pronunciation of French (or maybe some other language) that appeared inside a majority English language book. It had languished for quite some time before it drifted over to me. I had just recently read a NSHipster article about NLLanguageRecognizer so I was able to take the text and annotate the language used in each run of words (VoiceOver takes NSAttributedString not NSString, and one of the attributes you can set is the language). I could tell the pronunciation was different, but as “not even remotely a French (or whatever language was involved) speaker” I couldn’t tell if it was better or not. I recorded the reading of the page, got in touch with the original reporter (they turned out to be internal) and verified it was actually better.

It was pretty easy to fix, but if I hadn’t stumbled across that article and happened to be thinking about NSAttributedString’s bewilderingly large number of largely under documented and poorly organized attributes I likely would have just taken that bug report and do what so many other people did, concluded it was way way way too hard for the fairly small value of improvement and shoved it off to a vast pile of “I acknowledge this is a bug, and agree in theory it should be fixed, but it now sits in a huge pile of other bugs likely not to actually get enough attention to actually ever change state”. Don’t get me wrong, I’m sure it actually is very hard to algorithmically identify the language used in random short phrases, and decent voice synthesis in any of 20+ languages is most definitely non-trivial, but since someone else did all that hard work and the API to communicate that stuff was actually documented (albeit not where one would expect that documentation) it was in actual fact pretty trivial.

So I guess I’m proudest of serendipity in this case, but whatever, I’ll take it.

…oh, and the thing I originally came to say is oh man if you listen to a book with VoiceOver and compare it to a human reader the VO version is lifeless drek. Just not nice at all. If I were blind I would 100% prefer it to no voice at all, but if a book were available with a human narrator I would absolutely prefer that. So any “magic AI book reading voice” had really better be an order of magnitude or 3 better then listening to VoiceOver of a book’s text otherwise it really won’t be surprising when virtually everyone pans it as awful compared to “real” narrators. (I won’t place a bet about weather or not that is possible – ML has be doing some very supposing things in the last decade or so, jus that if it really is just VoiceOver or even twice as good as VO it’ll be crap compared to real people reading it!)


replace the labor of voice-over actors with robots who are willing

That word ‘willing’ is not the word that should have been used here.


It’s forced labor.

1 Like

Yeah, right! Free the robots!

(Of course, it’s not labour, at all. That’s the beauty of it to the shareholders who no longer have to pay real humans.)


Piesing: “Human readers are expensive, and AI is inevitable! Bring on the AI audiobook narrators!”
Publishers: “Cool, cool, you’re fine with AI art for the cover, right?”
Piesing: “Hell yeah! Fucking artists, wanting to be paid for making dumb drawings.”
Publishers: “Also, great news, we’ve got AI writing books now.”
Piesing: “Waaaaiiit wait wait wait wait wait wait.”


I did not realize how true this was until I tried listening to Kurt Vonnegut’s Cat’s Cradle, narrated by Tony Roberts. I managed about five minutes.

That feels as though you are overlooking the more important factor - what’s the impact on the bottom line for the next couple of quarters? The shareholders always win.

1 Like

I’m using Google Assistant to read web pages content, sometimes including the HCR topic here on the bbs. It is better than many a voice synthesizer I know. The Guardian, e.g., offers synthetic voice content which is just terrible.

But a human narrator can be so much more. Way back, I listened to the Harry Potter series narrated by Stephen Fry. I really forgot that this was read just by one person more than once, and would be baffled each time I realised after quite a while - or even after finishing a chapter, or the book.
The moment a synthetic voice plus some clever statistical language models can do this, I’ll be equally baffled (but by the programmers, in fact).

Oh, just FTR: I tried out ChatGPT. And I guess quite some jobs I did in my life will be nearly obsolete in the future. It is impressive.

Right now, as the article says, you can get Siri to do exactly the same, as has been the case since Siri was first introduced, as an aid for the hearing compromised.
Also, as was pointed out, there are tens, hundreds of thousands of books out there, that will never get a human-narrator, because of time and cost limitations.
While there may be limitations to a narration, especially an AI, often it may be down to poor proof-reading of the transfer from print to ebook - often no proof-reading at all, leading to terrible grammatical errors and bad paragraphs and line breaks.
I worked in print and publishing, I actually created books from a manuscript and a box of photos, and spent a lot of time proofreading type galleys and finished pages; (this was years before desktop publishing existed, I was pasting photoset type onto paper layout sheets to go under a process camera!) The errors I see in ebooks has me grinding my teeth in frustration, wishing there was a simple way of opening a page on my iPad and editing the errors directly.

1 Like

This topic was automatically closed after 5 days. New replies are no longer allowed.