Originally published at: Google's Pixel 8 phones make an AI-enhanced splash | Boing Boing
…
It saws something about smartphones, that I assumed that the AI would be applied to the camera rather than the phone functionality.
I’m naive, but Im beginning to feel like “AI” is just another marketing buzzword in a case like this. Did we really need “AI” to recognize the word “ummm” and not transcribe it? Or did they just improve the code?
I like my Pixel 5, as it’s my preferred size, but Google will kill support for it shortly, which seems ridiculous after only 3 years. Also, the 5 was the last one with the free photo storage deal.
So I’ve gone ahead and ordered a Fairphone 5 which should be arriving any day now. Potentially up to 10 years of updates, and user-serviceable.
I have really enjoyed my 5, but things are gradually going wrong with it, and it’s never been great at, y’know, answering phone calls.
Without AI, you can program it to do a pretty good job of recognizing when someone says “umm” and leaving it out.
Implementing AI would help it do a better job of understanding when YOU say “umm”.
That said, I’ve no idea which is actually implemented here, but it’s possible it’s valid. Even if it’s not implemented on the phone itself, it’s plausible that an AI model was used in development to build the filter.
I mean, it is a marketing buzzword, but without more info, I couldn’t say how hollow it is.
I realize there are going to be dozens of different flavors of “AI”, but can you tell me the naive consumer what is actually happening in general terms?
I’m guessing it works like the text and image generators where it samples x number of possibilities and then creates a composite of the average? So when I say umm it is going to review what other words I said, review that with my samples or external samples, then decides whether I wanted to transcribe um or not?
They’re actually running LLM-based AI on the phone itself.
Yes, technically that’s a form of programming, but it’s not the deterministic, rudimentary IF…THEN kind of programming.
It’s not going to be programmed precisely on “umm.” It’s going to be given a whole bunch of training data and told “can you categorize which of these utterances have semantic meaning, and which are filler sounds?”
It’s no good to hard code it to the letters “u-m-m.” You want it to distinguish all of the uhh’s and ehh’s, and so on. And, further, you might want it to actually transcribe “umm” in cases where the user intends to write it.
So it’s more about determining intent from context (including predictive text), tone of voice, speed, and so.
Sadly for some of us, not to be released in North America.
Also, in general. An AI researcher was recently talking about how various software that used to be called “AI” is now just… software. A lot of the current “AI” will go the same way once the novelty wears off and it’s no longer useful for marketing.
New retirement plan - brush up on my coding skills and change my (middle?) name to “Quantum AI Cloud” and get hired as a consultant so that the marketing depts can easily make outrageous claims with minimal fine print.
To expand on SamSam’s excellent answer… When a lot of people (or at least my wife) uses speech recognition, she goes back and edits the transcription to correct mistakes (and remove those pause words). An AI on the phone can use that to compare what it transcribed with what the user intended, and better reconcile the two in the future.
This topic was automatically closed after 5 days. New replies are no longer allowed.