Death to the "chatbot passed a Turing Test" story

Of course it does.

Otherwise, where would all the calculators go?

I find it funny that Kevin Warwick pops up in all these stupid publicity stunts when in reality he doesn’t (or didn’t, when he worked at Reading) have much to do with them beyond being the front man and is actually an excellent control engineer.

1 Like

This is also why chess is hard. If you wanted a computer to actually analyze every possible game of chess then you aren’t going to ever get there. We know, though, that raw computing power could solve the problem in theory, it’s just that it can’t do it in the real world.

Just as dedicated people put their minds to a better way to work with chess problems and built machines that we don’t regard as intelligent that play chess very, very well. We can’t really say whether it is possible or not to do the same kind of thing with conversations, so the two paths you lay out certainly aren’t the only two possible approaches.

Bear in mind that humans manage to have conversations with substantially less than all the computing power in the universe. Can an even smaller amount of computing power run “have a conversation” without having to run “be consciously aware of what is going on” or “be capable of doing anything other than have a conversation?” We know the problem of having a conversation is solvable and we have an upper bound on what it takes to solve it, I don’t see how we can reliably guess what the lower bound is and assume there is no “cheat” that would accomplish the task but that we would still scoff at calling an intelligence.

I think it is actually the fact that every system can be gamed that makes you so confident that humans can detect computers.

I wonder why this got more attention than the work IBM did with Watson to understand natural language questions, which is far more impressive.

2 Likes

The problem is that the Turing test compares putative AIs to humans to determine intelligence. I see a very problematic assumption there.

It’s not actually absolutely clear whether you can “have a human conversation” without being consciously aware.

This is essentially The Zombie problem. The question is: can machines (or beings) without consciousness act exactly like beings with consciousness? If they can, then it means that consciousness has no effect on the brain. Some, like Daniel Dennett, argue that such a proposition is illogical, and that consciousness is required to act like a human.

A proper human conversation is probably one of the most human things we can do. Could we do it without consciousness?

I personally don’t know. I’m pretty much a physicalist, but I am sure that consciousness exists, and that I use it when I have a conversation.

There’s some truth to this idea – a lot as far as the Turing tests in these contests go. You need only look at how people make programs better to see it’s by figuring out what features make talk convincing, much more like chess programs than anything like how we think.

I think it’s all missing the point, though. Things like chess games are complex but the moves are restricted. A true conversation is much more open-ended, because it’s not just a question of giving good responses, but can be used to explore ways your partner understands the world.

Some random ideas:

  • Have you ever heard the fable of the frogs who wanted a king? I’m curious what you think the author was trying to say.
  • If you could spend an evening with anyone, living or dead, who would you pick?
  • If there were a way to live forever, do you think it would be something you would want to do?
  • Do you think there’s any meaning in the universe?

I wouldn’t expect anyone to have answers for all that, but I should be able to ask them to at least speculate, elaborate, or explain what they think. To some extent that’s how we conclude other people do think, right?

So if a program can give human-like discussions on such topics, I’d be inclined to say it must have at least some real aspects of human-like thinking. How much that should be the goal is a good question, but if it is, I would then say Turing’s test for evaluating it is not a silly place to start.

It’s just that these restricted conversations are something else entirely. They’re specifically set up to preclude investigating any deeper understanding, so they can be treated like complex chess puzzles. And, surprise, they end up saying nothing about anything other than those puzzles.

1 Like

I don’t really know either. But Turing’s test isn’t to have a human conversation, it’s to make humans feel as if they are having a human conversation. The test is to deceive, so it’s a question of whether it can overcome our ability to sense that something is off. Or it is a test to play into our propensity to anthropomorphize so that we start giving the benefit of the machine at which point it can slip up quite significantly. Our “other conscious being” detector certainly has significant faults and there’s plenty of evidence it turns up lots of false positives. I think chasing false positives is what people trying to beat the test are going to do.

But Turing’s original formulation of the Turing test wasn’t about having a real conversation, it was modeled after a party game. The point was to be able to fool someone based on answers to questions passed under a doorway, not to be able to carry on a great conversation for hours at a time. The fact that the Turing Test itself has slid towards having a convincing, meaningful conversation shows that it’s already fallen into “that doesn’t really count” mode. I’m pretty sure Turing’s original formulation of the test has been passed.

A thought that popped into my mind, reading these, is that Deep Blue had a database of every game that Kasparov ever played. It wasn’t just a machine to play chess, it was a machine to beat Kasparov - that’s not to say it wasn’t good at chess (I’m sure it would beat me) but it’s another way that systems can be gamed. If someone wanted to make a computer to fool you in particular, it would have to be able to answer these questions, but if it wanted to fool me it would need something a little different. Still, to pass the test it only has to fool one jury, not generally be able to carry on any conversation with anyone.

I wonder. Right now most seem unbelievably poor at even recognizing what they’re talking about, which you hardly need an in-depth conversation to notice.

I guess this comes down to what you say next, about different juries have different standards:

When nobody is interested in probing whether their partner has memory let alone understanding, no conversation is going to test for them, and so the heuristics we have now might be enough. Was the idea really to fool one non-expert jury, though? I sort of assumed it was being able to fool juries, so on a more or less consistent basis.

As for me, it’s not like I would use the same questions every time; for instance, I would ask about completely different stories. And if your program can listen to any random story I tell it about any random subject, give a plausible answer about the meaning or motivations, and support that with evidence from the story, I’m ok with saying it must have some form of understanding in it.

Now it’s easy to say that’s just because nobody has made such a program yet, and when we do I’d want something else, but I’m skeptical. Chess was picked not because it was reflective of how people think, just because it was hard, and nobody considered the possibility of tuning up something to be good at chess and incapable of anything else. Go turns out to be harder still, but I would hope by now everyone understands that a Go program would end up the same.

But being able to interpret real or hypothetical situations, so far as coming up with meanings and motivations, and support those answers? What exactly would anyone expect from an intelligent program that doesn’t actually fall under that umbrella? Maybe someone could wave such a thing off as merely a world-interpreting-program, but as far as I can tell that’s essentially what human understanding is, too. I’m genuinely at a loss to imagine how I would draw a distinction between the two.

I guess maybe the test is silly, but I still don’t think the choice of task is.

Back in the day, I had more fun playing with a ‘non-intelligent’ language learner called Niall than things like Eliza. Basically it worked out sentence structure from what you typed and tried to write its own.

And now I see there’s an Android version…

The point of having a test is that at some point you call it quits an the test is passed. If the exercise is to keep following the computer around for the rest of its life looking for evidence that it might be a computer, then even a computer that is significantly more intelligent than we are would slip up eventually.

There are plenty of things that computers could do that would make me think they were intelligent. But if we formulate any specific test, and then people make computers to try to beat that particular test, my level of skepticism that any real intelligence had been produced would be very high. The passing computer is a machine designed to win at a particular game.

I guess my question is, can we be sure that our brains are the absolute minimal, most efficient design for the purpose of having a decent conversation? To me, there seems to be a lot going on in here that is quite extraneous, so I can believe that a shortcut could be accomplished.

I wouldn’t doubt your answer on that. But I do think there is an alternate question, would I trust that some form of intelligence and comprehension is necessary to have a decent conversation? I would say not at all for the cases being used, but I’m inclined to think the answer is yes for more in-depth or conceptual topics.

I guess I’m arguing that it isn’t really a particular test to tune up to any more, if you can make it depend on displaying a broad enough understanding. I think that should be possible within the framework of conversations if juries knew to try for it.

This topic was automatically closed after 5 days. New replies are no longer allowed.