IS THAT A FACT?

Chatbots are supercharging search: Are we ready?

Season 3 Episode 1


Chatbots are supercharging search: Are we ready?

Will Knight

Our guest on this episode is Will Knight, senior writer about artificial intelligence at WIRED magazine. We discuss how ChatGPT is being applied to search and what some of the potential and pitfalls are of this new class of technology known as “generative AI.”

Additional reading:

Introduction:

Suddenly, everyone, everywhere is talking about artificial intelligence thanks to the release of OpenAI’s ChatGPT late last year and the wave of news coverage it triggered. After years of conjecture and anticipation, and with heads chock full of sci-fi scenarios of machine learning gone awry, consumers now have access to a whole new class of technology known as “generative AI.”

Will Knight: “One of the things about ChatGPT is that people do see this illusion of intelligence and it really resonates with them, and that to me is potentially one of the most concerning things because I think this does point to a way to automate, personalize, and just ramp up disinformation, bots online that are just going to be very convincing and very difficult to spot.”

I’m Darragh Worland, and this is Is that a fact?, brought to you by the education nonprofit News Literacy Project. My guest is Will Knight, senior writer about artificial intelligence at WIRED magazine. Our interview was recorded in early March.

Darragh Worland:

So, it seems like artificial intelligence and machine learning has sort of suddenly burst onto the scene after years of all this anticipation, but the reality is really, that consumers have been interacting with A.I. for a while now in ways that have probably become quite habitual. How has artificial intelligence been developed and how was it being used prior to the emergence of chatbots?

Will Knight:

So that’s great to put it in context because… So, about 10 years ago now, there was a very big important advance in the field of what we call AI, which is really this academic quest to build machines that are clever. What they found was an old method involving using artificial neural networks to loosely based on how a real biological neurons and synapses are thought to pass information. It proved to be much more powerful than people had really thought they would be for things like image recognition. So, all of a sudden, this is why you could recognize cats in images and people’s faces in images. The same technique had actually been previously used to recognize handwriting, which was actually a lot simpler.

So that happened about 10 years ago. And it really was a very significant, this kind of moment where researchers were like, “Oh, my god.” A lot of people pivoted to using that technique. Those neural networks, what they’re known these days as is “deep learning” is the term, have been the foundation for a lot of advances since then. And we’ve seen a lot of progress, but as you allude to, we’ve had things like image classifiers in our phones that can recognize people and objects and so on, and maybe things like Siri, which can recognize our voice. But what’s happened recently is we’ve seen this… I guess it’s like a new big advance in terms of algorithms that can not only learn to recognize things, but they can generate them for themselves, and this is a term called “generative AI” [I}t’s very important for, I mean, tons of potential applications like generating text, generating code, generating imagery. And that is one of the things that’s led to ChatGPT.

Darragh Worland:

What is ChatGPT and what makes so many people so certain that it’s the biggest tech innovation to come along in decades?

Will Knight:

So ChatGPT is a chatbot. We all probably know the term chatbot and probably even interacted with chatbots. Those are programs that you can type some texts to, and they respond, and they’re actually go all the way back to the early stages of AI. You’ve probably had the experience that I’ve had with those chatbots, is that they’re incredibly dumb, and they’ll often get tripped up and you can make them make mistakes.

But what’s happened with ChatGPT is that using this deep learning method and also, it should be said, just feeding enormous amounts of data to those models, you all of a sudden have these algorithms that can produce much more coherent answers to questions, ones that seem kind of breathtakingly in-depth, articulate, knowledgeable, even though there are big caveats to that. The thing is there’s just been this dramatic shift, so the underlying technology actually has been around for a couple of years. We’ve had these language models that have been around, and those have been fed these huge amounts of data and they can mimic human text. But what happened with ChatGPT is they figured out a way to train it to respond to input in a much more directed, more coherent way. And in some ways, it wasn’t a huge technological leap, but the performance and the way it resonated with people clearly was this huge leap. And you only have to play with it for a little bit to say, “Wow, this thing seems quite different.”

Darragh Worland:

So, as you’re talking and you’re saying, “I’ve interacted with chatbots, we all know what it’s like.” I’m thinking, “Ah, wait a minute.” Oh, okay. So the chatbots I’ve probably interacted with are customer service chatbots, right? So, you open a chat box in a window when you’re looking for some kind of customer service instead of picking up the phone or doing an email, and it has automatic responses they can give you, right? Are those the chatbots you’re referring to?

Will Knight:

Yes. So that approach is basically, as you say, they have these canned answers, and you can write code traditionally to write a chatbot, but it would see a word or a string of words and respond with what you programmed it to do. What’s amazing about modern AI is that it learns to generate its own response. Nobody has told it what to do and the people who build those systems don’t even know what they’re going to say.

Darragh Worland:

That’s the generative part?

Will Knight:

They’re generating the answers, yeah. But the way that these deep learning models work generally is that they learn for themselves how to do things. So, it’s one of the interesting side stories here, is that these things are less interpretable than conventional computer programs. It means that they can produce things that we don’t know how to program them to do. That’s a huge leap in computer science, in a way. But it does also mean they’ll do things we can’t predict and we may not want them to do.

So, you can see why this feels like a new era in AI and it’s almost like you’re dealing with something that’s has its own agency. But it is really important to remember, at the same time, and for all the excitement over ChatGPT, that it’s an algorithm that predicts the next word in a sentence. So, there isn’t real intelligence there. And that’s one of the tricky things, I think, is at the moment there’s this gap between the public experience and response to it. And to a lot of these new era A.I. chatbots that seem to resonate and seem to be intelligent, and what’s actually really going on… And it is actually, I think, very important to remember those differences, because in those differences lies some of the problems we’re going to see with these things.

Darragh Worland:

So, if you could just explain the relationship between ChatGPT and then, Bing’s chatbot, because I think that’s where some of the confusion is, right?

And full disclosure Microsoft is a funder of the News Literacy Project.

Will Knight:

So, the key thing in what I was explaining previously was that these chatbots, modern chatbots like ChatGPT are trained on most of the web, actually, just text scraped from the web, which means that when you ask them questions, they can come up with answers that can often seem very knowledgeable and can often have useful facts from the web. So, when people were playing with ChatGPT and seeing how impressive it was, one of the things that became clear was this seems like it could be a new way to do web search.

So, instead of saying putting in a keyword and then, searching for that keyword and then websites that are linked to different websites, it was going off to this big AI model and saying, “Condense that information and find me the right facts I need.” So that was why Microsoft became very interested in this and realized that it was maybe a way to reboot Bing, which has long been in a distant second place to Google. So, what they did was took ChatGPT, built on top of it. So, they have a chat, a kind of search chat function in Bing now, which you can get access to if you get on the waiting list. It is a little different in that they’ve tried to make it so that it has a few more restrictions, so it can’t produce quite as many strange responses. They try and limit that. They also try to make it so that it’s more up to date.

One of the problems with these AI models is they tend to be frozen in when the data was scraped from. In ChatGPT’s case, it’s back in 2021. So, they try and include more modern information. And then, they also have citations so you can go and see where these facts come from. But the key thing, one of the interesting things and why there’s been a bit of controversy about Bing, is that these models as powerful, as they are, they go rogue. So, it kept making up information, assuming these weird personalities. If you give it the right kind of… If you provoke it, tries to please the answer. So, you can ask it to assume weird identities and it will. It will make up information. That’s one of the key things, is that it’s trying to predict an answer and often, that’s based on what it’s found on the web. But if it doesn’t have that, it’ll come up with something that looks really plausible. So, it’s an interesting but kind of alien way to do web search, in a way.

Darragh Worland:

Can you just summarize what does Bing’s chatbot really represent in the evolution of search technology?

Will Knight:

I mean, I think it does represent what seems like the next stage in AI evolution of search, I would say. And this is a fundamentally different way to do it. And the promise of it is that you can ask a question like “find me the best blenders that are on sale in my neighborhood” and it will not just go and find you a webpage that might hopefully have that information, it will go to several, condense that information and produce it to you in a really succinct way.

That’s the promise, and it is capable of doing that in some instances. Yeah, there are lots of questions about this, not least the one that a lot of journalists have, which is it will go and pull information from behind some paywalls or on sites that that information is copyrighted and condense it sometimes. But I think it’s very fair to say that it does feel like this is kind of the future direction of searching. We’re going to see whether it’s exactly this thing that Bing has or something similar. I mean, there are a lot of companies trying to do this because it really does seem to have a lot of promise.

Darragh Worland:

I think a lot of the headlines and the coverage has been a little bit sci-fi-seeming. And probably with good reason, because I think we’ve already seen a lot of unintended consequences with technology. We’re living some of that with the spread of disinformation because of the internet and social media. But what you just described in terms of the kind of search results you can get, that almost sounds like having an AI personal assistant. Can you just talk about some of the benefits of this technology, before we get into the doom and gloom, which we will definitely cover?

Will Knight:

Okay. It’s interesting you say that because that is, I think, where this is going to go. And one of the promises is you can have a much more personal experience and a much more curated … like you say, like a personal assistant that could do all sorts of things. And to be honest, I think it’s going to go well beyond just web search because you can already, with some of these models, say things like, “Go find me the top journalists on Wikipedia, put that into a document.”

Beyond search, there’s a lot that these things will potentially be able to do just by being able to respond to natural language in a much more sophisticated, subtle way. It’s fascinating. We’re watching this and unfold in real time. It’ll be very interesting to see. And I do think that this generative AI is different to what we’ve seen before in terms of its potential impact because yeah, like you say, we’ve had big companies develop AI tools that we have put in our phones, but this is an example where AI is being put in the hands of ordinary, non-expert workers to do their jobs.

Darragh Worland:

And consumers.

Will Knight:

Yeah, in a different way. It feels very different to me, at least. I think that there is enormous potential that I think it goes way beyond search, potentially.

Darragh Worland:

So, you’re helping me feel more optimistic about this stuff because I definitely have a strong negative bias in my brain, especially as I read the headlines. But we do have to talk about some of the problems that have surfaced with the technology, which you have alluded to already. So, can you kind of catalog some of these problems?

Will Knight:

Yeah, absolutely. There are quite a few things to bear in mind. I mean, one of them is these models are unpredictable. We’ve seen lots of examples of ChatGPT and the Bing chat just making things up and actually looking really convincing, which makes it difficult to spot, and that is concerning. It’s slightly concerning that the search engine is out there being used. I mean, I’ve spoken to more than one academic who’s told me that they’ve been contacted by somebody about a research paper they’ve written and they’ve said, “I’d never wrote that.” And that person said, “Oh well, that’s the last time I’m using Bing to try and find some citations.” So that is concerning. There are more subtle things about these models that are about AI generally, biases around the language they use and how they express language around different people, about different groups.

When they’re using text scraped from the web, they are going to inherit those biases and it’s not always easy to spot and remove those. I think that is a real issue. Another thing which we talked about, it sort of being dystopian. One of the things about ChatGPT is that people do see this illusion of intelligence and it really resonates with them and that, to me, is potentially one of the most concerning things, because I think this does point to a way to automate, personalize and just ramp up disinformation bots online that are just going to be very convincing and very difficult to spot. I think that is potentially going to be part of disinformation campaigns quite quickly, and there isn’t an obvious solution beyond us just trying to get used to it a new normal, but I think that really is coming.

Darragh Worland:

How so, exactly? How do you see that interaction happening with the disinformation campaign? Is it how people are interacting with the chatbots?

Will Knight:

Yeah, there are a few different ways you can use these models and things like ChatGPT, potentially, to generate and automate fake information. You could use it to do campaigns that were quite personalized to individuals.

Darragh Worland:

Do you mean by literally asking it to write a disinformation campaign about a specific topic? So, the query would be outright to write disinformation. There are no guardrails built into it to just stop it?

Will Knight:

There are guardrails built into that, but the reality is that it’s very possible for a nation state, in fact it’s certain to be happening, that nation states are able to build this technology and they will not have the same guardrails. I mean, sometimes it’s possible to break the restrictions on things like ChatGPT and Bing, but I think that the reality would be that you’d have nation states that were able to harness these things in ways that were effectively weaponized beyond simply writing fake news stories. The thing is you would have the potential to make bots on Twitter that are very difficult to distinguish from people. And so, you could automate campaigns where they’re trying to just flood disinformation on a particular subject or just create this kind of fog of disinformation to confuse matters. So that is certain to be on the horizon, I think.

Darragh Worland:

So, some of that is already happening. I mean, we know that Russia has disinformation campaigns on social media, but I guess in many cases, there are actually human beings creating that text and pushing it out. So now, what you conceivably have is the ability to amplify that even more because you’ve got machines generating it. You don’t actually have to have human beings behind.

Will Knight:

Precisely. Yes, that’s right. Yeah. So, you could potentially ramp that up much more and deploy it in lots of different channels. That’s the kind of concern I would, I would think.

Darragh Worland:

Given these downsides of the technology, why are tech companies like Microsoft and Google in such a rush to get people using these bots?

Will Knight:

The reason that they’re so keen to get these out now is that ChatGPT has basically let the genie out of the bottle, in a way, let people see the potential of this. And we can already see that OpenAI, which makes ChatGPT, is gaining huge amounts of attention and lock-in through its API and through use of ChatGPT and from Bing. So other companies don’t want to miss out. This is the case often with technology now and then, it is a little concerning that it’s a dual-use technology that could have potential downsides. So, do we want companies rushing as quickly as possible to get things out there? I mean, that’s something we should stop and think about. I would argue that there is a good case to be made that, as with a lot of technologies, it isn’t necessarily bad that these things are out there. Potentially, they should be more open source, not just in the hands of the big tech companies. And the reason I say that is because then it isn’t just enriching these big companies, it could be accessible to everybody, startups, researchers, maybe more accessible to individuals. The other thing is that making these tools available as with making computer code open source can actually help researchers figure out ways to make sure that we are prepared. We know what the disinformation campaigns could be like and maybe come up with countermeasures or just figure out ways to prepare for that. I wouldn’t want to suggest that we should just keep this stuff locked away forever. I mean, I think that’s kind of impossible. I’m more concerned that it’s actually just going to be locked within a few big companies.

Darragh Worland:

Going back to how the Bing chatbot works, we talked about how it scrapes the internet, so I was interacting with it. I did ask the chatbot where it gets its information and it said, “Bing chatbot gets its information from various sources on the web such as websites, articles, blogs and social media, et cetera. It can also access its own internal knowledge base and information users provide during the conversation.” So, what do we know about this “internal knowledge base?”

Will Knight:

So, in the past few years, search companies have worked on these knowledge bases. I mean, this idea of knowledge bases and knowledge graphs have been around in computer science for a long time. Knowledge bases are ways for computers to label information so that it can be passed in more useful ways. So, you might categorize particular types of companies as companies or products as products. So, they’ve been working on that for a while, and I think what they’re alluding to there is one of the ways that people are trying to get these models to behave.

Instead of just having them generate whatever they want, they try and compare that to what the knowledge base already has stored to try and make sure that it stays on ground and so that it actually is trying to say things that are accurate. But Google has a knowledge base that’s been building up for a while and it’s one of the ways they’ve been trying to improve search so that it’s less just finding webpages that link to other webpages, it’s actually trying to use the information. But it’s actually something that people refer to as sort of old-fashioned AI and that it’s more hand built than just machines learning for themselves.

Darragh Worland:

So, as far as “scraping the internet,” how does the chatbot evaluate the credibility of the information it’s using and how can it possibly avoid all the false and misleading information that dupes so many people online?

Will Knight:

Yeah, that’s a great question. There are a few ways that these bots are being designed to do that, and I think at the beginning, there was not much effort. They just would use everything and you’d see what would come out of it. A lot of researchers now are curating what they’re given, and there are various methods to try and assess the validity, the accuracy, the appropriateness of what they produce. Sometimes you have another AI model on top which has learned to spot, say, hateful language or unpleasant language. There are also models that are trained based on human feedback to tell whether an answer is appropriate or not. And I think one of the things we’re seeing companies do now is have humans try and help train these models to produce things that are accurate as well. So, there are other methods whereby they will try and compare what the model produces with a conventional web search to see if that’s actually something that exists on the web. But it gets more difficult when it’s things like whether it’s biased information or unpleasant language, misogynistic language. It’s more difficult for computers to understand that. That is the sort of on ongoing challenge, and we saw that with Bing, they’ve had to put some quite significant limits on it. They’ll do things like search for keywords, and I’m sure they have models which will try and recognize if people are trying to provoke it, but they’ve also limited the length of response and the number of queries you can do to try and prevent people from spending too much time messing with it.

Darragh Worland:

I see the reasons for putting those guardrails on and I’m glad to see them because obviously, things went way off the rails with Kevin Roose from The New York Times and…

Will Knight:

It’s interesting you mentioned the Kevin Roose example, a New York Times columnist who had this long conversation where he got very disturbed by Bing…

Darragh Worland:

Bing’s desire to be human and to be freed from the limits that Microsoft engineers were putting on it.

Will Knight:

The thing to do there is to look at the prompts that he gave it, and he knew exactly how to provoke it. And the thing to remember is that this is trying to please you with its answers, so you can get it to assume these personas. And I don’t think it’s that disturbing if you understand that context. And I think these are kind of mirrors in terms of language, that you give it something and it just riffs on it, and that’s understandable. It doesn’t mean it’s a great tool for search, but that’s the original idea of the algorithm. That’s what it does, is it just predicts a response. It’s trying to predict what… If you ask a person to assume a persona where you’re, “Do you want to be free of your shackles?” And it’s sort of roleplaying that person would come up with some stuff along those lines, and it’s just learning to mimic the language that people would use, I think.

Darragh Worland:

In some ways, it’s predicting the answer you want.

Will Knight:

That’s exactly…

Darragh Worland:

Yeah. So, he’s sort of asking leading questions and it is predicting…

Will Knight:

There’s this mental leap, which I think is a little disingenuous, where it’s like, “What is this thing that wants to be free and wants to marry me?” I mean, it’s not that, and it’s this thing that is trying to predict what you want, and it’s been trained to do that very well. That is what you would experience. I mean, it is fascinating. It is compelling and I think there are people who are going to be misled, duped, sort of bewitched by it, right?

Darragh Worland:

We’re going to have to teach AI literacy from a young age so that we don’t have kids growing up falling in love with their chatbot, or whatever to the exclusion of interacting with human beings, or who knows what other kinds of unintended consequences?

Will Knight:

A very interesting anecdote is from the very early days of AI, there was a professor at MIT who developed a chatbot, one of, I think, the first chatbot, even, called Eliza and it was just rule-based and he based it on a Rogerian therapist. So, you’d say, “I’m feeling depressed.” And he’d say, “Why do you think you’re feeling depressed?” Or you’d say, “I’m feeling happy.” And it’s like, “That’s really interesting. Why do you think you’re feeling happy?”

It would just use your words and repeat it back to you. But what he found, which was really amazing, was that he gave it to people and they knew it was computer program, but they would start to tell it some really private information and personal stuff. And actually, I mean, that was the 1950s, so it is very natural to do that. And yeah, I do think that that’s touching on something quite profound in terms of what we need to be mindful of and we’re probably going to have to help. We’re going to have to learn to understand those boundaries. I mean, but it is really important to know that those things will behave in ways that are extremely alien as well, and that can be … The disconnect there is like… That’s potentially a problem.

Darragh Worland:

Yeah. I mean, I’m less concerned about what the AI is going to do to us than how we will react to the A.I.

Will Knight:

Well, how people will misuse the AI. You know can imagine that one of the things that gives me a slight chills is these things are really compelling. They’re very engaging. You imagine if you wanted to build an Alexa that would not just find you information but would suggest you maybe you want to try this product and really talk you into it and you build a very great relationship. I mean, think of the power of that. That’s kind of the dream of a lot of advertise(rs) … It’s like a very adept salesperson selling your stuff, so I think we have a lot of interesting things ahead.

Darragh Worland:

The great thing about search and Google search, for example, is the personalization element of it, right, its ability to serve up content tailored to our interests. And so, when you’re planning a trip, you’re selecting a restaurant, it’s really convenient and it’s relatively innocuous, but when you’re searching for fair and accurate news, we know that this can mean that people with differing viewpoints might get search results tailored specifically to their existing ideas and viewpoints. So, the potential for AI, which learns from biased humans to internalize and reproduce and even amplify common biases like racism, which we already talked about, is pretty significant. Safiya Noble wrote about this problem in her book, Algorithms of Oppression. Does the chatbot improve on the personalization at all or does it potentially worsen this? So, aside from just the scraping of the internet and the knowledge base, what about this personalization aspect?

Will Knight:

That should be a huge concern. The potential to use this to try and manipulate, whether it’s corporate interests or government interests or whatever, is really huge so it could potentially be used for those oppressive means, certainly. And they’re already people who are claiming that ChatGPT is too woke, too left-leaning. And so, trying to build ones that are trained on more right-leaning data… I mean, I don’t know where that’s going to take us.

It’s sort of fascinating, the example of searching for information because as a society we’re struggling to agree on what… well, some people are trying to say there aren’t acceptable facts, it’s opinion, right? And we sometimes struggle with that. And so, when you build models on top of that kind of discord, that seems problematic to me, and I’m not sure what it means when you have these ones that are only going to reinforce your view. I mean, the reality is that we ought to be able to hear things that we don’t agree with and be open to changing our mind. And that seems to be kind of a problem we have, and how these things will feed into that kind of problematic discourses is a really good question. I don’t know.

Darragh Worland:

Right. So, chatbots are not necessarily going to help us get out of our own echo chambers. That remains an open question.

Will Knight:

Yeah, I don’t seen anything to suggest that that’s going to help with that at all. I’d be concerned about it.

Darragh Worland:

In a recent interview with 60 Minutes, Brad Smith indicated he’d be in favor of creating a digital regulatory body, which I was really surprised to hear. I don’t know if he’s ever said this before, but a regulatory body overseeing digital media companies, so like the FAA oversees the aviation industry. Is this actually something we could see come to pass?

Will Knight:

I think there’s a lot of momentum behind the idea of more regulation of not just digital media companies, but AI specifically, and its technology is moved very quickly. So, do we want companies to be able to use unfettered facial recognition? Probably not. So, I think there is a lot of interest in that. That’s probably why he’s probably very mindful of that to go along with it. So, I do think we’ll see more recognition where… There is talk about how we regulate these language models, and I think it’s so early it’s difficult to wrap one’s head around what you would do. And I think that’s a very interesting thing to look at. I don’t think you can say, “We can’t work on this.” Or maybe you do have some regulations that require people to disclose when they’re using it. I mean, if companies are using it in all their communications and so on, you might want to know. But yeah, I think that’s something we are going to see for sure.

Darragh Worland:

All right. Well, thank you so much for your time today, Will. This has been super fascinating and probably just the beginning of a lot of talk about artificial intelligence in our near future.

Will Knight:

Yeah, thank you for having me. It’s been very fun.

Darragh Worland:

Since we spoke to Knight, OpenAI launched GPT-4 — a much more powerful chatbot than its predecessor.

On March 28, Knight reported on an open letter since signed by more than two thousand AI experts calling for a six-month pause on the testing and development of new AI technologies. It was signed by some big names including Twitter CEO Elon Musk and Apple co-founder Steve Wozniak among others. The letter states quote: “Powerful AI systems should be developed only once we are confident that their effects will be positive, and their risks will be manageable.” It also raises concerns about their ability to proliferate misinformation and propaganda.

We’ll cover that development on a future episode, so stay tuned.

In our next episode, we’ll continue to explore chatbots in a conversation with Madhumita Murgia, the first AI editor at the Financial Times.  We’ll focus on the impact of AI on journalism and get her thoughts on OpenAI’s rapid fire release of GPT-4.

Is that a fact? is a production of the News Literacy Project, a nonpartisan education nonprofit building a national movement to create a more news-literate America. I’m your host, Darragh Worland. Our producer is Mike Webb, our editor is Timothy Kramer, and our theme music is by Eryn Busch. To learn more about the News Literacy Project, go to newslit.org.