IS THAT A FACT?

Will chatbots change how journalism is practiced?

Season 3 Episode 2


Will chatbots change how journalism is practiced?

Madhumita Murgia

Our guest on this episode is Madhumita Murgia, the first artificial intelligence editor at the Financial Times, based in London. We talked about how generative AI is changing journalism. Our interview was recorded in late March.

Additional reading:

Will chatbots change how journalism is practiced?

Introduction

Darragh Worland: If you’ve spent time any time using generative AI like ChatGPT or Google’s Bard, you’ve probably seen how it can help us do extraordinary things, but the technology could also lead us astray. When it comes to journalism, such tools can deepen reporting. But just how much should journalists trust chatbots, and how often are they using them?

Madhu Murgia: 

I’ve seen how it can be a really great assistive tool. Something like an intern who can help to draw out information from complex or long documents and help to summarize themes and ideas, which then gives you a jumping off point to go away and do the original reporting that we expect of journalists.

I’m Darragh Worland, and this is Is That a Fact? brought to you by the education nonprofit The News Literacy Project. My guest for this episode who you just heard is Madhu Murgia, the first artificial intelligence editor at the Financial Times, based in London. We talked about how generative AI is changing journalism. Our interview was recorded in late March.

Darragh Worland:

Why did the Financial Times feel it was important to create an AI editor position at the paper?

Madhu Murgia:

I’ve been covering AI since about 2016 and the conversation to have an AI editor actually started before the introduction of ChatGPT, which has captured everyone’s imaginations. But the goal really was to have somebody who thinks more strategically and be a bit more thoughtful about how to draw together all these different strands in AI development and how it’s being introduced in various industries and also to have a bit more of a global perspective. And at the same time that we were having the discussion, this generative AI trend burst into the picture. And so I think the timing has been right for that because there’s so much news coming at us. And so the role is really to interpret all of that to look at how it impacts different industries and also across the world.

Darragh Worland:

And do you think other news organizations are going to be following suit in creating positions like yours?

Madhu Murgia:

Yes. I already see that happening. I think MIT Tech Review has somebody who focuses on this. Bloomberg does as well, The New York Times. So it’s happening and it will certainly happen more. I think the reason is that this technology which has so far been quite pretty much in development, has really moved into the product consumer space and it’s now infused into products touching billions of people like Microsoft Office or Gmail. And so it means that people want to read about it and learn more, not only because it touches their daily lives, but also because it’s going to disrupt all their different industries and different types of jobs. So, I think that’s when it really becomes this mainstream conversation

Darragh Worland:

And it’s touching journalism and journalist jobs as well. So, how is generative AI affecting the practice of journalism and how might that evolve over time?

Madhu Murgia:

Yeah, I think this feels like the thing that’s top of most media organizations and most journalists I talk to, top of the mind for them. I think part of it is because of the nature of the technology which is how closely it mimics the things that we do and that we know are really hard to do, which is passing information and writing things. So, it really affects us in this very elemental way. I think it’s definitely going to impact how we do our jobs.

I’ve been playing around and experimenting with it purely from trying to understand how the different versions and different companies products compare. I’ve seen how it can be a really great assistive tool. Something like an intern who can help to draw out information from complex or long documents and help to summarize themes and ideas, which then gives you a jumping off point to go away and do the original reporting that we expect of journalists.

Darragh Worland:

So, CNET was recently called out for not disclosing that it had been publishing AI generated stories that were edited by humans, but they were using the byline “CNET Money Staff.” Several factual errors came to light and even plagiarized work were identified in these stories. The editor-in-chief has temporarily paused their use of AI, but he does plan to pick back up when he feels more confident in the tool and the organization’s editorial processes. What are your thoughts about this?

Madhu Murgia:

I think the reason that this evoked these strong reactions is because of what people expect from a trusted news organization. Because there’s this trust that we have between the journalist and the reader which is, “Look, I’m going to go away. I’m going to find out what’s really true. I’m going to confirm and check the facts and I’m going to report what’s happening. That’s my promise to you as a journalist.” That means that we’ve gone away and done the work. And for example, at the FT, we have a rule that any stories that we break, we have a two-source rule, which means you have to have two people independently saying the same fact in order to verify and publish.

So, I think that trust and that original reporting that we promise we’re doing these kinds of actions contravene that promise. We certainly, and I wouldn’t expect any news organizations to use generative technologies like ChatGPT as part of their published work without disclosing it, that just as we would credit any human assists that we had.

And also it’s part of the public education around AI. Because we know and we’ve been reporting this, that there are errors that crop up in these technologies, that they aren’t a hundred percent accurate, which is also what journalism is supposed to do. We’re supposed to confirm and be accurate in what we say. So, by disclosing that a generative AI was used it means people come in with the understanding that this is not factually perfect. There might be use cases where that isn’t completely necessary, where it’s something more creative or fun or a depiction of something else. But when it comes to news, we have to show that we’ve done the work of verifying the fact and disclosing is a part of that process, I think.

Darragh Worland:

You would hope that in the editorial oversight of the use of this generative AI in the reporting or the assembling of the story, that you would be catching those factual errors in that plagiarism. You wouldn’t want to go to publication with a story that’s got factual errors in it, anymore than you would with a human putting together the story, right?

Madhu Murgia:

Yeah. You definitely couldn’t publish something that was just generated straight out of one of these systems because even though people producing them say that they make errors. You definitely need oversight in any use case where this is being put out to the public. Because you want to always be as accurate as possible and not pollute the public conversation with misinformation and made up things which the people who make these systems, they call them hallucinations. You would definitely need editorial oversight of anything that you used. But then I also think that that’s the reason you disclose it so that people understand that, “Hey, read this with your skeptical hat on, or remember that this thing makes errors.”

Darragh Worland:

You used the word “hallucination” and I’ve heard this word come up in relation to AI before. It’s a strange term to hear in relation to machine learning. Can you explain what it means and why the word “hallucination” because it sounds like something we would apply to a human being and not to what’s ultimately a machine.

Madhu Murgia:

With these language generating systems, it is really hard even if you understand or report on how they work, not to humanize them because of the way they communicate in natural language. There’s a lot of interesting commentary from linguists like Emily Bender who say we should avoid using words that mimic human thought or reasoning or remind us of how humans think because it’s not the same thing. It’s not a thinking reasoning being.

So, I think the word “hallucination” essentially means made up facts. It makes up things. So, we can call it misinformation, we can call it false or made up, is the easiest way to say it. But really what it means is that because these systems are just working by predicting the most likely next word in a sentence, sometimes when you ask it a question and it doesn’t know the answer, it just fills that with anything, even if it doesn’t know. Because that’s how it works. It works by generating the most likely next word in a sentence.

So, if you ask it really specific facts, they’re trained on something that’s cut off at a certain point that it doesn’t have access to and then it lives in its own box and isn’t connected to real time information from the internet. So, if you ask it a really specific factual question that it doesn’t know the answer to, it will just predict what it thinks should be the next word and the next word. And that’s where these so-called “hallucinations” come from.

Darragh Worland:

That sounds like a major limitation. A human being who did that would be considered a compulsive liar if they did that too many times. Isn’t that problematic if it hallucinates?

Madhu Murgia:

It’s problematic because the outcome is that it can make mistakes. But I think if we go in knowing that it can make mistakes, then we build that into how we use the tool, right? Which is why we can’t use it to just write stories that we then publish because we already know that it can do that. But because it’s a powerful tool and it’s been trained on basically all realms of digital material on the internet, it’s usually pretty accurate. And so that means that it can be used as a good summarizing tool or a shortcut tool or an ideating tool to help you brainstorm ideas or what rabbit holes should I be running down or what are the big themes.

And those use cases don’t require a hundred percent accuracy, but it still can be a really helpful way to use it. So, I think the most important thing is knowing that which is what I call public education, where if we are teaching kids, you can’t use it to just submit an essay because there can be wrong things in it, but maybe you can help it to come up with ideas or scenarios and then go away, check that and whatever, then it can be a more useful way of approaching it.

Darragh Worland:

So, does the Financial Times use any AI in the production reporting and editing process so far? And if so, what kind of editorial process do you have in place to avoid publishing any hallucinations?

Madhu Murgia:

So, we haven’t, to my knowledge, used any AI for anything that’s been published by us so far. But like people everywhere and journalists everywhere, we’ve definitely been trying it out and playing with it and seeing how it works and what it’s good for and what it isn’t good for, and really trying to kick the tires with it a bit. I would say it’s still really early for us to have any policies specifically around it. But I would expect that from our broad ethical training as journalists, particularly I can only speak for us, any journalist would know that using a tool like this would require an internal discussion with your editor around, “Should we even use it at all and what disclosures there should be?” But we are still very early with that. I don’t think we’re at a point where we have planned to use it in any way and we haven’t got any set policies.

But others do. I’ve seen that Buzzfeed has said that they’re going to work with OpenAI. We recently reported that a publisher of at least a 100 newspapers, national and regional here in the UK, so some of the papers they publish are the Daily Mirror and the Daily Express. They are exploring how to use AI to assist in news writing, helping reporters to compile knowledge around topics. They’ve got a group who are working through this stuff. It’s definitely on our minds here and on others as well. And the policies I think will follow.

Darragh Worland:

So, GPT-4 has now been released within months of its predecessor. It just seems like the pace of this generative AI and its development has sped up so much. How much does it improve on GPT-3 in terms of its ability to stop the spread of mis- and disinformation and detect bias? And also just in general, how much does it improve on its predecessor in its ability to do anything?

Madhu Murgia:

This is an open research question about the question you ask around. Is it better at mitigating bias or saying discriminatory things? This is something that’s being actively studied. The reason as you mentioned is because it’s all happening so fast. It’s not like we’ve had months and years to have researchers plugging away and playing with this stuff. It’s already out there, everyone’s using it. We’ve all got our hands on it at the same time, researchers and people who think about safety as well as journalists whose part of their job as accountability. So, we’re all figuring this out together.

In terms of how it’s better, I have anecdotal evidence and I’m not an expert. Just to give an example with GPT-4, you can give it so much more text and you can just see why that would make a big difference. If I can put in 3,000 words of text versus 200, you can imagine why the responses that I would get from two systems that did that, one would be way more nuanced and informative than the other.

So, you certainly see from my comparative playing that GPT-4 feels like the most advanced technology out there. It has more nuanced. If you look at the legal sector, someone I spoke to there said when they compared the previous generation and this one, it was just able to much better analyze a contract and spot areas of where there might be legal risk within that contract, which wouldn’t have been possible before.

Or similarly I spoke to someone at Duolingo, which is the language learning app. And they said they now have with GPT-4 a tool, which is a role play. So, you can pretend that you’re talking to a barista at a Parisian cafe for example, and have a conversation and they have all these different scenarios. And he said that would’ve never been possible with the previous version because this requires the AI to inhabit a character. It needs to know context and setting. So, it’s more than just the conversation.

Darragh Worland:

So, what is the potential for AI to revolutionize content moderation and detection of misinformation on social media?

Madhu Murgia:

What do you mean on social media?

Darragh Worland:

So, obviously people can create deepfakes and potentially spread them on social media. We know that dystopian future, we imagine that. But I’m just wondering also, and there’s been talk about this, could tools like GPT-4 power Twitter’s content moderation and automatically label media shared there? Could it auto-reply on overtly false claims that have the potential to do harm? Could we employ this technology in a way that can do the work that we know these social media platforms have really been struggling to keep up with humans behind them?

Madhu Murgia:

Yeah, no, that’s really interesting. I haven’t thought so much about the flip side of it. But the reason it’s so hard to automate content moderation is because of all of the gray areas in how people speak. There’s language, every different language has different logic embedded into it. And you say different languages have different ways of saying things that might be harmful or toxic. You need context of that language, of that culture. And even within a language and a culture, there are so many ways to get around saying the obvious thing. And that’s been the real struggle. Which is why these companies have had to use tens of thousands of humans to do that gray area moderation. Although, Facebook in particular, Meta has said in the past that they’re catching a lot more of their misinformation or doing a lot more of their moderation through automated systems when it’s obvious stuff.

So, I do see how a language generating, if you were able to classify it, you could get a system like a GPT type system to say, “This is why this is wrong.” And it would just be quicker to do rather than getting a human to type out what was wrong with that particular statement. I guess you could scale it a lot quicker.

But I don’t necessarily see how it would replicate that same problem we have about understanding nuance embedded into language and culture and new words that evolve very quickly. For example, I know with Twitch and so on, I’ve heard that people evade content moderation or on TikTok even by using completely innocuous words to mean something else, and then everybody knows what it means. But obviously the machine’s going to pick it up and how quickly can you keep evolving? So, it becomes this race between human and AI. So, I think, again, I think it would be a really interesting research question to see how much could something that passes language help to better understand the nuance that we’ve struggled with so far.

Darragh Worland:

So, all the talk at first was about Bing’s chatbot and then Bard was just released. But I think there’s some question already about whether Bard is really ready for the public. There was a PC that said Bard not only willingly created a conspiracy theory, but it fabricated citations from well-known reputable sources. It said The New York Times reported something on a specific date when it did no such thing. What are your thoughts about Bard? I know it’s built on technology that’s different than Bing’s chatbot, which is built on ChatGPT. Yeah. What are your thoughts?

Madhu Murgia:

Yeah, so Bard is from Google and they’ve been working on this stuff probably longer than anyone else. And this is based on a model they’ve had for a while called LaMDA. I’ve been trying out Bard since it’s been out. I think what you said about making up citations, I’ve found that also with GPT-3. This is part of the hallucinations problem that we talked about. So, if you ask it, “Has the FT written anything about this topic?” I’ve had responses where it says, “Yes, here’s the link,” and it’s just a dead link. It’s done that for other publications as well. I’ve seen people on Twitter say that they’ve made it up similarly. It’s not just URLs to news organizations, it’s so to citations to academic papers that don’t exist.

Because again, if you look at it through the filter, it’s just predicting the next word in a sentence. It’s trying to come up with what it thinks is the answer you want. So, if you’re saying, “Can you give me a citation?” It’s like, “Yeah sure. Here’s one that’s plausible based on the question you’re asking me.” So, if you know that going in, then it makes sense why you can’t trust the factual stuff that comes out of it.

I think it’s not just a Bard issue and they obviously have flagged it up just like OpenAI did, that you need to check the staff that comes out of this. With that in mind, I don’t think it’s particularly worse than anything else that I’ve tried. I think Google has been really careful about the guardrails of what you can and can’t ask it, which means that the responses are more limited compared to GPT-4. But it also means that they’ve tried to be much more careful about it going rogue, for example, and saying really bad things. That was really fascinating to see all the different ways they tried to break it.

That’s going to have to happen continuously with these models because you can’t just test it before you put it out into the world and then stop. Because really this is the real test when the models are being interacted with by tens of millions of real humans, not a select group of employees or experts because this is how it will really be. The tires will really be kicked on how can you break this and make it say bad things? So, I think they’ll have to have an ongoing process of seeing how are people using this, how is it breaking and how can we prevent it? It’s definitely not perfect. Bard is not perfect, neither is GPT-4. You can still find ways to break them today.

Darragh Worland:

We heard the same thing from Will Knight of WIRED about the tendency of ChatGPT to want to please the user. I was grinning when you were saying that because it just makes me think of this personal assistant that’s a people pleaser. It’ll just do anything to bend over backwards to serve you, even give you a link to nowhere, which is obviously not helping you. In fact, it’s doing the opposite.

Madhu Murgia:

But I think what it’s been built to do is answer your question.

Darragh Worland:

Yeah.

Madhu Murgia:

When you’re asking it, “Oh, has the FT written about it? Can you show me a link?” It’s trying to answer your question with something.

Darragh Worland:

Why can’t it just say “no”? Why is that not an acceptable answer?

Madhu Murgia:

I think it’s part of the architecture. And what I’ve seen and not myself, but what I’ve heard from in my reporting is that if you simply say, “only give me right answers” or “don’t make things up” or “stick to the facts and don’t make things up,” then that actually really improves that problem.

Darragh Worland:

That’s really useful information.

Madhu Murgia:

So, tell me what you know about X? Don’t make anything up if you don’t know the answer.

Darragh Worland:

I’m sorry to keep humanizing it. However, it is like working with an intern who does really want to appear to know things it doesn’t.

Madhu Murgia:

You are trying to work with a tool that’s supposed to give you an answer to whatever you want. So, if you tell it, [if you’re] more specific about the parameters and say, “Don’t make anything up.” That instruction helps it to narrow down the field of what it’s able to.

Darragh Worland:

It would be great if the programmers could program it to say “no, but …”

Madhu Murgia:

Well, I think it’s because it’s not connected to the internet, so it doesn’t necessarily have all up-to-date knowledge of what’s going on. Bing is, obviously, because it’s the search engine connected to GPT. Even Bard is they say it’s grounded on Google search responses which makes it a little bit more accurate than it would be if it wasn’t trained on Google search webpages. So, I think one way in which it could be more accurate, which is why I mentioned that if you input information like a transcript or a document, then it doesn’t do that because it has the contours and the outlines of what you’re asking it.

So, I think giving it specific information and instructions helps to improve that. And it’s not always useful to get an answer that’s “no,” because it depends on what you’re trying to do with it.

If you’re just trying to come up with ideas for the stuff, and it doesn’t matter if the ideas are a bit wrong or whatever. You don’t want it to keep coming back and saying, “But this is the only idea I have because everything else is wrong or I know has incorrect facts in it or whatever.” I think the balance to strike is between it being flexible enough to allow you to be a bit more blue sky with it, but also for it to not respond in a way that makes it seem very persuasive while it’s giving you wrong answers.

Darragh Worland:

That’s really an incredibly valuable takeaway what you just shared. Should we expect the dizzying pace of AI advances to continue or do you expect these technologies to plateau in the near future?

Madhu Murgia:

It’s really hard for me to know the answer without being on the inside of one of these companies about, where does this end? Because this has been a huge debate in the community as well. Does it keep getting better to the point where there is some form of intelligence, more than just a predictive technology? And there’s a whole bunch of new apps have got these plugins which means you can book things through ChatGPT like on Expedia or you can try and reserve a table at a restaurant. So, it’s moving very quickly from being something that you communicate with to something that’s able to execute actions for you by plugging into the internet. So, you can see how that quickly spreads.

And, in some ways, even if the technology itself doesn’t become more powerful, the way that it’s being plugged into the internet means that it’s going to change how we live our lives very quickly. And that certainly is dizzying. And with that come all the risks and the fears around safety and bias that we’ve been talking about, which people need to keep up with. So that’s my big takeaway.

Darragh Worland:

Thanks for listening. If you get a chance to interact with Bing’s chatbot or Google’s Bard, try Madhu’s suggestion of telling it to stick to the facts and let me know if that improves your results. Drop me a note at [email protected].

Is That a Fact? is a production of the News Literacy Project, a nonpartisan education nonprofit building a national movement to create a more news-literate America. I’m your host, Darragh Worland. Our producer is Mike Webb, our editor is Timothy Cramer, and our theme music is by Eryn Busch. To learn more about the News Literacy Project, go to newslit.org.