The promise and pitfalls of generative AI for research

The implications of using generative artificial intelligence (AI) tools like the wildly popular ChatGPT for research was a hot topic of discussion at the recent annual meeting of the American Association for Advancement of Science (AAAS) in Washington DC. The chatbot, launched by OpenAI less than five months ago, has already been listed as a co-author on several research papers.

In January, the Science family of journals published by AAAS announced a complete ban on such text-generating algorithms, with editor-in-chief Holden Thorp expressing significant concern about the potential effect these technologies could have on research. The fear is that fake research papers written partly or entirely by programs like ChatGPT will find their way into the scientific literature.

Earlier this year a team from the University of Chicago and Northwestern University in Illinois trained ChatGPT to generate fake research abstracts based on papers published in high-impact journals. They ran these phoney papers and the original ones through a plagiarism detector and AI output detector, and separately had human reviewers try to distinguish which were generated and which were real.

In the study, plagiarism-detection tools couldn’t differentiate between real and fraudulent abstracts, but free tools like GPT-2 Output Detector were able to successfully determine whether text was written by a human or a bot. However, the human reviewers were only able to recognise the ChatGPT-generated papers 68% of the time, and they erroneously identified 14% of real abstracts as counterfeits.

Such findings have spurred scientific publishers to act. Springer Nature has also revamped its rules to specify that technologies like ChatGPT cannot be credited as an author, but they can be used in the preparation process as long as all details are divulged.

Dutch academic publishing giant Elsevier has issued guidance that AI tools could be used to improve the ‘readability’ and language of the research articles it disseminates, provided that this is disclosed. But Elsevier, which publishes more than 2800 journals, prohibits the use of these technologies for key tasks like interpreting data or drawing scientific conclusions.

‘In the middle of a frenzy’

During the AAAS media briefing on these technologies, Thorp stated that ChatGPT and similar AI chatbots have a lot of potential, but he emphasised that the landscape is dynamic. ‘We’re in the middle of a frenzy right now, and I don’t think that the middle of a frenzy is a good time to make decisions,’ Thorp said. ‘We need conversations among stakeholders about what we will strive for with tools like this.’

He described Science’s policy on the use of ChatGPT and its siblings as ‘one of the most conservative’ approaches taken by scientific publishers. ‘We get that eventually, once this all dies down and we have a thoughtful discussion about this, that there will probably be some ways to use it that will be accepted by the scientific community,’ Thorp added.

He made an analogy between these new generative AI technologies and Adobe Photoshop, when it first came along decades ago. ‘People did stuff to improve the way their images looked, mostly polyacrylamide gels, and we didn’t have any guardrails then,’ Thorp recalled, noting that the scientific community argued about whether this was inappropriate from the late 1990s to around 2010. ‘We don’t want to repeat that, because that takes up a huge chunk of scientific bandwidth … we don’t want to argue over old works.’

Thorp acknowledged, however, that his organisation is receiving a lot of feedback that it has gone too far. ‘But it is a lot easier to loosen your criteria than to tighten them,’ he said.

Gordon Crovitz, the co-chief executive of Newsguard – a journalism and technology tool that rates the credibility of news and tracks online misinformation – went further at the AAAS event. He said he considers ChatGPT in its current form ‘the greatest potential spreader of misinformation in the history of the world’.

The chatbot ‘has access to every example of misinformation in the world, and it is able to spread it eloquently and in highly credible, perfect English, in all sorts of forms’, he warned, adding that later versions of the tool like Microsoft’s Bing Chat have been trained to provide the reader with a more balanced account and cite its sources.

Crovitz recounted how he used ChatGPT to draft an email to Sam Altman, the chief executive of OpenAI. The prompt he fed the chatbot was to send Altman an email arguing why the tool should be trained to understand the trustworthiness of news sources and identify false narratives.

‘It produced the most wonderful email, and I disclosed that ChatGPT was the co-author, and wrote to him: “Dear Sam, your service is extremely persuasive to me and hope it will be persuasive to you,” and attached what the machine had created for me,’ Crovitz recalled. He said he is still waiting for Altman’s reply.

Could peer review be subverted?

Not only is there concern within the research community about the fact that ChatGPT has been accepted as an author on multiple research papers, but there are also questions over whether this technology could subvert the peer review process.

Andrew White, a chemical engineering and chemistry professor at the University of Rochester in New York, recently took to Twitter seeking advice after receiving what he described as a five-sentence, very non-specific peer review of one of his research papers. The ChatGPT detector that White used reported that the review was ‘possibly written by AI’, and he wanted to know what to do. Others chimed in that something similar had happened to them.

‘I went to Twitter because there was no smoking gun, and the review was unaddressable,’ White tells Chemistry World. ‘This is new ground – if you say a peer review is plagiarised, there is no mechanism to deal with that,’ he continues. ‘I wanted to err on the side of caution, so I talked to the editor and said the review was unusual and nonspecific and, regardless of authorship, it is not addressable.’

Peer review doesn’t pay or come with much external recognition, Whites notes and points out that the same is true for the annual reports that US researchers must write for agencies that fund their work. ‘Those reports disappear somewhere, and no one ever reads them, and I am sure that people are writing those with ChatGPT,’ White says.

Journals will have to evaluate research papers and peer reviewer comments even more carefully to be sure to catch anything superficial that might have been written by AI, he suggests. ‘Maybe that will slow down publishing, and maybe that is what we need.’

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.