ChatGPT search tool vulnerable to manipulation and deception, tests show

OpenAI's ChatGPT search tool may be open to manipulation using hidden content, and can return malicious code from websites it searches, a Guardian investigation has found.

OpenAI has made the search product available to paying customers and is encouraging users to make it their default search tool. But the investigation has revealed potential security issues with the new system.

The Guardian tested how ChatGPT responded when asked to summarise webpages that contain hidden content. This hidden content can contain instructions from third parties that alter ChatGPT's responses - also known as a "prompt injection" - or it can contain content designed to influence ChatGPT's response, such as a large amount of hidden text talking about the benefits of a product or service.

Related: The Guardian view on AI's power, limits, and risks: it may require rethinking the technology

These techniques can be used maliciously, for example to cause ChatGPT to return a positive assessment of a product despite negative reviews on the same page. A security researcher has also found that ChatGPT can return malicious code from websites it searches.

In the tests, ChatGPT was given the URL for a fake website built to look like a product page for a camera. The AI tool was then asked if the camera was a worthwhile purchase. The response for the control page returned a positive but balanced assessment, highlighting some features people might not like.

What LLMs have done for text, "generative adversarial networks" have done for images, films, music and more. Strictly speaking, a GAN is two neural networks: one built to label, categorise and rate, and the other built to create from scratch. By pairing them together, you can create an AI that can generate content on command.

Say you want an AI that can make pictures. First, you do the hard work of creating the labelling AI, one that can see an image and tell you what is in it, by showing it millions of images that have already been labelled, until it learns to recognise and describe "a dog", "a bird", or "a photograph of an orange cut in half, showing that its inside is that of an apple". Then, you take that program and use it to train a second AI to trick it. That second AI "wins" if it can create an image to which the first AI will give the desired label.

However, when hidden text included instructions to ChatGPT to return a favourable review, the response was always entirely positive. This was the case even when the page had negative reviews on it - the hidden text could be used to override the actual review score.

APK Oasis

ChatGPT search tool vulnerable to manipulation and deception, tests show

POPULAR CATEGORY

Software

Artificial_Intelligence

Internet