Read this article in español, 日本語.
ChatGPT came into the public eye a year and a few months ago. Ever since then, there has been an increasing trend in many sectors to try to put it to use to replace some of the things that people do, or to provide a new way to help people find answers to whatever they may wonder.
The world of web browsers has not been spared by this trend with multiple examples of web browsers integrating LLM (Large Language Model) functionality in one way or another.
Yet, even as they do so in the name of building the future, none of them seem to consider the glaring flaw in these features: The LLMs themselves are simply not suited as conversation partners, as summarization engines, and are only able to help with generating language with a significant risk of plagiarism.
In order to understand why all of those are fundamental problems, and not problems that are eventually going to be solved, we should examine the very nature of LLMs.
We do not want to get into a very long-winded explanation of the intricacies of LLMs here. Instead, we will settle for a shorter explanation. It might leave out some caveats, but everything said here does apply to the big popular generic LLMs out there.
Many experts in the field have already done an excellent job of this. Here is an interesting read: “You are not a parrot. And a chatbot is not a human“.
What are LLMs?
LLMs are just a model of what a written language looks like. That is a mathematical description of what it looks like. It is built by examining a large variety of sources and focuses on describing which word is the most likely to follow a large set of other words. There is a bit of randomness added to the system to make it feel more interesting and then the output is filtered by a second model which determines how “nice” that output sounds. In several cases, this second stage model was made by having many (underpaid) people to look at what comes out of the first stage and choose whether they liked it or not and whether it sounded plausible.
This has two fundamental issues:
- Copyright and privacy violations
In order to have a good idea of which word is likely to follow a set of words, it is necessary to look at a lot of text. The more text, the better as every bit of text allows to tweak the model to be a more accurate representation of a language. Also, much of the text fed into it needs to be relatively recent to reflect the current usage of the language.
This means there is a tremendous incentive to consume text from all recent sources available, from social media to articles and books. Unfortunately, such text being baked into the model means that it is possible to cause it to output the same text verbatim. This happens if, for a given input sequence, there is no better choice than regurgitating this original text. As a result, these models will in some case just repeat copyrighted material, leading to plagiarism.
Similarly, the mass of text coming from social media and other user-provided sources may well contain sensitive, private information that can similarly be regurgitated. Some clever people have found ways to trigger this sort of behavior, and it is unlikely that it is possible to protect fully against it. Being clearly aware of the risk posed by exposing private information, we have never been thrilled by the idea of it possibly getting baked into those models. - Plausible-sounding lies
Since the text that an LLM is built out of originates in large part from the Internet in general, that means that a lot of it is complete trash. That goes from mere poorly written prose to factual error and actually offensive content. Early experiments with the technology would result in chatbots which quickly started spewing out offensive language themselves, proving that they are unfit for purpose. This is why modern LLMs are moderated by a second stage filtering their output.
Unfortunately, as written above, this second stage is built by people rating the output of the first stage. To make this useful, they need to examine huge amounts of outputs. Even the most knowledgeable people in the world could not hope to check everything for accuracy and even if they could, they cannot know every output that will ever be produced. For those, all the filter does is help set the tone. All this leads to favoring the kind of output that people like to see, which is confident-sounding text, regardless of accuracy. They will be right for the most part on widely known facts, but for the rest, it’s a gamble. More often than not, they will just give a politician-grade lie.
The right thing to do
So, as we have seen, LLMs are essentially confident-sounding lying machines with a penchant to occasionally disclose private data or plagiarise existing work. While they do this, they also use vast amounts of energy and are happy using all the GPUs you can throw at them which is a problem we’ve seen before in the field of cryptocurrencies.
As such, it does not feel right to bundle any such solution into Vivaldi. There is enough misinformation going around to risk adding more to the pile. We will not use an LLM to add a chatbot, a summarization solution or a suggestion engine to fill up forms for you until more rigorous ways to do those things are available.
Still, Vivaldi is about choice and we will continue to make it possible for people to use any LLM they wish online.
Despite all this, we feel that the field on machine learning in general remains an exciting one and may lead to features that are actually useful. In the future, we hope that it will allow us to bring good privacy-respecting features to our users with a focus on improving discoverability and accesibility.
We will keep striving to provide a featureful and ethical browsing experience.