The great tracker blocker conundrum - Technical

Read this article in 日本語.

The web is the confluence point of diverse ideas, motivations, and interests. Sometimes these interests end up being at odds. At Vivaldi, we are keenly aware of the challenges of keeping the wishes of various parties in balance to allow the web to thrive.

When we introduced the Vivaldi Ad and Tracker blocker a few years ago, our goal was to make it serve our users, giving them the ability to be safe in the knowledge that they wouldn’t suffer from tracking, distractions, or other issues that come from nefarious ads. But we also wanted to allow them to support their favourite websites, as well as Vivaldi itself. This is why we settled on having three blocking levels: one for no blocking, one for full blocking, and, between those, one to block only the worse trackers.

This middle ground exists because, even today, important parts of the web remain largely ad-supported. While we understand that some of our users legitimately wish to be fully shielded from ads, those of you who wish to support specific sites that rely on ads should have that ability, too.

However, as time went by, we began to notice issues with this setup. We began experiencing dwindling revenue from our partner search engines and were told that revenue was unusually low. As we investigated, we came to the troubling realization that enabling the tracker blocker would block some traffic essential for those deals to function.

Naturally, as these partnerships are an important revenue source, I was set to investigate and suggest a solution. Unlike many of the ad-blocker fixes I’ve done in the past, I feel that this change deserves to be explained in more detail. There is a possibility that we revert this if it doesn’t work out, but I expect it to remain in place for some time.

Before getting into the fix itself, I would like to discuss the original issue. The reasons behind it are complex, but I will attempt to shed some light on them.

The basics

Search engine partnerships rely primarily on users performing searches using the search engine in question – a common use case. In return, search engines reward us with a fraction of their earnings for those searches. Typically, search engines make revenue when users interact earnestly with an ad shown as part of their search results. We usually expect search engines to serve useful ads based on the context of the user search.

When the ad blocker is set up to allow showing ads on our partner search engines, it all seemed to work fine in appearance, but it turned out to not be so. The reasons for this are tied to the fight against ad fraud.

Ad fraud results from the current ad landscape, where almost all ad deals on the internet go through one of a few large tech companies, acting as intermediaries. These companies are the same big-tech companies you are already familiar with, namely Meta, Google and Microsoft.

This hegemony of big tech over the technology used for ads (or ad tech) means it’s almost impossible to have a direct trust relation between the entity wanting to advertise and the entity showing the ad. This means anyone could be unscrupulously using the system to commit ad fraud. Ad fraud aims to trigger large payments for ads by performing many non-genuine interactions with said ads.

Naturally, all the intermediaries have implemented mechanisms to detect ad fraud. If we want the few search engine ads we allow to function correctly, we need to make sure those mechanisms work. As it turns out, our tracker blocker was breaking them. In order to understand why, we need to discuss the history of ad fraud detection mechanisms.

The early days of ads

In the early days, all that was needed to get paid for an ad was to make sure it had loaded on your webpage, presumably indicating that someone had seen it. This is called CPM (Cost Per Mille [thousand impressions]). It was easy to set up, easy to get paid for, but also made it very easy to cheat. Anyone could send thousands of requests to load such an ad and rake in the money from this naive implementation.

Advertisers obviously quickly set out to mitigate this, by trying to establish whether an interaction with an ad was genuine. This implies verifying that the user agent interacting with the ad is trustworthy, and then attributing a payout for the ad interaction to the site that showed it. This is broadly called ad attribution. As such, the mitigations fit into three categories:

Ad attribution checks performed on the client.
Ad attribution checks performed on the server.
Raising the bar on the requirements for an ad interaction to count.

The first two kinds of mitigations are not very effective for CPM ads, as the site displaying the ads and benefiting from them fully controls the environment where they’re displayed. This makes CPM ads only really viable when there is full mutual trust between the advertiser and the site. As a result, CPM ads are rare today and pay close to nothing.

So, the bar started to rise on what ad interactions are required. The first step in that direction has also existed since the early days. It requires that the destination page of an ad gets loaded, presumably after clicking the ad itself. This is called CPC (Cost Per Click) and is still in use today, but not as much as it used to be. In its most trivial form, it is also easy to cheat by just loading the target link repeatedly.

Mitigating click fraud for CPC ads is easier. In this case, attribution checks are made on the ad-landing page. Since that page belongs to the company paying for the ad, they have every incentive to detect fraud accurately. CPC ads used to be fairly commonplace, but have become less popular in recent times as an even harder-to-cheat alternative has gained popularity.

Clicks alone no longer pay the bills.

Nowadays, a third kind of ad requirement is becoming the norm. It makes it so that a wannabe-fraudster must perform an action that is costly in terms of time or money, to the point of negating the benefit of ad fraud. This method is called CPA (Cost Per Acquisition) and requires performing an action that benefits the advertised site. Only after a visitor buys something, Subscribes, or starts to use a service, does the owner of the site displaying the ad get paid.

The question is then, how do we trust that a user agent or an IP address is actually trustworthy and not just part of a click farm, trying to produce realistic clicks (for CPC)? And how do we keep track of whether an ad led to a purchase (for CPA)? This is where the differences between client-side and server-side ad attribution come into play.

Server-side ad-attribution relies on information gathered directly by websites and shared with advertisers. This can be as simple as looking at your IP address. If your IP address is shared by many other apparent users, then you might be trying to get fraudulent clicks on a CPC ad. This can happen if you share an Internet connection with many people, like at work or school, or when using security tools that hide your IPs, like a secure proxy (usually commercialized as a VPN).

If you have created an account on a shopping site, some details may be shared with advertisers. Those details can be cross-referenced with details from other shopping sites to determine if you are a well-known, trustworthy shopper.

Client-side ad-attribution relies mostly on loading scripts from the advertiser, which report to the advertiser about the activity of the client. A given script can be loaded on any site that advertises using a given advertiser. These scripts can measure many different things, including what blend of sites the client visits, to decide if it’s trustworthy.

Unfortunately, such third-party scripts, which are loaded on many different sites and send information about what the user is doing to a third-party, look a lot like trackers (and sometimes also act like one).

Naturally, they end up on both ad blocking and tracker blocking lists. This means tracker blocking tends to break this sort of attribution completely.

A tracker or not a tracker?

The core idea of ad attribution scripts is not malicious. They allow the different companies involved in advertising to find out if anyone cheated. In an ideal world, these companies would all act in the best interest of the web and its users, and throw away that data as soon as they have established that no cheating took place.

Unfortunately, as I’ve mentioned above, those companies are essentially the big tech companies, who, time and time again, have proven hard to trust. Some are happy to use ad attribution as a pretense to perform even more data gathering. They may promise to play nice, but they are fickle, and there is not much to stop them from going back on those promises.

Given this situation, it is no surprise that all those scripts end up on lists of known trackers. With that, we have everything we need to explain why tracker blocking affects ad revenues.

Indeed, when the scripts responsible for ad attribution are blocked, clicking on an ad, and even making a purchase won’t count for anything. Therefore, if the Vivaldi Ad blocker is set up to allow ads on a site while the tracker blocker is enabled, that site won’t make much ad revenue. This is a problem because, despite this issue, trackers at large are still harmful to your privacy, including some of the ad attribution trackers.

And this is where the competing interests come into play. Ad tech companies want to avoid ad fraud, while our users don’t want to be tracked. We must somehow keep both happy if we wish to keep providing Vivaldi for free while remaining competitive in terms of privacy as our promise to our users (which, unlike some big tech companies, we intend to keep.)

So, how do we do that?

Telling advertisers what they already know.

There are broadly two existing ways to tackle the ad attribution problem. We decided to reject the current industry trend, widely unpopular amongst browser users, which builds the data-gathering required to attribute ads into the browser. Making the browser a willing participant in the process of data gathering, and putting all the data required for ad attribution in the hands of browser makers goes against our principles.

The approach we chose was to allow requests to specific tracker urls, for a limited time, in the context of the ad landing domain. Research on prior implementations did not lead us to find anything fundamentally better. This approach has several merits:

Allowed trackers are loaded only after clicking on what is obviously an ad.
Allowed trackers are only loaded for the ad landing site, and nowhere else,
limiting potential damage.
All other trackers remain blocked.
The tracking responsibility remains with the advertisers, using preexisting
web technologies.

We have partnerships with multiple search engines, since we believe users should have the choice of who they use for search. So, we needed a tailored list of exceptions for each of them. We also wanted to be able to fix any issues arising on the fly. Therefore, I ended up implementing some new extra rules options for the ad blocker, which would provide the level of control we need. If you are interested in the technical details, you can find those here.

As with the rest of the ad blocker code, this implementation is fully available as part of our released source code bundles. This allows you to fully verify that our implementation is as described.

The best part of this solution is that the ad attribution scripts will typically only run in the context of an ad served by the same company that serves the tracking script.

In other words, if you click an ad served by Google as part of using Startpage (as I mentioned earlier almost all ad deals on the internet go through one of the big tech companies, acting as intermediaries), the Google ad attribution script loads. It can potentially track that you have been on that site. But Google would have known that anyway since you have clicked the ad and therefore, we are not telling Google more than they already knew.

By and large, we have done our best to adhere to this principle, so we expect that the impact on your privacy will be minimal, or maybe even null.

What does all this mean for you?

In short, if you never click on ads, nothing changes. If you click an ad shown in the search results from one of our search partners, we will now allow a limited selection of ad attribution scripts to load, even if they are on our tracker blocking list.

These will be allowed to load for a limited time, and only as long as you remain on the site to which the ad led you. We believe that this limited scope approach will largely prevent any ability of those scripts to contribute to profile building and your privacy will remain safe.

You will find an indication of which trackers are being allowed by this mechanism in the Ad and Tracker Blocker popup.

Overall, both the ad blocker and the tracker blocker will keep operating as they always have and block requests for ads and trackers across the web.

“But I really don’t want any ads or trackers.”

That’s an understandable feeling, and we got you. If you would rather not see ads at all on our partner sites, and fully disable ad attribution, you can follow these steps:

For desktop:

Open Settings > Privacy and Security > Tracker and Ad Blocking
Click “Block Trackers and Ads”
Click “Manage sources”
In “Ad Blocking Sources”, uncheck “Allow ads from our partners”

For Android:

From the Vivaldi menu, open Settings
Select “Tracker and Ad blocking”
Select “Manage Ad blocking sources”
Uncheck “Allow ads from our partners”

Ad attribution is not currently implemented on iOS. However, we still display ads on partner search engines. You may disable it using the same steps as for Android.

If you choose to disable ads from our partners, we would appreciate a donation to support the development of Vivaldi. Developing a web browser is costly, and since we do not rely on investors, every bit counts.

At any rate, I support our users’ right to view and experience the web in the way they prefer, whether ad-free, for any reason. The work to improve the ad blocker will keep moving forward in the coming months, with the goal of reaching feature parity with uBlock Origin.

The fight for the open web continues

Ads on the Internet are in a complicated situation nowadays. On one end, they are still the only realistic way for some businesses to get funding; on the other hand, ad-tech companies have lowered the trust in ads, leading to more and more people installing an ad blocker.

The last thing we want is for most Internet businesses to retreat behind a paywall as a result of this trend, as this would be a large blow to the open web, and exclude anyone who does not have the means to join. Already we are seeing this in the news space where The Truth Is Paywalled But The Lies Are Free.

As always, we are fighting for legislation forcing ad tech companies to cease tracking users for the purpose of building profiles. We encourage the use of contextual ads, which do not require building a profile to be relevant. Furthermore, users must be able to trust that an ad presented to them is not a scam or malware.

With our funding hopefully secured for the future, we will continue the fight to bring this vision closer to reality.

The great tracker blocker conundrum – Technical