How product usage tracking can compromise privacy

Innocent feature tracking can quickly turn into unacceptable user behavior monitoring, user profiling and a money-spinner for a company. Tarquin Wilton-Jones, Security Expert at Vivaldi, describes how.

Man with laptop experiences product usage tracking.

A user may not realize just how much profiling information is being gathered from the way they use a product.

When it comes to tracking feature usage of a product, there are two extremes.

At one end, there is the idea that the user’s activities are ultimately private; that the user has the right to their privacy, that a company should not actively monitor which features of the product the user favors.

At the other extreme, there is the approach of monitoring each and every action that the user takes within a product, recording their behavior, and using it to streamline the processes for the user, even remove unused features to reduce maintenance.

In between, it’s a grey area.

As someone working in a software company, it is very easy to see the benefits of knowing what features of your product the users are actually using, or whether one version of a feature is more easily understood than another. Aside from the possibility that you might remove a feature simply because your users are unable to find it rather than not wanting it, it also ushers in the problem of how to respect the privacy of your users.

A basic product check

The most basic check, which most companies will have for their products, is to know how many users their system or product has – this may be needed for statistical purposes, or for financial purposes when making agreements with partners.

For a server-side product such as a webmail service, counting accounts (perhaps checking if they are active) is usually enough to satisfy this need. However, for a product that users download and install, counting the number of users is more difficult. The most basic approach is to simply count downloads, and hope that the number of downloads from the company’s website matches the number of installs. But this paints a false picture, since users may install software via third-party catalogs or via corporate software distribution, or – indeed – may never run it again after installation.

Therefore, it may become necessary to count users by having the installations notify the vendor when they are installed or running. This requires some kind of identification to prevent “reinstallation” from being counted as a new user and to make sure that repeatedly running the application does not cause the user count to increase. Identifying a single installation requires an identification token to be kept in the user’s profile so they can be differentiated from other users, and this identification token is then sent as part of the notification to the vendor.

All this makes it possible to tie an installation to a user. The vendor can see how often it is run, whether the user changes the IP address, whether they travel to other countries, whether they run the software daily or weekly.

It’s all useful for a vendor who wants to know how their software is used and where their users are from, but it immediately invades the privacy of a user who just wants to use the software, and not share their habits and lifestyle with a company.

Even if the user feels it is acceptable to send back installation statistics, they cannot be expected to fully understand the privacy implications of this, and are unlikely to realize that just by allowing themselves to be counted, they also make it possible for a company to see other aspects of their life.

At Vivaldi, we have developed a system for counting users, while maintaining user privacy.

How feature tracking starts

When developing a product, it is normally in a company’s interest to know where to spend their development resources. New features take developmental time to produce and maintain in the future. Is a newly-added feature being used? Should it be in a menu or a visible button to help more users find it? Does adding one feature cause other features to be used less – essentially showing that the added clutter makes other things harder to find? Is one language version performing better than others?

Tracking whether a feature is being used simply sends a ping – a minimal message saying “the feature was used”. This could be anonymized, or it could be tied to the user identifier. In both cases, the server that receives the message gets to see that a user from that IP address was using that feature.

Progress is not always progress

Feature tracking can quickly become a go-to approach for development. Developers may want to see exactly how a feature gets used, not just whether it gets used.

This can be done with laboratory-style testing, with users brought in as a focus group to test how they use the feature. However, that does not always represent how it will be used in the real world.

So feature tracking can become more and more detailed, timing how quickly a user gets through a certain section, checking which buttons they press, checking how they move their mouse, or whether they use a touch screen or keyboard to navigate.

Product development gets hungry for this information, and so the privacy policy and the end-user license agreement gets updated with less rigid wording, allowing more and more feature tracking to take place, with users blindly agreeing without realizing what they are agreeing to.

Huge amounts of tracking and profiling information begin to get sent to the vendor, and it is collected in a database of user profiling information. This is normally anonymized to a degree; it is not stored with the user account tied to it but each profile is a profile of a real person. It may or may not be possible to tie that profile to that person, depending on whether your system links it to a specific user account. But it is still the digital representation of a person. If the data were to be exposed, someone with access to some other behavioral data may be able to tie that to the actual person.

The loss of trust

All of this collected data can become a valuable asset. Something that can be sold to other companies, or advertising agencies, as “big data”. To some companies, this becomes a major source of revenue, while to others, user privacy is the most important aspect.

An important aspect to this is user trust. The user is unable to distinguish between the two types of companies, so if a company collects data, users may assume the company is using it in a way that they would not be comfortable with. The company may be genuinely trying to build a better product, but the user sees themselves becoming a product.

But corporate cultures change over time, and one day, even innocently collected feature usage data can begin to be seen as a financial gold mine. The frameworks that were built to improve products for the user, now become a privacy-invading money-spinner for the company, violating the trust of the users who signed up for the purpose of improving the product.

As a company becomes larger, it can become very difficult to maintain the line between acceptable feature tracking, and unacceptable user behavior monitoring. The staff that adhered to the original spirit are no longer the only staff working on the product. Newer staff may not realize the bounds that they are overstepping. They may not feel that it is wrong to see how quickly a user moves their mouse towards a button, or whether that correlates with whether the user has selected the high contrast mode first – essentially leaking information that the user is likely to have a physical disability.

Just don’t do it

This is one of the reasons that some companies, such as Vivaldi, outright refuse to collect such statistics. It is easy to ensure that the data collection never escalates to the point that it becomes privacy-invasive and that the data can never be leaked or compromised if it is never collected in the first place. It is much easier to retain the trust of the user if they can see exactly what information is sent to a server-side service, and they can see that nothing about them or their behavior is ever sent to the vendor.

Even in cases where server-side services collect minimal information for debugging purposes, such as HTTP access logs, that data can be removed as soon as it is no longer needed, to prevent it from becoming a statistical data store ripe for data mining, should there be a change in corporate culture. This can be clearly documented in privacy policies to make sure that the user can see that nothing will be retained for future use.

In a privacy policy, it is much more reassuring for a user to see that “we do not collect usage statistics” than to see “we collect statistics for the following 10 purposes, and we promise not to misuse the data, but we reserve the right to update this privacy policy in the future”.

Data privacy

Legal systems are slow to respond to these changes in privacy risks. Most nations do not currently have sufficient protection for user data. GDPR has only recently become established within the EU, but other countries are still working on their equivalents. Regulations may not go far enough to protect a user from anonymized data collection. Unless they have specifically signed up for a behavioral profiling study, a user may not realize just how much information is being gathered about the way they use a product.

Feature-tracking doesn’t sound like something very threatening, but in many cases, it still creates a behavioral profile that reveals personality traits and potentially even medical conditions.

Even if we assume that the vendor will always be trustworthy, storing user data must be done in such a way that in the event of a compromised server, it will not fall into untrusted hands.

Listening to users

While all of this feature tracking goes on, it is all too easy to stop doing the most important thing: listening to users. Users are the life of the industry. Those people, those real humans, are what the product is made for. They have desires for the product that will not show up in a statistic. They may have wanted to use a feature, but were unable to find it, and therefore the unused feature gets removed.

Sometimes, there is also an unseen link between users and minor features. Many software companies rely – knowingly or unknowingly – on the goodwill of users. A user who enjoys a specific minor feature may encourage other users to use the product, even though the new users might not make use of that specific feature. Removing the feature because it is not used by many users, can alienate the once-loyal user, and in the long term reduce the free promotion of the product that the user was providing. This is something that happens regularly when a company stops listening to users and starts relying on statistics.

Even though feedback from users can often be negative – people are much quicker to complain about a problem than to offer praise for a positive experience – making a user feel like their voice is heard can have a dramatic effect on product uptake. A person rather than a statistic. A welcoming community rather than a heartless corporation. I work for a company, Vivaldi Technologies, that chooses to do the right thing.

* * *

This blog was first published on Tutanota.com

Get away from Big Tech and have fun doing it

Download Vivaldi