Your news is not our business

Why personalized news recommendations and privacy are not mutually exclusive

At Cliqz we aim for content personalization and increased relevance of displayed news results without collecting private data, profiling or compromising our users’ privacy.

Cliqz started as a news recommendation product[1] in 2008. During development we realized that recommendations translate to search without the user having to clearly state an intent, basically search without a query. Cliqz search was born in 2013 and was quickly followed by the Cliqz browser, enabling us to ship our search product to users. News recommendations have remained one of the pillars of Cliqz along this journey.

There are technologically awe-inspiring content recommendation products out there, personalized to an almost uncanny degree. Some require a laborious setup, while others are learning from every move you make online, linking data points from different services on all your devices. While it is unquestionable that personalization implies usage of some sort of personal data, what truly is questionable is whether this data needs to be collected en masse and a priori, sent to some company’s servers and stored in the form of a user profile that can be comfortably acted upon.

While doing news recommendations at Cliqz, we made a few key choices which we strongly believe in, and which allow us to stay in line with our mission to always protect the user’s privacy:

  1. Individuals need a shared reality which we see as a continuously updated list of current events of general interest.
  2. Sources, i.e. news domains, are very important to individuals as they are a personal choice and imply trust; for example, political news is desired by individuals from a specific source in line with their beliefs, football news is desired by individuals from a particular team and source perspective, etc.
  3. Individual preferences can be represented by relatively large groups of people acting as interest filters.

Now let us explore how our choices translate into content recommendation lines offered to our users in every new tab opened in the Cliqz Browser, and how we make sure we safeguard the user’s privacy.

Figure 1: Cliqztab
Figure 1: Cliqztab

Top News

The shared reality mentioned above is assembled into Top News. The product translates into a limited, per-country list of news articles, aiming to keep Cliqz users informed about current events. All articles originate from hand-picked, well-respected and trusted news outlets in a particular country.

In order to determine the most relevant news, we first collect and process all articles that were published by those outlets during a defined timeframe, which we then compare among each other to form clusters around current events. The impact of these events, their prevalence on news sources, their presence on homepages and their times of publication all help to curate a list of Top News—updated hourly or every time a major story breaks.

There are no privacy concerns here, as the list of Top News is identical for all users in a given country. By default, every individual browser pulls the list of Top News associated with the country and language parameters available in the browser configuration.

Top News is what we consider the need to know information for a large group of users, offering a matter-of-fact view of current world events.

Top News coexists in every new tab opened in the Cliqz browser with the information users want to know, the personalized news experience.

History-based Recommendations

News sources that users trust and visit are stored in the browsing history of the Cliqz browser, which makes it your personal preferences log. The News team at Cliqz leverages this concrete information—hosted locally in your browser, on your device—to offer a more personalized and relevant content experience.

History-based Recommendations aim for focus and increased relevance of news recommendations to our users—without harming their privacy or forcing them to endure an uncomfortable setup process.

To take a practical example, imagine a fictional user, Ben, bilingual, keeping up to date with environmental and scientific developments and trusting the BBC for it. He is also an avid reader of book reviews from faz.net, and, like almost everybody in Munich, has a special interest in recommendations for new bars, cafés and restaurants to take out his new girlfriend.

The Cliqz in-browser intelligence operating locally in Ben’s browser checks daily if his preferences have changed, increments the number of visits he has made to different news outlets and surfaces the three most frequently visited outlets; later on, it will utilize the articles’ URL structure of the individual content publishers to align with Ben’s interests even more.

At the risk of oversimplification, Ben’s browser would display relevant article recommendations based on the following information stored locally in his browser (where specific categories are further identified from the URL structure):

  • https://www.bbc.com/news/science-environment/
  • https://www.sueddeutsche.de/muenchen/
  • https://www.faz.net/aktuell/feuilleton/buecher/rezensionen/
Figure 2: Ben’s news landscape
Figure 2: Ben’s news landscape

As illustrated above, by design your new tab can contain a maximum of twelve news tiles; the first three tiles are always populated with the most relevant Top News, while the rest of the tiles—depending on how many news domains we support from your history—are populated with personalized recommendations based on Cliqz’ internal article score and a history domain relevance score computed locally in your browser. Your user agent also deduplicates topically between the two features to make sure that there are no redundancies in the news shown.

The news backend holds a list of relevant articles for each of the supported news domains (roughly 500 domains, e.g. cnn.com, politico.com, faz.net, etc.). We curate these sources carefully to ensure a baseline of objectivity. This list is updated every 15 minutes to ensure the freshness of content at all times.

The criteria to decide the relevance of news articles are based on popularity and engagement with the content—information that is also available through Human Web, our socially responsible data collection initiative.

Each browser subscribed to History-based Recommendations will pull the lists of relevant articles for a set of domains from the news backend, this set is determined as the top three most visited news domains by the user’s browser combined with other news domains chosen at random to act as noise. Note that the only information sent is domains—not interests, not profiles, just general-purpose news domains like bbc.co.uk.

Once the browser receives the list of new articles—hundreds of them at a time, in fact—the matching process between the content of the articles and the user’s profile takes place, which—and this bears repeating—happens entirely locally within your browser.

With this protocol we ensure that the Cliqz News backend does not learn about the interests of our users, while simultaneously enabling them to get relevant articles tailored to their interests. Note that this is a strong departure from traditional recommendation systems, in which users’ profiles are stored on servers. In such systems, there will be a file recording your interest in “Bayern München”, “Brexit” and “Boris Johnson”, for example—or even worse, containing your entire browsing history, regardless of whether it is news-related or not.

Cliqz does not want to know that about you, or anyone else for that matter. Instead, a large list of articles from your preferred domains—say, bbc.co.uk and telegraph.co.uk—is pushed to your browser; subsequently, in your browser, the articles are filtered out so that only those that match your interests remain. The result is tailored content, without having to disclose your interests to us. What is personal in your browser, stays in your browser—it is as simple as that.

This, then, is our minimum viable product and version 1.

Stay tuned! We heard you, we analyzed the needs and the market and have plenty of ideas to turn into features and make data transfer even more secure in the new year.

To give just a few examples, we plan to:

  • Help users discover more of what they like by suggesting content from domains that are topically related to those present in a user’s history.
  • Enlarge the time frame for recommendations (our current product is based on the standard 24-hour news cycle, which makes sense if you consider the ephemeral nature of news data, but we want to evolve our news—to a content recommendation product).
  • Increase the number of recommendations displayed.
  • Give more customization power to the user to locally block a domain, block a topic or increase dominance of a topic.
  • Consider leveraging search history—locally in your browser.
  • And many more…

In the end, what we want to achieve—with very little to no setup effort required—is to offer you three pillars of information:

  • What you need to know \rightarrow Top News.
  • What you want to know \rightarrow History-based Recommendations with the tweaks and upgrades mentioned above.
  • What is nice to know \rightarrow Stuff you possibly love to talk about, the new Plogging initiative in your town, the new tricks of B52 by Boston Dynamics, and something that puts a smile on your face. No cat videos, though.

Footnotes


  1. The first version of Cliqz News recommended articles to users in the form of a web application by harnessing the power of the crowd. Topics were represented by large groups of Twitter users with influence in a particular topic. The content circulated within these groups was then classified and ranked, allowing us to make predictions about stories you might want to read based on detected similarities with other users’ interests. ↩︎