Implementing the Dat protocol in Cliqz

How we run a node application inside Gecko

Imagine a Web where you can create a new website from your Web browser with a single click and share it with your friends, without the need for any intermediate hosting provider. In the latest version of the Cliqz desktop browser we are releasing experimental support for the distributed web protocol Dat which enables exactly this. This is a new protocol for loading websites not from centralised servers as with the ‘traditional’ web, but from other peers who are visiting the site. Protocols like Dat have the potential to massively reduce barriers for users to publish content on the Web, and we want to encourage their adoption to help users get away from centralised walled gardens such as Facebook, Medium and Blogspot.

Since the beginning of the Web, browsers have relied on client-server protocols such as HTTP and FTP to load content from the Web. Addresses in these protocols point to a specific server where content should be loaded from, for example the address http://0x65.dev/index.html instructs the browser to look for the server at 0x65.dev and request the file at /index.html. HTTPS further entrenches the server by requiring that the server proves its authenticity with a certificate. Dweb protocols, like Dat, flip this around: A Dweb URL contains the information required to both find and verify the authenticity of the content. The browser can then retrieve this data from any user (or ‘peer’) who claims to have a copy.

If you just want to turn on Dat support in Cliqz, skip the end of the article for instructions!

What is Dat?

The Dat protocol enables peer-to-peer sharing of data and files across the Web. Like similar technologies such as IPFS and BitTorrent it allows clients to validate the data received, so one can know that the data is replicated properly and verify ownership, but in contrast Dat also supports modification of the resources at a specific address, with fast updates propagated to peers. Other useful properties include private discovery—allowing data shared privately on the network to remain so.

These features have led to a movement to use it as a new protocol for the Web, with the Beaker browser pushing innovation around what this new peer-to-peer Web could look like. The advantages of using Dat for the web are many-fold:

  • Offline-first: Every site you load is implicitly kept locally, allowing it to be navigated when offline. Similarly, changes to sites (both local and remote) will propagate when connectivity is available, but functionality will always be the same.
  • Transparent and censorship resistant: Sites are always the same for every user—the site owner cannot decide to change site content based on your device or location as is common on the current Web. As sites are entirely published in Dat, and there is no server-side code, then all the code running the site can be seen with ‘view source’.
  • Self-archiving: Dat versions all mutations of sites, so as long as at least one peer keeps a copy of this data, the history of the site will remain accessible and viewable. This can also keep content online after the original publisher stops serving their content.
  • Enables self-publishing: As servers are no longer required, anyone can push a site with Dat—no server or devops required. Publishing to the P2P web requires no payment, no technical expertise, and no platform lock-in.
  • Resilient: Apps and sites stay up as long as people are using them, even if the original developers have stopped hosting.

These features are compelling, but for a protocol like Dat to gain traction it needs a critical mass of adoption (a catch-22). One way to achieve that is to push it into software that many users already have and use regularly—the browser.

Implementing Dat support

Under the hood Dat is a combination of two Node.js modules[1]:

  • hyperdrive, a file-like abstract over append-only logs, and
  • discovery-swarm, a network module that connects peers subscribing to the same topic.

Hyperdrive exposes a replicate method, which allows the content of the drive to be synced with any other instance of hyperdrive using the same address. This sync protocol runs over any bi-directional stream and is transport agnostic. When using discovery-swarm for networking this transport is either plain TCP sockets, or UTP, an efficient peer-to-peer protocol created for BitTorrent.

const Hyperdrive = require('hyperdrive');
const discoverySwarm = require('discovery-swarm');
const drive = new Hyperdrive(storage, address);
const swarm = discoverySwarm(opts);
swarm.join(drive.discoveryKey);
swarm.on('connection', (connection) => {
  connection.pipe(drive.replicate()).pipe(connection);
})

To run Dat in the browser we need to run these modules inside the browser runtime, for example as a browser extension. While hyperdrive can be bundled for the web with browserify, the networking components in discovery-swarm use core Node.js networking libraries net and dgram that have no web equivalent. To properly integrate with the Dat network we must be able to open TCP and UDP sockets from the browser extension’s runtime.

Additionally, we have to instruct the browser that requests to dat:// URLs should be handled by our new custom implementation. This is not something that is supported by public APIs for WebExtensions.

Libdweb

Luckily, our desktop browser is a Firefox fork, and it has access to rich internal APIs for extending the browser’s capabilities. An example of this is the libdweb project, which was started 2 years ago by Mozillians aiming to add extension APIs needed for Dweb protocols to be integrated in the browser. This project adds APIs for programmable protocol handlers, TCP sockets and UDP sockets that could be used by priviledged browser extensions.

// Programmable protocol handler
browser.protocol.registerProtocol('dat', (request) => {
  return handleRequest(request);
});
// TCPSockets
const conn = await browser.TCPSocket.create({
  host: '1.2.3.4',
  port: 8888
});
await conn.write(buffer);
const response = await conn.read()

Armed with these implementations, we can swap in new implementations for Node’s net and dgram APIs with ones that use libdweb TCP and UDP sockets underneath. These were also open source contributions by the community[2] which we packaged in a browserify transform. The result is that we can ‘browserify’ our Node code that uses raw sockets, and get a bundle that will run as a WebExtension!

browserify -g @sammacbeth/libdwebify index.js

The resulting extension, dat-webext, is open sourced and enables Dat sites to be loaded directly in the browser. This extension can run in Gecko-based browsers, albeit requiring priveledged status in order to load experimental APIs. As it targets the Gecko platform, it is also compatible with GeckoView on Android, and we plan to bring support in a future version of Cliqz for Android.

Try it out

Dat support can be enabled in Cliqz Desktop via our new ‘Labs’ pane in settings:

Labs pane in Cliqz settings

After enabling there are two network configurations that you can enable if you chose to:

  • Announce IP to the network: This controls whether your IP is listed as a peer for the Dat sites you visit.
  • Upload data to other peers: This controls whether you will upload data to peers you connect to. Enabling this allows you to help keep sites online, as other users will be able to load the site based on the data you send to them.

These settings are both off by default, which is the most private way to browse the Dat network. As privacy is important for all products and features at Cliqz, we are investigating the best way to handle the different privacy models of the Dweb. Unlike for HTTP sites, where the server hosting the site can see every page visited by each specific IP, in Dat this knowledge is spread among peers. The peers you actually download content from could infer which pages you are visiting. It will be important to find the best ways to communicate these trade-offs so that users can feel safe browsing sensitive Dat sites.

Once you’ve enabled Dat in Cliqz, you can switch to the Dat version of this page!

Dat Hello World

Dat sites can introspect and modify themselves via an API called DatArchive. This API enables rich Web applications, without the need for a web server to store state changes. One example of such an application is orkl, which is a blogging platform built on Dat. Orkl has two modes of operation: read-only mode, for normal visitors reading the blog, and editing mode, which is activated when your browser is able to modify the site’s Dat archive.

We can create an Orkl blog by asking the browser to ‘fork’ an existing blog. This copies all the files from the other site into a new Dat archive which you are the owner of:

DatArchive.fork('dat://orkl.seed.hex22.org/').then((archive) => {
  console.log(`Site created at ${archive.url}`);
});

You can try this out for yourself below. Simply click the button below and you will be prompted to create a new Dat site from the Orkl code. This will give you a link to your site, and when you open it you’ll be in editor mode. Once you’ve make some edits, you can share the link with a friend, who will be able to load it from you and will just see the posts without the editing controls.

For the site creation demo, first load this page over Dat

As you are hosting this site from your browser, it may become unavailable when you turn off your device, or close your browser. To remedy this you can get more peers to host the content. There are also services like Hashbase that can act as always-on peers for your site. Hashbase can also mirror your Dat site to HTTPS, to allow other browsers to view your site.

The Dat-web is still small, but you can explore some of the existing sites via Beaker Browser’s directory. There are a few example applications that show the potential of the technology, for example this pastebin alternative allows you to send snippets to friends using private links and with the data never being uploaded to a centralised server.

Summary and Next Steps

This is an experimental new feature in the Cliqz browser, we are at the first phase of testing how technologies like Dat can empower users and help the browser user-agent better serve users. There are still a few rough edges in the protocol support in this version that we hope to address in the coming months. If you’d like to play with Dat, download the Cliqz browser.

Footnotes


  1. There are also partial implementations of Dat in Rust and C++, but they are not yet feature complete. ↩︎

  2. Node net using browser.TCPSocket and Node dgram using browser.UDPSocket ↩︎