Privacy: X marks the spot where…



In the beginning there was the Internet… and the Internet was HTTP

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers [47]. A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred. HTTP has been in use by the World-Wide Web global information initiative since 1990. (Source: RFC: 2616)

Those “request methods” mentioned above can contain something called an X header. They are fields in the request that begin with an X and allow for either non-standard or proprietary add-ons to the regular fields in the HTTP header.

So what’s an example of an X header?

x-h3g-network-quality – this tells you the type of network. “3G” for example. It’s a very straightforward idea on how you can add some data to the request that a browser makes to a Web server. The Web server now knows that the phone that the browser made the request from is on a 3G network.

Why are X headers so important?

Well because they’re a “standard” way to add “non-standard” data to the HTTP protocol. Pay special attention to the word “standard”. Everything is about standards.

So why all the fuss about “X” marks the spot.

Well there’s something else I need to tell you about how the Internet works. Remember our familiar picture – two cans and a piece of string. One can (the browser) talks to the other can (the Web server). Ok. Now we have to introduce a new term – Caching. A web cache is a mechanism for the temporary storage (caching) of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag. (Source Wikipedia). The biggest caching engine in the world is probably Google or maybe Bings. But there are tens of thousands of cache sites out on the Web, virtually every ISP uses them to accelerate content. Browsers can connect with a Web caching server vs. going directly to the “origin” Web server. The cache serves as an intermediary.

So what’s the big deal?

Well some of those caching engines have critical problems when it comes to headers. (Source: link) and so if they see something they don’t understand they strip it out.

So what?

Well Do Not Track is a header – it’s not an X header (there’s no X_DNT) which they wouldn’t strip out. So older caching engines may not respect that new header (HTTP_DNT=”1”) and simply strip it out.

Ok… so why’s that a big deal?

Enforcement of the Do Not Track standard – that’s the big deal. How will the origin server ever know if the browser really sent a header that said “Don’t Track Me”? If an intermediary has removed it then there is no knowledge, and therefore it becomes unenforceable. Both Caching and Web servers will need to be upgraded to support a new standard header that is being transmitted in a non standard way i.e. not as an X-header, but as a regular header (that was never planned for in the original HTTP design).

And that is going to be the biggest problem facing the Do Not Track standard. Unless it’s implemented as an X header vs. a standard header then “EVERY” caching engine will have to be updated to support the new standard (to make sure it doesn’t strip anything important out). There are tens of thousands of caching servers out there that will have to be modified and upgraded. Normally no one would care about this… but now due to Privacy they will have to.

Therefore X marks the spot where in this case something is enforceable or not

Posted in: Privacy, User Experience

Email Subscription