What, why and how of cookies affecting your privacy

What, why and how of cookies affecting your privacy

Privacy on the web is a heated subject.  You can tell it is because websites like Slashdot have their own section dedicated to it; at the beginning of every movie that you watch there is a bit that says ‘You wouldn’t steal a car‘; the media erupts into seizures every time Facebook changes their terms and conditions.  The Wall Street Journal released a series of articles recently about your privacy online and they started off on the humble cookie.  And it turns out that it probably is all the web analytic’s industry fault (more on that later).

What is a Cookie?

Let us start at the beginning, like the WSJ did.  A cookie is a small text file that is associated with your browser. It can only be read by the website that sets it and also can only be written to by the website that sets it. Typically it is a random string of letters and numbers that are unique to your browser.

Simple?  Well maybe so, but the reason it is causing so much controversy is the way that it is used.  Initially this cookie was used so that the website you were on could write some settings into it that would allow the website to remember you between visits (or indeed between pages in a session) and allow them to maintain some personalisation (eg a “Hello Alec” message).  Well this is where it got complicated, because it turns out that you can then use this information to work out how many people have seen the website, because you can count the number of cookies in your log files.

Assuming that people don’t delete them from their browser that is.  Or use a different browser.  Or a different computer.  Or more than one person using the same computer.  Ok, they aren’t that accurate at all, but at least you can tie a session together.  Assuming you don’t delete them.  Or block them.

This is still all fine, until someone pointed out that looking through the log files was a pain because it contained every single file that was downloaded and every single request from the server and working out what was a page, person, cookie, etc was a huge volume of processing.  Plus you had to spend time and effort filtering to make sure that you only counting real people and not robots and spiders.  A far better solution would be to have something on each page that only loaded once per page and didn’t get loaded by robots or spiders.  Like a bit of custom javascript that was sent to its own server.

If you were going to do that, you might as well include in the request from the server some additional information about the page, rather than just the url – so you end up with custom information.  And it turns out that you can use this javascript to pick up the movement of the mouse across the page, the things that have been typed in text boxes and all the other stuff.  Lets collect it all and we can work out what we want to do with it later.

This would all be fine if it was just information from your website.  Well it turns out that some companies didn’t have the ability to do it themselves, so they let a third party put a bit of javascript on the page that pointed towards another domain’s servers and gave them cookies from that domain.  These were third party cookies.

Is this A Problem?

You wouldn’t have thought so – the people who allowed the companies to put third party javascript on their pages would keep a close eye on the data and make sure they weren’t collecting any private information (or if they were, they were explicit when they did so).  And that is the case with most tracking technologies.  Even with Facebook – they at least tell you that they are going to be using your information (in an aggregated fashion) to give you custom adverts based on the information you enter.

It turns out that it is the uncontrolled use of the third party javascript and cookies that are the problem.  It tuns out it isn’t that easy to make money on the web.  If you don’t sell anything then you have to come up with some new business models.  One of the biggest ones is online advertising.  However this isn’t very easy to do technically because you need to make sure you can get a different banner on each page and that it is only served the right number of times otherwise you’ll overcharge your cu