What, why and how of cookies affecting your privacy

Privacy on the web is a heated subject.  You can tell it is because websites like Slashdot have their own section dedicated to it; at the beginning of every movie that you watch there is a bit that says ‘You wouldn’t steal a car‘; the media erupts into seizures every time Facebook changes their terms and conditions.  The Wall Street Journal released a series of articles recently about your privacy online and they started off on the humble cookie.  And it turns out that it probably is all the web analytic’s industry fault (more on that later).

What is a Cookie?

Let us start at the beginning, like the WSJ did.  A cookie is a small text file that is associated with your browser. It can only be read by the website that sets it and also can only be written to by the website that sets it. Typically it is a random string of letters and numbers that are unique to your browser.

Simple?  Well maybe so, but the reason it is causing so much controversy is the way that it is used.  Initially this cookie was used so that the website you were on could write some settings into it that would allow the website to remember you between visits (or indeed between pages in a session) and allow them to maintain some personalisation (eg a “Hello Alec” message).  Well this is where it got complicated, because it turns out that you can then use this information to work out how many people have seen the website, because you can count the number of cookies in your log files.

Assuming that people don’t delete them from their browser that is.  Or use a different browser.  Or a different computer.  Or more than one person using the same computer.  Ok, they aren’t that accurate at all, but at least you can tie a session together.  Assuming you don’t delete them.  Or block them.

This is still all fine, until someone pointed out that looking through the log files was a pain because it contained every single file that was downloaded and every single request from the server and working out what was a page, person, cookie, etc was a huge volume of processing.  Plus you had to spend time and effort filtering to make sure that you only counting real people and not robots and spiders.  A far better solution would be to have something on each page that only loaded once per page and didn’t get loaded by robots or spiders.  Like a bit of custom javascript that was sent to its own server.

If you were going to do that, you might as well include in the request from the server some additional information about the page, rather than just the url – so you end up with custom information.  And it turns out that you can use this javascript to pick up the movement of the mouse across the page, the things that have been typed in text boxes and all the other stuff.  Lets collect it all and we can work out what we want to do with it later.

This would all be fine if it was just information from your website.  Well it turns out that some companies didn’t have the ability to do it themselves, so they let a third party put a bit of javascript on the page that pointed towards another domain’s servers and gave them cookies from that domain.  These were third party cookies.

Is this A Problem?

You wouldn’t have thought so – the people who allowed the companies to put third party javascript on their pages would keep a close eye on the data and make sure they weren’t collecting any private information (or if they were, they were explicit when they did so).  And that is the case with most tracking technologies.  Even with Facebook – they at least tell you that they are going to be using your information (in an aggregated fashion) to give you custom adverts based on the information you enter.

It turns out that it is the uncontrolled use of the third party javascript and cookies that are the problem.  It tuns out it isn’t that easy to make money on the web.  If you don’t sell anything then you have to come up with some new business models.  One of the biggest ones is online advertising.  However this isn’t very easy to do technically because you need to make sure you can get a different banner on each page and that it is only served the right number of times otherwise you’ll overcharge your customer, etc.  So what companies do is farm out their ad serving to third parties.  Who in turn allow other websites to buy ads with interactive creatives which sometimes point towards yet another third party, largely unseen by the original website, almost certainly unseen by the user.

Now in the unlikely event that a social networking site (say facebook) wanted to buy some advertising for a big newspaper website (say WSJ), they’d probably be allowed to put some interactive content into that advert, which would be loaded from their server, which you have a cookie with, because you’re logged in on a different tab.  Now facebook can serve you a personalised ad on another website without you knowing how it got your information.  Not that I would suggest that facebook would do that, of course.

What happens next?

Well if you are a reader of this blog, you may have noticed that the EU are taking a hard line on this and are putting in place legislation to make it more explicit on the cookies you are giving to a user.  This is good for the users, isn’t it?  Well not really, because you will only have to agree once and most of the big advertisers run on big networks.  So you could allow them to pass cookies to you on one website and still be allowing them access on another website.

The IAB (the auditors of the advertising industry) are running a campaign on how to improve your privacy and explaining to users what it is that the advertising industry does with their personal information.  Although I have to say the first time I saw it was on this blog post by Eric Peterson.  And I’m not convinced that it does that much to allay my fears as a consumer (yes I know this is a selective quote, but it was quite near the beginning):

Getting to know you. Sites use demographic data to learn more about you so they can deliver relevant content and ads. If you’re a 30-year-old woman, and an advertiser hopes to reach female users aged 18 to 35, the site recognizes that you’re “eligible” to receive ads from that particular advertiser.

This leads us nicely on to the WAA.  For the IAB this is big business.  In 2008 the online advertising industry was estimated to be worth $45bn a year worldwide.  This is colossal.  The WAA industry is nowhere near that size.  Potentially the largest of them all, Omniture, got sold for $1.9bn to Adobe last year.  We’re basically talking peanuts.  But it is the technology that has been built by the web analytics providers that is being used by the advertisers that is causing the trouble.  And it appears that we are doing little about it (according to Eric Peterson).  At the moment it feels a bit like banning the printing press because it produces pornography.

Really I don’t seem to have many answers.  The Web Analytics Association seems to be more focussed on the analysts (or at least the consultants who get to use the benefits of their membership) to want to do anything about this.  The Analytics providers don’t appear to see it as their job to fight the causes of the customers they provide for, although it was interesting reading WebTrends take on the EU cookie issue – more interesting was the silence from Omniture, Unica and even Google Analytics (maybe they have lawyers working behind the scenes).  The directive itself seems to have not got much press recently, but is meant to be aimed at improving the privacy of users.  Even if it isn’t a great way of doing it – surely the better legislation should be websites being more explicit about who they allow to put stuff on their web pages.  Any suggestions?

Posted in Cookies, Web Analytics

Leave a Reply

Your email address will not be published. Required fields are marked *