Monday, November 23, 2009

How Slashdot saw the new Cookie laws in the EU

As you may have noticed before, I am a reader of slashdot (although only an occasional reader these days as I pick up posts through slashdot's twitter updates).  Last time I got something analytics related on to their front page it was about Google and Yahoo! releasing free web analytics software (you can read the effect it had on this blog in an update I did on social bookmarking).  Last week something else caught my eye and that was the new EU cookie laws and so I posted that to Slashdot as well:

Reader whencanistop writes with some details on an upcoming EU law that slipped under the radar as it was part of the package containing the "three strikes" provision, which attracted all the attention and criticism.

"A couple of weeks ago we discussed the EU cookie proposal, which has now been passed into law. While the original story broke on the Out-law blog from a law perspective ('so breathtakingly stupid that the normally law-abiding business may be tempted to bend the rules to breaking point'), there has now been followup from a couple of industry insiders. Aurelie Pols of the Web Analytics Association has blogged on how this will affect websites that want to monitor what people are looking at on their sites, while eConsultancy has blogged on how this will impact the affiliate industry. In all of this the general public is being ignored — the people who, if the law is actually implemented, will have to proceed through ridiculous screens of text every time they access a website. I know most of you guys hate cookies in general, but they are vital for websites to know how people are accessing the sites so they can work out how to improve the experience for the user."
There was a monumental response on the site with over 400 comments.  So I thought I'd put an update on here, given that I didn't have enough time to respond to them all at the time (plus I'd been to the pub and had a beer, so my arguments started making less and less sense).

You don't need Cookies/Tracking to make a site better - you can do this with Usability testing


(image courtesy of Chabster)

Can you do everything you need to do through usability testing to make your website better?  You certainly need to do usability testing.  I think I've said on here in the past that one of your key performance metrics should be a measurement from 'voice of the user'.  Is that enough?  Can you get everything you need from usability and asking users.  Lets see what we can get from those things:

  • We can ask users how they accessed the content - this will give us good impressions of where they were and how we can build up that scent so that we can get them to stay longer.  This may not be completely accurate though, because they might not have realised where they were, or they might have forgotten and made something up
  • We can ask them which content they accessed - this will tell us what they were looking at, we can even ask them if they thought it was good and matched what they wanted.  Again this suffers from the same as above
  • We can watch what they do to try and improve usability - from watching them scroll, matching their eye patterns to parts of the screen, even where they are trying to enter text into a field to see how they do it.  This is amazingly useful in telling us how to build our pages and journeys up
What do we not get from usability?  Well we can't ask everyone, essentially is the real problem.  Your choices are:

  1. Take a representation of what you think your audience are and invite them into a room to observe
  2. Ask everyone on the site, throw them a popup and hope that a decent few respond
  3. Use an existing customer database
Really to do any of these things you want to make sure you are targetting in the right way.  Eg I am under no disillusions that the people who arrive at the home page of this blog want something completely different to those that came in to the most popular post.

The usability testing will give us ways of improving a user's journey through the site to get to the point where we want them to be (either at a conversion where they give the site owner money or at a non-monetary valued conversion).  It will not tell you if the users are actually doing it or not.  For this - you need to use your analytics to find out.

You can store session information in GET/POST of your links/submit buttons

(image courtesy of agent-seo)

As pointed out in Slashdot - you can technically create a unique url for each of your GET/POST items so that it contains information on a users session.  As also pointed out, there are several reasons why you wouldn't want to do this:
  1.  It's hideously search engine unfriendly because you create hundreds of duplicate urls (although your canonical url parameter should be able to get around this)
  2. You're exposing your session ids to anyone who you then link off to through your referrer information
  3. You're exposing your session ids to anyone who is doing any packet sniffing (although this could also be true if you were using cookies)
  4. You're doing some very complicated coding for something that could be done much more easily through cookies (or indeed is being done much more easily through cookies and thus making companies change)
  5. It doesn't work for tracking in the real world for static content (non-secure).  The world is built on links.  I've got lots of them to other sites in this commentary already.  Imagine if each of those links had a session ID included in it for my visit.  Every time you clicked on the link, the person on the other end would think it was me again.  Now imagine that I posted one of those links to Slashdot - it would get thousands of views and become worthless in terms of tracking
If you want to read it, there was a very good response from Pieroxy on the security implications off putting session ID or even anything on the client side when creating transactions.

Cookies are only valuable for Marketing (ads and affiliates)


Cookies are valuable for Marketing purposes.  Extremely valuable.  Guess what?  Most of the internet is based on Marketing.  Google makes all its money through MarketingSlashdot makes its money through Marketing.

Ok, I'm mocking a little bit, because this money is made for them through advertising.  But one website's advertising income is another website's Marketing budget.  If the company didn't get any return on their marketing budget, they wouldn't spend it.

How do you find out how much money you make through your advertising budget?  Well you can do it the old fashioned way as people did with old style media (telly, radio, print, inserts, door drops, etc) - which basically means you take your sales results over time and map any differences in performance to timings of advertising campaigns (guestimating is probably the word you would use).  Or you can do it in the new fangled way which is to offer the user a cookie and then when they get to the sale page you can link that back to the advertising campaigns they saw.

Want to know why online advertising is seen as less valuable than offline media - it because of the reason above.  You should probably do both options to get a comparative figure for your online advertising compared to your offline.  Also because of the current 'last click wins' attribution, you lose information about cumulative effects when you measure via this way.

But that isn't all of it.  No.  You see, advertising works in several ways.  Another way advertising works is by affiliate websites.  There are lots of them.  You probably only think of traditional affiliate sites that are just link farms.  But there are lots of others.  All of your price comparison sites run solely on affiliate deals.  Invariably these sites measure how much money they should make based on the cookie that they give you when you click off their site (plus an image that they put on the final websites thank you page telling the affiliate you've paid).  Imagine if you actually had to get your car insurance quotes from every possible website instead of going to Confused, Money supermarket or compare the meerkat.  These sites wouldn't exist without their affiliate deals (they wouldn't make any money!).

So if you have a website and do marketing you want to have cookies enabled so you can work out if you are making more money (unless you gamble on the old style approach).  What if you don't spend money on Marketing?  All of your visits come for free from people typing in your url, from Google or good will links.  Again to be able to find out which visits and how many of them you don't actually need cookies.  To find out which ones are valuable and worth thanking, you do.  Because you can't link this source to that sale without those cookies.
 
The Government is regulating the tools when regulating the behaviour of those using the tools would be better

I think you could probably sum up the whole proposal by the government with the above.  They've misunderstood the applications of the cookies and decided to make it difficult for the users.  As many on Slashdot have been saying - if you want to stop murders, banning hammers isn't going to help.

So this is what cookies usually do:
  • They give you a unique identity so they can monitor your movements from page to page (for tracking purposes or to keep you in a logged in state)
  • Cookies are set from the originating domain (first party) or from another domain that the originating domain has given permission to
  • Cookies are read only from the originating domain of the cookie and only if the page you are on contains something loaded from that domain
For there to be any privacy and personal data issues the following would have to happen:
  • The originating domain would have to give away any information you entered into its page to a third party without you agreeing to it (against the data protection act in the UK)
  • The originating domain would have to allow a third party to load javascript on a page where you'd entered your personal information and allow the third party to collect that information (this would be lazy and well within legislative rights if an advertising company - for example - did do this
The user already has the right to reject cookies through their browser if they wish.  Basically the Government is turning what is a user decision into a website decision.  Do I want cookies?  I want to decide as a user.  The Government is basically telling me with this law that I am not clever enough to work out whether I want cookies in case something bad happens, so they are going to make it implicit that I agree to them.

Then I hit this comment:
It seems to me that what you're saying is that you make a living telling people what they "should" do, and helping them to do things that way. You are being told that your model of doing things is under threat, and your objection is that this will make your life more difficult.
Is he right?  Am I biased?  God, now I'm confused. Maybe we *should* try and come up with a new way of doing it.  Any ideas?

Thursday, November 12, 2009

The difference between Accuracy and Precision

I did a Google search earlier on mentions on my blog of the word accuracy and the precision.  Neither appear at all.  That was a little depressing, because I'm sure I've written about this before.  But there you go.  Anyway, I've been thinking of writing a post about statistical analysis for a while.  I'm also wary that I am grossly underqualified to write a post on statistical analysis, which I why I've put it off for so long.  Feel free to post up in the comments exactly where I've gone about this in the wrong fashion.  Lets start with some Web Analytics basics.

Data out of Web Analytics systems is not accurate.  They are set up, however to be precise.  These are two completely different things that everyone should know about.  I would suggest that you read the wikipedia article about it, but then you'd probably get side tracked and never come back to my blog.


Courtesy of wikipedia

To sum up in a short space - accuracy means you are closer to the true value, whereas precision means that your data points are not widely spread.

Why is this important?  Web analytics is all about trending over time.  You can find out how something it doing at the moment and you can get some insight into that - where people are coming from, where they are going to, etc.  But really what you want to do is try changing something and seeing the impact that it has one your data.

So our nice little chart up there describing the difference between accuracy and precision, or indeed all the target views that you can get of the same data if you do a Google Image search, don't really paint the full picture.  What you need to think about is how this affects your graphs over time.  You need to take that graph up there and turn that into a three dimensional time graph.

So I've done it with the visits to my blog.  I'm going to assume that I have a precision of +/- 20% each way.  That's a lot, but I've done it to illustrate a point that you'll see in a minute.


If you look at the upper and lower limits of what the real figure could be, you suddenly get a much better picture of what is going on.  That period in between May 2009 and September 2009 represents a fairly flat time for the blog.  Visits not going up or down a lot.  Or where they?  They could have been going up or down wildly.  According to Google Analytics, in two consecutive weeks I had 50 and 51 visits.  However with my 20% range on that, it could have swung anywhere from 40 to 60 in a week (or vice versa).  That would represent an increase in visits, week on week of 50%, just from a small 20% error rate.

However if I'd had a precision of 5% over that time, the most it could have swung from would be 47 to 53, which represents a much smaller change.  This is where precision comes in very useful.  Knowing that if you completed the same experiment 50 different times and were only out by 5%, rather than 20% means you can forecast how your tool is doing.

Accuracy however is a completely different matter.  Did I get 51 visits in that month or 23?  How far out is that number?  The answer to this question is that it doesn't really matter how far out I am, as long as every time I measure it I get a precise result.  Why?  Because when I change something (as I did the next week by writing a new post) I can then measure how different this is to the previous one with confidence.  Hey - you'd probably like it to be in the ball park, but as Johnny Longden said in that post I linked to up there: Cookies, javascript, people, etc all mean that you're unlikely to have a very accurate picture. 

This is all well and good assuming that your data has a normal distribution.  That would mean that your figures were just as likely to be out by being up as they are being down.  Normal distributions occur on things like height, length of foot, that sort of thing - where everything is kind of quite high in the first place.

Do web analytics data fall under a normal distribution model?

I some how doubt that they do.  There is a law in economics called the law of diminishing returns (excuse all the wikipedia links).  The law states that the more you have of something, the less likely you are to make additional unit profits out of it because of a degree of corruption that you end up with.  I think this rule probably holds true with analytics systems as well, but possibly with different consequences.

  1. Normal distributions can't hold because of the lower limit ie you can't go below zero (and yes I know you can't go below zero for your height either, but that is different because the median is high for the height and the precision means that zero height is highly unusual - that's not the case for web analytics systems that frequently report zero visits)
  2. Poisson distributions of the data may be a good approximation and we know the larger the number the more likely it is to follow a normal distribution.  The graph below shows a couple of poisson distributions for increasing Lambda (ie the number you were expecting).

That would mean that for low figures you are expecting small differences, but they will be significant in terms of percentages, but for larger figures there is more likely to be a larger error margin and hence the figures may not be so precise.  This is certainly true for any analytics tool that uses sampling to increase its efficiency (as Google Analytics does) or indeed any tool that limits its table lengths (as HBX does - or did).  However other tools that don't do this can frequently end up with data processing issues that may or may not cause a lag because of the large quantities being produced.

So how do we work this out then?  Is there a hard and fast rule?  Well for one I won't be worrying about the accuracy of my figures.  Do I have 50 or do I have 32?  It doesn't matter.  When I go from 50 to 51 though, I want to know that this is one additional visit, or whatever in that time period.  Do I know that?  I wonder whether any analytics tool would ever like to say how precise they think their tool is.  I wonder if they know or could know.  Bar asking 100 random people to view a page on a daily basis for a couple of months.  And then they'd have to retry it with 10 people and 1,000 people as well just to make sure that their accuracy doesn't change with larger samples.

This is probably the perfect project for a student to do as a thesis, actually.  Any takers?

Monday, November 02, 2009

Two years of WhenCanIStop

Oh I hate doing these posts every so often about my website and what's been happening on it.  But then it comes around and I realise that I posted first on this blog just over two years ago, so I suppose I'd better write one of those posts that says exactly what has been going on in the last two years and how my blog posting has changed.  This is probably only really interesting for me and the odd other person who has blogged for a while and also keeps tabs on how many people look at their posts.  And of course I'm sure there are some n00bs out there who are wondering whether they should write a blog or not.

Just in case you were wondering - I first did this type of post in April 2008 when I was about to put Gatineau on my blog and wanted to do a baseline.  Then around about this point last year, I wrote the 50th post on the blog and it was almost a year in and I was quite pleased with how it was all going.  Then at the beginning of the year I wrote another one and reviewed the full calendar year's progress on the blog.  Funnily enough, I did another one when I realised that my traffic was going down a little bit, but it wasn't quite the same.

I suppose the first thing that you will probably notice is that this time last year I had written 50 posts (one year in) and this time around I'm only on post number 78 (this one).  That suggests that I've slowed down a little.  It's not quite as simple as that though:



As you can see from the graph attached there are three months when there were a large number of posts.  The first month (and if I'm going to give any advice to anyone on this one - make sure you have lots of posts to start with before you go live - that way you have a bit of a running start), I'd pre written all six of those on the companies internal blog system and released them into the wild to get a better audience on them.

Then there is March 08 when I wrote 9 posts.  Except I didn't, in that month I had a lot of assistance from Kate Duffy who guest posted from SES in New York.  Then that was followed up by April, in which I wrote twice on the fly because of unusual events.  Notably I had to create some sort of scent on the blog when I got on the front page of Slashdot and had to write a follow up myself.  Then I attended the Omniture Summit and wrote up my findings from my couple of days there.


Suddenly it looks a lot better, doesn't it?  Those fifty posts consisted of six pre written and six written by someone else.  So instead of writing 50 in a year, actually it was only 38, so the 28 in the last year is much more comprehendible.  Although, that doesn't mean to say I shouldn't write more frequently.



So here are the headline figures:
  • 7,366 visits
  • 6,101 Visitors
  • 10,872 page views
That's for the whole two years, but it is quite good.  If you want a bit of one upmanship to go on, I'll tell you which countries are the most popular as well:

  1. 3,286 Visits  United States 
  2. 1,632 Visits  United Kingdom
  3. 294 Visits  India
  4. 277 Visits  Italy
  5. 273 Visits  Canada
Thanks guys from all those countries (101 different countries looking at the blog!).

Top content:


  1. 713 Visits  Conversion Funnel Analysis: When, How and What
  2. 620 Visits  Omniture's SiteCatalyst HBX
  3. 474 Visits  SiteCatalyst Compared to HBX
  4. 401 Visits  Setting Up campaigns in HBX and Google Analytics
  5. 209 Visits  Is HBX Active Viewing Awesome?
It's hardly surprising that there are some older posts in there, because they've been there longer and had more chance to get promoted in rankings and had more time to accumulate visits.

My search traffic has been fairly constant over time (I say search traffic - it is entirely Google, with a little bit from Yahoo!).



Maybe I need to spend a bit more time looking at how I can attempt to get more traffic from search engines.  I suppose one of the things I could do is write more generalised posts ("Funnel Analysis" is my top keyword from Google) that fit across several different niches, rather than the large volume that I write about specific systems.

Interestingly my direct traffic has been quite inconsistent.  I suspect that this is partly to do with the way that I promote myself and my blog.  Whilst finishing my last job (end of 2008) I was pushing it out quite frequently in presentations and in analysis so that I could get my company a bit more focused.  In my current job there are far fewer internal stakeholders and many more Government departments who frequently won't have the skills, inclination or even time to look up the things that I am promoting.



And finally, but not least, here are the number of referring sites that have sent me traffic.  These are usually fairly inconsistent and I think the ones that should get noteworthy mentions (beyond the previous posts I've written) are Twitter (@whencanistop), UserPathways (thanks James) and of course Web Analytics Demystified (which I try to post comments on frequently). 




I suppose I should probably try and promote my blog elsewhere through comments on blogs and pushing it on forums, but it feels a little bit tacky - like I'm promoting myself rather than the content of my blog.  I suppose I am really, because that is what the blog is supposed to do anyway.

Add This

 
Blog Directory - Blogged