Monday, October 25, 2010

Why you can't compare figures from different Analytics tools

I was prompted to write this post by Hitwise (of all people) who told me that they were running a beta test that shows a total number of visits.  One of the questions in the test was "Did the reports show the results you expected?"  The answer, was of course not.  I measure my data based on an analytics tool and it measures a completely different set than Hitwise.  Not only are they measuring different things to start with, they measure them in a different way, process them in a different way, exclude things in a different way, extrapolate in a different way, etc.  It's the equivalent of comparing eating an apple with a fork with eating an orange with a spoon.  And it doesn't matter anyway.

Why doesn't it matter?  Well let us go back almost a year to a post I wrote on the difference between accuracy and precision.  Remember it?  In it I said:

Web analytics is all about trending over time. You can find out how something it doing at the moment and you can get some insight into that - where people are coming from, where they are going to, etc. But really what you want to do is try changing something and seeing the impact that it has one your data
Well for one I won't be worrying about the accuracy of my figures. Do I have 50 or do I have 32? It doesn't matter. When I go from 50 to 51 though, I want to know that this is one additional visit, or whatever in that time period.

I even gave a practical example of this when I looked at some data modelling and statistical testing I had been doing on the site I work on.

These are all well and good, but they don't look at anything other than one tool.  There is a reason for that:

Web Analytics tools will never be comparable to each other in total numbers because none of them are accurate.  

And I'm including Hitwise in my broad, sweeping statement.

Because web analytics tools are inaccurate in so many different ways to each other, they should never, ever be compared to each other.  But it still doesn't matter as long as they are all precise.

I know the media wants to take up its role of reporting which website is better than the other one based on ABCe audits.  But it doesn't work.  Each website is measured by a different calculation, even if they have the same analytics tool:

  • They will block different IP addresses (because everyone should block their internal IP addresses so that they can't accidentally over inflate their figures by doing silly things like unit testing against live)
  • They have different pages.  This may seem insignificant, but if the pages are significantly bigger and take longer to load then (assuming you've got your tags at the bottom of the page like you should do) it's possible that some tags won't load.
  • Different types of user may be more security conscious and more likely to block your cookie.
  • Different types of website might use different types of cookie, especially if they are sitting on multiple domains.
  • The processing done by the tool in the back end is different.  Processed peas go through a process shock horror!  It turns out that your analytics system takes all the data that it collects and processes that into your page views, visits, visitors, etc.  Funnily enough, they don't all do it in the same way (if they did, they would all be indistinguishable from each other).
  • The filtering of robots and spiders is done differently by different systems.  HBX, I remember being told, was quite clever and could remove things automatically that were seen as not human simply by their behaviour (rather than any particular technical attribute).
  • I could go on with more and more differences, but I won't.

Your Average Guardian reader is going to be accessing content in a different way ...
... to your average The Sun reader (I jest, of course)

So does your ABCe audit matter at all then?  

Well actually it turns out it does.  Firstly you are probably quoting your figures to advertisers, press officers, the guy on the tube next to you call Bernard and all sorts of people.  What if you were making them up?  Wouldn't those people be really annoyed?  Well the ABCe do the sensible thing of making sure that you don't make them up by double checking them.  Then they do another sensible thing by making sure you don't over inflate those figures by cheating (by counting everyone twice, making all the visits yourself, etc, etc).

They also provide the likes of you and me a sensible place to look at how they have been performing over time.  Of course this is slightly tinged if you take into account that some websites change their measurement system (Protip: the ABCe include a little bit at the end that says the counting method), but you can still perform some measurement over time.  Whether you want to or not is a different matter.


Going back to Hitwise - one of the advantages that Hitwise has is that it does allow you to do comparisons from site to site, as you are not comparing one tool's figure with another.  You can make changes to your website (or watch as others make changes to theirs) and see the effect on each other.

What you cannot do though, is compare the figures in Hitwise to that of another tool.  This is the reason why Hitwise doesn't need to put total visit numbers in.

Interestingly, as an aside at the end, Comscore and Nielsen who do similar things to Hitwise (but with a smaller sample size in the UK) are both trying to come up with a way of integrating their panel data with real analytics data.  This is (presumably) a way of getting around the issue that their data isn't accurate.  Given the size of their panels it may not be very precise either, but I still don't think that it is worth them doing it.  Accuracy is the wrong measure in this industry - we need precision.  Nielsen and Comscore would be far better off looking to increase their sample size to improve precision (IMO).

Monday, October 04, 2010

What Adobe should do for SiteCatalyst version 15

Given that my last post was number 100, I was thinking that this post should be a Web Analytics 101, but then I decided that was too cheesy for even me.  So instead, I've come up with a list of things that should definitely be in the next version of the SiteCatalyst tool powered by Omniture.  Although, I'm well aware that there is going to be a release of SiteCatalyst version 14.9 coming out soon, which will presumably have some nice feature updates - this is what I think should take the next step in Adobe's next step and will be a bit far removed from improvements on the current system.

Before I go into anything else, I'd just like to point out the Omniture Ideas website.  It is awesome.  I am going to submit all of these to that website.  In the morning, once I have had some sleep.

1. Tagless Events

Let us start with the biggest one that is around.  Whatever you want to call them: conversions, events, goals - the things that you want people to do on your website.  In Omniture you can set them up as custom events (see the customer conversion section of my post on SiteCatalyst reports) in the tags on your page.  I'm not going to lie to you, I think for implementation they are overly complicated.  Moreover they are named completely wrong - how on earth can you have 'events' which are effectively conversions, whilst you have 'coversions'?  I'm going to attempt to explain it to you here in layman's terms:

A custom event in SiteCatalyst can work in three ways.  It can be a:

  • Counter - every time it occurs it goes up by one.  
  • Currency - it goes up by a certain amount of currency which you can specify in the tag
  • Numeric - it can go up by a certain number which you can specify in the tag

So first off you need to make sure you set it up in the admin screen and then you also have to code it into the page.  So with the counter ones you can just set them to go up by one by one by giving a parameter in the tag that says = "event1".  For Currency and Numeric ones you should also add in how much you want them to go up by in a separate variable which is described by the products tag.  You can put in here what the product is, how many of them there have been, how much value is associated with each of them.  This is amazingly powerful if you have lots of different products on the site that you want to tag through your events (eg an online shop).  It's particularly powerful.

I don't like it.

Why not?  Let's see how you set up goals in Google Analytics:

See how easy that is?  It's one step and I don't need to change anything on the page.  The results of Counter Custom events in Omniture work exactly like Goals do in the results part of each of the two tools.  Omniture must come up with a way for us to create events on the fly without having to change the code on the page.

2. Custom Traffic variables defaulted to Custom Conversion variables as well

One of the biggest issues that I frequently have is where users of SiteCatalyst are using custom events in custom traffic reports.  Why?  They give back a gibberish answer.

Ok, Ok, they don't really give back a gibberish answer, but they don't necessarily give back the answer you were expecting.  Remember custom traffic variables are counter variables (page views, visits, entries, etc) whereas custom conversion variables are on page tagged campaigns.  Therefore having an event convert against a custom conversion variable makes a lot of sense.  You can choose in the interface if you want the event to occur against the most recent value, the first noted value or linearly across both (it would be nice if it could be fully attributed to both - the total of each of them doesn't need to add up to the total of all of them).  You can choose when you want the campaign to expire (after the visit, after a time length, after converting against one of your events) and you can choose a whole load of other things (see my learnings from the Summit for custom conversions as counters).

However having an event convert against a custom traffic variable doesn't make as much sense.  And indeed you can't really track your events against something that are just counted as flat options.  Except in the case of page views, where it turns out you can.  How does that work?  Well instead of attributing it to the first page view (you can do this in your paths reports - if you have it enabled - where you have the choice of measuring against the entry page) or the last page view before you get to the conversion, it gives it a representative value against each of the page views.  That kind of makes sense - you might want to be able to attribute a value to all the pages in the journey.

But you might also want to see how good a page is at, say, driving registrations to a site, even though it isn't directly linked to from there.  If you want to do this, you need to use your custom conversion reports and set up the same variable as in your custom traffic report.  This seems very long winded to me.  The data is collected anyway - why can't there be a button in the admin section to just enable it?  Or better still, just auto enable it.

Yes, yes I know you can do this with a VISTA rule - but I shouldn't need an Adobe consultant to flick a switch.

3. Default popular plugins

Our site uses a desperately old version of the SiteCatalyst code before plugins became popular.  To prove that they have become popular, Omniture's website has come up with a list of them.  Some of these seem like a genius has come up with them. Why aren't they implemented as default?

The first one on this list involves taking a query string parameter from the url and putting into into a custom (traffic or conversion) variable.  I don't know of any website that doesn't contain the search term that a user typed in as a query string in the url.  Omniture can collect the url of the pages (it collects it at least once to populate the pages report to match the page names up to the urls).  Why can't it auto-populate an internal search report.  Or use an editor in the admin section that allows you to describe which of the custom traffic variable (with a switch for the custom conversion variable) you want each query string to relate to.

even Google has the search paramters in the url

How do we know that they do it anyway?  We know because you can set up campaigns to be picked up in the javascript code from the query string.  Why can't SiteCatalyst capture all the query strings and then just use the ones described in the admin section.

4. Update javascript file automatically

I wrote a spec a few weeks ago to put a new bit of the site under the latest version of the javascript code.  It took a bit of work by me to make sure all the plugins worked correctly and that it would be fine.  We've been doing some testing this week to make sure it works properly.  It didn't.  I went back to my code to come up with the reasons why.  In fact I went all the way back to the original code as it was spat out by SiteCatalyst.  It turns out in that short period of time, Adobe had come up with a new version of the code and we were working off H.22.

Now there is no way I am going to rewrite the entire thing, so I've told our developers that we're already out of date.  We haven't even put it live yet.

When will someone come up with a solution that allows Adobe the right to push our minor changes to the code to everyone else?  Could there be a common include in the javascript file that referenced a common file that was the same for everyone?  I don't know anything about this, but I can't be the only one to have realised that their code is so out of date that SiteCatalyst is no longer supporting it.

5. Full metrics in Hierarchy reports

This is one of those words that I can never spell and always turn it into a heirarchy.  I've come up with a definition for the word: It's a list of people who are in line for the thrown in order.

Anyway, enough of the pleasantries, the hierarchy reports are very, very useful.  If you set them up correctly you can drill down level by level of your content.  It's a great way to be able to drill down into sections of the site.  If you don't use the hierarchy report, I suggest you set it up.  You can of course do this with your products that you convert against as well - using classifications.

My big problem with this report is that you can't do anything else with it.  It's entirely stand alone.  I think that Adobe should improve the functionality of this report to allow all sorts of added extras.  The added extras I'm talking about are those additional metrics (like entries, exits, single access, time, etc).  

Not only should these added extra metrics be available, but we should also be able to correlate this report against any of the other reports.  The reports I'm thinking of that should be defaulted would be the traffic sources, search engines and search terms.  In fact, linking it up to the new channel report would be the most awesome thing.

Currently to get around this issue, I'm having to record each of my levels of my hierarchy in a flat format in separate custom traffic variables, each of which need to be correlated with each other and each of which takes up time in processing.

6. Enable visits on correlation reports

For the love of god - do this.  It is so annoying that you have to create correlations for reports and then suddenly you are reduced to just page views.

There are ways around this problem.  The options are that you can put both values in the same custom traffic variable and separate them with a colon (or some other punctuation), but this causes you problems when you want to know the visits all as a group, rather than associated with another value.  It would be so much easier if you could just correlate and keep visits.

Plus, whilst they are at it, they could add the ability to trend the correlated reports.

Blog Directory - Blogged