AVG and the fake traffic debacle

No, this isn’t the start of my latest fantasy novel – where the schoolboy hero (Larry Plotter) of the story breaks into the realm of (final) fantasy to destroy the evil AVG from taking over the world (its the sequel to the one where he stops Google).  I think I’ll leave J K Rowling to her set of books.  Actually this is actually AVG the antivirus software company and their new version of their free virus software (version 8).

It all started off on April 24th when AVG released version 8 of their antispyware product.  In this version they included an updated version of linkscanner which prefectched pages in search results.  Not only did the prefetch the page, they would also prefetch all the content, including any images or javascript.  They wouldn’t execute the javascript, but they’d certainly take it from your servers.

The Register were relatively quick to pick up on this story and whilst they had their angle, I am also going to have mine in a minute as well.  There were a flood of comments in the section after this article and I think they are well worth a read.  A week after The Register got hold of it, Slashdot posted their story onto the front page and generated a huge volume of response.  There was also a huge volume of response down under in the Whirlpool forums because of their long tail of traffic.  Peter Cameron of AVG NZ eventually went online and told them that they were taking it down.

I think in terms of what it does for the user of linkscanner, the comments over on the WA Yahoo! group by Steve McInerney put the ball down very well.  But lets step back and look at the reason they might do it:

  • Most of the new sites that people find are through search results
  • It is impractical to have a list of all sites that have a history of giving malware to your computer on your computer
  • If you click on a link on a search engine results page (SERP) to go to a new site, you don’t want to be slowed down by your virus software checking the page before you get there (it just adds to the misery)
  • If your antivirus software knew which link you were going to click on before you clicked on it, it could get the page before you go there and check that it doesn’t have any malware
  • The antivirus software doesn’t know which link you are going to click on, so it gets all the pages and checks for viruses and malware before you go there
  • To do this, it has to download all the pages on each SERP that you look at
  • It also has to download all the images and javascript to make sure they aren’t linking to known malware/virus providers

The theory is there, the downsides for the user are this:

  • It eats up your bandwidth
  • Anything horrible that can get loaded by the javascript won’t be tested (that is virtually everything)
  • It eats up your bandwidth
  • It’ll slow your computer down
  • It eats up your bandwidth

So it is great if you have unlimited bandwidth?  Well not quite, because of the downsides from the other end of the spectrum.  This is what it does to your webmaster (or the website it is prefetching from) when it loads everything from that page (read Judah Phillips interesting post on it too):

  1. It eats up bandwidth
  2. Any site with ‘noscript’ tags on their analytics solution will have bad data
  3. Any measurement system that uses images will have bad data
  4. Any measurement system that relies on log file analysis will have bad data
  5. Any measurement system that uses javascript based tags will be fine

Well, that’s all well and good, you might say, but so what?  As an end user with no bandwidth issues then I’m completely covered.  Or so you think.  Well, there are solutions for webmasters for options 2, 3 and 4 above.  We can just filter those people out by looking at their useragent (which had a strange string in it that you could use), however we were sure at one point they were just going to go with the useragent of a normal browser.

Option one is more interesting though.  The only way around this here is to block the user agent from accessing your servers.  Or reroute it to another set of servers.  What does that mean?  It means from an end user perspective, they are using up their bandwidth when they aren’t even checking the sites they think they are safe from.  Then when they click through, they’ll be even more unsuspecting because they think that AVG is protecting them.

Interestingly this has manifested itself on our sites in a slightly different way.  Firstly our adserving technology uses a 1×1 gif to tell how many times each ad could have been loaded (which it will compare with the number of each type that have been loaded).  So this in theory has inflated all of our ad impressions, without increasing our ads.

The other issue is that AVG linkscanner didn’t cope with question marks (?), equals signs (=), etc, very well and actually instead of trying to get the real page it was looking for, ended up just picking up the error page (I think this is actually a 403 error, but I can’t work out why). This meant that mainly we had a big increase in ad impressions on the error page that weren’t really there.

Fortunately it looks like AVG have backed down and are going to remove this ‘feature’ from their next update to the tool.  In the meantime we need to go back over our traffic stats and start excluding this useragent from our stats so that we can give real impressions on pages.  It’ll be interesting to see if the auditing body for online metrics, the ABCe, will retrospectively go back and start filtering these out of their audited sites data.  It’ll also be interesting if they do, to see whose traffic goes down by the most, given the recent arguments about it.

2 Comments on “AVG and the fake traffic debacle

  1. hello

    great forum lots of lovely people just what i need

    hopefully this is just what im looking for, looks like i have a lot to read.

Leave a Reply to Anonymous Cancel reply

Your email address will not be published. Required fields are marked *

*