RIP Google Analytics, Casualty of Spam

2015/1/10 6:11

300 visits! Looks like my website had a pretty good week, no? That's pretty good considering I haven't posted anything in weeks. Some site must have linked to an old post. Well, let's see what Google Analytics has to say about that:

bogus google analytics traffic

iloveitaly.co? priceg.com? blackhatworth.com? buttons-for-website.com?

Sonofabitch.

If you have a website, this probably looks familiar. Your site's getting a ton of traffic from generic-sounding websites. Maybe you fell for it at first and clicked back to see if they indeed linked to your content. You found either that the referring site didn't reference your content, or that it wasn't even up at all. Welcome to the latest salvo in the spam wars.

Spammers have figured out that its pretty easy to spoof Google Analytics traffic. When Google Analytics logs traffic on your site it uses a simple key. Since all tracking is handled through JavaScript on the client side, there's no real checks to ensure that your key is run off your site. They can make requests directly to Google from their domains using your unique key and their bogus traffic shows up in your reports.

Why do this? Clicks, plain and simple. A person who went through the trouble of setting up analytics tracking is probably a person with just enough vanity to immediately check up who's referring to their site. In online advertising click-through rates for advertisements are pretty miniscule. I imagine the rates for this type of scam are dramatically better.

What we're left with however is that Google Analytics is now junk. The code is cracked. It's not worth it to constantly prune away the dozens of spam referrers.

Google could clean this up. Right now this spam probably isn't that sophisticated. Multiple users report the same sites in their logs, so the same domains are probably swamping Analytics at the same time. Google could be proactive and at the first sign of improper activity prevent the domain from reaching logs across its network. I hear they do this already, but certainly not on a daily basis, let alone real-time.

Relying on JavaScript alone for tracking may just be too vulnerable. I may have to investigate more server-side solutions. Any suggestions?

[edit]

Hey Reddit, thanks for stopping in.

I should be more specific on the two types of spammers I'm seeing on Google Analytics. The first group are typified by semalt.com, a name that is instantly recognizable to many with public facing websites. These entities wind up in your logs by accessing your site. I don't mind GA reporting this traffic, they're bad actors either trying to fill access logs with spam or probing my site for weaknesses to exploit. There are steps I can take on my server to prevent this traffic.

This post is more concerned with the newer group, the darodars on the web. They've figured out how to game Google Analytics bypassing my site altogether. For a clearer example, here's recent traffic results for a site that's been down for the past month and a half.

bogus google analytics traffic from a dead site

The blurred number is actually the Google Analytics key for that site. This fault in the logic here is in the GA process.