Small, non-competitive niche site, off-season.
For the most part, I assume the difference between raw server logs and javascript-driven analytics (GA, Clicky, etc) is mostly down to bots, some I want in general (Google, Bing) and the rest I don't.
I don't do raw logs, erg, gave it up in the 90s. In your comparison of logs to GA, is 10% too low for a gutcall?
I don't actually look at raw logs unless I think there's an issue, so my view of it is skewed.
But no, 10% wouldn't even scratch the surface. Recently was working on a site that gets under 1000 legit uniques per month and was getting 60,000 hits (not uniques) per DAY on the server with the worst day being 150,000 hits. So they were like 99% bots... which explained why they kept have server issues despite having almost no traffic. The vast majority of these have Asian IPs.
The trickier thing is picking out bots that actually show in Google Analytics. Same site mentioned above had actually asked me to look into specific users showing up in their GA data with a profile roughly (this is from memory) like
- 100% bounce
- Safari
- yahoo referrer
- from one of 2 or 3 southern california cities (Huntington Beach was one).
I could not resolve this via GA because GA does not let you see IPs. So we installed Clicky and started watching and looking at IPs when one of the cities in question came up.
Without going into the details of why they wanted to block this traffic, we were slowly able to put identify the IDs and block that bot traffic.
But seriously, I'm a real amateur with that stuff. If I really wanted to get to the bottom of something like that, I'd see if I could hire IncrediBill or get invited as a beta tester on CrawlWall ... though it's been in beta for a long time. Not sure if it's actually usable or not. I know Bill has had some serious distractions the last few years, but he's at least monitoring the Twitter feed https://twitter.com/crawlwall
RC - here's a report from a small, niche site, off-season that is running Cloudflare. CF doesn't block 100% of bots by a long shot, so the actual situation is quite a bit worse I think, but this gives you some idea.
>users showing up in their GA data with a profile roughly
Thanks. I'll take a look for suspect profiles.
>If I really wanted to get to the bottom of something like that
It's not worth that. Rules of Thumb provided by th3core members have long served me well.
In this case, the GA numbers seem about right (minus 10%), but I've always preached to proud site owners that 30% of your traffic is bots. My main interest is just assessing how well GA distinguishes between eyeballs & bots.
Ah. I think I misunderstood your question. My general feeling is that Javascript analytics (including GA) pick up only a small percentage of bots. You're only going to get bots that parse Javascript, which is a small number.