Archive for the ‘Analytics’ Category

Web Analytics – More Observations

This is the forth article in this series on Web Analytics. This post came about because I read a suggestions to compare current usage with a similar time frame in the past, so I compared information from October 2006 to October 2007. When these two reports were generated, I filtered out known spiders/crawlers and all clients coming from ISU addresses. I wanted to look at how the general public use our site.

Metric % Change
Visits to the Site Up 37.5%
Unique Visitors Up 31%
Visitors Who Visited Once Up 30%
Visitors Who Visited More Than Once Up 35%
Page Views Down 18%
Visits to Home Page Down 9.5%
Home Page as Entry Page Down 9.5%
Views of the Home Page Up 10.5%
Number of Searches Up 27.5%
No Referrers Up 21%

The first thing that jumps out is that the number of visits is up 37.5% and the number of page views is down 18%. The average number of page views per visit went down from 3.5 to 2.1. Defiantly supports the one and done theory. What this report doesn’t show is how many visits only saw one page, and how many saw multiple pages. I may need to see if our version of WebTrends will give me that information.

Other Observations:

  • I can’t explain why the number of views of the home page went up by 10% when the number of visits went down.
  • 9 of the top 50 entry pages in October 2006, today give a “File not found” page. This was some Excel training materials that we took down because they were for an ancient version of Excel (version 5 or 6 I think). Opps.
  • In 2006, 22% of the visits Entered through one of the top 50 entry pages, in 2007, that dropped to about 16%. The “Long Tail” is getting bigger.
  • In 2006, 6 of the top 50 files downloaded from the site were feeds, that number stayed the same in 2007. However, if you score them like a Cross-County meet, 2007 wins 75 to 121. All 6 feeds were in the top 25 in 2007, while only 4 were in 2006.
  • The feeds in the top 50 also changed a lot. Our main news feed went from number 41 in 2006 to numbers 2 (ATOM feed) and 18 (rss feed) in 2007. The ATOM feed didn’t exist in 2006.
  • The top 5 countries stayed the same (US, Canada, Mexico, Australia, UK), but beyond that, they were all over.
  • The percent of searches from Google was 87% in 2007, up from 86% in 2006.

The way people are using our site is drastically changing. Much of what I’m observing is consistent with what I’ve been reading about people’s browsing habits. It’s an exciting time to be doing Web Development!


Web Analytics – Unanswered Questions

I have many unanswered questions after looking at the web usage on our server. For many of these questions, I’m not sure it’s possible to find an answer.

  • How many times are we in the top 10 or 20 on a Google search when the person doesn’t click on our link? We can tell if they click on us, I’d like to know when we are on the search results page, but we don’t get the click.
  • Is there things we can do to improve the click through rate from Google? One possibility is to change the Meta Description information as found in the article “Improve snippets with a meta description makeover.” Would this help?
  • What words are people searching for, where we have information, but don’t show up on the first page of the Google search results? What words should we be targeting?
  • Of the 20,000 visits that see our home page first, how many of them are ISU Extension staff members accessing our site outside the ISU network. For example, a staff member takes their laptop home, and starts up Firefox or IE, and the ISU Extension is the default home page. This would give an off campus visit to the ISU Extension home page, however, this hit should be filtered out when considering how the public sees our page. I don’t know how set up this filter.
  • Does it help our Google rating that a number of people are using graphics from our server on their pages?
  • A large number of visits into our site had “No Referrer”. Why? Are these actual people, or spiders/robots, etc.? This needs more evaluation.
  • How much time is appropriate to look at web usage data? How much time does your organizations spend on this? I think ISU Extension would benefit by making this a higher priority, maybe even to the point that it is someones primary job.

I’ve learned a lot about our web site by taking a more in depth look our log files, however, there are a lot of unanswered questions.

Web Analytics – Outcomes

In taking a deeper look at the web logs for ISU Extension’s main site, I’ve learned a lot about how the site is used. One thing that jumped out is that “The Long Tail” definitely exists on our web server. This says that a small number of pages have the highest visits per page. However, when aggregated together, the less popular web pages make up the bulk of the visits. These recent posts from Seth Godin and Arpan Shah describe the long tail much better than I can.

Entry page into the site is an interesting statistic. In many cases, the first page people see are not the ones we think are most important. We think about News, the Store, and topic pages off our home page as being pretty important. However, only the store showed up on the top 50 entry pages (as number 8), and that wasn’t the store’s home page but the “Item Detail” page.

So what did show up? Three of the top 50 entry pages had to do with tree identification, a site that was first written over 10 years ago, and without much modification since then. Not only did people come into this site, they used it! People moved around within this site because it had 39 of the top 100 overall pages in October. To borrow a phase from Kevin, “The people have voted with their mouse clicks.” This site is important.

Other things I learned:

  • Our home page was the entry page for about 20,000 visits, and had a total of about 24,000 visits, meaning that a little over 4,000 people (less than 1% of the total visits to the server) saw our home page after coming into a different page. This reinforces the every page is your home page theory.
  • For over 100,000 visits, the top entry page was a graphic file. This was likely the only file seen by these visitors.
  • One of the top referrers to our site was MySpace. Upon farther investigation, I discovered people are using images from our server on their pages. Some of the other top referrers were also to graphic files.
  • Making the top 50 entry pages was old pages on using HTML. I discovered that one of these pages is linked from Wikipedia, therefore had high rankings on Google..
  • Google was the biggest referrer (no surprise), but not by as much as I would have thought. Also in the list of top 20 referring sites were search pages from Yahoo, MSN, Live, AOL, and Ask. Absent from the list is Wikipedia.
  • Google seems to break pages with frames up, into the individual frames. We have a site that uses frames, and the content on that site if found through Google. However, when accessing the content through Google, you never see the frames, just the content session. This causes it to loose its identity with ISU Extension.

I was surprised how much good information I found when digging deeper into the statistics for the entire server. My guess is that the ISU Extension site is pretty typical of what is happening at other states.

If you would like more information on this, please let me know.

Web Analytics – Tools

I spent some time last week evaluating the web usage reports that we generate for the ISU Extension web site. I took a little different approach this time, in that I was looking at the entire site, rather than individual counties, departments, projects, etc. that are hosted as folders on the site.

To do the bulk of the analysis, I used reports generated by WebTrends, an extremely old version of it. Someday we’ll either upgrade (expensive) or change to another tool. For now, we still get a lot of good from the old version we’re running.

First thing I did was adjust the report for the entire server to eliminate as many robots/spiders/crawlers as I could. I did this by filtering out browsers that contained certain words, which took out many of them. Next, I filtered out all visits where the entry page was “/robots.txt”, which eliminate many more. This should catch the well behaved robots.

I then created a second report, which did some additional filtering. For the second report, I filtered out all visits where the client computer was from I was mostly interested in how the public use our site. Unfortunately, this report does contain visits from staff members when they are not on the ISU network, for example if I access the site from home, it is included. I don’t know a good way to find and exclude these hits.

When I saw something in the report that seemed odd, I would use two additional tools to manipulate the actual IIS log file to see what was going on. First one was a UNIX style grep. I use this tool a lot to find words/phases in text files. I typically pipe the output into another text file, that I then load into Excel to analyze.

The second tool I use is Log Parser. Log Parser is a free download from Microsoft, and lets you do SQL queries against your log file. For some queries, like seeing the top ten pages requested from your server, this is an awesome tool.

I learned a lot about the way our web server is used. More on that in upcoming posts.