Writing Atom Feeds in .NET Using Argotic

January 11, 2008 by bwebster

This week, one of my projects entailed generating some Atom feeds from web pages produced in our Content Management System. Because our CMS doesn’t generate feeds automatically, I used a package called Argotic Syndication Framework, by Brian Kuhn, to help build the feeds in a .NET program.

Argotic is a very well designed and implemented framework that reads and writes both Atom and RSS feeds. It also handles popular extensions to feeds, such as the iTunes extensions. Very nice having a single framework that handles both major feed formats!

I discovered a minor problem when I tried to validate the feeds using the Feed Validator. Although I didn’t see anything in the Atom Specifications about it, the validation failed because the <entry> items were not the last thing in the feed. I downloaded the source code for Argotic, to change the output order, which was pretty easy to accomplish.

I also posted a bug report at the Argotic Syndication Framework web site at about 10:30 pm Monday. By 9:00 AM Tuesday, there were 2 responses from Brian, one of which said it was fixed for the next release. Wow, great service Brian! Especially for a product that is freely available.

I definitely recommending using Argotic to work with feeds in .NET.

Web Controls on dnrTV

December 12, 2007 by bwebster

One of the projects I’ve been thinking about for a long time is a toolbox of controls that we can reuse. I’ve had a hard time trying to explain what that would look like, but I definitely know it’s the direction that we need to go. Not only will it be good for us, but it could provide a mechanism for sharing code with other states.

A couple of weeks, I decided that it would be easier to explain if I had something tangible to show people. As a first step, I fired up episode #1 of dnrTV. In this episode, Miguel Castro builds a “contact us” web control that can be dropped into any ASP.NET web page where you want a contact us form.

As he went though the process, I’d stop the video to duplicate his steps by writing my own web control. Mine was much simpler in function, it provides a search text field and button, that will interact with our Google search appliance. I learned a lot by watching a couple minutes, then doing whatever he was talking about.

There were two more episodes on dnrTV where Miguel talked about web controls, both of which talk mostly about adding design time features to the control, meaning how the control acts in Visual Studio. These episodes are number 2, and number 31.

The best thing about these three episodes is that they help me think about how we can reuse code from one product to another. I now have a “Google Search Appliance” control that I can show. It takes a little more work initially to write the code with enough flexibility and configurations to be useful in multiple projects. However, done correctly, it will save time in the long run.

I really like the dnrTV videos. In fact, as I think about holding regular developer meetings, I’m thinking we might want to use them as the various topics. We’d watch and discuss the episodes that have topics of interest. Is this a good idea?

The people that do dnrTV, started by doing weekly podcasts at DotNetRocks. These are highly recommended for .NET developers as well.

Google Snippets

December 10, 2007 by bwebster

Getting your page to show up as a top link on Google is one thing, having people click on it is another. What Google shows as the snippet is a big factor in whether someone clicks.

One of the things Google may use for a Snippet is the page’s “description” meta tag, see Improve snippets with a meta description makeover

The following video was included in a post The anatomy of a search report. In it, they talk more about how Google determines what to include in the snippet.

[Update: The embedded video didn't seem to come through in the syndication feeds, so here is a direct link to the YouTube video "Matt Cutts Discusses Snippets." ]

We can’t control what Google uses as the snippet. We can control what we put in the description meta tag. Google can choose whether or not to use it.

Lynette and I hope to meet this week to talk more about snippets.

Web Analytics - More Observations

December 4, 2007 by bwebster

This is the forth article in this series on Web Analytics. This post came about because I read a suggestions to compare current usage with a similar time frame in the past, so I compared information from October 2006 to October 2007. When these two reports were generated, I filtered out known spiders/crawlers and all clients coming from ISU addresses. I wanted to look at how the general public use our site.

Metric % Change
Visits to the Site Up 37.5%
Unique Visitors Up 31%
Visitors Who Visited Once Up 30%
Visitors Who Visited More Than Once Up 35%
Page Views Down 18%
Visits to Home Page Down 9.5%
Home Page as Entry Page Down 9.5%
Views of the Home Page Up 10.5%
Number of Searches Up 27.5%
No Referrers Up 21%

The first thing that jumps out is that the number of visits is up 37.5% and the number of page views is down 18%. The average number of page views per visit went down from 3.5 to 2.1. Defiantly supports the one and done theory. What this report doesn’t show is how many visits only saw one page, and how many saw multiple pages. I may need to see if our version of WebTrends will give me that information.

Other Observations:

  • I can’t explain why the number of views of the home page went up by 10% when the number of visits went down.
  • 9 of the top 50 entry pages in October 2006, today give a “File not found” page. This was some Excel training materials that we took down because they were for an ancient version of Excel (version 5 or 6 I think). Opps.
  • In 2006, 22% of the visits Entered through one of the top 50 entry pages, in 2007, that dropped to about 16%. The “Long Tail” is getting bigger.
  • In 2006, 6 of the top 50 files downloaded from the site were feeds, that number stayed the same in 2007. However, if you score them like a Cross-County meet, 2007 wins 75 to 121. All 6 feeds were in the top 25 in 2007, while only 4 were in 2006.
  • The feeds in the top 50 also changed a lot. Our main news feed went from number 41 in 2006 to numbers 2 (ATOM feed) and 18 (rss feed) in 2007. The ATOM feed didn’t exist in 2006.
  • The top 5 countries stayed the same (US, Canada, Mexico, Australia, UK), but beyond that, they were all over.
  • The percent of searches from Google was 87% in 2007, up from 86% in 2006.

The way people are using our site is drastically changing. Much of what I’m observing is consistent with what I’ve been reading about people’s browsing habits. It’s an exciting time to be doing Web Development!

Web Analytics - Unanswered Questions

November 21, 2007 by bwebster

I have many unanswered questions after looking at the web usage on our server. For many of these questions, I’m not sure it’s possible to find an answer.

  • How many times are we in the top 10 or 20 on a Google search when the person doesn’t click on our link? We can tell if they click on us, I’d like to know when we are on the search results page, but we don’t get the click.
  • Is there things we can do to improve the click through rate from Google? One possibility is to change the Meta Description information as found in the article “Improve snippets with a meta description makeover.” Would this help?
  • What words are people searching for, where we have information, but don’t show up on the first page of the Google search results? What words should we be targeting?
  • Of the 20,000 visits that see our home page first, how many of them are ISU Extension staff members accessing our site outside the ISU network. For example, a staff member takes their laptop home, and starts up Firefox or IE, and the ISU Extension is the default home page. This would give an off campus visit to the ISU Extension home page, however, this hit should be filtered out when considering how the public sees our page. I don’t know how set up this filter.
  • Does it help our Google rating that a number of people are using graphics from our server on their pages?
  • A large number of visits into our site had “No Referrer”. Why? Are these actual people, or spiders/robots, etc.? This needs more evaluation.
  • How much time is appropriate to look at web usage data? How much time does your organizations spend on this? I think ISU Extension would benefit by making this a higher priority, maybe even to the point that it is someones primary job.

I’ve learned a lot about our web site by taking a more in depth look our log files, however, there are a lot of unanswered questions.

Web Analytics - Outcomes

November 15, 2007 by bwebster

In taking a deeper look at the web logs for ISU Extension’s main site, I’ve learned a lot about how the site is used. One thing that jumped out is that “The Long Tail” definitely exists on our web server. This says that a small number of pages have the highest visits per page. However, when aggregated together, the less popular web pages make up the bulk of the visits. These recent posts from Seth Godin and Arpan Shah describe the long tail much better than I can.

Entry page into the site is an interesting statistic. In many cases, the first page people see are not the ones we think are most important. We think about News, the Store, and topic pages off our home page as being pretty important. However, only the store showed up on the top 50 entry pages (as number 8), and that wasn’t the store’s home page but the “Item Detail” page.

So what did show up? Three of the top 50 entry pages had to do with tree identification, a site that was first written over 10 years ago, and without much modification since then. Not only did people come into this site, they used it! People moved around within this site because it had 39 of the top 100 overall pages in October. To borrow a phase from Kevin, “The people have voted with their mouse clicks.” This site is important.

Other things I learned:

  • Our home page was the entry page for about 20,000 visits, and had a total of about 24,000 visits, meaning that a little over 4,000 people (less than 1% of the total visits to the server) saw our home page after coming into a different page. This reinforces the every page is your home page theory.
  • For over 100,000 visits, the top entry page was a graphic file. This was likely the only file seen by these visitors.
  • One of the top referrers to our site was MySpace. Upon farther investigation, I discovered people are using images from our server on their pages. Some of the other top referrers were also to graphic files.
  • Making the top 50 entry pages was old pages on using HTML. I discovered that one of these pages is linked from Wikipedia, therefore had high rankings on Google..
  • Google was the biggest referrer (no surprise), but not by as much as I would have thought. Also in the list of top 20 referring sites were search pages from Yahoo, MSN, Live, AOL, and Ask. Absent from the list is Wikipedia.
  • Google seems to break pages with frames up, into the individual frames. We have a site that uses frames, and the content on that site if found through Google. However, when accessing the content through Google, you never see the frames, just the content session. This causes it to loose its identity with ISU Extension.

I was surprised how much good information I found when digging deeper into the statistics for the entire server. My guess is that the ISU Extension site is pretty typical of what is happening at other states.

If you would like more information on this, please let me know.

Web Analytics - Tools

November 12, 2007 by bwebster

I spent some time last week evaluating the web usage reports that we generate for the ISU Extension web site. I took a little different approach this time, in that I was looking at the entire site, rather than individual counties, departments, projects, etc. that are hosted as folders on the site.

To do the bulk of the analysis, I used reports generated by WebTrends, an extremely old version of it. Someday we’ll either upgrade (expensive) or change to another tool. For now, we still get a lot of good from the old version we’re running.

First thing I did was adjust the report for the entire server to eliminate as many robots/spiders/crawlers as I could. I did this by filtering out browsers that contained certain words, which took out many of them. Next, I filtered out all visits where the entry page was “/robots.txt”, which eliminate many more. This should catch the well behaved robots.

I then created a second report, which did some additional filtering. For the second report, I filtered out all visits where the client computer was from iastate.edu. I was mostly interested in how the public use our site. Unfortunately, this report does contain visits from staff members when they are not on the ISU network, for example if I access the site from home, it is included. I don’t know a good way to find and exclude these hits.

When I saw something in the report that seemed odd, I would use two additional tools to manipulate the actual IIS log file to see what was going on. First one was a UNIX style grep. I use this tool a lot to find words/phases in text files. I typically pipe the output into another text file, that I then load into Excel to analyze.

The second tool I use is Log Parser. Log Parser is a free download from Microsoft, and lets you do SQL queries against your log file. For some queries, like seeing the top ten pages requested from your server, this is an awesome tool.

I learned a lot about the way our web server is used. More on that in upcoming posts.

Google is Addictive

November 8, 2007 by bwebster

About 6 months ago, Lynette called me with a question about Google Reader. She was going on vacation, and needed to export her list of feeds so someone else could import them and  follow some news sources while she was away. Because I wasn’t using Google Reader at the time, I created an account so I could help her.

At the time, I was using a Windows based feed reader, and thought I was happy with it. It took about 5 minutes of using Google Reader to decide that I liked it better. One of the main advantages is having access to my feed reader at home or in the office.

Soon after, I added a couple of other services, notably Webmaster tools and Groups. A little later, I needed a search mechanism for a specific web site, so I created a Custom Search Engine.

Last week, I Kevin did an eXtension Learn session on Google Presentation. At the time, I was interested in seeing Presentation, and also Google Docs, but didn’t think I’d start using them. During the presentation, Kevin gave Beth access to the presentation, and at the toward the end of the presentation, Beth made a change to the background of the presentation they were working on, and it showed up on Kevin’s computer, almost instantly. What an “Ah Ha” moment, seeing this level of collaboration.

I now have added Google docs, and have created a couple of docs using it. I’m impressed with the way it works. More full-featured then I expected from a web based package. It was extremely nice to be able to work on a document at home, after starting it at the office.

My latest addition is GMail. When I started my Google account, I was determined not to create a GMail account. Seeing how well other Google services work, I decided I needed to give this one a try.

What I’m trying to figure now is whether I should be worried that I’ve sold out to Google. Although I haven’t given them any money, I’ve given them a whole lot of data about me. Should I be concerned about this? And what’s next to try?

Finding Feeds in Google Reader

October 17, 2007 by bwebster

During my presentation on “Consuming Feeds” last week, someone asked me how they would find relevant blogs to their subject matter. My answer was along the lines of, search for it using Google, get recommendations from coworkers on what they read, and watch for blog-rolls on the blogs you find interesting.

My answer this week would be much different. That’s because I just learned about the search feature that’s built into the “Add Subscription” function in Google Reader. It will show you feeds related to what you enter into the “Add Subscription” field, along with an indication of how many people are subscribed to that feed using Google’s tools. I learned about this from the Google Operating System Blog.

For example, I’m currently interested in SharePoint, so I entered the work “SharePoint” into the “Add Subscription” file. Google Reader than found 30 feeds, related to SharePoint.

This is a great feature in Google Reader. The only problem is that I already have too many subscriptions, and now it’s so easy to find more.

Consuming Feeds Presentation - Followup

October 17, 2007 by bwebster

Last week I posted the outline for the presentation I gave at the Extension Technology Showcase, see Consuming Feeds Presentation Outline The presentation went well, although not very well attended. It doesn’t help when you’re up against a presentation on reducing stress, and another about gadgets.

In the outline, I described the opening demonstration, which would be a non-technical way to show what a feed reader can do for you. The concept was great, however I’ll change a couple things next time. First, I included too many packing peanuts along with the candy and some of the candy was too light. It was hard to quickly clean out the “fluff”, leaving just the good content. Hersey’s miniatures seemed to work better than some of the other things.

More importantly, I put the candy into lunch sized brown paper bags. These were too hard to handle. The demonstration would have been much better had I used large plastic cups as the containers to represent web sites. I could have dumped together a lot easier, and they wouldn’t have been as messy.

I think feed readers is an important subject for Extension staff, and I hope to be able present this information again. If so, I need a title that will attract more participants. People probably don’t know what “Consuming Feeds” means. Maybe “Browse more Web Sites in Less Time”. Please offer other suggestions.