Colin Cochrane

Colin Cochrane is a Software Developer based in Victoria, BC specializing in C#, PowerShell, Web Development and DevOps.

Urban Dictionary and Google Sitelinks

A little while ago Google allowed some method of control over the sitelinks that are associated with a sites' Google snippet listing. Through the Sitelinks feature of Google Webmaster tools, webmasters are able to decide which sitelinks appear with their listing. However, I don't think this feature has been picked up by the guys over at the Urban Dictionary, as the snippet below shows:

 

 I can only imagine that the Goog has decided that this "c word" page is valuable to it's searchers and have thoughtfully provided a link directly to it, or perhaps the UD have intentially provided a link to one of their most important pages afterall. 

ASP.NET Custom Errors: Preventing 302 Redirects To Custom Error Pages

 
You can download the HttpModule here.
 
Defining custom error pages is a convenient way to show users a friendly page when they encounter an HTTP error such as a 404 Not Found, or a 500 Server Error.  Unfortunately ASP.NET handles custom error pages by responding with a 302 Temporary redirect to the error page that was defined. For example, consider an example application that has IIS configured to map all requests to it, and has the following customErrors element defined in its web.config:
 
<customErrors mode="RemoteOnly" defaultRedirect="~/error.aspx">
<error statusCode="404" redirect="~/404.aspx" />
</customError>

If a user requested a page that didn't exist, then the HTTP response would look something like:

http://www.domain.com/non-existant-page.aspx --> 302 Found
http://www.domain.com/404.aspx  --> 404 Not Found
Date: Sat, 26 Jan 2008 03:08:21 GMT
Server: Microsoft-IIS/6.0
Content-Length: 24753
Content-Type: text/html; charset=utf-8
X-Powered-By: ASP.NET
 
As you can see, there is a 302 redirect that occurs to send the user to the custom error page.  This is not ideal for two reasons:

1) It's bad for SEO

When a search engine spiders crawls your site and comes across a page that doesn't exist, you want to make sure you respond with an HTTP status of 404 and send it on its way.  Otherwise you may end up with duplicate content issues or indexing problems, depending on the spider and search engine.

2) It can lead to more incorrect HTTP status responses

This ties in with the first point, but can be significantly more serious.  If the custom error page is not configured to response with the correct status code then the HTTP response could end up looking like:

http://www.domain.com/non-existant-page.aspx --> 302 Found
http://www.domain.com/404.aspx  --> 200 OK
Date: Sat, 26 Jan 2008 03:08:21 GMT
Server: Microsoft-IIS/6.0
Content-Length: 24753
Content-Type: text/html; charset=utf-8
X-Powered-By: ASP.NET
 
Which would almost guarantee that there would be duplicate content issues for the site with the search engines, as the search spiders are simply going to assume that the error page is a normal page, like any other.Furthermore it will probably cause some website and server administration headaches, as HTTP errors won't be accurately logged, making them harder to track and identify.
I tried to find a solution to this problem, but I didn't have any luck finding anything, other than people who were also looking for a way to get around it.  So I did what I usually do, and created my own solution.
 
The solution comes in the form of a small HTTP module that hooks onto the HttpContext.Error event.  When an error occurs, the module checks if the error's type is an HttpException.  If the error is an HttpException, then the following process takes place:
  1. The response headers are cleared (context.Response.ClearHeaders() )
  2. The response status code is set to match the actual HttpException.GetHttpCode() value (context.Response.StatusCode = HttpException.GetHttpCode())
  3. The customErrorsSection from the web.config is checked to see if the HTTP status code (HttpException.GetHttpCode() ) is defined.
  4. If the statusCode is defined in the customErrorsSection then the request is transferred, server-side, to the custom error page. (context.Server.Transfer(customErrorsCollection.Get(statusCode.ToString).Redirect) )
  5. If the statusCode is not defined in the customErrorsSection, then the response is flushed, immediately sending the response to the client.(context.Response.Flush() )

Here is the source code for the module.

   1: Imports System.Web
   2: Imports System.Web.Configuration
   3:  
   4: Public Class HttpErrorModule
   5:   Implements IHttpModule
   6:  
   7:   Public Sub Dispose() Implements System.Web.IHttpModule.Dispose
   8:     'Nothing to dispose.
   9:   End Sub
  10:  
  11:   Public Sub Init(ByVal context As System.Web.HttpApplication) Implements System.Web.IHttpModule.Init
  12:     AddHandler context.Error, New EventHandler(AddressOf Context_Error)
  13:   End Sub
  14:  
  15:   Private Sub Context_Error(ByVal sender As Object, ByVal e As EventArgs)
  16:     Dim context As HttpContext = CType(sender, HttpApplication).Context
  17:     If (context.Error.GetType Is GetType(HttpException)) Then
  18:       ' Get the Web application configuration.
  19:       Dim configuration As System.Configuration.Configuration = WebConfigurationManager.OpenWebConfiguration("~/web.config")
  20:  
  21:       ' Get the section.
  22:       Dim customErrorsSection As CustomErrorsSection = CType(configuration.GetSection("system.web/customErrors"), CustomErrorsSection)
  23:  
  24:       ' Get the collection
  25:       Dim customErrorsCollection As CustomErrorCollection = customErrorsSection.Errors
  26:  
  27:       Dim statusCode As Integer = CType(context.Error, HttpException).GetHttpCode
  28:  
  29:       'Clears existing response headers and sets the desired ones.
  30:       context.Response.ClearHeaders()
  31:       context.Response.StatusCode = statusCode
  32:       If (customErrorsCollection.Item(statusCode.ToString) IsNot Nothing) Then
  33:         context.Server.Transfer(customErrorsCollection.Get(statusCode.ToString).Redirect)
  34:       Else
  35:         context.Response.Flush()
  36:       End If
  37:  
  38:     End If
  39:  
  40:   End Sub
  41:  
  42: End Class

The following element also needs to be added to the httpModules element in your web.config (replace the attribute values if you aren't using the downloaded binary):

<httpModules>
<add name="HttpErrorModule" type="ColinCochrane.HttpErrorModule, ColinCochrane" />
</httpModules>

And there you go! No more 302 redirects to your custom error pages.

Please Don't Urinate In The Pool: The Social Media Backlash

pool-party The increasing interest of the search engine marketing community in social media has resulted in more and more discussion about how to get in on the "traffic goldrush".  As an SEO, I appreciate the enthusiasm in exploring new methods for maximizing exposure for a client's site, but as a social media user I am finding myself becoming increasingly annoyed with the number of people that are set on finding ways to game the system.

The Social Media Backlash

My focus for the purposes of this post will be StumbleUpon, which is my favourite social media community by far.  That said, most of what I say will applicable to just about any social media community, so don't stop reading just because you're not a stumbler.  Within the StumbleUpon community there has been a surprisingly strong, and negative, reaction to those who write articles/blog posts that explore methods for leveraging StumbleUpon to drive the fabled "server crashing" levels of traffic, or dissect the inner-workings of the stumbling algorithm in order to figure out how to get that traffic with the least amount of effort and contribution necessary. 

"What Did I Do?"

When one of these people would end up on the receiving end of the StumbleUpon's community's ire they would be surprised. Instinctively, with perfectly crafted link-bait in hand, they would chronicle how they fell victim to hordes of angry stumblers, and express their disappointment while condemning the community for being so harsh.  Then, with anticipation of the inevitable rush of traffic their tale would attract to their site, they would hit the "post" button and quickly submit their post to their preferred social media channels.  What they didn't realize was that they were proving the reason for the community's backlash the instant they pressed "post".

Please Don't Urinate In The Pool

To explain that reason, we need to look at the reason people actually use StumbleUpon.  The biggest reason is the uncanny ability that it has for providing its users with a virtually endless supply of content that is almost perfectly targeted to them.  When this supply gets tainted, the user experience is worsened, and the better that the untainted experience is, the less tolerant the users will be of any tainting.

To illustrate, allow me to capitalize on the admittedly crude analogy found in the heading of this section.  Let's think of the StumbleUpon community as a group of friends at a pool party.  They are having a lot of fun, enjoying eachother's company, when they discover someone has been urinating in the pool.  The cleaner the water was before, the more everyone is going to notice the unwelcome "addition" to the water.  When they find out who urinated in the pool, they are going to be expectedly angry with them.  To stretch this analogy a little further, you can be damned sure that they wouldn't be happy when they found out that someone was telling everyone methods for strategically urinating in certain areas of the pool in order to maximize the number of people who would be exposed to the urine.

For anyone who was in the group of friends, and actually used and enjoyed the pool, the idea of urinating in it wouldn't even be an option.  Or, in the case of StumbleUpon, someone who actually participated in the community and enjoyed the service, wouldn't want to pollute it.

Catching Unwanted Spiders And Content Scraping Bots In ASP.NET

spiderinaglass

If you have a blog that is even moderately popular then you have likely fallen victim to some form of content scraping.  Ever since it became possible to earn money through ads on a website there have been people trying to find ways to cheat the system.  The most widespread example of this comes in the form of splogs and similar spam-based websites, which consist only of ads from Google AdSense and duplicated content that is scraped from other sites.  In this post I will share a method you can use to identify "evil" spiders and content scraping bots that are wasting your website's resources.

I'll start off by defining what is considered an "evil" spider/bot.  For our purposes here, we'll be looking at spiders and bots that ignore robots.txt and nofollow when crawling a site.  These are spiders and bots that offer no value to you in allowing them to crawl your site, as the major search engines use spiders and bots that respect these rules (with the unique exception of MSN who employs a certain bot that presents itself as a regular user in order to identify sites that present different content to search engine spiders than users).  

Of these valueless spiders, some are almost certainly going to be some form of content scraping bot, which is sent to literally copy the content of your site for use elsewhere.  It is in your best interest to limit how much of your content gets scraped because you want visitors coming to your site, not some spam-filled facsimile.

This method to identify unwanted spiders involves the creation of a trap,  which can be created as follows:

1) Create a Hidden Page

To identify these undesired visitors you need to isolate them.  Create a page on your site, but do not link to it from anywhere just yet.  For the purposes of my examples, I'll call our example page "trap.aspx".

spidertrap1

Now you want to disallow this page in your robots.txt.

spidertrap2

With this trap page disallowed in the robots.txt, it will prevent good spiders from crawling it.  What is needed now is a link to the trap page with the rel="nofollow" attribute, which should be placed on your home page for maximum effect.  The link must be invisible to users otherwise you might mistake a unwitting visitor for a bad spider.

<a rel="nofollow" href="/trap.aspx" style="display:none;" />

This creates a situation in which the only requests for "/trap.aspx" will be from a spider or bot that ignores both robots.txt and nofollow, which is exactly the kind of bots we want to identify.

2) Create a Log File

Create an XML document and name it "trap.xml" (or whatever you want) and place it in the App_Data folder of your application (or wherever you want, as long as the application has write-access to the directory).  Open the new XML document and create an empty root-element "<trapRequests>" and ensure it has a complete closing tag.

<?xml version="1.0" encoding="utf-8"?>
<trapRequests>
</trapRequests>
 
You can use whatever method is best for you to log the requests, you do not need to use an XML document.  I am using XML for the purposes of this example.

3) Log What Gets Caught In The Trap

With the trap in place, you now want to keep track of the requests being made for "trap.aspx".  This can be accomplished quite easily using LINQ, as illustrated in the following example:

Imports System.Xml.Linq
Partial Class trap_aspx Inherits System.Web.UI.Page
Protected Sub Page_Load(ByVal sender As Object, ByVal e As System.EventArgs) _ Handles Me.Load
 
  LogRequest(Request.UserHostAddress, Request.UserAgent)
End Sub

Private Sub LogRequest(ByVal ipAddress As String, ByVal userAgent As String)
Dim logFile As XDocument

Try

    logFile = XDocument.Load(Server.MapPath("~/App_Data/trap.xml"))
    logFile.Root.AddFirst(<request>
<date><%= Now.ToString %> </date>
<ip><%= ipAddress %> </ip>
<userAgent><%= Now.ToString %> </userAgent>
</request>)
     logFile.Save(Server.MapPath("~/App_Data/trap.xml"))
Catch ex As Exception
My.Log.WriteException(ex)
End Try
End Sub
End Class

This code sets it up so every request for this page is logged with:

  1. The Date and Time of the request.
  2. The IP address of the requesting agent.
  3. The User Agent of the requesting agent.

You can, of course, customize what information is logged to your preference.  The code will need to be adjusted if you are using a different storage method.  Once done, you will end up with an XML log file (or your custom store) with every request to "trap.aspx" that will look like:

<?xml version="1.0" encoding="utf-8"?>
<trapRequests>
<request>
<date>12/30/2007 12:54:20 PM</date>
<ip>1.2.3.4</ip>
<userAgent>ISCRAPECONTENT/1.2</userAgent>
</request>
<request>
<date>12/30/2007 2:31:51 PM</date>
<ip>2.3.4.5</ip>
<userAgent>BADSPIDER/0.5</userAgent>
</request>
</trapRequests>
 
Now you've set your trap and any unwanted bots and spiders that find it will be logged.  You are then free to use the logged data to deny access to offending IPs, User Agents, or by whatever criteria you decide is appropriate for your site.

How Being An SEO Analyst Made Me A Better Web Developer

Being a successful web developer requires constant effort to refine your existing abilities while expanding your skill-set to include the new technologies that are continually released, so when I first started my job as a search engine optimization analyst I was fully expecting my web development skills to dull.  I could not have been more wrong.

Before I get to the list of reasons why being an SEO analyst made me a better web developer I'm going to give a quick overview of how I got into search engine optimization in the first place. Search engine optimization first captured my interest when I wanted to start driving more search traffic for a website I developed for the BCDA (and continue to volunteer my services as webmaster). Due to some dissatisfaction with our hosting provider I decided that we would switch hosting providers as soon as our existing contract expired and go with one of my preferred hosts. However, as a not-for-profit organization the budget for the website was limited and the new hosting was going cost a little more, so I decided to set up an AdSense account to bring in some added income. 

The expectations weren't high; I was hoping to bring in enough revenue through the website to cover hosting costs.  At that point I did not have much experience with SEO so I started researching and looking for strategies I could use on the site.  As I read more and more articles, blogs and whitepapers I became increasingly fascinated with the industry while I would apply all of the newfound knowledge to the site.  Soon after I responded to a job posting at an SEO firm, applied, and was shortly thereafter starting my new career as an SEO analyst. 

My first few weeks at the job were spent learning procedures, familiarizing myself with the various tools that we use, and, most importantly, honing my SEO skills.  I spent the majority of my time auditing and reporting on client sites, which exposed me to a lot of different websites, programming and scripting languages, and tens of thousands of lines of code.  During this process I realized that my web development skills weren't getting worse, they were actually getting better.   The following list examines the reasons for this improvement.

1) Coding Diversity

To properly analyze a site, identify problems, and be able to offer the right solutions I often have to go deeper than just HTML on a website.  This meant that I had to be proficient at coding in a variety of different languages, because I don't believe in pointing out problems with a site unless I can recommend how to best fix them.  Learning the different languages came quickly from the sheer volume of code I was faced with every day, and got easier with each language I learned.

2) Web Standards and Semantic Markup

In a recent post, Reducing Code Bloat Part Two: Semantic HTML, I discussed the importance of semantic HTML to lean, tidy markup for your web documents.  While I have always been a proponent of web standards and semantic markup my experience in SEO has served to solidify my beliefs.  After you have pored through 12,000 lines of markup that should have been 1000, or spent two hours implementing style modifications that should have taken five minutes, the appreciation for semantic markup and web standards is quickly realized.

3) Usability and Accessibility

Once I've optimized a site to draw more search traffic I need to help make sure that that traffic actually sticks around.  A big part of this is the usability and accessibility of a site.  There are a lot of other websites out there for people to visit and they are not going to waste time trying to figure out how to navigate through a meandering quagmire of a design.  This aspect of my job forces me to step into the shoes of the average user, which is something that a lot of developers need to do more often.  It has also made me more considerate when utilizing features and technologies, such as AJAX, in respect to accessibility, such that I ensure that the site is still accessible when that feature is disabled or is not supported.

4) The Value of Content

Before getting into SEO, I was among the many web developers guilty of thinking that a website's success can be ensured by implementing enough features, and that enough cool features could make up for a lack of simple, quality content.  Search engine optimization taught me the value of content, and that the right balance of innovative features and content will greatly enhance the effectiveness of both.

That covers some of the bigger reasons that working as an SEO analyst made me a better web developer.  Chances are that I will follow up on this post in the future with more reasons that I am sure to realize as I continue my career in SEO.  In fact one of the biggest reasons I love working in search engine optimization and marketing is that it is an industry that is constantly changing and evolving, and there is always sometime new to learn.

Is There Added Value In XHTML To Search Engine Spiders?

The use of XHTML in the context of SEO is a matter of debate.  The consensus tends to be that using XHTML falls into the category of optimization efforts that provide certain benefits for the site as a whole (extensibility, ability to use XSL transforms) but offers little or no added value in the eyes of the search engines.  That being said, as the number of pages that search engine spiders have to crawl continues to increase every day, the limits to how much the spiders can crawl are being tested.  This has been recognized by SEOs and is reflected in efforts to trim page sizes down to make a site more appealing to the spiders.  Now it is time to start considering the significant benefits that a well-formed XHTML document can potentially offer to search engine spiders.

Parsing XML is faster than parsing HTML for one simple reason: XML documents are expected to be well-formed.  This saves the parser from having to spend the extra overhead involved in "filling in the blanks" with a non-valid document.  Dealing with XML also opens the door to the use of speedy languages like XPath that provide fast and straightforward access to any given part of an XML document.  For instance, consider the following XHTML document:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>My XHTML Document</title>
    <meta name="description" content="This is my XHTML document."/>
  </head>
  <body>
    <div>
      <h1>This is my XHTML document!</h1>
      <p>Here is some content.</p>
    </div>
  </body>
</html>

Now let's say we wanted to grab the contents of the title element from this document.  If we were to parse it as straight HTML we'd probably use a regular expression such as "<title>([^<]*)</title>" (As a quick aside, I want to clarify that HTML parsers are quite advanced and don't simply use regular expressions to read a document).  In Visual Basic the code to accomplish this would look like:

Imports System.Text.RegularExpressions

Class MyParser

  Function GetTitle(ByRef html As String) As String 
    Return RegEx.Match(html,"<title>([^<]*)</title>").Groups(1).Value 
  End Function 

End Class 

If we were to use XPath, on the other hand, we would get something like this:

Imports System.Xml 

Class MyParser 

  Function GetTitle(ByRef reader As XmlReader) As String 
    Dim doc As New XPath.XPathDocument(reader) 
    Dim navigator As XPath.XPathNavigator = doc.CreateNavigator
    Return navigator.SelectSingleNode(&quot;/head/title&quot;).Value 
  End Function 

End Class

Don't let the amount of code fool you.  While the first example uses 1 line of code to accomplish what takes the second example 3 lines, the real value comes when dealing with a non-trivial document.  The first method would need enumerate the elements in the document, which would involve either very complex regular expressions with added logic (because regular expressions are not best suited for parsing HTML), or the added overhead necessary for an existing HTML parser to accurately determine how the document "intended" to be structured.  Using XPath is a simple matter of using a different XPath expression for the "navigator.SelectSingleNode" method.

With that in mind, I constructed a very basic test to see what kind of speed differences we'd be looking at between HTML parsing and XML (using XPath) parsing.  The test was simple: I created a well-formed XHTML document consisting of a title element, meta description and keywords elements, 150 paragraphs of Lorum Ipsum, 1 <h1> element, 5 <h2> elements, 10 <h3> elements and 10 anchor elements scattered throughout the document.

The test consisted of two methods, one using XPath, and one using Regular Expressions.  The task of each method was to simply iterate through every element in the document once, and repeat this task 10000 times while being timed.  Once completed it would spit out the elapsed time in milliseconds that it took to complete.  The test was kept deliberately simple because the results are only meant to very roughly illustrate the performance differences between the two methods.  It was by no means an exhaustive performance analysis and should not be considered as such.

That being said, I ran the test 10 times and averaged the results for each method, resulting in the following:

XML Parsing (Using XPATH) - 13ms

HTML Parsing (Using RegEx) - 1852ms

As I said, these results are very rough, and meant to illustrate the difference between the two methods rather than the exact times. 

These results should, however, give you something to consider in respect to the potential benefits of XHTML to a search engine spider.  We don't know how search engine spiders are parsing web documents, and that will likely never change.  We do know that search engines are constantly refining their internal processes, including spider logic, and with the substantial performance beneifts of XML parsing, it doesn't seem too far-fetched to think that the search engines might have their spiders capitilizing on well-formed XHTML documents with faster XML parsing, or are at least taking a very serious look at implementing that functionality in the near future.  If you consider even a performance improvement of only 10ms, when you multiply that against the tens of thousands of pages being spidered every day, those milliseconds add up very quickly.

Reducing Code Bloat - Or How To Cut Your HTML Size In Half

When it comes to designing and developing a web site the load time is one consideration that is often ignored, or is an afterthought once the majority of the design and structure is in place.  While high-speed internet connections are becoming increasingly common there are still a significant portion of web users out there with 56k connections, and even those with broadband connections aren't guaranteed to have a fast connection to your particular server.  Every second that a user has to wait to download your content is increasing the chance of that user deciding to move on.

Attempting to reduce the size of a web page is usually restricted to compressing larger images and optimizing them for web use.  This is a necessary step to managing page size, but there is another important factor that can significantly reduce the size of a page to improve download times: code bloat, or more specifically, (x)HTML code bloat.  Getting rid of that code bloat means less actual bytes to be downloaded by clients, as well as captilizing on what the client has already downloaded.  Unfortunately this is an option that tends to be ignored due to the perceived loss of time spent combing through markup to cut out the chaff, despite the fact that clean, efficient markup with well-planned style definitions will save countless hours when it comes to upkeep and maintenance.

To demonstrate the difference between a bloated page and one with efficient markup I created two basic pages.  One uses tables, font tags, HTML style attributes and so forth to control the structure and look of the page, while the other uses minimal markup with an external stylesheet.

1) Bloated Page

[code:html]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>This Tag Soup Is Really Thick</title>
<meta name="description" content="The end result of lazy web page design resulting in a bloated mess of HTML.">
<meta name="keywords" content="tag soup, messy html, bloated html">
</head>
<body>
<center>
  <table width=700 height=800 bgcolor="gainsboro" border=0 cellpadding=0 cellspacing=0>
    <tr>
      <td valign=top width=150><table align=center border=0 cellpadding=0 cellspacing=0>
          <tr>
            <td align="left"><a href="#home" title="Not a real link"><font color="#4FB322" size="3" face="Geneva, Arial, Helvetica, sans-serif">Home Page</font></a></td>
          </tr>
          <tr>
            <td align="left"><a href="#about" title="Not a real link"><font color="#4FB322" size="3" face="Geneva, Arial, Helvetica, sans-serif">About Me</font></a></td>
          </tr>
          <tr>
            <td align="left"><a href="#links" title="Not a real link"><font color="#4FB322" size="3" face="Geneva, Arial, Helvetica, sans-serif">Links</font></a></td>
          </tr>
        </table></td>
      <td valign=top><table>
          <tr>
            <td align="center" height=64><h1><font color="red" face="Geneva, Arial, Helvetica, sans-serif">Welcome to My Site!</font></h1></td>
          </tr>
          <tr>
            <td align="left"><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Isn&acute;t it surprisingly ugly and bland?</font></td>
          </tr>
          <tr>
            <td align="left"><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fusce est. Maecenas pharetra nibh vel turpis molestie gravida. Integer convallis odio eu nulla. Vivamus eget turpis eu neque dignissim dignissim. Fusce vel erat ut turpis pharetra molestie. Cras sollicitudin consequat sem. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Maecenas augue diam, sagittis eget, cursus at, vulputate at, nisl. Etiam scelerisque molestie nibh. Suspendisse ornare dignissim enim. Sed posuere nunc a lectus. Vestibulum luctus, nibh feugiat convallis ornare, lorem neque volutpat risus, a dapibus odio justo at erat. Donec vel lacus id urna luctus tincidunt. Morbi nunc. Donec fringilla sapien nec lectus. Duis at felis a leo porta tempor.</font></td>
          </tr>
          <tr>
            <td align="left"><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Maecenas malesuada felis id mauris. Ut nibh eros, vestibulum nec, ornare sollicitudin, hendrerit et, ligula. Suspendisse tellus elit, rutrum ut, tempor eget, porta bibendum, magna. Nunc sem dolor, pharetra ut, fermentum in, consequat vitae, velit. Vestibulum in ipsum. Phasellus erat. Sed eget turpis tristique eros cursus gravida. Vestibulum quis pede a libero elementum varius. Nullam feugiat accumsan enim. Aenean nec mi. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;</font></td>
          </tr>
          <tr>
            <td align="left"><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Aenean vel neque ac orci sagittis tristique. Phasellus cursus quam a mauris. Donec posuere pede a nisl. Curabitur nec ligula eu nibh accumsan sagittis. Integer lacinia. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos hymenaeos. Praesent tortor dolor, pellentesque eget, fermentum vel, mollis ut, erat. Nullam mollis. Cras rhoncus tellus ut neque. Pellentesque sed ante.</font></td>
          </tr>
          <tr align="left">
            <td><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Donec at nunc. Nulla elementum porta elit. Donec bibendum. Fusce elit ligula, gravida et, tincidunt et, aliquam sit amet, metus. Nulla id magna. Fusce quis eros. Sed eget justo. Vivamus dictum interdum quam. Curabitur malesuada. Proin id metus. Curabitur feugiat. Nunc in turpis. Cras lobortis lobortis felis. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Mauris imperdiet aliquet ante. Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</font></td>
          </tr>
          <tr align="left">
            <td><font color="#FFFFFF" size="3" face="Geneva, Arial, Helvetica, sans-serif">Etiam tristique mauris at nibh sodales pretium. In lorem eros, laoreet eget, rhoncus et, lacinia nec, pede. Fusce a quam. Pellentesque vitae lacus. Vivamus commodo. Morbi euismod, ipsum id consectetuer ornare, nisi sem suscipit pede, vel dictum purus mauris eu leo. Proin sodales. Aliquam in pede nec eros aliquet adipiscing. Nulla a purus sed risus ullamcorper tempus. Nunc neque magna, fringilla quis, ullamcorper vitae, placerat sed, orci. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;</font></td>
          </tr>
        </table></td>
    </tr>
  </table>
</center>
</body>
</html>

[/code]

2) Cleaned Page

[code:html]

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
    <title>Less Markup, More Content</title>
    <meta name="description" content="The end result of lazy web page design resulting in a bloated mess of HTML.">
    <meta name="keywords" content="tag soup, messy html, bloated html">
    <link href="style.css" rel="stylesheet" type="text/css">
  </head>
  <body>
    <div class="content">
      <ul class="menu">
        <li><a href="#home" title="Not a real link">Home Page</a></li>
        <li><a href="#home" title="Not a real link">About Me</a></li>
        <li><a href="#home" title="Not a real link">Links</a></li>
      </ul>
      <h1>Welcome To My Site!</h1>
      <p>Isn&acute;t it suprisingly ugly and bland?</p>
      <p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Fusce est.  Maecenas pharetra nibh vel turpis molestie gravida. Integer convallis  odio eu nulla. Vivamus eget turpis eu neque dignissim dignissim. Fusce  vel erat ut turpis pharetra molestie. Cras sollicitudin consequat sem.  Vestibulum ante ipsum primis in faucibus orci luctus et ultrices  posuere cubiliaCurae; Maecenas augue diam, sagittis eget, cursus at,  vulputate at, nisl. Etiam scelerisque molestie nibh. Suspendisse ornare  dignissim enim. Sed posuere nunc a lectus. Vestibulum luctus, nibh  feugiat convallis ornare, lorem neque volutpat risus, a dapibus odio  justo at erat. Donec vel lacus id urna luctus tincidunt. Morbi nunc.  Donec fringilla sapien nec lectus. Duis at felis a leo porta tempor. </p>
      <p>Maecenas malesuada felis id mauris. Ut nibh eros, vestibulum nec,  ornare sollicitudin, hendrerit et, ligula. Suspendisse tellus elit,  rutrum ut, tempor eget, porta bibendum, magna. Nunc sem dolor, pharetra  ut, fermentum in, consequat vitae, velit. Vestibulum in ipsum.  Phasellus erat. Sed eget turpis tristique eros cursus gravida.  Vestibulum quis pede a libero elementum varius. Nullam feugiat accumsan  enim. Aenean nec mi. Vestibulum ante ipsum primis in faucibus orci  luctus et ultrices posuere cubilia Curae; </p>
      <p>Aenean vel neque ac orci sagittis tristique. Phasellus cursus quam a  mauris. Donec posuere pede a nisl. Curabitur nec ligula eu nibh  accumsan sagittis. Integer lacinia. Class aptent taciti sociosqu ad  litora torquent per conubia nostra, per inceptos hymenaeos. Praesent  tortor dolor, pellentesque eget, fermentum vel, mollis ut, erat. Nullam  mollis. Cras rhoncus tellus ut neque. Pellentesque sed ante. </p>
      <p>Donec at nunc. Nulla elementum porta elit. Donec bibendum. Fusce  elit ligula, gravida et, tincidunt et, aliquam sit amet, metus. Nulla  id magna. Fusce quis eros. Sed eget justo. Vivamus dictum interdum  quam. Curabitur malesuada. Proin id metus. Curabitur feugiat. Nunc in  turpis. Cras lobortis lobortis felis. Pellentesque habitant morbi  tristique senectus et netus et malesuada fames ac turpis egestas.  Mauris imperdiet aliquet ante. Lorem ipsum dolor sit amet, consectetuer  adipiscing elit. </p>
      <p>Etiam tristique mauris at nibh sodales pretium. In lorem eros,  laoreet eget, rhoncus et, lacinia nec, pede. Fusce a quam. Pellentesque  vitae lacus. Vivamus commodo. Morbi euismod, ipsum id consectetuer  ornare, nisi sem suscipit pede, vel dictum purus mauris eu leo. Proin  sodales. Aliquam in pede nec eros aliquet adipiscing. Nulla a purus sed  risus ullamcorper tempus. Nunc neque magna, fringilla quis, ullamcorper  vitae, placerat sed, orci. Pellentesque habitant morbi tristique  senectus et netus et malesuada fames ac turpis egestas. Vestibulum ante  ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; </p>
    </div>
  </body>
</html>

[/code]

External Style Sheet

[code:html]

@charset "utf-8";
body{font:13pt Geneva, Arial, Helvetica, sans-serif;}
.menu{float:left;height:800px;list-style-type:none;width:150px;}
.menu a, .menu a:visited{color:#4FB322;}
.content{background:gainsboro;color:white;margin:auto;position:relative;width:700px;}
h1{color:red;line-height:64px;text-align:center;}
p{margin:4px;}

[/code]

Even in this basic example you can see a fairly dramatic improvement when the excess HTML is trimmed and CSS is used to control style.  The original page is 51 lines, where the cleaned page is only 26 lines, plus 7 lines in the style sheet.  The cleaned page is a third the size of the original (counting the style sheet), and more realistically is actually half the size because the style sheet would be cached by most client browsers and wouldn't be downloaded for every page request.  As far as raw kilobytes it's a difference of 6KB to 4KB, which isn't a particularly exciting difference in this case, but one that is quickly magnified as the length of the page increases.  This is especially true with dynamic applications that pull content from a database, most importantly content such as product listings that utilize the same markup and are repeated multiple times. Fortunately in the case of dynamic pages involving looping procedures that output the same markup with different content, cutting down the bloat can be as easy as a few modifications to those procedures.

Furthermore if you wanted to change, for instance, the font color or the line-height in the original, you would have to modify every font tag and table cell to accomplish that.  Implementing those changes in the second example requires a single modification to the style-sheet.  The time-saved here is once again significantly amplified when considered in a situation with multiple pages (in many cases this can be hundreds or even thousands).

When all is said and done, this isn't meant to be a be-all end-all guide for optimizing your markup because I could write a book and still not cover it all.  Rather it was meant to highlight an aspect of web page performance and optimization that is usually swept under the rug in favour of those that are more directly appreciable such as eye-candy and new features.  While clean markup might not be as "glamourous" as other aspects of web development, it is an important aspect to keeping load time in check and a crucial factor in reducing them amount of time spent maintaining and updating.

Google Cracks Down On Sites That Are Losing Importance

There has been a lot of controversy in the world of SEO as of late, mostly centering around the PageRank penalties that Google has been handing out to enforce their paid-link policies.  The members of the SEO community, who do enjoy to voice their opinion from time to time, increased the size of Google's index by about 2.3% blogging about it (just kidding! I love the SEO community).  As more updates to the Toolbar PageRank happened over the weeks, more disgruntled SEOs, webmasters and site owner's came out of the woodwork to express their outrage at Google for decreasing the Toolbar PageRank of their site. 

This outrage was in the form of more blogging which, inevitably, a lot of people were reading.  For a while you would be hard-pressed to find someone who wasn't writing about the latest Google PageRank Massacre.  PageRank was on everyone's mind. 

Those of us, such as myself, who pay little attention to the little green bar sat back and watched, interested in the developments regarding Google's policies and the paid-link debate.  We would sift through post after post of someone denouncing Google because their Toolbar PageRank dropped from a 3 to a 2, loudly proclaiming that they shouldn't be penalized for the 2 paid links they have when the site around the corner has nothing but paid links and cloaked text.  

As more and more people shared their tragic stories of their little green bar shrinking I started to notice that something was missing in the SEO blog community.  No-one was complaining about their ranking positions anymore, wondering how their site could not possibly be in the top then when they had a 900 characters in their meta keywords tag and was positioning a div full of keywords at -9999px to the left.  Suddenly it appeared that everyone had a perfectly optimized site with an abundance of high-value, relevant backlinks with content so useful and compelling that bounce rate was a thing of the past.  But these sites were losing Toolbar PageRank, and since they were shining beacons of optimized perfection the only explanation is that they were casualties of the PageRank Massacre.

Give me a break.

I'm going to kill the sarcasm now and get to the point now.  Here are some things that you might want to consider if you are thinking you got slapped by the latest Google crackdown:

1) Does Your Site Actually Have Paid Links?

Surprisingly there are people out there who are complaining about Google penalizing them in a PageRank Massacre when their sites don't actually have any paid links, or anything that even looks like a paid link.  Whether its a cheap ploy to draw some worthless traffic or just a desire to be part of the group, it make you look foolish.

2)  PageRank Can In Fact Decrease

Take a deep breath on this one and repeat after me: "My PageRank is not guaranteed to go up on every update".  PageRank is a relative value and if your site is not keeping up with the rest of the herd your PageRank will decrease.  This is growing more prominent every day as the level of competition continues to increase with more sites actually making an effort in SEO.

3) Stop Making Excuses

Unless you are in minority of sites that were actually penalized without cause, step back and take a good, honest look at the state of your website.  Do you have quality content, user-friendly navigation, optimized titles and descriptions?  Is it really so perfectly optimized that the only possible explanation is Google's ire?

My point here, if somewhat longwinded, is that it is easy to hop on the "Google screwed me" bandwagon when your site doesn't perform to your expectations.  It takes a little more honesty and willingess to improve yourself, and your site, to recognize areas that need improvement and work on those.  When it comes down to it, whether you got smacked with a PageRank penalty or simply fell behind the rest of the pack the best thing you can do is roll up your sleeves and get back to work.  The time will always be much better spent and your site and your site's visitors will benefit because of it.

Using the ASP.NET Web.Sitemap to Manage Meta Data

kick it on DotNetKicks.com

In an ASP.NET application the web.sitemap is very convenient tool for managing the site's structure, especially when used with an asp:Menu control.  One aspect of the web.sitemap that is often overlooked is the ability to add custom attributes to the <siteMapNode> elements, which provides very useful leverage for managing a site's meta-data.  I think the best way to explain would be through a simple example.

Consider a small site with three pages: /default.aspx, /products.aspx, and /services.aspx. The web.sitemap for this site would look like this:

<?xml version="1.0" encoding="utf-8" ?>

<siteMap xmlns="http://schemas.microsoft.com/AspNet/SiteMap-File-1.0" >

<siteMapNode url="~/" title="Home">

     <siteMapNode url="~/products.aspx" title="Products" />
     <siteMapNode url="~/services.aspx" title="Services" />

</siteMapNode>

 

Now let's add some custom attributes where we can set the Page Title (because the title attribute is where the asp:Menu control looks for the name of a menu item and it's probably best to leave that as is), the Meta Description, and the Meta Keywords elements. 

 

<?xml version="1.0" encoding="utf-8" ?>

<siteMap xmlns="http://schemas.microsoft.com/AspNet/SiteMap-File-1.0" >

<siteMapNode url="~/" title="Home" pageTitle="Homepage" metaDescription="This is my homepage!" metaKeywords="homepage, keywords, etc">

     <siteMapNode url="~/products.aspx" title="Products" pageTitle="Our Products" metaDescription="These are our fine products" metaKeywords="products, widgets"/>
     <siteMapNode url="~/services.aspx" title="Services" pageTitle="Our Services" metaDescription="Services we offer" metaKeywords="services, widget cleaning"/>

</siteMapNode>

 

Now with that in place all we need is a way to access these new attributes and use them to set the elements on the pages.  This can be accomplished by adding a module to your project, we'll call it "MetaDataFunctions" for this example.  In this module you add the following procedure.

 

Public Sub GenerateMetaTags(ByVal TargetPage As Page)
    Dim head As HtmlHead = TargetPage.Master.Page.Header
    Dim meta As
New HtmlMeta

    If SiteMap.CurrentNode IsNot Nothing Then
      meta.Name = "keywords"
      meta.Content = SiteMap.CurrentNode("metaKeywords")

      head.Controls.Add(meta)

      meta =
New HtmlMeta
      meta.Name = "description"
      meta.Content = SiteMap.CurrentNode("metaDescription")
      head.Controls.Add(meta)

      TargetPage.Title = SiteMap.CurrentNode.Description
   
Else

      meta.Name = "keywords"
      meta.Content = "default keywords"

      head.Controls.Add(meta)

      meta =
New HtmlMeta
      meta.Name = "description"
      meta.Content = "default description"

      head.Controls.Add(meta)

      TargetPage.Title = "default page title"
    End If
  End Sub

 

Then all you have to do is call this procedure on the Page_Load event like so...

 

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load

    GenerateMetaTags(Me)    

End Sub

 

..and you'll be up and running.  Now you have a convenient, central location where you can see and manage your site's meta-data.

SEO Best Practices - Dynamic Pages in ASP.NET

kick it on DotNetKicks.com

One of the greatest time-savers in web development is the use of dynamic pages to serve up database driven content.  The most common examples of which are content management systems and product information pages.  More times than not these pages hinge on a querystring parameter such as /page.aspx?id=12345 to determine which record needs to be retrieved from the database and output to the page.  What is surprising is how many sites don't adequatly validate that crucial parameter.

Any parameter that can be tampered with by a user, such as a querystring, must be validated as a matter of basic security.  That being said, this validation must also adequately deal with a situation when that parameter is not valid.  Whether the parameter is for a non-existant record, or whether the parameter contains letters where it should only be numbers, the end-result is the same: the expected page does not exist.  As simple as this sounds there are countless applications out there that seem to completely ignore any sort of error handling, and are content to have Server Error in "/" Application be the extent of their error handling.  Somewhere in the development cycle the developers of these application decided that the default ASP.NET error page would be the best thing to show to the site's visitors, and that a 500 SERVER ERROR was the ideal response to send to any search engine spiders that might have the misfortune of coming across a link with a bad parameter in it.

With a dynamic page that depends on querystring parameters to generate its content, the following basic measures should be taken:

Protected Sub Page_Load(ByVal sender As Object, ByVal e As EventArgs) Handles Me.Load    
'Ensure that the requested URI actually has any querystring keys
If Request.Querystring.HasKeys() Then

'Ensure that the requested URI has the expected parameter, and that the parameter isn't empty
If Request.Querystring("id") IsNot Nothing Then

'Perform any additional type validation to ensure that the string value can be cast to the required type.

Else

Response.StatusCode = 404

Response.Redirect("/404.aspx",True)

End If

Else

Response.StatusCode = 404

Response.Redirect("/404.aspx",True)

End Sub


This is a basic example, but demonstrates how to perform simple validation against the querystring that will properly redirect anyone that reaches the page with a bad querystring in the request URL.  A similar approach should be taken when attempting to retrieve the data in the case that the record is not found.

Another useful trick is to define the default error redirect in the web.config file (<customErrors mode="RemoteOnly" defaultRedirect="/error.aspx">), and use that page to respond to the error appropriately by using the Server.GetLastError() method to get the most recent server exception and handling that exception as required.

There are many other ways to manage server responses when there is an error in your ASP.NET application.  What is most important is knowing that you need to handle these errors properly, up to and including an appropriate response to the request.