Colin Cochrane

Colin Cochrane is a Software Developer based in Victoria, BC specializing in C#, PowerShell, Web Development and DevOps.

Reducing Code Bloat Part Two - Semantic HTML

In my first post on this subject, Reducing Code Bloat - Or How To Cut Your HTML Size In Half, I demonstrated how you can significantly reduce the size of a web document by simply moving style definitions externally and getting rid of a table-based layout.  In this installment I will look at the practice of semantic HTML and how effective it can be at keeping your markup tidy and lean.

What is Semantic HTML?

Semantic HTML, in a nutshell, is the practice of creating HTML documents that only contain the author's "meaning" and not how that meaning is presented.  This boils down to using only structural elements and not presentational elements.  A common issue with semantic HTML is identifying the often subtle differences between elements that represent the author's meaning versus a presentational element.  Consider the following examples:

I really felt the need to <i>emphasize</i> the point.


I really felt the need to <em>emphasize</em> the point.

Many people would consider these two snips of code to be essentially the same; a paragraph, with the word "emphasize" in italics.  When considering the semantic value of the elements, however, there is one significant difference.  The element <i> is a purely presentational (or visual), and it has no meaning to the semantic structure of the document.  The <em> element, on the other hand, has a meaningful semantic value in the document's structure because it defining its contents as being emphasized.  Visually we usually see the contents both <i> and <em> elements rendered as italicized text which is why the difference often seems like nitpicking, but the standard italicizing for <em> elements is simply the choice made by most web browsers, it has no inherent visual style.

Making Your Markup Work for You

Some of you might be thinking to yourself "some HTML tags have a different meaning than others? So what?".  Certainly a reasonable response, because when it comes down to it, HTML semantics can seem like pointless nitpicking.  That being said, this nitpicking drives home what HTML is really supposed to do: define markup.  Think of it like being a writer for a magazine.  You write your article, including paragraphs and indicating words or sentences that should be emphasized, or considered more strongly, in relation to the rest, and submit it.  You don't decide what font is used, the colour of the text, or how much space should be between each line, that's the editor's job.  It is exactly the same concept with semantic HTML: the structure and meaning of the content is the job of HTML, and the presentation is the job of the browser and CSS.

Without the burden of presentational elements you only have to worry about a core group of elements with meaningful structure value.  For instance...

This is a chunk of text <span class="red14pxitalic">where</span> random words 
are <span class="green16pxbold">styled</span> differently,
<span class="red14pxitalic">with</span> some words 
<span class="green16pxbold">being</span> red, italicized and at 14px, and others
being green, bold, and 16px.
with an external style-sheet definition...

...turns into this...

This is a chunk of text <em>where</em> random words 
are <strong>styled</strong> differently, <em>with</em> some words 
<strong>being</strong> red, italicized and at 14px, and others being green, 
bold, and 16px.

with an external style-sheet definition...

p em{color:red;font-size:14px;font-style:italic;}
p strong{color:green;font-size:16px;font-weight:bold;}

The second example accomplishes the same visual result as the first, but contains actual semantic worth in the document's structure.  It also illustrates how much simpler it is to create a new document because you don't have to worry about style.  You create the document and use meaningful semantic elements to identify whatever parts of the content are necessary, and let the style-sheet take care of the rest (assuming of course that the style-sheet was properly defined with the necessary styles).  By using a more complete range of HTML elements you will find yourself needing <span class="whatever"> tags less and less and find your markup becoming cleaner, easier to read, and smaller.


Code Size Comparison


  HTML Characters (Including Spaces) CSS Characters (Including Spaces)
Example One 302 127
Example Two 216 103


Part Three will continue looking at semantic HTML, as well as strategies you can use when defining your style framework.

Comments (3) -

  • Jason

    12/17/2007 10:00:32 AM |

    Great post! I find that this is the hardest transitional element for me in switching from old bloated style of coding to the new pure CSS methods.  This and putting as little "extra" HTML in the document as possible, while still trying to achive my layout via CSS.  Ive found that i still tend to do something like :
    <div id="sidebar">
    <li>Item 1</li>
    <li>Item 2</li>
    when in reality the div is purely extra - and I should have just assigned the ID to the UL and used that as the core containing element for the sidebar, etc.

    Thanks for the great posts!

  • Jason H

    1/22/2008 7:46:19 AM |

    The example of CSS being the layout editor is a great way of thinking about the different roles!

  • Brian H

    2/8/2009 12:24:36 PM |

    sometimes div is actually semantically more correct. For you, adding the div and id class would be more correct because div means division and the sidebar is a division in the page.

Pingbacks and trackbacks (1)+

Comments are closed