Google vs. Web Standards – Part 2

Posted: March 06, 2006 Comments(23)

In continuance of Part 1:

How could Google better their situation? First, a step back needs to be taken and the overall picture needs to be dissected. What is Google offering? To put it simply, Google’s main founding purpose was to offer the best search engine on the Web. Looking further into Google’s mission statement, the issues of accessibility and usability are brought to our attention. That is an excellent thing to read as a Google user, but do they stick to their word? When taking a look at the most commonly viewed Web pages of Google, the answer is no. Invalid markup which is also less than semantic makes for an accessibility nightmare. While Google may be easy to use for the majority of its users, using the service with any sort of disability would be a daunting task.

Google’s Home Page

At first look, Google’s home page seems like a simple design which is self explanatory and easy to use. While that may have been the intentions of Google, that point doesn’t reach everybody. When taking a look at the markup of Google’s home page, we’re faced with a table with no helpful summary that is full of images and links. There is even a form in there that we can assume is for searching the Web — that is, only if we have heard of Google before and know why it is there. There is no dictation of what the page is there for and although it may be obvious for the majority, it is not very friendly towards first time visitors. While this may seem to some as taking things out of perspective, it is to bring focus to the issue at hand: one set of solutions which will enable Google to become standardized and more accessible to it’s users.

How could it be changed?

The first step in changing Google’s markup domain-wide would be including a DOCYTPE. This is a natural first step that goes without much explanation. Including the DOCTYPE, in short, will allow Web browsers to treat the markup in the way you intended. The DOCTYPE also serves other purposes, but they are beyond the scope of this article (See footnote link 1). The next step to take would be removing any styles and putting them in an external CSS document. The foundational purpose of CSS is to separate style from content, what better way than keeping the code in two separate documents?

After the basics have been taken care of, the home page code should be revamped and standardized according to its semantic purpose. Each element should be examined and listed. For example, the Google home page has the following elements:

  • List – Service based links
  • Image – Google logo
  • List – Services Google offers
  • Form – Used for searching
  • List – Links to customize search
  • List – Other Google company related links
  • Heading – Indicating copyright

As stated earlier, the Google home page is quite simplistic in nature. When keeping semantics in mind, markup should be used in the way it was intended. Currently Google’s home page markup looks like this:

<html><head><meta http-equiv="content-type" content="text/html; charset=UTF-8"><title>Google</title><style><!--
body,td,a,p,.h{font-family:arial,sans-serif;}
.h{font-size: 20px;}
.q{color:#0000cc;}
//-->
</style>
<script>
<!--
function sf(){document.f.q.focus();}
function rwt(el,ct,cd,sg){el.href="/url?sa=t&ct="+escape(ct)+"&cd="+escape(cd)+"&url="+escape(el.href).replace(/\+/g,"%2B")+"&ei=6MYLRKv4FcygaKr6raIE"+sg;el.onmousedown="";return
true;}
// -->
</script>
</head><body bgcolor=#ffffff text=#000000 link=#0000cc vlink=#551a8b alink=#ff0000 onLoad=sf() topmargin=3 marginheight=3><center><table border=0 cellspacing=0 cellpadding=0 width=100%><tr><td align=right nowrap><font size=-1><a href="/url?sa=p&pref=ig&pval=2&q=http://www.google.com/ig%3Fhl%3Den" onmousedown="return rwt(this,'pro','hppphnu:def','')">Personalized Home</a> | <a href="https://www.google.com/accounts/Login?continue=http://www.google.com/&hl=en">Sign in</a></font></td></tr><tr height=4><td><img alt="" width=1 height=1></td></tr></table><img src="/intl/en/images/logo.gif" width=276 height=110 alt="Google"><br><br><form action=/search name=f><script><!--
function qs(el) {if (window.RegExp && window.encodeURIComponent) {var ue=el.href;var qe=encodeURIComponent(document.f.q.value);if(ue.indexOf("q=")!=-1){el.href=ue.replace(new RegExp("q=[^&$]*"),"q="+qe);}else{el.href=ue+"&q="+qe;}}return 1;}// -->
</script><table border=0 cellspacing=0 cellpadding=4><tr><td nowrap><font size=-1><b>Web</b>    <a id=1a class=q href="/imghp?hl=en&tab=wi" onClick="return qs(this);">Images</a>    <a id=2a class=q href="http://groups.google.com/grphp?hl=en&tab=wg" onClick="return qs(this);">Groups</a>    <a id=4a class=q href="http://news.google.com/nwshp?hl=en&tab=wn" onClick="return qs(this);">News</a>    <a id=5a class=q href="http://froogle.google.com/frghp?hl=en&tab=wf" onClick="return qs(this);">Froogle</a>    <a id=7a class=q href="/lochp?hl=en&tab=wl" onClick="return qs(this);">Local</a>    <b><a href="/intl/en/options/" class=q>more »</a></b></font></td></tr></table><table cellspacing=0 cellpadding=0><tr><td
width=25%> </td><td align=center><input type=hidden name=hl value=en><input maxlength=2048 size=55 name=q value="" title="Google Search"><br><input type=submit value="Google Search" name=btnG><input type=submit value="I'm Feeling Lucky" name=btnI></td><td valign=top nowrap width=25%><font size=-2>  <a href=/advanced_search?hl=en>Advanced Search</a><br>  <a href=/preferences?hl=en>Preferences</a><br>  <a href=/language_tools?hl=en>Language Tools</a></font></td></tr></table></form><br><br><font size=-1><a href="/ads/">Advertising Programs</a> - <a href=/services/>Business Solutions</a> - <a href=/intl/en/about.html>About Google</a></font><p><font size=-2>©2006 Google</font></p></center></body></html>

This code alone results 48 errors according to the W3C Validator.

A Possible Solution

Reworking this page from the ground up — keeping standards and semantics in mind — could result in the following valid markup:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">

<head>

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="description" content="Use Google to search the Internet" />
<meta name="language" content="english, en" />
<meta name="keywords" content="search, web, images, groups, news, froogle, local" />
<meta name="author" content="Google www.google.com" />
<meta name="publisher" content="Google www.google.com" />
<meta name="robots" content="noindex, nofollow" />

<title>Google</title>

<link rel="stylesheet" href="docs/css/screen.css" type="text/css" media="screen" />

<link rel="icon" href="favicon.ico" type="image/x-icon" />
<link rel="shortcut icon" href="favicon.ico" type="image/x-icon" />

</head>

<body>

<ul class="links" id="services">
  <li><a href="#">Personalized Home</a></li>
  <li><a href="#">Sign in</a></li>
</ul>

<img src="images/logo.gif" alt="Google Logo" width="276" height="110" class="logo" />

<ul class="navigation">
  <li>Web</li>
  <li><a href="#">Images</a></li>
  <li><a href="#">Groups</a></li>
  <li><a href="#">News</a></li>
  <li><a href="#">Froogle</a></li>

  <li><a href="#">Local</a></li>
  <li><a href="#" class="more">more »</a></li>
</ul>

<form action="null.html" method="get" class="search">
  <fieldset>
  <legend>Search</legend>
  <input class="textfield" type="text" name="query" maxlength="2048" title="Google Search" />
    <input class="button" type="submit" name="search" value="Google Search" />

    <input class="button" type="submit" name="lucky" value="I'm Feeling Lucky" />
  </fieldset>
  <ul class="options">
    <li><a href="#">Advanced Search</a></li>
    <li><a href="#">Preferences</a></li>
    <li><a href="#">Language Tools</a></li>
  </ul>

</form>

<ul class="links">
  <li><a href="#">Advertising Programs</a></li>
  <li><a href="#">Business Solutions</a></li>
  <li><a href="#">About Google</a></li>
</ul>

<h6>©2006 Google</h6>

</body>
</html>

Naturally, if Google were to adopt this solution, they would want to retain the look and feel that has become so known by their users. You may note that various elements are missing, including pipes and hypens which are used on Google’s current home page to give certain link groups separation. These characters were removed due to the fact that the underlying structure was altered and the inclusion of pipes and hyphens would remove certain semantic attributes of the document.

Please accept the following as a proof of concept: Google’s Home Page – Validated and Semantic.

Adopting this solution would not be a major obstacle for Google. The markup is standard and semantically correct and seems to make more sense. The look and feel has been retained almost identically, yet the underlying functionality has been greatly expanded. It may be argued that the page does not in fact align exactly with Google’s current solution. In rebuttal, the differences are negligible and in all honesty not important. To correct the markup and retain absolute ‘pixel-to-pixel’ consistency would result in a bloated stylesheet and unnecessary complexity.

Google Search Results

A typical page of Google search results, at its base, contains a list of results pertaining to your query. Without question, there are many other elements included, but at the root, a Google search result page conveys a list of links in order of relevancy to your search. The markup, however, doesn’t reflect such content. The markup reflects a document containing first an unidentifiable table containing an image, some links, and a form. Upon further inspection we see some more tables with what seem to be an indication of some search results figures. Beyond that are more tables which lack a summary that contain paragraphs, links, and some other information. When looking at the markup it is virtually impossible to determine what content is included in this document.

Take a look at the markup for a Google search on ‘Web Standards’. If you are using Firefox and Chris Pederick’s Web Developer Extension, take a second to outline the table cells in this document. The amount of tables used and the way in which they are included does not make much sense semantically. While some may argue that search results may be tabular data, it would be difficult to determine what would define a column header. In this example, a group of search results will be deemed an ordered list due to the fact that they are ranked in an order of relevancy.

For those of you who may not be using Firefox or do not have the Web Developer Extension currently installed, the following image displays what you would see if you were to outline the table cells of a Google search results page on ‘Web Standards’:

Image depicting a table outline of Google search results

A Possible Solution

If the same process as applied to Google’s home page was applied to the search results, what would result is a semantically correct, valid document, that is widely accessible and usable to the majority of Google’s users.

Please accept the following as a proof of concept: Google’s Search Results – Validated and Semantic.

Both the markup of the original document and the accompanying CSS were reworked from the ground up focusing mainly on the semantics of what this particular document had to offer. The markup was then standardized and you can see what resulted. As opposed to the original document, which contained 239 validation errors, we now have a completely valid document with proper CSS.

What Was Changed?

The validation of Google’s search results was performed by stepping back and again looking at the specific elements the document had to offer. As stated before, the main focus of the document was an ordered list of search results. The first action was to convert the original results into a listed form. Next, each search item was examined. What resulted was a list containing a header which was the link to the related document. Following that was a paragraph which included the page description. Finally, the information including the URL, a link to similar pages, and a link to Google’s cache was included. This last line of text did not receive any specific markup due to the fact that it was simply part of the list item that was the search result. No further markup was needed.

Once this was complete, the sponsored links were taken into consideration. These too are an ordered list, and basically reflect exactly the same elements as the main search results. This list was simply contained in a div and floated to the right. The same markup is applied to this list. Next was including the page’s header. This was accomplished first by marking up the appropriate areas. There was the header as a whole, but contained within the header there were separate spans of text, each offering different pieces of information. These spans were styled and separated, and the header was then styled to reflect that of Google’s. The next step was to include the logo, search form, services navigation, and other various links. This was done simply by porting the markup from the home page and making the necessary style adjustments to reflect properly on Google’s current design.

Search result page navigation was a challenge of its own. Semantically this element is a list containing links. The challenge was in turning a list into what Google has had on display for so long. This feature was probably the most difficult element to style. Continuing with the footer, a div was included to reflect the existence of the Google Toolbar that is offered on many search results pages. From time to time this section varies depending on what seems to be random occurrence. For the example given, only this one example will be put on display. The word ‘Free!’ was put in red text originally, and this was replaced by styling an em block to reflect the emphasis Google wanted to put on this word.

Finally, the last elements of the footer include another div containing another simplistic search form and a list of links regarding Google’s services. Below that is a copyright header and that concludes the elements contained in a Google search results page. It should not be made more confusing than that.

Why?

Google’s mission is to make the world’s information as usable and accessible as possible. Currently, when looking closely at the foundational markup of their most popular service, they are not doing so. Making a change such as what was done here does not appear to be a gigantic obstacle and should be implemented in one form or another.

Please bear in mind that this is one possible solution of many. While it may have discrepancy in your eyes, it is the author’s hope that much criticism and discussion can be brought to this solution so that it may be revised and brought to the utmost of potential.

Footnotes
  1. Fix Your Site With the Right DOCTYPE! – A List Apart

Digg this article

Get my newsletter

Receive periodic updates right in the mail!
  • This field is for validation purposes and should be left unchanged.

Comments

  1. Excellent, glad to see you gave some proposed solutions to follow up on your Part 1 article.

    Something else worth mentioning is that Microsoft’s MSN Search http://search.msn.com/ is now standards compliant (strict nonetheless!) and even their homepage http://www.microsoft.com is compliant now (HTML 4.0 but still).

    As much as I love Google and what they stand for, Microsoft took a giant step ahead of them with that move and it would be nice to see Google counter with some of the improvements you’ve made.

  2. I absolutely love your stunning AND validating remake of Google’s pages. I could definitely see Google fixing their homepage with ease; however, fixing the search page would be a bit trickier. My guess is that they have hardcoded the HTML into their search code. It would probably be a difficult task to go in and change it all since the search page probably pulls the information from their database just like that. Although, if the rest of their code looks like their HTML code does, then it’s definitely time for an update anyway.

    If google does update their search pages, I hope that they keep the clean and simple layout they have now. I would really hate to see Google go the way of Yahoo and add a bunch of useless junk to the homepage.

  3. @Michael: WOW! I hadn’t even realized that before writing this article — do you know when they standardized? It is great to see that MSN Search is not only valid, but also semantic. Microsoft loses a point or two for continuing with a tabular layout, but they have made a positive improvement. Hopefully this will be some direct inspiration for Google to TIDY up their code.

    @astridas: You bring up a good point with hardcoded HTML. It is probably safe to assume such a situation exists. It may take a bit of time to change now, but in the long run they would be better off converting to a more standard method. I also agree with your point of Google retaining their look and feel. It has given them this much success — why change it?

  4. I’ve been wondering when someone was going to point that out. =P

    Well done. I’ll put it this way, I had to watch my address bar when looking at the example links… =)

  5. Ah, this is an issue that has been discussed in the past among standardistas and it’s good to see such a well thought-out article covering the whole issue. My experience in the past, from hearing Google’s remarks when responding to questions about their rejection of standards, has been that Google just doesn’t care about semantics or CSS. They think they know the best way to serve their engine (lighter versions using CSS have been made) and they won’t listen. They also insist that their search engine is some kind of “usability masterpiece,” but to them usability does not include blind users or any other users who use assistive technology. You summed it up best when you said: “There is even a form in there that we can assume is for searching the Web — that is, only if we have heard of Google before and know why it is there.” This is exactly the dilemma that any website faces when semantics are overlooked.

  6. Great article!

    I do, however, have one problem—you did not define a background colour for the body, so users with a colour (other than white) specified as their background colour, the page will not render how you intended.
    😉

  7. Good article, but I’ve got one issue with semantic HTML in the article. You use only one heading in the document, and that’s an H6 for the footer copyright? How is that semantic?

  8. @Joebert: I was having the same issue when developing my versions of the Google pages. It was actually inspiration to keep going!

    @Christian: If that is truly the way Google feels about standards, semantics, and Web accessibility, they should really change their mission statement don’t you think?

    @Blake: Thank you for pointing out such an oversight. I have updated the stylesheet to include your suggestion — thanks for catching it!

    @Jay: You are correct, the search page itself does not use any header but an h6 for the footer. I took a look at what that small line of text represented, and a heading level 6 was the best I could determine as correct representation. The search results page does have its fair share of headers, using an h1 for the main header of the document and h2’s for the query results. Do you have any suggestions for a change? I am all ears and hoping for some criticism. Thanks for taking the time to really look into the example.

    @baldo: While people using older (ancient) browsers may see a Google page a bit more “artistically” (for lack of a better word) laid out, using a semantic version allows for increased accessibility. Removing the tabular layout creates a more accessible and usable Web site for those using an older browser, even though it may not look as pretty.

    Thanks to everyone for your great comments. I’m glad to hear that people aren’t just taking the examples and saying ‘thats nice’. I’m really getting some great feedback and excellent suggestions for change — keep them coming!

  9. Don’t think google will waste diskspace on HTML.

    Using tables allows you to use the same html/css markup for all browsers (mainly older) and still have a neally identical output. (And i don’t want to start a div vs. table discussion, i 100% for div’s.) There are workarounds that allow you to output differend stylesheet to differend browsers, but doubt google will chose to use them. There is also javascript, but then the style would depend on users having javascript.

    I think the main reason google hasn’t already taken this step is all the subsites/functions that they would have to keep track of. Google.dk, google.de google.co.za and so on. They have a million hits a second if not more, by all browsers out there.(0.1% in Google-land is like 1.000 per. sec.). They will have to do some serious browser testing, way beyond a simple html validation.

  10. @Jonas: Thanks for taking the time to write your opinion! I’ve adjusted the font color due to some similar observations… is this better?

    It is absolutely doubtful that Google will change much of anything discussed here any time soon. It was just a mild experiment for me to see what the actual changes involved would be. Thanks again for posting your thoughts and I hope you find the site useful in the future.

  11. I believe that older browser support could be a reason for staying with tables. However, if you take a look at Microsoft (the king of failed compliancy and older browsers) you will notice they have started thier move toward compliancy. This makes a big statement as Microsoft is the world leader in the Home market.

    It will be difficult for Google to update all of their servers with new code. Although, I think it is still something that needs to be done to move into the future instead of holding on to the past.

  12. I applaud what you have done. Google is a role model and leader and they should follow standards as much as possible. I understand the argument made by another visitor that Google wants to serve the same page to everyone, but that doesn’t mean throwing in the towel and serving “table soup”.

    One small point. When I used the View > Text Size > Increase command on your version of the Google home page, the three lines beginning with “Advanded Search” moved down and didn’t look right. The existing Google home page doesn’t work that way. I don’t think a replacement page has to work exactly the same, but I think the wrap problem I described shouldn’t be evident until the text is a lot bigger.

  13. @astridas: I’m sure it would be difficult to modify all that code — but I believe that they’re going to have to do it sooner or later if they expect to expand their service in the easiest way possible.

    @Jonas: Glad it’s better for you — I like it more too

    @John: Thanks for taking the time to comment, I really appreciate it. I also think that Google should take more of an active stance in standardizing their code. I’m glad you looked into the layout that I had come up with and I’m going to look into the scalability issue you mentioned. Hope to hear more from you in the future.

  14. It’s an economy-of-scale issue. Google’s lack of standards is intentional – they make their page viewable in the broadest number of browsers while serving the smallest number of bits in the shortest amount of time. They don’t release the actual number of page views they get each day; news sources place it somewhere between 150 million (in 2003) and 2.4 billion. Suppose the number is 500 million (not unreasonable – myspace.com gets 1.5B hits per day, and digg.com is getting close to 150M). Their current page serves up 9875 bytes. Your proof of concept page weighs in at 15265 bytes (both values include the logo). The difference, 5390 bytes, would add 1.5 terabytes/day, or very nearly a petabyte/year, to their server load (assuming 500M page views/day). The actual numbers vary (only the first page gets the big logo for instance… results pages have smaller overhead…. but then again their actual page view count might be 2 to 5 times larger) but you see the magnitude of the problem. All Google has to gain from switching to standards (other than recognition from you and I) is added overhead and longer page load times.

  15. @foobario: Thanks for writing such an insightful comment. You’re absolutely correct in what you say and undoubtedly those are major reasons for Google to stay right where they are. Bear in mind that I’m sure the proof of concept I came up with could be streamlined a great deal. Also remember that stylesheets are more often than not cached on the first visit so extrapolating bandwidth costs over the year would be a difficult task. All in all I completely agree with your comment and I’m glad you took the time to write it. Thanks!

  16. My new homepage <3

    With the help of a Greasemonkey Script

    It is quite amazing how a multi-billion dollar company can’t make such simple changes to their page… almost unbelievable.. It’s like they have 2 pages.. how can they not fix them : (

  17. I recently got down to business bringing the Google Search Appliance (a close relation to google.com) into standards-friendly territory. Read all about it on Google’s Enterprise Blog and at Joe’s Apt.

    This project has the full support of Google Enterprise and is an open source project over at code.google.com. (Search Appliance not necessary to play along at home.)

  18. Fantastic article.

    I was just looking at the Web Standards Award site (http://www.webstandardsawards.com/) the other day and their closing message saying that their “mission is complete”. I beg to differ. Looking at the sites in the archive, it’s the little guys who are creating the standards compliant sites, whilst the big companies with all the money are still ignoring the issue. I shake my head every time I see YouTube and MySpace and how popular they have become. These are the sites people are looking up to, yet they are so poorly coded.

    It’s time we started putting pressure on the big sites to switch to standards compliant design. There’s no excuse.

Leave a Reply

Your email address will not be published. Required fields are marked *