Current Events: Google’s Improved Flash Indexing

Posted: July 07, 2008 Comments(4)

News came about last week that Google improved their algorithm to better index Flash content. Understandably, Google has received this request for quite some time from a number of developers, but there are definitely some issues to consider when looking at this update.

I’ll get it out of the way early by saying that this update has absolutely no effect on the repercussions of using Flash when taking accessibility concerns into mind. This algorithm change is purely search engine technology with a separate listing of pros and cons.

What’s different, technically?

From what’s been posted about the algorithm update, the only big change as far as Flash indexing is the fact that Google can now see any text included in a Flash piece. The update also includes the ability to crawl any referenced URLs in the movie as well.

It’s important to keep in mind what content will remain absent from the indexing algorithm:

At present, we are only discovering and indexing textual content in Flash files. If your Flash files only include images, we will not recognize or index any text that may appear in those images. Similarly, we do not generate any anchor text for Flash buttons which target some URL, but which have no associated text.

Also note that we do not index FLV files, such as the videos that play on YouTube, because these files contain no text elements.

The limitation here runs parallel to that of the indexing of HTML, but that’s not to say HTML and SWF are on the same level. When considering HTML, Google is able to see alt attribute information for images in HTML documents, which should contain valuable information outlining the content of the picture.

Something else to keep in mind is the provided list of limitations with Google’s updated ability to crawl Flash:

There are three main limitations at present, and we are already working on resolving them:

  1. Googlebot does not execute some types of JavaScript. So if your web page loads a Flash file via JavaScript, Google may not be aware of that Flash file, in which case it will not be indexed.
  2. We currently do not attach content from external resources that are loaded by your Flash files. If your Flash file loads an HTML file, an XML file, another SWF file, etc., Google will separately index that resource, but it will not yet be considered to be part of the content in your Flash file.
  3. While we are able to index Flash in almost all of the languages found on the web, currently there are difficulties with Flash content written in bidirectional languages. Until this is fixed, we will be unable to index Hebrew language or Arabic language content from Flash files.

The list really gets me curious to see how things will go as far as dynamically injecting Flash into a document and whether it will be beneficial for search engine saturation. For instance: what if Google does execute the JavaScript which inserts a Flash movie into my document, but the alternate content I provided was much more search engine friendly? How do I ensure that Google indexes the content I intended to be indexed as opposed to the more limited Flash version? There are plenty of questions to be asked in reaction to this update.

What does it mean for us?

This update from Google will probably have some interesting repercussions. For instance, it doesn’t take much time to infer who exactly was making the multitude of requests for SWF indexing. If you’re not well aware of effective ways to provide alternate content, who else are you going to blame when your client is asking why they don’t show up in any search results?

I’m not sure if anyone else shares this type of concern, however. Taking a step back and thinking about it; is an improved algorithm that indexes Flash content going to act as a benefit to the general public? I’m not so sure. It’s as though the update can be viewed as both a positive and a negative at the same time. On one hand, it could be very beneficial when implemented properly, ensuring equivalent alternate content continues to be provided. On the other hand, many developers will now (lazily) opt out of providing alternate content simply because Google can read their text content.

My initial reaction to this update was mostly negative and partially positive. One of the major reasons I felt that way was after reading this paragraph from the more detailed post on Webmaster Central:

We’ve developed an algorithm that explores Flash files in the same way that a person would, by clicking buttons, entering input, and so on. Our algorithm remembers all of the text that it encounters along the way, and that content is then available to be indexed. We can’t tell you all of the proprietary details, but we can tell you that the algorithm’s effectiveness was improved by utilizing Adobe’s new Searchable SWF library.

The scariest thing to read in that paragraph for me is “We can’t tell you all of the proprietary details…” Now, I’m not so ignorant as to say that Google shouldn’t have some secrets guarding how they remain the big name in search, but that statement was a big reminder that virtually everything surrounding this update is behind closed doors of both Google and Adobe.

The use of Flash is remarkable as far as plugin saturation is concerned, but I think it’s something that should be kept in mind with updates like this.

How does this change SEO?

I’m very interested to see how much weight is given to Flash content with this update. Do you think Google will treat text found in Flash equally to that of content found in an HTML document? It’s a strange notion to consider, and I’m not sure I could make an educated guess quite yet, but I’d be very interested to see what Google has to say about it over the coming months.

In the grand scheme of things, in an attempt to not think like a Web developer, this update will probably be beneficial for the general public. Poorly developed Flash sites will now be that much easier to find. Stepping back into professional opinion, I’m very leery of the update. It’s as though we’re taking a step back in a way, but are we? Flash is obviously going nowhere; it’s here to stay. Should we be glad that Adobe and Google are working together to improve the technology? It’s hard to say that we’re worse off as a result of this update. Like all things, we can only hope it’s not taken advantage of.

Get my newsletter

Receive periodic updates right in the mail!
  • This field is for validation purposes and should be left unchanged.

Comments

  1. I am less than impressed with it all, to say the least. So it can pull out text from different areas – what about making sense of that text? Simply pulling text out of it may help to see keywords, but how would they be weighted like they would on a semantic HTML where we can use headers, emphasis, links, etc. I think the results will still be a jumbled mess – because I don’t think it can index flash well.

    Just as with HTML based sites, there are some good sites that follow standards and some table based sites that are the polar opposite. Google indexes both – but the first will always fair better due to it’s logical structure, and in many cases, the density of keywords found in the content.

    I don’t know how this can revolutionize or change the SEO game, it just seems like they are finally doing something that should have been done years ago.

    Regardless of it all, and as you stated in the very beginning – this has nothing to do with accessibility. Flash is still completely broken in this aspect.

  2. I think that Google and Adobe should be able to go ahead and do their own thing, meaning playing with spidering forms, Flash and JavaScript in whatever way they can. I know it will be clunky and take a while until they can really extract some semantics from these elements, but it’s great they are thinking about exposing this content.

    I just do wish they would look at the bigger picture and be responsible for their own actions. Just ‘kind of’ gettting that information exposed does leave web standards advocates in a bit of a lurch when trying to promote best practice development and design.

    It’s not all about exposing content for SEO (as you mentioned), there are other factors to think about. If Google doesn’t do this then they can really mess up the web.

  3. @Nate Klaiber: Very well said. The fact that the content now being pulled has very little meaning is a major consideration. I agree, this should have been taken care of a long time ago, and maybe by now we wouldn’t be dealing with this influx of meaningless content.

    @Scott G: There are definitely pros to the update, but not without equivalent odds, don’t you think? The only end product for ‘the rest of us’ will be a proprietary link between the two technologies. You nailed it when you said that it’s not about simply exposing the content for SEO, there’s a lot to keep in mind.

    Thanks to you both for your thoughts!

Leave a Reply

Your email address will not be published. Required fields are marked *