Home -> My Blog -> SEO Article Blog -> Crawling Ain’t No Walk in the Park

Crawling Ain’t No Walk in the Park

If you’ve ever tried to get into a high end night club, you know it’s not easy. Unless you’re somehow ‘on the list’ you’ve got to know somebody, be somebody or have a really great outfit if you’re hoping to make the cut. There’s a bouncer at the door who decides who gets in and who stays out, so as not to pack the club too full and make for an uncomfortable environment, or get the fire marshal called to the scene.web spider

Think of web crawlers, also called bots, ants, web spiders, and automatic indexers etc., as those same bouncers. They decide who gets in, who gets indexed on the search engine they were sent by, and who stays out. They have a system and they’re not afraid to use it, much like the brawn of their muscle real-life counterparts.

Web crawlers are programs created by search engines that are sent out to peruse the internet with one of two goals: find new web sites to index, or update the content of an existing indexed site. Good SEO is how you get the crawlers to your site. They’re searching various parts of your site, all associated with SEO. The URL of your site, the web page title, the meta tag information, the web content, the links on the page and where those links lead to. It searches primarily for keywords in the title, content and URL. After collecting this data the crawler will begin investigating your links. Remember, links that come from reputable sites hold more weight with a web crawler than links from unknown sites. So it isn’t the sheer number of links you have, but where the links come from and where they’re going.

The crawlers follow each of the links coming from each page of your site and investigate those pages the same way they investigated your site in the first place. Its important to remember this because some black hat SEO developers have the sneaky tactic of creating ghost sites with fake links to hopefully boost the credentials of their site to the crawlers. Unfortunately, this is a short term solution. The crawlers will quickly catch on to this type of tactic and it can end with your site being removed and sometimes banned from a search engine.

After collecting all their data the web crawlers will return to their search engine and your website becomes indexed on their copy of the internet. The better your SEO, the better your ranking on the search engine’s index the next time your desired keywords are searched on the engine.

Another thing to consider is the policies web crawlers use to dictate their behavior. Typically, there are four policies: a selection policy, a re-visit policy, a politeness policy and a parallelization policy.

The selection policy tells the crawlers which pages to download. The World Wide Web in its entirety is simply to large to be held completely by a single search engine. As of 2005 the best sampling the largest search engine could hope for was 70%. Selection policy helps the crawlers make sure that the websites they’re indexing are the best quality and more relevant for the engine.

The re-visit policy tells the crawlers when to go out and check for changes to the web page. While one of a crawler’s main jobs is to make sure its information is never out of date, it’s always a good idea to resubmit any pages you’ve created directly to the search engine, as there’s no way to know how often the crawler will be re-visiting a site to update.

The politeness policy is quite simple. It’s making sure too many crawlers aren’t indexing a site at one time. They take up a lot of room on the bandwidth, performing many different actions quickly. Like too many bouncers inside the club at once, intimidating the customers and making them leave. One crawler at a time, please.

And lastly the parallelization policy just makes sure two different crawlers don’t download the same site to the search engine, causing there to be two of the same sites, parallel sites, on the search engine. This would be confusing and frustrating for the user doing the search.

So remember, your website is standing in line outside the hottest search engine on the internet and unless it’s SEO is equivalent to some nice Loubotin heels and a Prada handbag, it’s not getting inside.

Share This