Monday, 6 November 2017

Sumit Site map to the Search Engines

One good tip is that you should prepare a crawler page (or pages) and submit this to the search engines. This page should have no text or content except for links to all the important pages that you wished to be crawled. When the spider reaches this page it would crawl to all the links and would suck all the desired pages into its index. You can also break up the main crawler page into several smaller pages if the size becomes too large. The crawler shall not reject smaller pages, whereas larger pages may get bypassed if the crawler finds them too slow to be spidered.

You do not have to be concerned that the result may throw up this "site-map" page and would disappoint the visitor. This will not happen, as the "site-map" has no searchable content and will not get included in the results, rather all other pages would. We found the site wired.com had published hierarchical sets of crawler pages. The first crawler page lists all the category headlines, these links lead to a set of links with all story headlines, which in turn lead to the news stories.

You do not have to submit all the pages of your site. As stated earlier, many sites have restrictions on the number of pages you submit. A key page or a page that has links to many inner pages is ideal, but you must submit some inner pages. This insures that even if the first page is missed, the crawler does get to access other pages and all the important pages through them. Submit your key 3 to 4 pages at least. Choose the ones that have the most relevant content and keywords to suit your target search string and verify that they link to other pages properly.

PR.CPWebHosting have written above that the spiders may bypass long and "difficult" pages. They would have their own time-out characteristics or other controls that help them come unstuck from such pages. So you do not want to have such a page become your "gateway" page. One tip is to keep the page size below 100 kb.

Doorblocks

Several pages in e-commerce and other functional sites are generated dynamically and have "?" or "&" sign in their dynamic URLs. These signs separate the CGI variables. While Google will crawl these pages, many other engines will not. One inconvenient solution is to develop static equivalent of the dynamic pages and have them on your site. Another way to avoid such dynamic URLs is to rewrite these URLs using a syntax that is accepted by the crawler and also understood as equivalent to the dynamic URL by the application server. The Amazon site shows dynamic URLs in such syntax. If you are using CPWeb Hosting Apache web server, you can use Apache rewrite rules to enable this conversion.

No comments:

Post a Comment