Tuesday, 14 November 2017

Spider spotting

The effectiveness of your efforts in submitting your pages for listing on search engines can be monitored and evaluated by two methods: spider spotting and URL check.
Spiders from search engines that visit your site and crawl pages leave some unique trace marks in your access log. This can tell you whether a spider has visited or not, what pages they have visited and also the frequency or duration of their visit.

The best way to identify spider visits is by finding out which visitors asked for the file robots.txt from your site. Only spiders make such a request, as this file is an indication to them to avoid covering the page in question. So the first thing a crawler would do is to check for this file. If you see the access log and analyze it using some convenient software, you would be able to spot all the visits that were initiated with this request. Then one can spot the host name and relate that to major search engines. Host names are related to the search engine company's name (it is the name of the site that hosts the spider). Another name that is used to identify such visits is the agent or browser names used by respective search engines. Get a list of host names and agent names from available resources (these names tend to change often) and also develop your own intuitive list by searching your access logs for all occurrences of known engine, host or agent names. Concentrate only on the top engines; though you may find several other smaller and less known search engines visiting your site.

Pay attention to not only the total number of visits but to the activity pattern for each of the recent visits to actually judge how many pages they covered. This is a very good way of ensuring if submissions have worked or if other inducements such as links from other sites have worked or not. This also helps you to distinctly evaluate the effectiveness of submission, indexing and page ranking characteristics of your site.
Some examples of hostnames and agent names are as below:
• AltaVista: hostname may have altavista.com within its name; agent is often called Scooter
• Excite host name may have atex or excite.com and agent name is Architextspider.
• Inktomi agent and host names have inktomi.com and Slurp is often used as the agent name.
• Lycos uses lycos.com within its host name and Lycos Spider is often part of the agent name.

Formulating a Search Engine submission budget is crucial. It should be such that you have the best possible combination of free submission, paid submission and paid placement programs.
How much would you like to spend on this exercise? If the budget is limited, options such as some of the paid programs, advertisements, expensive directory listings will have to be forsaken and attention given to getting the best results from limited but focused efforts.

The key is to strike a balance between free and paid programs that yield maximum ROI. You should submit your website to all the free search engines such as Google, AltaVista, and WebCrawler, at least one paid search engine such as Inktomi ($89 for submission of 3 URLs), and Yahoo Directory ($299 a year). Apart from these, you should think of submitting your website to a couple of "paid participation" or "paid placement" programs such as Google Adwords and Overture.

In your work, you may need to write Web pages that talk to servers
In your work, you may need to write Web pages that talk to servers whose interfaces you have no control over. Two common kinds of servers are relational databases, which may require their input in SQL, and search engines like Alta Vista or Yahoo!. These interfaces may not meet the needs of your users, or they may be too complicated for your target audience.

Using JavaScript and the principle of form modification, you can create simple, easy-to-use interfaces that meet your user's needs while satisfying the requirements of the server.

No comments:

Post a Comment