bethedon.com

Be the DON : Free web-based MMORPG

Be the DON is a FREE turn based mafia game based on the real life workings of the mafia. You take up the role of a Mafioso and do everything it takes to dominate your city, how you wanna earn your respect is up to you! There are many ways you can att


Alexa stats for bethedon.com

Compare this site to:

traffic alexa for bethedon.com

Site Seo for bethedon.com

Tag :
H1 H2 H3 H4 H5
0 0 0 0 0
Image : There are 7 images on this website and 7 images have alt attributes
Frame : There are 0 embed on this website.
Flash : There are 0 flash on this website.
Size : 6,658 characters
Meta Description : Yes
Meta Keyword : Yes

Magestic Backlinks for bethedon.com

Magestic Backlinks bethedon.com

About bethedon.com

Domain

bethedon.com

MD5

1dc81f23b2f14fc92574cec90fa46161

Keywords

Mafia,DON,Be,The,Online,Multiplayer,Game,Massive,MMORPG,MMORPG,RPG,Attack,Whore,Thug,Hire,Buy,Sell

Charset UTF-8
Page Speed Check now
Web server Apache
IP Address 199.250.205.19
					
								
		
robot-id: abcdatos

robot-name: ABCdatos BotLink

robot-cover-url: http://www.abcdatos.com/

robot-details-url: http://www.abcdatos.com/botlink/

robot-owner-name: ABCdatos

robot-owner-url: http://www.abcdatos.com/

robot-owner-email: botlink+AEA-abcdatos.com

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: windows

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: BotLink

robot-noindex: no

robot-host: 217.126.39.167

robot-from: no

robot-useragent: ABCdatos BotLink/1.0.2 (test links)

robot-language: basic

robot-description: This robot is used to verify availability of the ABCdatos

                   directory entries (http://www.abcdatos.com), checking

                   HTTP HEAD. Robot runs twice a week. Under HTTP 5xx

                   error responses or unable to connect, it repeats

                   verification some hours later, verifiying if that was a

                   temporary situation.

robot-history: This robot was developed by ABCdatos team to help

               working in the directory maintenance.

robot-environment: commercial

modified-date: Thu, 29 May 2003 01:00:00 GMT

modified-by: ABCdatos



robot-id:                       Acme.Spider

robot-name:                     Acme.Spider

robot-cover-url:                http://www.acme.com/java/software/Acme.Spider.html

robot-details-url:              http://www.acme.com/java/software/Acme.Spider.html

robot-owner-name:               Jef Poskanzer - ACME Laboratories

robot-owner-url:                http://www.acme.com/

robot-owner-email:              [email protected]

robot-status:                   active

robot-purpose:                  indexing maintenance statistics

robot-type:                     standalone

robot-platform:                 java

robot-availability:             source

robot-exclusion:                yes

robot-exclusion-useragent:      Due to a deficiency in Java it's not currently possible to set the User-Agent.

robot-noindex:                  no

robot-host:                     *

robot-from:                     no

robot-useragent:                Due to a deficiency in Java it's not currently possible to set the User-Agent.

robot-language:                 java

robot-description:              A Java utility class for writing your own robots.

robot-history:                  

robot-environment:              

modified-date:                  Wed, 04 Dec 1996 21:30:11 GMT

modified-by:                    Jef Poskanzer



robot-id:           ahoythehomepagefinder

robot-name:         Ahoy! The Homepage Finder

robot-cover-url:    http://www.cs.washington.edu/research/ahoy/

robot-details-url:  http://www.cs.washington.edu/research/ahoy/doc/home.html

robot-owner-name:   Marc Langheinrich

robot-owner-url:    http://www.cs.washington.edu/homes/marclang

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      maintenance

robot-type:         standalone

robot-platform:     UNIX

robot-availability: none

robot-exclusion:    yes

robot-exclusion-useragent: ahoy

robot-noindex:      no

robot-host:         cs.washington.edu

robot-from:         no

robot-useragent:    'Ahoy! The Homepage Finder'

robot-language:     Perl 5 

robot-description:  Ahoy! is an ongoing research project at the

                    University of Washington for finding personal Homepages.

robot-history:      Research project at the University of Washington in 

                    1995/1996 

robot-environment:  research

modified-date:      Fri June 28 14:00:00 1996

modified-by:        Marc Langheinrich



robot-id: Alkaline

robot-name: Alkaline

robot-cover-url: http://www.vestris.com/alkaline

robot-details-url: http://www.vestris.com/alkaline

robot-owner-name: Daniel Doubrovkine

robot-owner-url: http://cuiwww.unige.ch/~doubrov5 

robot-owner-email: [email protected]

robot-status: development active

robot-purpose: indexing

robot-type: standalone     

robot-platform: unix windows95 windowsNT

robot-availability: binary      

robot-exclusion: yes

robot-exclusion-useragent: AlkalineBOT 

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: AlkalineBOT

robot-language: c++

robot-description: Unix/NT internet/intranet search engine

robot-history: Vestris Inc. search engine designed at the University of

 Geneva 

robot-environment: commercial research 

modified-date: Thu Dec 10 14:01:13 MET 1998

modified-by: Daniel Doubrovkine 



robot-id:anthill

robot-name:Anthill

robot-cover-url:http://www.anthill.org/index.html

robot-details-url:http://www.anthill.org/index.html

robot-owner-name:Torsten Kaubisch

robot-owner-url:http://www.anthill.org/index.html

robot-owner-email:[email protected]

robot-status:development

robot-purpose:indexing

robot-type:standalone

robot-platform:independent

robot-availability:not yet

robot-exclusion:no (soon in V1.2)

robot-exclusion-useragent:anthill

robot-noindex:no

robot-host:anywhere

robot-from:no

robot-useragent:AnthillV1.1

robot-language:java

robot-description:Anthill is used to gather priceinformation automatically from online stores.support for international versions.

robot-history:This is a reasearch project at the University of Mannheim in Germany, professorship Prof. Martin Schader, assistant Dr. Stefan Kuhlins

robot-environment:research

modified-date:Thu, 6 Dec 2001 01:55:00 GMT

modified-by:Torsten Kaubisch



robot-id: appie

robot-name: Walhello appie

robot-cover-url: www.walhello.com

robot-details-url: www.walhello.com/aboutgl.html

robot-owner-name: Aimo Pieterse

robot-owner-url: www.walhello.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windows98

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: appie

robot-noindex: yes

robot-host: 213.10.10.116, 213.10.10.117, 213.10.10.118

robot-from: yes

robot-useragent: appie/1.1

robot-language: Visual C++

robot-description: The appie-spider is used to collect and index web pages for

 the Walhello search engine

robot-history: The spider was built in march/april 2000

robot-environment: commercial

modified-date: Thu, 20 Jul 2000 22:38:00 GMT

modified-by: Aimo Pieterse



robot-id:           arachnophilia

robot-name:         Arachnophilia

robot-cover-url:    

robot-details-url:

robot-owner-name:   Vince Taluskie

robot-owner-url:    http://www.ph.utexas.edu/people/vince.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         halsoft.com

robot-from:         

robot-useragent:    Arachnophilia

robot-language:     

robot-description:  The purpose (undertaken by HaL Software) of this run was to

	collect approximately 10k html documents for testing

	automatic abstract generation

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: arale

robot-name: Arale

robot-cover-url: http://web.tiscali.it/_flat

robot-details-url: http://web.tiscali.it/_flat

robot-owner-name: Flavio Tordini

robot-owner-url: http://web.tiscali.it/_flat

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix, windows, windows95, windowsNT, os2, mac, linux

robot-availability: source, binary

robot-exclusion: no

robot-exclusion-useragent: arale

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: no

robot-language: java

robot-description: A java multithreaded web spider. Download entire web sites or specific resources from the web. Render dynamic sites to static pages.

robot-history: This is brand new.

robot-environment: hobby

modified-date: Thu, 09 Jan 2001 17:28:52 GMT

modified-by: Flavio Tordini



robot-id:           araneo

robot-name:         Araneo

robot-cover-url:    http://esperantisto.net

robot-details-url:  http://esperantisto.net/araneo/

robot-owner-name:   Arto Sarle

robot-owner-url:    http://esperantisto.net

robot-owner-email:  [email protected]

robot-status:       development

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:     Linux

robot-availability: none

robot-exclusion:    yes

robot-exclusion-useragent: araneo

robot-noindex:      yes

robot-nofollow:     yes

robot-host:         *.esperantisto.net

robot-from:         yes

robot-useragent:    Araneo/0.7 ([email protected]; http://esperantisto.net)

robot-language:     Python, Java

robot-description:  Araneo is a web robot developed for crawling and indexing web pages written in the international language Esperanto.  The database will be used to build a web search engine and auxiliary services to be published at esperantisto.net.

robot-history:      (The name Araneo means "spider" in Esperanto.)

robot-environment:  hobby, research

modified-date:      Fri, 16 Nov 2001 08:30:00 GMT

modified-by:        Arto Sarle



robot-id: araybot

robot-name: AraybOt

robot-cover-url: http://www.araykoo.com/

robot-details-url: http://www.araykoo.com/araybot.html

robot-owner-name: Guti

robot-owner-url: http://www.araykoo.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing maintenance

robot-type: standalone

robot-platform: Linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: AraybOt

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: AraybOt/1.0 (+http://www.araykoo.com/araybot.html)

robot-language: perl5

robot-description: AraybOt is the agent software of AraykOO! which crawls

 web sites listed in http://dmoz.org/Adult/, in order to build a adult search

 engine.

robot-history: 

robot-environment: service

modified-date: Sat, 19 Jun 2004 20:25:00 GMT+1

modified-by: Guti



robot-id:           architext

robot-name:         ArchitextSpider

robot-cover-url:    http://www.excite.com/

robot-details-url:

robot-owner-name:   Architext Software

robot-owner-url:    http://www.atext.com/spider.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *.atext.com

robot-from:         yes

robot-useragent:    ArchitextSpider

robot-language:     perl 5 and c

robot-description:  Its purpose is to generate a Resource Discovery database,

	and to generate statistics. The ArchitextSpider collects

	information for the Excite and WebCrawler search engines.

robot-history:      

robot-environment:

modified-date:      Tue Oct  3 01:10:26 1995

modified-by:



robot-id:           aretha

robot-name:         Aretha

robot-cover-url:    

robot-details-url:

robot-owner-name:   Dave Weiner

robot-owner-url:    http://www.hotwired.com/Staff/userland/ 

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      

robot-type:         

robot-platform:     Macintosh

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  A crude robot built on top of Netscape and Userland

	Frontier, a scripting system for Macs

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: ariadne

robot-name: ARIADNE

robot-cover-url: (forthcoming)

robot-details-url: (forthcoming)

robot-owner-name: Mr. Matthias H. Gross

robot-owner-url: http://www.lrz-muenchen.de/~gross/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: statistics, development of focused crawling strategies

robot-type: standalone

robot-platform: java

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: ariadne

robot-noindex: no

robot-host: dbs.informatik.uni-muenchen.de

robot-from: no

robot-useragent: Due to a deficiency in Java it's not currently possible

 to set the User-Agent.

robot-language: java

robot-description: The ARIADNE robot is a prototype of a environment for

 testing focused crawling strategies.

robot-history: This robot is part of a research project at the

 University of Munich (LMU), started in 2000.

robot-environment: research

modified-date: Mo, 13 Mar 2000 14:00:00 GMT

modified-by: Mr. Matthias H. Gross



robot-id:arks

robot-name:arks

robot-cover-url:http://www.dpsindia.com

robot-details-url:http://www.dpsindia.com

robot-owner-name:Aniruddha Choudhury

robot-owner-url:

robot-owner-email:[email protected]

robot-status:development

robot-purpose:indexing

robot-type:standalone

robot-platform:PLATFORM INDEPENDENT

robot-availability:data

robot-exclusion:yes

robot-exclusion-useragent:arks

robot-noindex:no

robot-host:dpsindia.com

robot-from:no

robot-useragent:arks/1.0

robot-language:Java 1.2

robot-description:The Arks robot is used to build the database

           for the dpsindia/lawvistas.com search service .

           The robot runs weekly, and visits sites in a random order

robot-history:finds its root from s/w development project for a portal

robot-environment:commercial

modified-date:6 th November 2000

modified-by:Aniruddha Choudhury



robot-id:           aspider

robot-name:         ASpider (Associative Spider)

robot-cover-url:    

robot-details-url:

robot-owner-name:   Fred Johansen

robot-owner-url:    http://www.pvv.ntnu.no/~fredj/

robot-owner-email:  [email protected]

robot-status:       retired

robot-purpose:      indexing

robot-type:         

robot-platform:     unix

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         nova.pvv.unit.no

robot-from:         yes

robot-useragent:    ASpider/0.09

robot-language:     perl4

robot-description:  ASpider is a CGI script that searches the web for keywords given by the user through a form.

robot-history:      

robot-environment:  hobby

modified-date:      

modified-by:



robot-id: atn.txt

robot-name: ATN Worldwide

robot-details-url:

robot-cover-url:

robot-owner-name: All That Net

robot-owner-url: http://www.allthatnet.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type:

robot-platform:

robot-availability:

robot-exclusion: yes

robot-exclusion-useragent: ATN_Worldwide

robot-noindex:

robot-nofollow:

robot-host: www.allthatnet.com

robot-from:

robot-useragent: ATN_Worldwide

robot-language:

robot-description: The ATN robot is used to build the database for the

 AllThatNet search service operated by All That Net.  The robot runs weekly,

 and visits sites in a random order.

robot-history:

robot-environment:

modified-date: July 09, 2000 17:43 GMT



robot-id: atomz

robot-name: Atomz.com Search Robot

robot-cover-url: http://www.atomz.com/help/

robot-details-url: http://www.atomz.com/

robot-owner-name: Mike Thompson

robot-owner-url: http://www.atomz.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: service

robot-exclusion: yes

robot-exclusion-useragent: Atomz

robot-noindex: yes

robot-host: www.atomz.com

robot-from: no

robot-useragent: Atomz/1.0

robot-language: c

robot-description: Robot used for web site search service.

robot-history: Developed for Atomz.com, launched in 1999.

robot-environment: service

modified-date: Tue Jul 13 03:50:06 GMT 1999

modified-by: Mike Thompson



robot-id: auresys

robot-name: AURESYS

robot-cover-url: http://crrm.univ-mrs.fr

robot-details-url: http://crrm.univ-mrs.fr      

robot-owner-name: Mannina Bruno 

robot-owner-url: ftp://crrm.univ-mrs.fr/pub/CVetud/Etudiants/Mannina/CVbruno.htm        

robot-owner-email: [email protected]     

robot-status: robot actively in use

robot-purpose: indexing,statistics

robot-type: Standalone

robot-platform: Aix, Unix

robot-availability: Protected by Password

robot-exclusion: Yes

robot-exclusion-useragent:  

robot-noindex: no

robot-host: crrm.univ-mrs.fr, 192.134.99.192

robot-from: Yes

robot-useragent: AURESYS/1.0

robot-language: Perl 5.001m

robot-description: The AURESYS is used to build a personnal database for 

 somebody who search information. The database is structured to be 

 analysed. AURESYS can found new server by IP incremental. It generate 

 statistics... 

robot-history: This robot finds its roots in a research project at the 

 University of Marseille in 1995-1996

robot-environment: used for Research

modified-date: Mon, 1 Jul 1996 14:30:00 GMT 

modified-by: Mannina Bruno



robot-id:           backrub

robot-name:         BackRub

robot-cover-url:

robot-details-url:

robot-owner-name:   Larry Page

robot-owner-url:    http://backrub.stanford.edu/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         *.stanford.edu

robot-from:         yes

robot-useragent:    BackRub/*.*

robot-language:     Java.

robot-description:

robot-history:

robot-environment:

modified-date:      Wed Feb 21 02:57:42 1996.

modified-by:



robot-id: robot-name: bayspider

robot-cover-url: http://www.baytsp.com/

robot-details-url: http://www.baytsp.com/

robot-owner-name: BayTSP.com,Inc

robot-owner-url:

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: Copyright Infringement Tracking

robot-type: Stand Alone

robot-platform: NT

robot-availability: 24/7

robot-exclusion:

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:

robot-useragent: BaySpider

robot-language: English

robot-description:

robot-history:

robot-environment:

modified-date: 1/15/2001

modified-by: [email protected]



robot-id:                       bbot

robot-name:                     BBot

robot-cover-url:                http://www.otthon.net/search

robot-details-url:              http://www.otthon.net/search/bbot

robot-owner-name:               Istvan Fulop

robot-owner-url:                http://www.otthon.net

robot-owner-email:              poluf1 at yahoo dot co dot uk

robot-status:                   development

robot-purpose:                  indexing, maintenance

robot-type:                     standalone

robot-platform:                 windows

robot-availability:             none

robot-exclusion:                yes

robot-exclusion-useragent:      bbot

robot-noindex:                  yes

robot-nofollow:			yes

robot-host:                     *.netcologne.de

robot-from:                     yes

robot-useragent:                bbot/0.100

robot-language:                 perl

robot-description:              Mainly intended for site level search, sometimes set loose.

robot-history:                  Started project in 11/2000. Called BBot since 24/04/2003.

robot-environment:              hobby

modified-date:                  Sun, 04 May 2003 10:15:00 GMT

modified-by:                    Istvan Fulop



robot-id: bigbrother

robot-name: Big Brother

robot-cover-url: http://pauillac.inria.fr/~fpottier/mac-soft.html.en

robot-details-url:

robot-owner-name: Francois Pottier

robot-owner-url: http://pauillac.inria.fr/~fpottier/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: mac

robot-availability: binary

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: no

robot-host: *

robot-from: not as of 1.0

robot-useragent: Big Brother

robot-language: c++

robot-description: Macintosh-hosted link validation tool.

robot-history:

robot-environment: shareware

modified-date: Thu Sep 19 18:01:46 MET DST 1996

modified-by: Francois Pottier



robot-id: bjaaland

robot-name: Bjaaland

robot-cover-url: http://www.textuality.com

robot-details-url: http://www.textuality.com

robot-owner-name: Tim Bray

robot-owner-url: http://www.textuality.com

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Bjaaland

robot-noindex: no

robot-host: barry.bitmovers.net

robot-from: no

robot-useragent: Bjaaland/0.5

robot-language: perl5

robot-description: Crawls sites listed in the ODP (see http://dmoz.org)

robot-history: None, yet

robot-environment: service

modified-date: Monday, 19 July 1999, 13:46:00 PDT

modified-by: [email protected]



robot-id:           blackwidow

robot-name:         BlackWidow

robot-cover-url:    http://140.190.65.12/~khooghee/index.html

robot-details-url:

robot-owner-name:   Kevin Hoogheem

robot-owner-url:

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:         140.190.65.*

robot-from:         yes

robot-useragent:    BlackWidow

robot-language:     C, C++.

robot-description:  Started as a research project and now is used to find links

	for a random link generator.  Also is used to research the

	growth of specific sites.

robot-history:

robot-environment:

modified-date:      Fri Feb  9 00:11:22 1996.

modified-by:



robot-id: blindekuh

robot-name: Die Blinde Kuh

robot-cover-url: http://www.blinde-kuh.de/

robot-details-url: http://www.blinde-kuh.de/robot.html (german language)

robot-owner-name: Stefan R. Mueller

robot-owner-url: http://www.rrz.uni-hamburg.de/philsem/stefan_mueller/

robot-owner-email:[email protected]

robot-status: development

robot-purpose: indexing

robot-type: browser

robot-platform: unix

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: no

robot-host: minerva.sozialwiss.uni-hamburg.de

robot-from: yes

robot-useragent: Die Blinde Kuh

robot-language: perl5

robot-description: The robot is use for indixing and proofing the

 registered urls in the german language search-engine for kids.

 Its a none-comercial one-woman-project of Birgit Bachmann

 living in Hamburg, Germany.

robot-history: The robot was developed by Stefan R. Mueller

 to help by the manual proof of registered Links.

robot-environment: hobby

modified-date: Mon Jul 22 1998

modified-by: Stefan R. Mueller



robot-id:Bloodhound

robot-name:Bloodhound

robot-cover-url:http://web.ukonline.co.uk/genius/bloodhound.htm

robot-details-url:http://web.ukonline.co.uk/genius/bloodhound.htm

robot-owner-name:Dean Smart

robot-owner-url:http://web.ukonline.co.uk/genius/bloodhound.htm

robot-owner-email:[email protected]

robot-status:active

robot-purpose:Web Site Download

robot-type:standalone


robot-platform:Windows95, WindowsNT, Windows98, Windows2000

robot-availability:Executible

robot-exclusion:No

robot-exclusion-useragent:Ukonline

robot-noindex:No

robot-host:*

robot-from:No

robot-useragent:None

robot-language:Perl5

robot-description:Bloodhound will download an whole web site depending on the

 number of links to follow specified by the user.

robot-history:First version was released on the 1 july 2000

robot-environment:Commercial

modified-date:1 july 2000

modified-by:Dean Smart



robot-id: borg-bot

robot-name: Borg-Bot

robot-cover-url: 

robot-details-url: http://www.skunkfarm.com/borgbot.htm

robot-owner-name: James Bragg

robot-owner-url: http://www.skunkfarm.com

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing statistics

robot-type: standalone

robot-platform: Linux Windows2000

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: borg-bot/0.9

robot-noindex: yes

robot-host: 24.11.13.173

robot-from: yes

robot-useragent: borg-bot/0.9

robot-language: python

robot-description: Developmental crawler to feed a search engine

robot-history:  

robot-environment: research service 

modified-date: Sat, 20 Oct 2001 04:00:00 GMT

modified-by: Sat, 20 Oct 2001 04:00:00 GMT



robot-id: boxseabot

robot-name: BoxSeaBot

robot-cover-url: http://www.boxsea.com/crawler

robot-details-url: http://www.boxsea.com/crawler

robot-owner-name: BoxSea Search Engine

robot-owner-url: http://www.boxsea.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: linux

robot-availability:

robot-exclusion: yes

robot-exclusion-useragent: boxseabot

robot-noindex: 

robot-host: 

robot-from:

robot-useragent: BoxSeaBot/0.5 (http://boxsea.com/crawler)

robot-language: java

robot-description: This robot is used to find pages

 for building the BoxSea search engine indices.

robot-history: The robot code uses Nutch.  Earlier

 experimental crawls were done under various user agent

 names such as NutchCVS(boxsea)

robot-environment:

modified-date: Fri, 23 Jul 2004 11:58:00 PST

modified-by: BoxSeaBot



robot-id: brightnet

robot-name: bright.net caching robot

robot-cover-url:

robot-details-url:

robot-owner-name:

robot-owner-url:

robot-owner-email:

robot-status: active 

robot-purpose: caching 

robot-type:

robot-platform: 

robot-availability: none

robot-exclusion: no

robot-noindex:

robot-host: 209.143.1.46

robot-from: no

robot-useragent: Mozilla/3.01 (compatible;)

robot-language:

robot-description:

robot-history:

robot-environment:

modified-date: Fri Nov 13 14:08:01 EST 1998

modified-by: brian d foy 



robot-id: bspider

robot-name: BSpider

robot-cover-url: not yet

robot-details-url: not yet

robot-owner-name: Yo Okumura

robot-owner-url: not yet

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: bspider

robot-noindex: yes

robot-host: 210.159.73.34, 210.159.73.35

robot-from: yes

robot-useragent: BSpider/1.0 libwww-perl/0.40

robot-language: perl

robot-description: BSpider is crawling inside of Japanese domain for indexing.

robot-history: Starts Apr 1997 in a research project at Fuji Xerox Corp.

 Research Lab.

robot-environment: research

modified-date: Mon, 21 Apr 1997 18:00:00 JST

modified-by: Yo Okumura



robot-id:           cactvschemistryspider

robot-name:         CACTVS Chemistry Spider

robot-cover-url:    http://schiele.organik.uni-erlangen.de/cactvs/spider.html

robot-details-url:

robot-owner-name:   W. D. Ihlenfeldt

robot-owner-url:    http://schiele.organik.uni-erlangen.de/cactvs/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing.

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         utamaro.organik.uni-erlangen.de

robot-from:         no

robot-useragent:    CACTVS Chemistry Spider

robot-language:     TCL, C

robot-description:  Locates chemical structures in Chemical MIME formats on WWW

	and FTP servers and downloads them into database searchable

	with structure queries (substructure, fullstructure,

	formula, properties etc.)

robot-history:

robot-environment:

modified-date:      Sat Mar 30 00:55:40 1996.

modified-by:



robot-id: calif

robot-name: Calif

robot-details-url: http://www.tnps.dp.ua/calif/details.html

robot-cover-url: http://www.tnps.dp.ua/calif/

robot-owner-name: Alexander Kosarev

robot-owner-url: http://www.tnps.dp.ua/~dark/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: calif

robot-noindex: yes

robot-host: cobra.tnps.dp.ua

robot-from: yes

robot-useragent: Calif/0.6 ([email protected]; http://www.tnps.dp.ua)

robot-language: c++

robot-description: Used to build searchable index

robot-history: In development stage

robot-environment: research

modified-date: Sun, 6 Jun 1999 13:25:33 GMT



robot-id: cassandra

robot-name: Cassandra

robot-cover-url: http://post.mipt.rssi.ru/~billy/search/

robot-details-url: http://post.mipt.rssi.ru/~billy/search/

robot-owner-name: Mr. Oleg Bilibin

robot-owner-url:        http://post.mipt.rssi.ru/~billy/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: crossplatform

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent:

robot-noindex: no

robot-host: www.aha.ru

robot-from: no

robot-useragent:

robot-language: java

robot-description: Cassandra search robot is used to create and maintain indexed database for widespread Information Retrieval System

robot-history: Master of Science degree project at Moscow Institute of Physics and Technology

robot-environment: research

modified-date: Wed, 3 Jun 1998 12:00:00 GMT



robot-id: cgireader

robot-name: Digimarc Marcspider/CGI

robot-cover-url: http://www.digimarc.com/prod_fam.html

robot-details-url: http://www.digimarc.com/prod_fam.html

robot-owner-name: Digimarc Corporation

robot-owner-url: http://www.digimarc.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent:

robot-noindex:

robot-host: 206.102.3.*

robot-from:

robot-useragent: Digimarc CGIReader/1.0

robot-language: c++

robot-description: Similar to Digimarc Marcspider, Marcspider/CGI examines

    image files for watermarks but more focused on CGI Urls.

    In order to not waste internet bandwidth with yet another crawler,

    we have contracted with one of the major crawlers/seach engines

    to provide us with a list of specific CGI URLs of interest to us.

    If an URL is to a page of interest (via CGI), then we access the

    page to get the image URLs from it, but we do not crawl to

    any other pages.

robot-history: First operation in December 1997

robot-environment: service

modified-date: Fri, 5 Dec 1997 12:00:00 GMT

modified-by: Dan Ramos



robot-id:           checkbot

robot-name:         Checkbot

robot-cover-url:    http://www.xs4all.nl/~graaff/checkbot/

robot-details-url:

robot-owner-name:   Hans de Graaff

robot-owner-url:    http://www.xs4all.nl/~graaff/checkbot/

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      maintenance

robot-type:         standalone

robot-platform:     unix,WindowsNT

robot-availability: source

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         no

robot-useragent:    Checkbot/x.xx LWP/5.x

robot-language:     perl 5

robot-description:  Checkbot checks links in a

	given set of pages on one or more servers. It reports links

	which returned an error code

robot-history:      

robot-environment:  hobby

modified-date:      Tue Jun 25 07:44:00 1996

modified-by:        Hans de Graaff



robot-id: christcrawler

robot-name: ChristCrawler.com

robot-cover-url: http://www.christcrawler.com/search.cfm

robot-details-url: http://www.christcrawler.com/index.cfm

robot-owner-name: Jeremy DeYoung

robot-owner-url: http://www.christcentral.com/aboutus/index.cfm

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Windows NT 4.0 SP5

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: christcrawler

robot-noindex: yes

robot-host: 64.51.218.*, 64.51.219.*, 12.107.236.*, 12.107.237.*

robot-from: yes

robot-useragent: Mozilla/4.0 (compatible; ChristCrawler.com, [email protected])

robot-language: Cold Fusion 4.5

robot-description: A Christian internet spider that searches web sites to find Christian Related material

robot-history: Developed because of the growing need for a more God influence on the Internet.

robot-environment: service

modified-date: Fri, 27 Jun 2001 00:53:12 CST

modified-by: Jeremy DeYoung



robot-id:           churl

robot-name:         churl

robot-cover-url:    http://www-personal.engin.umich.edu/~yunke/scripts/churl/

robot-details-url:

robot-owner-name:   Justin Yunke

robot-owner-url:    http://www-personal.engin.umich.edu/~yunke/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintenance

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  A URL checking robot, which stays within one step of the

	local server

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: cienciaficcion

robot-name: cIeNcIaFiCcIoN.nEt

robot-cover-url: http://www.cienciaficcion.net/

robot-details-url: http://www.cienciaficcion.net/

robot-owner-name: David Fern�ndez

robot-owner-url: http://www.cyberdark.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: yes

robot-host: epervier.cqhost.net

robot-from: no

robot-useragent: cIeNcIaFiCcIoN.nEt Spider (http://www.cienciaficcion.net)

robot-language: php,perl

robot-description: Robot encargado de la indexaci�n de las p�ginas para www.cienciaficcion.net

robot-history: Alcork�n (Madrid) - Europa 2000/2001

robot-environment: hobby

modified-date: Sat, 18 Aug 2001 00:38:52 GMT

modified-by: David Fern�ndez



robot-id: cmc

robot-name: CMC/0.01

robot-details-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=robot

robot-cover-url: http://www2.next.ne.jp/music/

robot-owner-name: Shinobu Kubota.

robot-owner-url: http://www2.next.ne.jp/cgi-bin/music/help.cgi?phase=profile

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: CMC/0.01

robot-noindex: no

robot-host: haruna.next.ne.jp, 203.183.218.4

robot-from: yes


robot-useragent: CMC/0.01

robot-language: perl5

robot-description: This CMC/0.01 robot collects the information

                   of the page that was registered to the music

                   specialty searching service.

robot-history: This CMC/0.01 robot was made for the computer

               music center on November 4, 1997.

robot-environment: hobby

modified-date: Sat, 23 May 1998 17:22:00 GMT



robot-id:Collective

robot-name:Collective

robot-cover-url:http://web.ukonline.co.uk/genius/collective.htm

robot-details-url:http://web.ukonline.co.uk/genius/collective.htm

robot-owner-name:Dean Smart

robot-owner-url:http://web.ukonline.co.uk/genius/collective.htm

robot-owner-email:[email protected]

robot-status:development

robot-purpose:Collective is a highly configurable program designed to interrogate

 online search engines and online databases, it will ignore web pages

 that lie about there content, and dead url's, it can be super strict, it searches each web page

 it finds for your search terms to ensure those terms are present, any positive urls are added to

 a html file for your to view at any time even before the program has finished.

 Collective can wonder the web for days if required.

robot-type:standalone

robot-platform:Windows95, WindowsNT, Windows98, Windows2000

robot-availability:Executible

robot-exclusion:No

robot-exclusion-useragent:

robot-noindex:No

robot-host:*

robot-from:No

robot-useragent:LWP

robot-language:Perl5 (With Visual Basic front-end)

robot-description:Collective is the most cleverest Internet search engine,

 With all found url?s guaranteed to have your search terms.

robot-history:Develpment started on August, 03, 2000

robot-environment:Commercial

modified-date:August, 03, 2000

modified-by:Dean Smart



robot-id: combine

robot-name: Combine System

robot-cover-url: http://www.ub2.lu.se/~tsao/combine.ps

robot-details-url: http://www.ub2.lu.se/~tsao/combine.ps

robot-owner-name: Yong Cao

robot-owner-url: http://www.ub2.lu.se/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: combine

robot-noindex: no

robot-host: *.ub2.lu.se

robot-from: yes

robot-useragent: combine/0.0

robot-language: c, perl5

robot-description: An open, distributed, and efficient harvester.

robot-history: A complete re-design of the NWI robot (w3index) for DESIRE project. 

robot-environment: research

modified-date: Tue, 04 Mar 1997 16:11:40 GMT

modified-by: Yong Cao



robot-id: confuzzledbot

robot-name: ConfuzzledBot

robot-cover-url: http://www.blue.lu/

robot-details-url: http://bot.confuzzled.lu/

robot-owner-name: Britz Thibaut

robot-owner-url: http://www.confuzzled.lu/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: Linux,Freebsd

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: confuzzledbot

robot-noindex: yes

robot-nofollow: yes

robot-host: *.ion.lu

robot-from: no

robot-useragent: Confuzzledbot/X.X (+http://www.confuzzled.lu/bot/)

robot-language: perl5

robot-description: The robot is used to build a searchable database

 for luxembourgish sites. It only indexes .lu domains and luxembourgish

 sites added to the directory.

robot-history: Developed 2000-2002. Only minor changes recently 

robot-environment: hobby

modified-date: Tue, 11 May 2004 17:45:00 CET

modified-by: Britz Thibaut



robot-id: coolbot

robot-name: CoolBot

robot-cover-url: www.suchmaschine21.de

robot-details-url: www.suchmaschine21.de

robot-owner-name: Stefan Fischerlaender

robot-owner-url: www.suchmaschine21.de

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: CoolBot

robot-noindex: yes

robot-host: www.suchmaschine21.de

robot-from: no

robot-useragent: CoolBot

robot-language: perl5

robot-description: The CoolBot robot is used to build and maintain the

 directory of the german search engine Suchmaschine21.

robot-history: none so far

robot-environment: service

modified-date: Wed, 21 Jan 2001 12:16:00 GMT

modified-by: Stefan Fischerlaender



robot-id:           core

robot-name:         Web Core / Roots

robot-cover-url:    http://www.di.uminho.pt/wc

robot-details-url:

robot-owner-name:   Jorge Portugal Andrade

robot-owner-url:    http://www.di.uminho.pt/~cbm

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, maintenance

robot-type:

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         shiva.di.uminho.pt, from www.di.uminho.pt

robot-from:         no

robot-useragent:    root/0.1

robot-language:     perl

robot-description:  Parallel robot developed in Minho Univeristy in Portugal to

	catalog relations among URLs and to support a special

	navigation aid.

robot-history:      First versions since October 1995.

robot-environment:

modified-date:      Wed Jan 10 23:19:08 1996.

modified-by:



robot-id: cosmos

robot-name: XYLEME Robot

robot-cover-url: http://xyleme.com/

robot-details-url:

robot-owner-name: Mihai Preda

robot-owner-url: http://www.mihaipreda.com/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: cosmos

robot-noindex: no

robot-nofollow: no

robot-host:

robot-from: yes

robot-useragent: cosmos/0.3

robot-language: c++

robot-description: index XML, follow HTML

robot-history:

robot-environment: service

modified-date: Fri, 24 Nov 2000 00:00:00 GMT

modified-by: Mihai Preda



robot-id: cruiser

robot-name: Internet Cruiser Robot

robot-cover-url: http://www.krstarica.com/

robot-details-url: http://www.krstarica.com/eng/url/

robot-owner-name: Internet Cruiser

robot-owner-url: http://www.krstarica.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Internet Cruiser Robot

robot-noindex: yes

robot-host: *.krstarica.com

robot-from: no

robot-useragent: Internet Cruiser Robot/2.1

robot-language: c++

robot-description: Internet Cruiser Robot is Internet Cruiser's prime index

 agent.

robot-history:

robot-environment: service

modified-date: Fri, 17 Jan 2001 12:00:00 GMT

modified-by: [email protected]



robot-id: cusco

robot-name: Cusco

robot-cover-url: http://www.cusco.pt/

robot-details-url: http://www.cusco.pt/

robot-owner-name: Filipe Costa Clerigo

robot-owner-url: http://www.viatecla.pt/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standlone

robot-platform: any

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: cusco

robot-noindex: yes

robot-host: *.cusco.pt, *.viatecla.pt

robot-from: yes

robot-useragent: Cusco/3.2

robot-language: Java

robot-description: The Cusco robot is part of the CUCE indexing sistem. It

 gathers information from several sources: HTTP, Databases or filesystem. At

 this moment, it's universe is the .pt domain and the information it gathers

 is available at the Portuguese search engine Cusco http://www.cusco.pt/.

robot-history: The Cusco search engine started in the company ViaTecla as a

 project to demonstrate our development capabilities and to fill the need of

 a portuguese-specific search engine. Now, we are developping new

 functionalities that cannot be found in any other on-line search engines.

robot-environment:service, research

modified-date: Mon, 21 Jun 1999 14:00:00 GMT

modified-by: Filipe Costa Clerigo



robot-id: cyberspyder

robot-name: CyberSpyder Link Test

robot-cover-url: http://www.cyberspyder.com/cslnkts1.html

robot-details-url: http://www.cyberspyder.com/cslnkts1.html

robot-owner-name: Tom Aman

robot-owner-url: http://www.cyberspyder.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: link validation, some html validation

robot-type: standalone

robot-platform: windows 3.1x, windows95, windowsNT

robot-availability: binary

robot-exclusion: user configurable

robot-exclusion-useragent: cyberspyder

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: CyberSpyder/2.1

robot-language: Microsoft Visual Basic 4.0

robot-description: CyberSpyder Link Test is intended to be used as a site

 management tool to validate that HTTP links on a page are functional and to

 produce various analysis reports to assist in managing a site.

robot-history: The original robot was created to fill a widely seen need

 for a easy to use link checking program.

robot-environment: commercial

modified-date: Tue, 31 Mar 1998 01:02:00 GMT

modified-by: Tom Aman



robot-id: cydralspider

robot-name: CydralSpider

robot-cover-url: http://www.cydral.com/

robot-details-url: http://en.cydral.com/help.html

robot-owner-name: Cydral

robot-owner-url: http://www.cydral.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: gather Web content for image search engine service

robot-type: standalone

robot-platform: unix; windows

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: cydralspider

robot-noindex: yes

robot-host: *.cydral.com

robot-from: yes

robot-useragent: CydralSpider/X.X (Cydral Web Image Search;

 http://www.cydral.com/)

robot-language: c++

robot-description: Advanced image spider for www.cydral.com

robot-history: Developped in 2003, the robot uses new methods to discover Web

 sites and index images

robot-environment: commercial

modified-date: Tue, 17 Jun 2004, 11:50:30 GMT

modified-by: [email protected]



robot-id: desertrealm

robot-name: Desert Realm Spider

robot-cover-url: http://www.desertrealm.com

robot-details-url: http://spider.desertrealm.com

robot-owner-name: Brian B.

robot-owner-url: http://www.desertrealm.com

robot-owner-email: [email protected]

robot-status: robot actively in use

robot-purpose: indexing

robot-type: standalone

robot-platform: cross platform

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: desertrealm, desert realm

robot-noindex: yes

robot-nofollow: yes

robot-host: *

robot-from: no

robot-useragent: DesertRealm.com; 0.2; [J];

robot-language: java 1.3, java 1.4

robot-description: The spider indexes fantasy and science fiction sites by

 using a customizable keyword algorithm. Only home pages are indexed, but all

 pages are looked at for links. Pages are visited randomly to limit impact on

 any one webserver.

robot-history: The spider originally was created to learn more about how

 search engines work.

robot-environment: hobby

modified-date: Fri, 19 Sep 2003 08:57:52 GMT

modified-by: Brian B.



robot-id:           deweb

robot-name:         DeWeb(c) Katalog/Index

robot-cover-url:    http://deweb.orbit.de/

robot-details-url:

robot-owner-name:   Marc Mielke

robot-owner-url:    http://www.orbit.de/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing, mirroring, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         deweb.orbit.de

robot-from:         yes

robot-useragent:    Deweb/1.01

robot-language:     perl 4

robot-description:  Its purpose is to generate a Resource Discovery database,

	perform mirroring, and generate statistics. Uses combination

	of Informix(tm) Database and WN 1.11 serversoftware for

	indexing/ressource discovery, fulltext search, text

	excerpts.

robot-history:      

robot-environment:

modified-date:      Wed Jan 10 08:23:00 1996

modified-by:



robot-id: dienstspider

robot-name: DienstSpider

robot-cover-url: http://sappho.csi.forth.gr:22000/

robot-details-url:

robot-owner-name: Antonis Sidiropoulos 

robot-owner-url: http://www.csi.forth.gr/~asidirop

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone 

robot-platform: unix

robot-availability: none

robot-exclusion:

robot-exclusion-useragent:

robot-noindex:

robot-host: sappho.csi.forth.gr 

robot-from:

robot-useragent: dienstspider/1.0  

robot-language: C

robot-description: Indexing and searching the NCSTRL(Networked Computer Science Technical Report Library) and ERCIM Collection

robot-history: The version 1.0 was the developer's master thesis project

robot-environment: research

modified-date: Fri, 4 Dec 1998 0:0:0 GMT

modified-by: [email protected]



robot-id: digger

robot-name: Digger

robot-cover-url: http://www.diggit.com/

robot-details-url:

robot-owner-name: Benjamin Lipchak

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, windows

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: digger

robot-noindex: yes

robot-host:

robot-from: yes

robot-useragent: Digger/1.0 JDK/1.3.0

robot-language: java

robot-description: indexing web sites for the Diggit! search engine

robot-history:

robot-environment: service

modified-date:

modified-by:



robot-id: diibot

robot-name: Digital Integrity Robot

robot-cover-url: http://www.digital-integrity.com/robotinfo.html

robot-details-url: http://www.digital-integrity.com/robotinfo.html

robot-owner-name: Digital Integrity, Inc.

robot-owner-url: 

robot-owner-email: [email protected]

robot-status: Production

robot-purpose: WWW Indexing

robot-type:

robot-platform: unix

robot-availability: none

robot-exclusion: Conforms to robots.txt convention

robot-exclusion-useragent: DIIbot

robot-noindex: Yes

robot-host: digital-integrity.com

robot-from:

robot-useragent: DIIbot

robot-language: Java/C

robot-description:

robot-history: 

robot-environment:

modified-date:

modified-by:



robot-id: directhit

robot-name: Direct Hit Grabber

robot-cover-url: www.directhit.com

robot-details-url: http://www.directhit.com/about/company/spider.html

robot-status: active

robot-description: Direct Hit Grabber indexes documents and

 collects Web statistics for the Direct Hit Search Engine (available at

 www.directhit.com and our partners' sites)

robot-purpose: Indexing and statistics

robot-type: standalone

robot-platform: unix

robot-language: C++

robot-owner-name: Direct Hit Technologies, Inc.

robot-owner-url: www.directhit.com

robot-owner-email: [email protected]

robot-exclusion: yes

robot-exclusion-useragent: grabber

robot-noindex: yes

robot-host: *.directhit.com

robot-from: yes

robot-useragent: grabber

robot-environment: service

modified-by: [email protected]



robot-id: dnabot

robot-name: DNAbot

robot-cover-url: http://xx.dnainc.co.jp/dnabot/

robot-details-url: http://xx.dnainc.co.jp/dnabot/

robot-owner-name: Tom Tanaka

robot-owner-url: http://xx.dnainc.co.jp

robot-owner-email: [email protected]

robot-status: development       

robot-purpose: indexing 

robot-type: standalone          

robot-platform: unix, windows, windows95, windowsNT, mac

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent:

robot-noindex: no

robot-host: xx.dnainc.co.jp

robot-from: yes 

robot-useragent: DNAbot/1.0

robot-language: java 

robot-description: A search robot in 100 java, with its own built-in

 database engine and web server . Currently in Japanese.

robot-history: Developed by DNA, Inc.(Niigata City, Japan) in 1998.

robot-environment: commercial

modified-date: Mon, 4 Jan 1999 14:30:00 GMT

modified-by: Tom Tanaka



robot-id: download_express

robot-name: DownLoad Express

robot-cover-url: http://www.jacksonville.net/~dlxpress

robot-details-url: http://www.jacksonville.net/~dlxpress

robot-owner-name: DownLoad Express Inc

robot-owner-url: http://www.jacksonville.net/~dlxpress

robot-owner-email: [email protected]

robot-status: active

robot-purpose: graphic download

robot-type: standalone

robot-platform: win95/98/NT

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: downloadexpress

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent:

robot-language: visual basic

robot-description: automatically downloads graphics from the web

robot-history:

robot-environment: commerical

modified-date: Wed, 05 May 1998

modified-by: DownLoad Express Inc



robot-id: dragonbot

robot-name: DragonBot

robot-cover-url: http://www.paczone.com/

robot-details-url:

robot-owner-name: Paul Law

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: DragonBot

robot-noindex: no

robot-host: *.paczone.com

robot-from: no

robot-useragent: DragonBot/1.0 libwww/5.0

robot-language: C++

robot-description: Collects web pages related to East Asia

robot-history:

robot-environment: service

modified-date: Mon, 11 Aug 1997 00:00:00 GMT

modified-by:



robot-id: dwcp

robot-name: DWCP (Dridus' Web Cataloging Project)

robot-cover-url: http://www.dridus.com/~rmm/dwcp.php3

robot-details-url: http://www.dridus.com/~rmm/dwcp.php3

robot-owner-name: Ross Mellgren (Dridus Norwind)

robot-owner-url: http://www.dridus.com/~rmm

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: java

robot-availability: source, binary, data

robot-exclusion: yes

robot-exclusion-useragent: dwcp

robot-noindex: no

robot-host: *.dridus.com

robot-from: [email protected]

robot-useragent: DWCP/2.0

robot-language: java

robot-description: The DWCP robot is used to gather information for

 Dridus' Web Cataloging Project, which is intended to catalog domains and

 urls (no content).

robot-history: Developed from scratch by Dridus Norwind.

robot-environment: hobby

modified-date: Sat, 10 Jul 1999 00:05:40 GMT

modified-by: Ross Mellgren



robot-id: e-collector

robot-name: e-collector

robot-cover-url: http://www.thatrobotsite.com/agents/ecollector.htm

robot-details-url: http://www.thatrobotsite.com/agents/ecollector.htm

robot-owner-name: Dean Smart

robot-owner-url: http://www.thatrobotsite.com

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: email collector

robot-type: Collector of email addresses

robot-platform: Windows 9*/NT/2000

robot-availability: Binary

robot-exclusion: No

robot-exclusion-useragent: ecollector

robot-noindex: No

robot-host: *

robot-from: No

robot-useragent: LWP::

robot-language: Perl5

robot-description: e-collector in the simplist terms is a e-mail address

 collector, thus the name e-collector.

 So what?

 Have you ever wanted to have the email addresses of as many companys that

 sell or supply for example "dried fruit", i personnaly don't but this is

 just an example.

 Those of you who may use this type of robot will know exactly what you can

 do with information, first don't spam with it,  for those still not sure

 what this type of robot will do for you then take this for example:

 Your a international distributer of "dried fruit" and you boss has told you

 if you rise sales by 10% then he will bye you a new car (Wish i had a boss

 like that), well anyway there are thousands of shops distributers ect, that

 you could be doing business with but you don't know who they are?, because

 there in other countries or the nearest town but have never heard about them

 before.  Has the penny droped yet, no well now you have the opertunity to

 find out who they are with an internet address and a person to contact in

 that company just by downloading and running e-collector.

 Plus it's free,  you don't have to do any leg work just run the program and

 sit back and watch your potential customers arriving.

robot-history: -

robot-environment: Service

modified-date: Weekly

modified-by: Dean Smart



robot-id:ebiness

robot-name:EbiNess

robot-cover-url:http://sourceforge.net/projects/ebiness

robot-details-url:http://ebiness.sourceforge.net/

robot-owner-name:Mike Davis

robot-owner-url:http://www.carisbrook.co.uk/mike

robot-owner-email:[email protected]

robot-status:Pre-Alpha

robot-purpose:statistics

robot-type:standalone

robot-platform:unix(Linux)

robot-availability:Open Source

robot-exclusion:yes

robot-exclusion-useragent:ebiness

robot-noindex:no

robot-host:

robot-from:no

robot-useragent:EbiNess/0.01a

robot-language:c++

robot-description:Used to build a url relationship database, to be viewed in 3D

robot-history:Dreamed it up over some beers

robot-environment:hobby

modified-date:Mon, 27 Nov 2000 12:26:00 GMT

modified-by:Mike Davis



robot-id:           eit

robot-name:         EIT Link Verifier Robot

robot-cover-url:    http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html

robot-details-url:

robot-owner-name:   Jim McGuire

robot-owner-url:    http://www.eit.com/people/mcguire.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintenance

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         

robot-useragent:    EIT-Link-Verifier-Robot/0.2

robot-language:     

robot-description:  Combination of an HTML form and a CGI script that verifies

	links from a given starting point (with some controls to

	prevent it going off-site or limitless)

robot-history:      Announced on 12 July 1994

robot-environment:

modified-date:      

modified-by:



robot-id: elfinbot

robot-name:ELFINBOT

robot-cover-url:http://letsfinditnow.com

robot-details-url:http://letsfinditnow.com/elfinbot.html

robot-owner-name:Lets Find It Now Ltd

robot-owner-url:http://letsfinditnow.com

robot-owner-email:[email protected]

robot-status:Active

robot-purpose:Indexing for the Lets Find It Now search Engine

robot-type:Standalone

robot-platform:Unix

robot-availability:None

robot-exclusion: yes

robot-exclusion-useragent:elfinbot

robot-noindex:yes

robot-host:*.letsfinditnow.com

robot-from:no

robot-useragent:elfinbot

robot-language:Perl5

robot-description:ELFIN is used to index and add data to the "Lets Find It Now

 Search Engine" (http://letsfinditnow.com). The robot runs every 30 days.

robot-history:

robot-environment:

modified-date:

modified-by:



robot-id:           emacs

robot-name:         Emacs-w3 Search Engine

robot-cover-url:    http://www.cs.indiana.edu/elisp/w3/docs.html

robot-details-url:

robot-owner-name:   William M. Perry

robot-owner-url:    http://www.cs.indiana.edu/hyplan/wmperry.html

robot-owner-email:  [email protected]

robot-status:       retired

robot-purpose:      indexing

robot-type:         browser

robot-platform:     

robot-availability: 

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         yes

robot-useragent:    Emacs-w3/v[0-9\.]+

robot-language:     lisp

robot-description:  Its purpose is to generate a Resource Discovery database

	This code has not been looked at in a while, but will be

	spruced up for the Emacs-w3 2.2.0 release sometime this

	month. It will honor the /robots.txt file at that

	time.

robot-history:      

robot-environment:

modified-date:      Fri May 5 16:09:18 1995

modified-by:



robot-id:           emcspider

robot-name:         ananzi

robot-cover-url:    http://www.empirical.com/

robot-details-url:

robot-owner-name:   Hunter Payne

robot-owner-url:    http://www.psc.edu/~hpayne/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         bilbo.internal.empirical.com

robot-from:         yes

robot-useragent:    EMC Spider

robot-language:     java This spider is still in the development stages but, it

	will be hitting sites while I finish debugging it.

robot-description:

robot-history:

robot-environment:

modified-date:      Wed May 29 14:47:01 1996.

modified-by:



robot-id: esculapio

robot-name: esculapio

robot-cover-url: http://esculapio.cype.com

robot-details-url: http://esculapio.cype.com/details.htm

robot-owner-name: CYPE Ingenieros

robot-owner-url: http://www.cype.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: link validation

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: esculapio

robot-noindex: yes

robot-host: 80.34.92.45

robot-from: yes

robot-useragent: esculapio/1.1

robot-language: C++

robot-description: Checks the integrity of the links between several

 domains.

robot-history: First, a research project. Now, an internal tool. Next, ???.

robot-environment: research, service

modified-date: Mon, 6 Jun 2004 08:25 +1 GMT

modified-by:



robot-id: esther

robot-name: Esther

robot-details-url: http://search.falconsoft.com/

robot-cover-url: http://search.falconsoft.com/

robot-owner-name: Tim Gustafson

robot-owner-url: http://www.falconsoft.com/

robot-owner-email:      [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix (FreeBSD 2.2.8)

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: esther

robot-noindex: no

robot-host: *.falconsoft.com

robot-from: yes

robot-useragent: esther

robot-language: perl5

robot-description: This crawler is used to build the search database at

 http://search.falconsoft.com/

robot-history: Developed by FalconSoft.

robot-environment: service

modified-date: Tue, 22 Dec 1998 00:22:00 PST



robot-id: evliyacelebi

robot-name: Evliya Celebi

robot-cover-url: http://ilker.ulak.net.tr/EvliyaCelebi

robot-details-url: http://ilker.ulak.net.tr/EvliyaCelebi

robot-owner-name: Ilker TEMIR

robot-owner-url: http://ilker.ulak.net.tr

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing turkish content

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: N/A

robot-noindex: no

robot-nofollow: no

robot-host: 193.140.83.*

robot-from: [email protected]

robot-useragent: Evliya Celebi v0.151 - http://ilker.ulak.net.tr

robot-language: perl5

robot-history:

robot-description: crawles pages under ".tr" domain or having turkish character

 encoding (iso-8859-9 or windows-1254)

robot-environment: hobby

modified-date: Fri Mar 31 15:03:12 GMT 2000



robot-id:           nzexplorer

robot-name:         nzexplorer

robot-cover-url:    http://nzexplorer.co.nz/

robot-details-url:

robot-owner-name:   Paul Bourke

robot-owner-url:    http://bourke.gen.nz/paul.html

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:     UNIX

robot-availability: source (commercial)

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         bitz.co.nz

robot-from:         no

robot-useragent:    explorersearch

robot-language:     c++

robot-history:      Started in 1995 to provide a comprehensive index

                    to WWW pages within New Zealand. Now also used in

                    Malaysia and other countries.

robot-environment:  service

modified-date:      Tues, 25 Jun 1996

modified-by:        Paul Bourke



robot-id: fastcrawler

robot-name: FastCrawler

robot-cover-url: http://www.1klik.dk/omos/

robot-details-url: http://www.1klik.dk/omos/

robot-owner-name: 1klik.dk A/S

robot-owner-url: http://www.1klik.dk

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Windows 2000 Adv. Server

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: fastcrawler

robot-noindex: yes

robot-host: 1klik.dk

robot-from: yes

robot-useragent: FastCrawler 3.0.X ([email protected]) - http://www.1klik.dk

robot-language: C++

robot-description: FastCrawler is used to build the databases for search engines used by 1klik.dk and it's partners

robot-history: Robot started in April 1999

robot-environment: commercial

modified-date: 05-08-2001

modified-by: Kim Gam-Jensen



robot-id:fdse

robot-name:Fluid Dynamics Search Engine robot

robot-cover-url:http://www.xav.com/scripts/search/

robot-details-url:http://www.xav.com/scripts/search/

robot-owner-name:Zoltan Milosevic

robot-owner-url:http://www.xav.com/

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing

robot-type:standalone

robot-platform:unix;windows

robot-availability:source;data

robot-exclusion:yes

robot-exclusion-useragent:FDSE

robot-noindex:yes

robot-host:yes

robot-from:*

robot-useragent:Mozilla/4.0 (compatible: FDSE robot)

robot-language:perl5

robot-description:Crawls remote sites as part of a shareware search engine

 program

robot-history:Developed in late 1998 over three pots of coffee

robot-environment:commercial

modified-date:Fri, 21 Jan 2000 10:15:49 GMT

modified-by:Zoltan Milosevic



robot-id:	felix

robot-name:	Felix IDE

robot-cover-url:	http://www.pentone.com

robot-details-url:	http://www.pentone.com

robot-owner-name:	The Pentone Group, Inc.

robot-owner-url:	http://www.pentone.com

robot-owner-email:	[email protected]

robot-status:	active

robot-purpose:	indexing, statistics

robot-type:	standalone

robot-platform:	windows95, windowsNT

robot-availability:	binary

robot-exclusion:	yes

robot-exclusion-useragent:	FELIX IDE

robot-noindex:	yes

robot-host:	*

robot-from:	yes

robot-useragent:	FelixIDE/1.0

robot-language:	visual basic

robot-description:	Felix IDE is a retail personal search spider sold by

  The Pentone Group, Inc.

  It supports the proprietary exclusion "Frequency: ??????????" in the

  robots.txt file. Question marks represent an integer

  indicating number of milliseconds to delay between document requests. This

  is called VDRF(tm) or Variable Document Retrieval Frequency. Note that

  users can re-define the useragent name.

robot-history:	This robot began as an in-house tool for the lucrative Felix

  IDS (Information Discovery Service) and has gone retail.

robot-environment:	service, commercial, research


modified-date:	Fri, 11 Apr 1997 19:08:02 GMT

modified-by:	Kerry B. Rogers



robot-id:           ferret

robot-name:         Wild Ferret Web Hopper #1, #2, #3

robot-cover-url:    http://www.greenearth.com/

robot-details-url:

robot-owner-name:   Greg Boswell

robot-owner-url:    http://www.greenearth.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing maintenance statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    Hazel's Ferret Web hopper, 

robot-language:     C++, Visual Basic, Java

robot-description:  The wild ferret web hopper's are designed as specific agents

	to retrieve data from all available sources on the internet.

	They work in an onion format hopping from spot to spot one

	level at a time over the internet. The information is

	gathered into different relational databases, known as

	"Hazel's Horde". The information is publicly available and

	will be free for the browsing at www.greenearth.com.

	Effective date of the data posting is to be

	announced.

robot-history:

robot-environment:

modified-date:      Mon Feb 19 00:28:37 1996.

modified-by:



robot-id: fetchrover

robot-name: FetchRover

robot-cover-url: http://www.engsoftware.com/fetch.htm

robot-details-url: http://www.engsoftware.com/spiders/

robot-owner-name: Dr. Kenneth R. Wadland

robot-owner-url: http://www.engsoftware.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance, statistics

robot-type: standalone

robot-platform: Windows/NT, Windows/95, Solaris SPARC

robot-availability: binary, source

robot-exclusion: yes

robot-exclusion-useragent: ESI

robot-noindex: N/A

robot-host: *

robot-from: yes

robot-useragent: ESIRover v1.0

robot-language: C++

robot-description: FetchRover fetches Web Pages.  

   It is an automated page-fetching engine. FetchRover can be

   used stand-alone or as the front-end to a full-featured Spider.

   Its database can use any ODBC compliant database server, including

   Microsoft Access, Oracle, Sybase SQL Server, FoxPro, etc.

robot-history:  Used as the front-end to SmartSpider (another Spider 

   product sold by Engineeering Software, Inc.)

robot-environment: commercial, service

modified-date: Thu, 03 Apr 1997 21:49:50 EST

modified-by: Ken Wadland



robot-id: fido

robot-name: fido

robot-cover-url: http://www.planetsearch.com/

robot-details-url: http://www.planetsearch.com/info/fido.html

robot-owner-name: Steve DeJarnett

robot-owner-url: http://www.planetsearch.com/staff/steved.html

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: fido

robot-noindex: no

robot-host: fido.planetsearch.com, *.planetsearch.com, 206.64.113.*

robot-from: yes

robot-useragent: fido/0.9 Harvest/1.4.pl2

robot-language: c, perl5

robot-description: fido is used to gather documents for the search engine 

                   provided in the PlanetSearch service, which is operated by

                   the Philips Multimedia Center.  The robots runs on an

                   ongoing basis.

robot-history: fido was originally based on the Harvest Gatherer, but has since

               evolved into a new creature.  It still uses some support code

               from Harvest.

robot-environment: service

modified-date: Sat, 2 Nov 1996 00:08:18 GMT

modified-by: Steve DeJarnett



robot-id:           finnish

robot-name:         H�m�h�kki

robot-cover-url:    http://www.fi/search.html

robot-details-url:  http://www.fi/www/spider.html

robot-owner-name:   Timo Mets�l�

robot-owner-url:    http://www.fi/~timo/

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     UNIX

robot-availability: no

robot-exclusion:    yes

robot-exclusion-useragent:  H�m�h�kki

robot-noindex:      no

robot-host:         *.www.fi

robot-from:         yes

robot-useragent:    H�m�h�kki/0.2

robot-language:     C

robot-description:  Its purpose is to generate a Resource Discovery

	database from the Finnish (top-level domain .fi) www servers.

	The resulting database is used by the search engine 

	at http://www.fi/search.html.

robot-history:      (The name H�m�h�kki is just Finnish for spider.)

robot-environment:

modified-date:      1996-06-25   

modified-by:        [email protected]



robot-id: fireball

robot-name: KIT-Fireball

robot-cover-url: http://www.fireball.de

robot-details-url: http://www.fireball.de/technik.html (in German)

robot-owner-name: Gruner + Jahr Electronic Media Service GmbH

robot-owner-url: http://www.ems.guj.de

robot-owner-email:[email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: KIT-Fireball

robot-noindex: yes

robot-host: *.fireball.de

robot-from: yes

robot-useragent: KIT-Fireball/2.0 libwww/5.0a

robot-language: c

robot-description: The Fireball robots gather web documents in German

 language for the database of the Fireball search service.

robot-history: The robot was developed by Benhui Chen in a research

 project at the Technical University of Berlin in 1996 and was

 re-implemented by its developer in 1997 for the present owner.

robot-environment: service 

modified-date: Mon Feb 23 11:26:08 1998

modified-by: Detlev Kalb



robot-id:           fish

robot-name:         Fish search

robot-cover-url:    http://www.win.tue.nl/bin/fish-search

robot-details-url:

robot-owner-name:   Paul De Bra

robot-owner-url:    http://www.win.tue.nl/win/cs/is/debra/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     

robot-availability: binary

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         www.win.tue.nl

robot-from:         no

robot-useragent:    Fish-Search-Robot

robot-language:     c

robot-description:  Its purpose is to discover resources on the fly a version

	exists that is integrated into the Tübingen Mosaic

	2.4.2 browser (also written in C)

robot-history:      Originated as an addition to Mosaic for X

robot-environment:

modified-date:      Mon May 8 09:31:19 1995

modified-by:



robot-id: fouineur

robot-name: Fouineur

robot-cover-url: http://fouineur.9bit.qc.ca/

robot-details-url: http://fouineur.9bit.qc.ca/informations.html

robot-owner-name: Joel Vandal

robot-owner-url: http://www.9bit.qc.ca/~jvandal/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix, windows 

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: fouineur

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: Mozilla/2.0 (compatible fouineur v2.0; fouineur.9bit.qc.ca)

robot-language: perl5

robot-description: This robot build automaticaly a database that is used

                   by our own search engine. This robot auto-detect the

                   language (french, english & spanish) used in the HTML

                   page. Each database record generated by this robot

                   include: date, url, title, total words, title, size

                   and de-htmlized text. Also support server-side and

                   client-side IMAGEMAP.

robot-history: No robots does all thing that we need for our usage.

robot-environment: service

modified-date: Thu, 9 Jan 1997 22:57:28 EST

modified-by: [email protected]



robot-id:           francoroute

robot-name:         Robot Francoroute

robot-cover-url:

robot-details-url:

robot-owner-name:   Marc-Antoine Parent

robot-owner-url:    http://www.crim.ca/~maparent

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, mirroring, statistics

robot-type:         browser

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         zorro.crim.ca

robot-from:         yes

robot-useragent:    Robot du CRIM 1.0a

robot-language:     perl5, sqlplus

robot-description:  Part of the RISQ's Francoroute project for researching

	francophone. Uses the Accept-Language tag and reduces demand

	accordingly

robot-history:

robot-environment:

modified-date:      Wed Jan 10 23:56:22 1996.

modified-by:



robot-id: freecrawl

robot-name: Freecrawl

robot-cover-url: http://euroseek.net/

robot-owner-name: Jesper Ekhall

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Freecrawl

robot-noindex: no

robot-host: *.freeside.net

robot-from: yes

robot-useragent: Freecrawl

robot-language: c

robot-description: The Freecrawl robot is used to build a database for the

  EuroSeek service.

robot-environment: service



robot-id:           funnelweb

robot-name:         FunnelWeb

robot-cover-url:    http://funnelweb.net.au

robot-details-url:

robot-owner-name:   David Eagles

robot-owner-url:    http://www.pc.com.au

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing, statisitics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         earth.planets.com.au

robot-from:         yes

robot-useragent:    FunnelWeb-1.0

robot-language:     c and c++

robot-description:  Its purpose is to generate a Resource Discovery database,

	and generate statistics. Localised South Pacific Discovery

	and Search Engine, plus distributed operation under

	development.

robot-history:      

robot-environment:

modified-date:      Mon Nov 27 21:30:11 1995

modified-by:



robot-id:	      gama

robot-name: gammaSpider, FocusedCrawler

robot-details-url: http://www.gammasite.com, http://www.gammasite.com/gammaSpider.html

robot-cover-url: http://www.gammasite.com

robot-owner-name: gammasite

robot-owner-url: http://www.gammasite.com

robot-owner-email:	[email protected]

robot-status: active

robot-purpose: indexing, maintenance

robot-type: standalone

robot-platform: unix, windows, windows95, windowsNT, linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: gammaSpider

robot-noindex:	no

robot-nofollow: no

robot-host: *

robot-from: no

robot-useragent: gammaSpider xxxxxxx ()/

robot-language: c++

robot-description:

  Information gathering.

  Focused carwling on specific topic.

  Uses gammaFetcherServer

  Product for selling.

  RobotUserAgent may changed by the user.

  More features are being added.

  The product is constatnly under development.

  AKA FocusedCrawler

robot-history: AKA FocusedCrawler

robot-environment: service, commercial, research

modified-date: Sun, 25 Mar 2001 18:49:52 GMT



robot-id: gazz

robot-name: gazz

robot-cover-url: http://gazz.nttrd.com/

robot-details-url: http://gazz.nttrd.com/

robot-owner-name: NTT Cyberspace Laboratories

robot-owner-url: http://gazz.nttrd.com/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: statistics

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: gazz

robot-noindex: yes

robot-host: *.nttrd.com, *.infobee.ne.jp

robot-from: yes

robot-useragent: gazz/1.0

robot-language: c

robot-description: This robot is used for research purposes.

robot-history: Its root is TITAN project in NTT.

robot-environment: research

modified-date: Wed, 09 Jun 1999 10:43:18 GMT

modified-by: [email protected]



robot-id: gcreep

robot-name: GCreep

robot-cover-url: http://www.instrumentpolen.se/gcreep/index.html

robot-details-url: http://www.instrumentpolen.se/gcreep/index.html

robot-owner-name: Instrumentpolen AB

robot-owner-url: http://www.instrumentpolen.se/ip-kontor/eng/index.html

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: browser+standalone

robot-platform: linux+mysql

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: gcreep

robot-noindex: yes

robot-host: mbx.instrumentpolen.se

robot-from: yes

robot-useragent: gcreep/1.0

robot-language: c

robot-description: Indexing robot to learn SQL

robot-history: Spare time project begun late '96, maybe early '97

robot-environment: hobby

modified-date: Fri, 23 Jan 1998 16:09:00 MET

modified-by: Anders Hedstrom



robot-id:           getbot

robot-name:         GetBot

robot-cover-url:    http://www.blacktop.com.zav/bots 

robot-details-url:

robot-owner-name:   Alex Zavatone

robot-owner-url:    http://www.blacktop.com/zav

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      maintenance

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no.

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         no

robot-useragent:    ???

robot-language:     Shockwave/Director.

robot-description:  GetBot's purpose is to index all the sites it can find that

	contain Shockwave movies.  It is the first bot or spider

	written in Shockwave.  The bot was originally written at

	Macromedia on a hungover Sunday as a proof of concept. -

	Alex Zavatone 3/29/96

robot-history:

robot-environment:

modified-date:      Fri Mar 29 20:06:12 1996.

modified-by:



robot-id:           geturl

robot-name:         GetURL

robot-cover-url:    http://Snark.apana.org.au/James/GetURL/

robot-details-url:

robot-owner-name:   James Burton

robot-owner-url:    http://Snark.apana.org.au/James/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintenance, mirroring

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         no

robot-useragent:    GetURL.rexx v1.05

robot-language:     ARexx (Amiga REXX)

robot-description:  Its purpose is to validate links, perform mirroring, and

	copy document trees. Designed as a tool for retrieving web

	pages in batch mode without the encumbrance of a browser.

	Can be used to describe a set of pages to fetch, and to

	maintain an archive or mirror. Is not run by a central site

	and accessed by clients - is run by the end user or archive

	maintainer

robot-history:      

robot-environment:

modified-date:      Tue May 9 15:13:12 1995		

modified-by:



robot-id: golem

robot-name: Golem

robot-cover-url: http://www.quibble.com/golem/

robot-details-url: http://www.quibble.com/golem/

robot-owner-name: Geoff Duncan

robot-owner-url: http://www.quibble.com/geoff/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: mac

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: golem

robot-noindex: no

robot-host: *.quibble.com

robot-from: yes

robot-useragent: Golem/1.1

robot-language: HyperTalk/AppleScript/C++

robot-description: Golem generates status reports on collections of URLs

  supplied by clients. Designed to assist with editorial updates of

  Web-related sites or products.

robot-history: Personal project turned into a contract service for private

  clients.

robot-environment: service,research

modified-date: Wed, 16 Apr 1997 20:50:00 GMT

modified-by: Geoff Duncan



robot-id: googlebot

robot-name: Googlebot

robot-cover-url: http://www.googlebot.com/ 

robot-details-url: http://www.googlebot.com/bot.html

robot-owner-name: Google Inc.

robot-owner-url: http://www.google.com/

robot-owner-email: [email protected] 

robot-status: active

robot-purpose: indexing

robot-type: standalone 

robot-platform: Linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: googlebot

robot-noindex: yes

robot-host: googlebot.com

robot-from: yes 

robot-useragent: Googlebot/2.X (+http://www.googlebot.com/bot.html)

robot-language: c++

robot-description: Google's crawler

robot-history: Developed by Google Inc

robot-environment: commercial

modified-date: Thu Mar 29 21:00:07 PST 2001

modified-by: [email protected]



robot-id: grapnel

robot-name: Grapnel/0.01 Experiment

robot-cover-url: varies

robot-details-url: mailto:[email protected]

robot-owner-name: Philip Kallerman

robot-owner-url: [email protected]

robot-owner-email: [email protected]

robot-status: Experimental

robot-purpose: Indexing

robot-type:

robot-platform: WinNT

robot-availability: None, yet

robot-exclusion: Yes

robot-exclusion-useragent: No

robot-noindex: No

robot-host: varies

robot-from: Varies

robot-useragent:

robot-language: Perl

robot-description: Resource Discovery Experimentation

robot-history: None, hoping to make some

robot-environment:

modified-date:

modified-by: 7 Feb 1997



robot-id:griffon

robot-name:Griffon                                                               

robot-cover-url:http://navi.ocn.ne.jp/                                           

robot-details-url:http://navi.ocn.ne.jp/griffon/                                 

robot-owner-name:NTT Communications Corporate Users Business Division            

robot-owner-url:http://navi.ocn.ne.jp/                                           

robot-owner-email:[email protected]                                   

robot-status:active                                                              

robot-purpose:indexing                                                           

robot-type:standalone                                                            

robot-platform:unix                                                              

robot-availability:none                                                          

robot-exclusion:yes                                                              

robot-exclusion-useragent:griffon                                                

robot-noindex:yes                                                                

robot-nofollow:yes                                                              

robot-host:*.navi.ocn.ne.jp                                                      

robot-from:yes                                                                   

robot-useragent:griffon/1.0                                                      

robot-language:c                                                                 

robot-description:The Griffon robot is used to build database for the OCN navi   

       search service operated by NTT Communications Corporation.

       It mainly gathers pages written in Japanese.            

robot-history:Its root is TITAN project in NTT.                                  

robot-environment:service                                                        

modified-date:Mon,25 Jan 2000 15:25:30 GMT                                       

modified-by:[email protected]



robot-id: gromit

robot-name: Gromit

robot-cover-url: http://www.austlii.edu.au/

robot-details-url: http://www2.austlii.edu.au/~dan/gromit/

robot-owner-name: Daniel Austin

robot-owner-url: http://www2.austlii.edu.au/~dan/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Gromit

robot-noindex: no

robot-host: *.austlii.edu.au

robot-from: yes

robot-useragent: Gromit/1.0

robot-language: perl5

robot-description: Gromit is a Targetted Web Spider that indexes legal

 sites contained in the AustLII legal links database.

robot-history: This robot is based on the Perl5 LWP::RobotUA module.

robot-environment: research

modified-date: Wed, 11 Jun 1997 03:58:40 GMT

modified-by: Daniel Austin



robot-id: gulliver

robot-name: Northern Light Gulliver

robot-cover-url:

robot-details-url:

robot-owner-name: Mike Mulligan

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: gulliver

robot-noindex: yes

robot-host: scooby.northernlight.com, taz.northernlight.com,

  gulliver.northernlight.com

robot-from: yes

robot-useragent: Gulliver/1.1

robot-language: c

robot-description: Gulliver is a robot to be used to collect

  web pages for indexing and subsequent searching of the index.

robot-history: Oct 1996: development; Dec 1996-Jan 1997: crawl & debug;

  Mar 1997: crawl again;

robot-environment: service

modified-date: Wed, 21 Apr 1999 16:00:00 GMT

modified-by: Mike Mulligan



robot-id: gulperbot

robot-name: Gulper Bot

robot-cover-url: http://yuntis.ecsl.cs.sunysb.edu/

robot-details-url: http://yuntis.ecsl.cs.sunysb.edu/help/robot/

robot-owner-name: Maxim Lifantsev

robot-owner-url: http://www.cs.sunysb.edu/~maxim/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: gulper

robot-noindex: yes

robot-nofollow: yes

robot-host: yuntis*.ecsl.cs.sunysb.edu

robot-from: no

robot-useragent: Gulper Web Bot 0.2.4 (www.ecsl.cs.sunysb.edu/~maxim/cgi-bin/Link/GulperBot)

robot-language: c++

robot-description: The Gulper Bot is used to collect data for the Yuntis research search engine project.

robot-history: Developed in a research project at SUNY Stony Brook.

robot-environment: research

modified-date: Tue, 28 Aug 2001 21:40:47 GMT

modified-by: [email protected]



robot-id: hambot

robot-name: HamBot

robot-cover-url: http://www.hamrad.com/search.html

robot-details-url: http://www.hamrad.com/

robot-owner-name: John Dykstra

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, Windows95

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: hambot

robot-noindex: yes

robot-host: *.hamrad.com

robot-from:

robot-useragent:

robot-language: perl5, C++

robot-description: Two HamBot robots are used (stand alone & browser based)

 to aid in building the database for HamRad Search - The Search Engine for

 Search Engines.  The robota are run intermittently and perform nearly

 identical functions.

robot-history: A non commercial (hobby?) project to aid in building and

 maintaining the database for the the HamRad search engine.

robot-environment: service

modified-date: Fri, 17 Apr 1998 21:44:00 GMT

modified-by: JD



robot-id:           harvest

robot-name:         Harvest

robot-cover-url:    http://harvest.cs.colorado.edu

robot-details-url:

robot-owner-name:   

robot-owner-url:    

robot-owner-email:  

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         bruno.cs.colorado.edu

robot-from:         yes

robot-useragent:    yes

robot-language:     

robot-description:  Harvest's motivation is to index community- or topic-

	specific collections, rather than to locate and index all

	HTML objects that can be found.  Also, Harvest allows users

	to control the enumeration several ways, including stop

	lists and depth and count limits.  Therefore, Harvest

	provides a much more controlled way of indexing the Web than

	is typical of robots. Pauses 1 second between requests (by

	default).

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: havindex

robot-name: havIndex

robot-cover-url: http://www.hav.com/

robot-details-url: http://www.hav.com/

robot-owner-name: hav.Software and Horace A. (Kicker) Vallas

robot-owner-url: http://www.hav.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Java VM 1.1

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: havIndex

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: havIndex/X.xx[bxx]

robot-language: Java

robot-description: havIndex allows individuals to build searchable word

 index of (user specified) lists of URLs.  havIndex does not crawl -

 rather it requires  one or more user supplied lists of URLs to be

 indexed.  havIndex does (optionally) save urls parsed from indexed

  pages.

robot-history: Developed to answer client requests for URL specific

 index capabilities.

robot-environment: commercial, service

modified-date: 6-27-98

modified-by: Horace A. (Kicker) Vallas



robot-id:           hi

robot-name:         HI (HTML Index) Search

robot-cover-url:    http://cs6.cs.ait.ac.th:21870/pa.html

robot-details-url:

robot-owner-name:   Razzakul Haider Chowdhury

robot-owner-url:    http://cs6.cs.ait.ac.th:21870/index.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         yes

robot-useragent:    AITCSRobot/1.1

robot-language:     perl 5

robot-description:  Its purpose is to generate a Resource Discovery database.

	This Robot traverses the net and creates a searchable

	database of Web pages. It stores the title string of the

	HTML document and the absolute url. A search engine provides

	the boolean AND & OR query models with or without filtering

	the stop list of words. Feature is kept for the Web page

	owners to add the url to the searchable database.

robot-history:      

robot-environment:

modified-date:      Wed Oct  4 06:54:31 1995

modified-by:



robot-id: hometown

robot-name: Hometown Spider Pro

robot-cover-url: http://www.hometownsingles.com

robot-details-url: http://www.hometownsingles.com

robot-owner-name: Bob Brown

robot-owner-url: http://www.hometownsingles.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: *

robot-noindex: yes

robot-host: 63.195.193.17

robot-from: no

robot-useragent: Hometown Spider Pro

robot-language: delphi

robot-description: The Hometown Spider Pro is used to maintain the indexes

 for Hometown Singles.

robot-history: Innerprise URL Spider Pro

robot-environment: commercial

modified-date: Tue, 28 Mar 2000 16:00:00 GMT

modified-by: Hometown Singles



robot-id: wired-digital

robot-name: Wired Digital

robot-cover-url:

robot-details-url:

robot-owner-name: Bowen Dwelle

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: hotwired

robot-noindex: no

robot-host: gossip.hotwired.com

robot-from: yes

robot-useragent: wired-digital-newsbot/1.5

robot-language: perl-5.004

robot-description: this is a test

robot-history:

robot-environment: research

modified-date: Thu, 30 Oct 1997

modified-by: [email protected]



robot-id:           htdig

robot-name:         ht://Dig

robot-cover-url:    http://www.htdig.org/

robot-details-url:  http://www.htdig.org/howitworks.html

robot-owner-name:   Andrew Scherpbier

robot-owner-url:    http://www.htdig.org/author.html

robot-owner-email:  [email protected]

robot-owner-name2:  Geoff Hutchison 

robot-owner-url2:   http://wso.williams.edu/~ghutchis/

robot-owner-email2: [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     unix

robot-availability: source

robot-exclusion:    yes

robot-exclusion-useragent: htdig

robot-noindex:      yes

robot-host:         *

robot-from:         no

robot-useragent:    htdig/3.1.0b2

robot-language:     C,C++.

robot-history:This robot was originally developed for use at San Diego

 State University.

robot-environment:

modified-date:Tue, 3 Nov 1998 10:09:02 EST 

modified-by: Geoff Hutchison 



robot-id:           htmlgobble

robot-name:         HTMLgobble

robot-cover-url:    

robot-details-url:

robot-owner-name:   Andreas Ley

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      mirror

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         tp70.rz.uni-karlsruhe.de

robot-from:         yes

robot-useragent:    HTMLgobble v2.2

robot-language:     

robot-description:  A mirroring robot. Configured to stay within a directory,

	sleeps between requests, and the next version will use HEAD

	to check if the entire document needs to be

	retrieved

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id:           hyperdecontextualizer

robot-name:         Hyper-Decontextualizer

robot-cover-url:    http://www.tricon.net/Comm/synapse/spider/

robot-details-url:

robot-owner-name:   Cliff Hall

robot-owner-url:    http://kpt1.tricon.net/cgi-bin/cliff.cgi

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         no

robot-useragent:    no

robot-language:     Perl 5 Takes an input sentence and marks up each word with

	an appropriate hyper-text link.

robot-description:

robot-history:

robot-environment:

modified-date:      Mon May  6 17:41:29 1996.

modified-by:



robot-id: iajabot

robot-name: iajaBot

robot-cover-url:

robot-details-url: http://www.scs.carleton.ca/~morin/iajabot.html

robot-owner-name: Pat Morin

robot-owner-url: http://www.scs.carleton.ca/~morin/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, windows

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: iajabot

robot-noindex: no

robot-host: *.scs.carleton.ca

robot-from: no

robot-useragent: iajaBot/0.1

robot-language: c

robot-description: Finds adult content

robot-history: None, brand new.

robot-environment: research

modified-date: Tue, 27 Jun 2000, 11:17:50 EDT

modified-by: Pat Morin



robot-id:           ibm

robot-name:         IBM_Planetwide

robot-cover-url:    http://www.ibm.com/%7ewebmaster/

robot-details-url:

robot-owner-name:   Ed Costello

robot-owner-url:    http://www.ibm.com/%7ewebmaster/

robot-owner-email:  [email protected]"

robot-status:

robot-purpose:      indexing, maintenance, mirroring

robot-type:         standalone and

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         www.ibm.com www2.ibm.com

robot-from:         yes

robot-useragent:    IBM_Planetwide, 

robot-language:     Perl5

robot-description:  Restricted to IBM owned or related domains.

robot-history:

robot-environment:

modified-date:      Mon Jan 22 22:09:19 1996.

modified-by:



robot-id: iconoclast

robot-name: Popular Iconoclast

robot-cover-url: http://gestalt.sewanee.edu/ic/

robot-details-url: http://gestalt.sewanee.edu/ic/info.html

robot-owner-name: Chris Cappuccio

robot-owner-url: http://sefl.satelnet.org/~ccappuc/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: statistics 

robot-type: standalone

robot-platform: unix (OpenBSD)

robot-availability: source

robot-exclusion: no 

robot-exclusion-useragent:

robot-noindex: no

robot-host: gestalt.sewanee.edu

robot-from: yes 

robot-useragent: gestaltIconoclast/1.0 libwww-FM/2.17

robot-language: c,perl5

robot-description: This guy likes statistics

robot-history: This robot has a history in mathematics and english

robot-environment: research

modified-date: Wed, 5 Mar 1997 17:35:16 CST

modified-by: [email protected]



robot-id: Ilse

robot-name: Ingrid

robot-cover-url:

robot-details-url:

robot-owner-name: Ilse c.v.

robot-owner-url: http://www.ilse.nl/

robot-owner-email: [email protected]

robot-status: Running

robot-purpose: Indexing

robot-type: Web Indexer

robot-platform: UNIX

robot-availability: Commercial as part of search engine package

robot-exclusion: Yes

robot-exclusion-useragent: INGRID/0.1

robot-noindex: Yes

robot-host: bart.ilse.nl

robot-from: Yes

robot-useragent: INGRID/0.1

robot-language: C

robot-description:  

robot-history:

robot-environment:

modified-date: 06/13/1997

modified-by: Ilse



robot-id: imagelock

robot-name: Imagelock 

robot-cover-url:

robot-details-url:

robot-owner-name: Ken Belanger  

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: maintenance      

robot-type:

robot-platform: windows95

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: no

robot-host: 209.111.133.*

robot-from: no

robot-useragent: Mozilla 3.01 PBWF (Win95)

robot-language:

robot-description: searches for image links

robot-history:

robot-environment: service

modified-date: Tue, 11 Aug 1998 17:28:52 GMT

modified-by: [email protected]



robot-id:           incywincy

robot-name:         IncyWincy

robot-cover-url:    http://osiris.sunderland.ac.uk/sst-scripts/simon.html

robot-details-url:

robot-owner-name:   Simon Stobart

robot-owner-url:    http://osiris.sunderland.ac.uk/sst-scripts/simon.html

robot-owner-email:  [email protected]

robot-status:

robot-purpose:

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         osiris.sunderland.ac.uk

robot-from:         yes

robot-useragent:    IncyWincy/1.0b1

robot-language:     C++

robot-description:  Various Research projects at the University of

	Sunderland

robot-history:

robot-environment:

modified-date:      Fri Jan 19 21:50:32 1996.

modified-by:



robot-id: informant

robot-name: Informant

robot-cover-url: http://informant.dartmouth.edu/

robot-details-url: http://informant.dartmouth.edu/about.html

robot-owner-name: Bob Gray

robot-owner-name2: Aditya Bhasin

robot-owner-name3: Katsuhiro Moizumi

robot-owner-name4: Dr. George V. Cybenko

robot-owner-url: http://informant.dartmouth.edu/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: Informant

robot-noindex: no

robot-host: informant.dartmouth.edu

robot-from: yes

robot-useragent: Informant

robot-language: c, c++

robot-description: The Informant robot continually checks the Web pages

 that are relevant to user queries.  Users are notified of any new or

 updated pages.  The robot runs daily, but the number of hits per site

 per day should be quite small, and these hits should be randomly

 distributed over several hours.  Since the robot does not actually 

 follow links (aside from those returned from the major search engines 

 such as Lycos), it does not fall victim to the common looping problems.

 The robot will support the Robot Exclusion Standard by early December, 1996.

robot-history: The robot is part of a research project at Dartmouth College.  

 The robot may become part of a commercial service (at which time it may be 

 subsumed by some other, existing robot).

robot-environment: research, service

modified-date: Sun, 3 Nov 1996 11:55:00 GMT

modified-by: Bob Gray



robot-id:           infoseek

robot-name:         InfoSeek Robot 1.0

robot-cover-url:    http://www.infoseek.com

robot-details-url:

robot-owner-name:   Steve Kirsch

robot-owner-url:    http://www.infoseek.com

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         corp-gw.infoseek.com

robot-from:         yes

robot-useragent:    InfoSeek Robot 1.0

robot-language:     python

robot-description:  Its purpose is to generate a Resource Discovery database.

	Collects WWW pages for both InfoSeek's free WWW search and

	commercial search. Uses a unique proprietary algorithm to

	identify the most popular and interesting WWW pages. Very

	fast, but never has more than one request per site

	outstanding at any given time. Has been refined for more

	than a year.

robot-history:      

robot-environment:

modified-date:      Sun May 28 01:35:48 1995

modified-by:



robot-id:           infoseeksidewinder

robot-name:         Infoseek Sidewinder

robot-cover-url:    http://www.infoseek.com/

robot-details-url:

robot-owner-name:   Mike Agostino

robot-owner-url:    http://www.infoseek.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    Infoseek Sidewinder

robot-language:     C Collects WWW pages for both InfoSeek's free WWW search

	services. Uses a unique, incremental, very fast proprietary

	algorithm to find WWW pages. 

robot-description:

robot-history:

robot-environment:

modified-date:      Sat Apr 27 01:20:15 1996.

modified-by:



robot-id: infospider

robot-name: InfoSpiders

robot-cover-url: http://www-cse.ucsd.edu/users/fil/agents/agents.html

robot-owner-name: Filippo Menczer

robot-owner-url: http://www-cse.ucsd.edu/users/fil/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: search

robot-type: standalone

robot-platform: unix, mac

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: InfoSpiders

robot-noindex: no

robot-host: *.ucsd.edu

robot-from: yes

robot-useragent: InfoSpiders/0.1

robot-language: c, perl5

robot-description: application of artificial life algorithm to adaptive

 distributed information retrieval

robot-history: UC San Diego, Computer Science Dept. PhD research project

 (1995-97) under supervision of Prof. Rik Belew

robot-environment: research

modified-date: Mon, 16 Sep 1996 14:08:00 PDT



robot-id:  inspectorwww

robot-name:  Inspector Web

robot-cover-url:  http://www.greenpac.com/inspector/

robot-details-url:  http://www.greenpac.com/inspector/ourrobot.html

robot-owner-name:  Doug Green

robot-owner-url:  http://www.greenpac.com

robot-owner-email:  [email protected]

robot-status:  active:  robot significantly developed, but still undergoing fixes

robot-purpose:  maintentance:  link validation, html validation, image size

 validation, etc

robot-type:  standalone

robot-platform: unix

robot-availability:  free service and more extensive commercial service

robot-exclusion:  yes

robot-exclusion-useragent:  inspectorwww

robot-noindex:  no

robot-host:  www.corpsite.com, www.greenpac.com, 38.234.171.*

robot-from:  yes

robot-useragent:  inspectorwww/1.0 http://www.greenpac.com/inspectorwww.html

robot-language:  c

robot-description:  Provide inspection reports which give advise to WWW

 site owners on missing links, images resize problems, syntax errors, etc.

robot-history:  development started in Mar 1997

robot-environment:  commercial

modified-date:  Tue Jun 17 09:24:58 EST 1997

modified-by:  Doug Green



robot-id:           intelliagent

robot-name:         IntelliAgent

robot-cover-url:    http://www.geocities.com/SiliconValley/3086/iagent.html

robot-details-url:

robot-owner-name:   David Reilly

robot-owner-url:    http://www.geocities.com/SiliconValley/3086/index.html

robot-owner-email:  [email protected]

robot-status:       development

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:         sand.it.bond.edu.au

robot-from:         no

robot-useragent:    'IAGENT/1.0'

robot-language:     C

robot-description:  IntelliAgent is still in development. Indeed, it is very far

	from completion. I'm planning to limit the depth at which it

	will probe, so hopefully IAgent won't cause anyone much of a

	problem. At the end of its completion, I hope to publish

	both the raw data and original source code.

robot-history:

robot-environment:

modified-date:      Fri May 31 02:10:39 1996.

modified-by:



robot-id: irobot

robot-name: I, Robot

robot-cover-url: http://irobot.mame.dk/

robot-details-url: http://irobot.mame.dk/about.phtml

robot-owner-name: [mame.dk]

robot-owner-url: http://www.mame.dk/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: irobot

robot-noindex: yes

robot-host: *.mame.dk, 206.161.121.*

robot-from: no

robot-useragent: I Robot 0.4 ([email protected])

robot-language: c

robot-description: I Robot is used to build a fresh database for the

 emulation community. Primary focus is information on emulation and

 especially old arcade machines. Primarily english sites will be indexed and

 only if they have their own domain. Sites are added manually on based on

 submitions after they has been evaluated.

robot-history: The robot was started in june 2000

robot-environment1: service

robot-environment2: hobby

modified-date: Fri, 27 Oct 2000 09:08:06 GMT

modified-by: BombJack [email protected]



robot-id:iron33

robot-name:Iron33

robot-cover-url:http://verno.ueda.info.waseda.ac.jp/iron33/

robot-details-url:http://verno.ueda.info.waseda.ac.jp/iron33/history.html

robot-owner-name:Takashi Watanabe

robot-owner-url:http://www.ueda.info.waseda.ac.jp/~watanabe/

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing, statistics

robot-type:standalone

robot-platform:unix

robot-availability:source

robot-exclusion:yes

robot-exclusion-useragent:Iron33

robot-noindex:no

robot-host:*.folon.ueda.info.waseda.ac.jp, 133.9.215.*

robot-from:yes

robot-useragent:Iron33/0.0

robot-language:c

robot-description:The robot "Iron33" is used to build the

                  database for the WWW search engine "Verno".

robot-history:

robot-environment:research

modified-date:Fri, 20 Mar 1998 18:34 JST

modified-by:Watanabe Takashi



robot-id:           israelisearch

robot-name:         Israeli-search

robot-cover-url:    http://www.idc.ac.il/Sandbag/

robot-details-url:

robot-owner-name:   Etamar Laron

robot-owner-url:    http://www.xpert.com/~etamar/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing.

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         dylan.ius.cs.cmu.edu

robot-from:         no

robot-useragent:    IsraeliSearch/1.0

robot-language:     C A complete software designed to collect information in a

	distributed workload and supports context queries. Intended

	to be a complete updated resource for Israeli sites and

	information related to Israel or Israeli

	Society.

robot-description:

robot-history:

robot-environment:

modified-date:      Tue Apr 23 19:23:55 1996.

modified-by:



robot-id: javabee

robot-name: JavaBee

robot-cover-url: http://www.javabee.com

robot-details-url:

robot-owner-name:ObjectBox

robot-owner-url:http://www.objectbox.com/

robot-owner-email:[email protected]

robot-status:Active

robot-purpose:Stealing Java Code

robot-type:standalone

robot-platform:Java

robot-availability:binary

robot-exclusion:no

robot-exclusion-useragent:

robot-noindex:no

robot-host:*

robot-from:no

robot-useragent:JavaBee

robot-language:Java

robot-description:This robot is used to grab java applets and run them

 locally overriding the security implemented

robot-history:

robot-environment:commercial

modified-date:

modified-by:



robot-id: JBot

robot-name: JBot Java Web Robot

robot-cover-url: http://www.matuschek.net/software/jbot

robot-details-url: http://www.matuschek.net/software/jbot

robot-owner-name: Daniel Matuschek

robot-owner-url: http://www.matuschek.net

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: Java

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: JBot

robot-noindex: no

robot-host: *

robot-from: -

robot-useragent: JBot (but can be changed by the user)

robot-language: Java

robot-description: Java web crawler to download web sites

robot-history: -

robot-environment: hobby

modified-date: Thu, 03 Jan 2000 16:00:00 GMT

modified-by: Daniel Matuschek 



robot-id: jcrawler

robot-name: JCrawler

robot-cover-url: http://www.nihongo.org/jcrawler/

robot-details-url:

robot-owner-name: Benjamin Franz

robot-owner-url: http://www.nihongo.org/snowhare/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: jcrawler

robot-noindex: yes

robot-host: db.netimages.com

robot-from: yes

robot-useragent: JCrawler/0.2

robot-language: perl5

robot-description: JCrawler is currently used to build the Vietnam topic

                   specific WWW index for VietGATE

                   . It schedules visits

                   randomly, but will not visit a site more than once

                   every two minutes. It uses a subject matter relevance

                   pruning algorithm to determine what pages to crawl

                   and index and will not generally index pages with

                   no Vietnam related content. Uses Unicode internally,

                   and detects and converts several different Vietnamese

                   character encodings.

robot-history:

robot-environment: service

modified-date: Wed, 08 Oct 1997 00:09:52 GMT

modified-by: Benjamin Franz



robot-id: askjeeves

robot-name: AskJeeves

robot-cover-url: http://www.ask.com

robot-details-url: 

robot-owner-name: Ask Jeeves, Inc.

robot-owner-url: http://www.ask.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, maintenance

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: "Teoma" or "Ask Jeeves" or "Jeeves"

robot-noindex: Yes

robot-host: ez*.directhit.com

robot-from: No

robot-useragent: Mozilla/2.0 (compatible; Ask Jeeves/Teoma) 

robot-language: c++

robot-description: Ask Jeeves / Teoma spider

robot-history: Developed by Direct Hit Technologies which was aquired by

 Ask Jeeves in 2000.

robot-environment: service

modified-date: Fri Jan 17 15:20:08 EST 2003

modified-by: [email protected]



robot-id: jobo

robot-name: JoBo Java Web Robot

robot-cover-url: http://www.matuschek.net/software/jobo/

robot-details-url: http://www.matuschek.net/software/jobo/

robot-owner-name: Daniel Matuschek

robot-owner-url: http://www.matuschek.net

robot-owner-email: [email protected]

robot-status: active

robot-purpose: downloading, mirroring, indexing

robot-type: standalone

robot-platform: unix, windows, os/2, mac

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: jobo

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: JoBo (can be modified by the user)

robot-language: java

robot-description: JoBo is a web site download tool. The core web spider can be used for any purpose.

robot-history: JoBo was developed as a simple download tool and became a full featured web spider during development

robot-environment: hobby

modified-date: Fri, 20 Apr 2001 17:00:00 GMT

modified-by: Daniel Matuschek 



robot-id:           jobot

robot-name:         Jobot

robot-cover-url:    http://www.micrognosis.com/~ajack/jobot/jobot.html

robot-details-url:

robot-owner-name:   Adam Jack

robot-owner-url:    http://www.micrognosis.com/~ajack/index.html

robot-owner-email:  [email protected]

robot-status:       inactive

robot-purpose:      standalone

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         supernova.micrognosis.com

robot-from:         yes

robot-useragent:    Jobot/0.1alpha libwww-perl/4.0

robot-language:     perl 4

robot-description:  Its purpose is to generate a Resource Discovery database.

	Intended to seek out sites of potential "career interest".

	Hence - Job Robot.

robot-history:      

robot-environment:

modified-date:      Tue Jan  9 18:55:55 1996

modified-by:



robot-id:           joebot

robot-name:         JoeBot

robot-cover-url:

robot-details-url:

robot-owner-name:   Ray Waldin

robot-owner-url:    http://www.primenet.com/~rwaldin

robot-owner-email:  [email protected]

robot-status:

robot-purpose:

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    JoeBot/x.x, 

robot-language:     java JoeBot is a generic web crawler implemented as a

	collection of Java classes which can be used in a variety of

	applications, including resource discovery, link validation,

	mirroring, etc.  It currently limits itself to one visit per

	host per minute.

robot-description:

robot-history:

robot-environment:

modified-date:      Sun May 19 08:13:06 1996.

modified-by:



robot-id:           jubii

robot-name:         The Jubii Indexing Robot

robot-cover-url:    http://www.jubii.dk/robot/default.htm

robot-details-url:

robot-owner-name:   Jakob Faarvang

robot-owner-url:    http://www.cybernet.dk/staff/jakob/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing, maintainance

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         any host in the cybernet.dk domain

robot-from:         yes

robot-useragent:    JubiiRobot/version#

robot-language:     visual basic 4.0

robot-description:  Its purpose is to generate a Resource Discovery database,

	and validate links. Used for indexing the .dk top-level

	domain as well as other Danish sites for aDanish web

	database, as well as link validation.

robot-history:      Will be in constant operation from Spring

	1996

robot-environment:

modified-date:      Sat Jan  6 20:58:44 1996

modified-by:



robot-id:           jumpstation

robot-name:         JumpStation

robot-cover-url:    http://js.stir.ac.uk/jsbin/jsii

robot-details-url:

robot-owner-name:   Jonathon Fletcher

robot-owner-url:    http://www.stir.ac.uk/~jf1

robot-owner-email:  [email protected] 

robot-status:       retired

robot-purpose:      indexing

robot-type:

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         *.stir.ac.uk

robot-from:         yes

robot-useragent:    jumpstation

robot-language:     perl, C, c++

robot-description:

robot-history:      Originated as a weekend project in 1993.

robot-environment:

modified-date:      Tue May 16 00:57:42 1995.

modified-by:



robot-id: kapsi

robot-name: image.kapsi.net

robot-cover-url: http://image.kapsi.net/

robot-details-url: http://image.kapsi.net/index.php?page=robot

robot-owner-name: Jaakko Heusala

robot-owner-url: http://huoh.kapsi.net/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: image.kapsi.net

robot-noindex: no

robot-host: addr-212-50-142-138.suomi.net

robot-from: yes

robot-useragent: image.kapsi.net/1.0

robot-language: perl

robot-description: The image.kapsi.net robot is used to build the database for the image.kapsi.net search service. The robot runs currently in a random times.

robot-history: The Robot was build for image.kapsi.net's database in year 2001.

robot-environment: hobby, research

modified-date: Thu, 13 Dec 2001 23:28:23 EET

modified-by:



robot-id:           katipo

robot-name:         Katipo

robot-cover-url:    http://www.vuw.ac.nz/~newbery/Katipo.html

robot-details-url:  http://www.vuw.ac.nz/~newbery/Katipo/Katipo-doc.html

robot-owner-name:   Michael Newbery

robot-owner-url:    http://www.vuw.ac.nz/~newbery

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      maintenance

robot-type:         standalone

robot-platform:     Macintosh

robot-availability: binary

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         yes

robot-useragent:    Katipo/1.0

robot-language:     c

robot-description:  Watches all the pages you have previously visited

	and tells you when they have changed.


robot-history:      

robot-environment:  commercial (free)

modified-date:      Tue, 25 Jun 96 11:40:07 +1200

modified-by:        Michael Newbery



robot-id:               kdd

robot-name:             KDD-Explorer

robot-cover-url:        http://mlc.kddvw.kcom.or.jp/CLINKS/html/clinks.html

robot-details-url:      not available

robot-owner-name:       Kazunori Matsumoto

robot-owner-url:        not available

robot-owner-email:      [email protected]

robot-status:           development (to be avtive in June 1997)

robot-purpose:          indexing

robot-type:             standalone

robot-platform:         unix

robot-availability:     none

robot-exclusion:        yes

robot-exclusion-useragent:KDD-Explorer

robot-noindex:          no

robot-host:             mlc.kddvw.kcom.or.jp

robot-from:             yes

robot-useragent:        KDD-Explorer/0.1

robot-language:         c

robot-description:      KDD-Explorer is used for indexing valuable documents

                which will be retrieved via an experimental cross-language

                search engine, CLINKS.

robot-history:          This robot was designed in Knowledge-bases Information

                        processing Laboratory, KDD R&D Laboratories, 1996-1997

robot-environment:      research

modified-date:          Mon, 2 June 1997 18:00:00 JST

modified-by:            Kazunori Matsumoto



robot-id:kilroy

robot-name:Kilroy

robot-cover-url:http://purl.org/kilroy

robot-details-url:http://purl.org/kilroy

robot-owner-name:OCLC

robot-owner-url:http://www.oclc.org

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing,statistics

robot-type:standalone

robot-platform:unix,windowsNT

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:*

robot-noindex:no

robot-host:*.oclc.org

robot-from:no

robot-useragent:yes

robot-language:java

robot-description:Used to collect data for several projects. 

 Runs constantly and visits site no faster than once every 90 seconds.

robot-history:none

robot-environment:research,service

modified-date:Thursday, 24 Apr 1997 20:00:00 GMT

modified-by:tkac



robot-id: ko_yappo_robot

robot-name: KO_Yappo_Robot

robot-cover-url: http://yappo.com/info/robot.html

robot-details-url: http://yappo.com/

robot-owner-name: Kazuhiro Osawa

robot-owner-url: http://yappo.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: ko_yappo_robot

robot-noindex: yes

robot-host: yappo.com,209.25.40.1

robot-from: yes

robot-useragent: KO_Yappo_Robot/1.0.4(http://yappo.com/info/robot.html)

robot-language: perl

robot-description: The KO_Yappo_Robot robot is used to build the database

           for the Yappo search service by k,osawa

           (part of AOL).

           The robot runs random day, and visits sites in a random order.

robot-history: The robot is hobby of k,osawa

           at the Tokyo in 1997

robot-environment: hobby

modified-date: Fri, 18 Jul 1996 12:34:21 GMT

modified-by: KO



robot-id: labelgrabber.txt

robot-name: LabelGrabber

robot-cover-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm

robot-details-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm

robot-owner-name: Kyle Jamieson

robot-owner-url: http://www.w3.org/PICS/refcode/LabelGrabber/index.htm

robot-owner-email: [email protected]

robot-status: active

robot-purpose: Grabs PICS labels from web pages, submits them to a label bueau

robot-type: standalone

robot-platform: windows, windows95, windowsNT, unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: label-grabber

robot-noindex: no

robot-host: head.w3.org

robot-from: no

robot-useragent: LabelGrab/1.1

robot-language: java

robot-description: The label grabber searches for PICS labels and submits

 them to a label bureau

robot-history: N/A

robot-environment: research

modified-date: Wed, 28 Jan 1998 17:32:52 GMT

modified-by: [email protected]



robot-id: larbin

robot-name: larbin

robot-cover-url: http://para.inria.fr/~ailleret/larbin/index-eng.html

robot-owner-name: Sebastien Ailleret

robot-owner-url: http://para.inria.fr/~ailleret/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: Your imagination is the only limit

robot-type: standalone

robot-platform: Linux

robot-availability: source (GPL), mail me for customization

robot-exclusion: yes

robot-exclusion-useragent: larbin

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: larbin (+mail)

robot-language: c++

robot-description: Parcourir le web, telle est ma passion

robot-history: french research group (INRIA Verso)

robot-environment: hobby

modified-date: 2000-3-28

modified-by: Sebastien Ailleret



robot-id: legs

robot-name: legs

robot-cover-url: http://www.MagPortal.com/

robot-details-url:

robot-owner-name: Bill Dimm

robot-owner-url: http://www.HotNeuron.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: legs

robot-noindex: no

robot-host:

robot-from: yes

robot-useragent: legs

robot-language: perl5

robot-description: The legs robot is used to build the magazine article

 database for MagPortal.com.

robot-history:

robot-environment: service

modified-date: Wed, 22 Mar 2000 14:10:49 GMT

modified-by: Bill Dimm



robot-id: linkidator

robot-name: Link Validator

robot-cover-url:

robot-details-url:

robot-owner-name: Thomas Gimon

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix, windows

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Linkidator

robot-noindex: yes

robot-nofollow: yes

robot-host: *.mitre.org

robot-from: yes

robot-useragent: Linkidator/0.93

robot-language: perl5

robot-description: Recursively checks all links on a site, looking for

 broken or redirected links.  Checks all off-site links using HEAD

 requests and does not progress further.  Designed to behave well and to

 be very configurable.

robot-history: Built using WWW-Robot-0.022 perl module.  Currently in

 beta test.  Seeking approval for public release.

robot-environment: internal

modified-date: Fri, 20 Jan 2001 02:22:00 EST

modified-by: Thomas Gimon



robot-id:linkscan

robot-name:LinkScan

robot-cover-url:http://www.elsop.com/

robot-details-url:http://www.elsop.com/linkscan/overview.html

robot-owner-name:Electronic Software Publishing Corp. (Elsop)

robot-owner-url:http://www.elsop.com/

robot-owner-email:[email protected]

robot-status:Robot actively in use

robot-purpose:Link checker, SiteMapper, and HTML Validator

robot-type:Standalone

robot-platform:Unix, Linux, Windows 98/NT

robot-availability:Program is shareware

robot-exclusion:No

robot-exclusion-useragent:

robot-noindex:Yes

robot-host:*

robot-from:

robot-useragent:LinkScan Server/5.5 | LinkScan Workstation/5.5

robot-language:perl5

robot-description:LinkScan checks links, validates HTML and creates site maps

robot-history: First developed by Elsop in January,1997

robot-environment:Commercial

modified-date:Fri, 3 September 1999 17:00:00 PDT

modified-by: Kenneth R. Churilla



robot-id: linkwalker

robot-name: LinkWalker

robot-cover-url: http://www.seventwentyfour.com

robot-details-url: http://www.seventwentyfour.com/tech.html

robot-owner-name: Roy Bryant

robot-owner-url: 

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance, statistics

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: linkwalker

robot-noindex: yes

robot-host: *.seventwentyfour.com

robot-from: yes

robot-useragent: LinkWalker

robot-language: c++

robot-description: LinkWalker generates a database of links.

 We send reports of bad ones to webmasters.

robot-history: Constructed late 1997 through April 1998.

 In full service April 1998.

robot-environment: service

modified-date: Wed, 22 Apr 1998

modified-by: Roy Bryant



robot-id:lockon

robot-name:Lockon

robot-cover-url:

robot-details-url:

robot-owner-name:Seiji Sasazuka & Takahiro Ohmori

robot-owner-url:

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing

robot-type:standalone

robot-platform:UNIX

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:Lockon

robot-noindex:yes

robot-host:*.hitech.tuis.ac.jp

robot-from:yes

robot-useragent:Lockon/xxxxx

robot-language:perl5 

robot-description:This robot gathers only HTML document.

robot-history:This robot was developed in the Tokyo university of information sciences in 1998.

robot-environment:research

modified-date:Tue. 10 Nov 1998 20:00:00 GMT

modified-by:Seiji Sasazuka & Takahiro Ohmori



robot-id:logo_gif

robot-name: logo.gif Crawler

robot-cover-url: http://www.inm.de/projects/logogif.html

robot-details-url:

robot-owner-name: Sevo Stille

robot-owner-url: http://www.inm.de/people/sevo

robot-owner-email: [email protected]

robot-status: under development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: logo_gif_crawler

robot-noindex: no

robot-host: *.inm.de

robot-from: yes

robot-useragent: logo.gif crawler

robot-language: perl

robot-description: meta-indexing engine for corporate logo graphics

 The robot runs at irregular intervals and will only pull a start page and

 its associated /.*logo\.gif/i (if any). It will be terminated once a

 statistically

 significant number of samples has been collected.

robot-history: logo.gif is part of the design diploma of Markus Weisbeck,

 and tries to analyze the abundance of the logo metaphor in WWW

 corporate design.

 The crawler and image database were written by Sevo Stille and Peter

 Frank of the Institut f�r Neue Medien, respectively.

robot-environment: research, statistics

modified-date: 25.5.97

modified-by: Sevo Stille



robot-id:           lycos

robot-name:         Lycos

robot-cover-url:    http://lycos.cs.cmu.edu/

robot-details-url:

robot-owner-name:   Dr. Michael L. Mauldin

robot-owner-url:    http://fuzine.mt.cs.cmu.edu/mlm/home.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         fuzine.mt.cs.cmu.edu, lycos.com

robot-from:         

robot-useragent:    Lycos/x.x

robot-language:     

robot-description:  This is a research program in providing information

	retrieval and discovery in the WWW, using a finite memory

	model of the web to guide intelligent, directed searches for

	specific  information needs

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id:           macworm

robot-name:         Mac WWWWorm

robot-cover-url:    

robot-details-url:

robot-owner-name:   Sebastien Lemieux

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     Macintosh

robot-availability: none

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     hypercard

robot-description:  a French Keyword-searching robot for the Mac The author has

	decided not to release this robot to the

	public

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: magpie

robot-name: Magpie

robot-cover-url:

robot-details-url:

robot-owner-name: Keith Jones

robot-owner-url: 

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix

robot-availability:

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: no

robot-host: *.blueberry.co.uk, 194.70.52.*, 193.131.167.144

robot-from: no

robot-useragent: Magpie/1.0

robot-language: perl5

robot-description: Used to obtain information from a specified list of web pages for local indexing. Runs every two hours, and visits only a small number of sites.

robot-history: Part of a research project. Alpha testing from 10 July 1996, Beta testing from 10 September.

robot-environment: research

modified-date: Wed, 10 Oct 1996 13:15:00 GMT

modified-by: Keith Jones



robot-id: marvin

robot-name: marvin/infoseek

robot-details-url:

robot-cover-url: http://www.infoseek.de/

robot-owner-name: WSI Webseek Infoservice GmbH & Co KG.

robot-owner-url: http://www.infoseek.de/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: marvin

robot-noindex: yes

robot-nofollow: yes

robot-host: arthur*.sda.t-online.de

robot-from: yes

robot-useragent: marvin/infoseek ([email protected])

robot-language: java

robot-description: 

robot-history: day of birth: 4.2. 2001 - replaces Infoseek Sidewinder

robot-environment: comercial

modified-date: Fri, 11 May 2001 17:28:52 GMT



robot-id: mattie

robot-name: Mattie

robot-cover-url: http://www.mcw.aarkayn.org

robot-details-url: http://www.mcw.aarkayn.org/web/mattie.asp

robot-owner-name: Matt

robot-owner-url: http://www.mcw.aarkayn.org

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: Procurement Spider

robot-type: Standalone

robot-platform: UNIX

robot-availability: None

robot-exclusion: Yes

robot-exclusion-useragent: mattie

robot-noindex: N/A

robot-nofollow: Yes

robot-host: mattie.mcw.aarkayn.org

robot-from: Yes

robot-useragent: M/3.8

robot-language: C++

robot-description: Mattie is an all-source procurement spider.

robot-history: Created 2000 Mar. 03 Fri. 18:48:16 -0500 GMT (R) as an MP3

 spider, Mattie was reborn 2002 Jul. 07 Sun. 03:47:29 -0500 GMT (R) as an

 all-source procurement spider.

robot-environment: Hobby

modified-date: Fri, 13 Sep 2002 00:36:13 GMT

modified-by: Matt



robot-id: mediafox

robot-name: MediaFox

robot-cover-url: none

robot-details-url: none

robot-owner-name: Lars Eilebrecht   

robot-owner-url: http://www.home.unix-ag.org/sfx/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing and maintenance

robot-type: standalone

robot-platform: (Java)

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: mediafox

robot-noindex: yes

robot-host: 141.99.*.*

robot-from: yes

robot-useragent: MediaFox/x.y

robot-language: Java

robot-description: The robot is used to index meta information of a

                   specified set of documents and update a database

                   accordingly.

robot-history: Project at the University of Siegen

robot-environment: research

modified-date: Fri Aug 14 03:37:56 CEST 1998

modified-by: Lars Eilebrecht



robot-id:merzscope

robot-name:MerzScope

robot-cover-url:http://www.merzcom.com

robot-details-url:http://www.merzcom.com

robot-owner-name:(Client based robot)

robot-owner-url:(Client based robot)

robot-owner-email:

robot-status:actively in use

robot-purpose:WebMapping

robot-type:standalone

robot-platform:	(Java Based) unix,windows95,windowsNT,os2,mac etc ..

robot-availability:binary

robot-exclusion: yes

robot-exclusion-useragent: MerzScope

robot-noindex: no

robot-host:(Client Based)

robot-from:

robot-useragent: MerzScope

robot-language:	java

robot-description: Robot is part of a Web-Mapping package called MerzScope,

	 to be used mainly by consultants, and web masters to create and

	 publish maps, on and of the World wide web.

robot-history: 

robot-environment:

modified-date: Fri, 13 March 1997 16:31:00

modified-by: Philip Lenir, MerzScope lead developper



robot-id:		meshexplorer

robot-name:		NEC-MeshExplorer

robot-cover-url:	http://netplaza.biglobe.or.jp/

robot-details-url:	http://netplaza.biglobe.or.jp/keyword.html

robot-owner-name:	web search service maintenance group

robot-owner-url:	http://netplaza.biglobe.or.jp/keyword.html

robot-owner-email:	[email protected]

robot-status:		active

robot-purpose:		indexing

robot-type:		standalone

robot-platform:		unix

robot-availability:	none

robot-exclusion:	yes

robot-exclusion-useragent:	NEC-MeshExplorer

robot-noindex:		no

robot-host:		meshsv300.tk.mesh.ad.jp

robot-from:		yes

robot-useragent:	NEC-MeshExplorer

robot-language:		c

robot-description:	The NEC-MeshExplorer robot is used to build database for the NETPLAZA

 search service operated by NEC Corporation. The robot searches URLs

 around sites in japan(JP domain).

 The robot runs every day, and visits sites in a random order.

robot-history: Prototype version of this robot was developed in C&C Research

 Laboratories, NEC Corporation. Current robot (Version 1.0) is based

 on the prototype and has more functions.

robot-environment:	research

modified-date:		Jan 1, 1997

modified-by:		Nobuya Kubo, Hajime Takano



robot-id: MindCrawler

robot-name: MindCrawler

robot-cover-url: http://www.mindpass.com/_technology_faq.htm

robot-details-url:

robot-owner-name: Mindpass

robot-owner-url: http://www.mindpass.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: MindCrawler

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: MindCrawler

robot-language: c++

robot-description: 

robot-history:

robot-environment:

modified-date: Tue Mar 28 11:30:09 CEST 2000

modified-by:



robot-id: mnogosearch

robot-name: mnoGoSearch search engine software

robot-cover-url: http://www.mnogosearch.org

robot-details-url: http://www.mnogosearch.org/features.html

robot-owner-name: Lavtech.com corp.

robot-owner-url: http://www.mnogosearch.org

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, windows, mac

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: udmsearch

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: UdmSearch

robot-language: c

robot-description: mnoGoSearch search engine software (formerly known

 as UDMSearch) is an advanced search solution for large-scale websites

 and Intranet. It is based on SQL database and supports numerous

 features.

robot-history: Formerly known as UDMSearch was developed as the search

  engine for the Russian republic of Udmurtia.

robot-environment: commercial

modified-date: Wed, 12 Sept 2001

modified-by: Dmitry Tkatchenko



robot-id:moget

robot-name:moget

robot-cover-url:

robot-details-url:

robot-owner-name:NTT-ME Infomation Xing,Inc

robot-owner-url:http://www.nttx.co.jp

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing,statistics

robot-type:standalone

robot-platform:unix

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:moget

robot-noindex:yes

robot-host:*.goo.ne.jp

robot-from:yes

robot-useragent:moget/1.0

robot-language:c

robot-description: This robot is used to build the database for the search service operated by goo

robot-history:

robot-environment:service

modified-date:Thu, 30 Mar 2000 18:40:37 GMT

modified-by:[email protected]



robot-id:           momspider

robot-name:         MOMspider

robot-cover-url:    http://www.ics.uci.edu/WebSoft/MOMspider/

robot-details-url:

robot-owner-name:   Roy T. Fielding

robot-owner-url:    http://www.ics.uci.edu/dir/grad/Software/fielding

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      maintenance, statistics

robot-type:         standalone

robot-platform:     UNIX

robot-availability: source

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         yes

robot-useragent:    MOMspider/1.00 libwww-perl/0.40

robot-language:     perl 4

robot-description:  to validate links, and generate statistics. It's usually run

	from anywhere

robot-history:      Originated as a research project at the University of

	California, Irvine, in 1993. Presented at the First

	International WWW Conference in Geneva, 1994.

robot-environment:

modified-date:      Sat May 6 08:11:58 1995	

modified-by:        [email protected]



robot-id:           monster

robot-name:         Monster

robot-cover-url:    http://www.neva.ru/monster.list/russian.www.html

robot-details-url:  

robot-owner-name:   Dmitry Dicky

robot-owner-url:    http://wild.stu.neva.ru/

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      maintenance, mirroring

robot-type:         standalone

robot-platform:     UNIX (Linux)

robot-availability: binary

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         wild.stu.neva.ru

robot-from:         

robot-useragent:    Monster/vX.X.X -$TYPE ($OSTYPE)

robot-language:     C

robot-description:  The Monster has two parts - Web searcher and Web analyzer.

	Searcher is intended to perform the list of WWW sites of 

	desired domain (for example it can perform list of all 

	WWW sites of mit.edu, com, org, etc... domain)

	In the User-agent field $TYPE is set to 'Mapper' for Web searcher

	and 'StAlone' for Web analyzer. 

robot-history:      Now the full (I suppose) list of ex-USSR sites is produced.

robot-environment:  

modified-date:      Tue Jun 25 10:03:36 1996

modified-by:



robot-id: motor

robot-name: Motor

robot-cover-url: http://www.cybercon.de/Motor/index.html

robot-details-url:

robot-owner-name: Mr. Oliver Runge, Mr. Michael Goeckel

robot-owner-url: http://www.cybercon.de/index.html

robot-owner-email: [email protected]

robot-status: developement

robot-purpose: indexing

robot-type: standalone

robot-platform: mac

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: Motor

robot-noindex: no

robot-host: Michael.cybercon.technopark.gmd.de

robot-from: yes

robot-useragent: Motor/0.2

robot-language: 4th dimension

robot-description: The Motor robot is used to build the database for the 

 www.webindex.de search service operated by CyberCon. The robot ios under 

 development - it runs in random intervals and visits site in a priority 

 driven order (.de/.ch/.at first, root and robots.txt first)

robot-history: 

robot-environment: service

modified-date: Wed, 3 Jul 1996 15:30:00 +0100

modified-by: Michael Goeckel ([email protected])



robot-id: msnbot

robot-name: MSNBot

robot-cover-url: http://search.msn.com

robot-details-url: http://search.msn.com/msnbot.htm

robot-owner-name: Microsoft Corp.

robot-owner-url: http://www.microsoft.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Windows Server 2000, Windows Server 2003

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: msnbot

robot-noindex: yes

robot-host: 

robot-from: yes

robot-useragent: MSNBOT/0.1 (http://search.msn.com/msnbot.htm)

robot-language: C++

robot-description: MSN Search Crawler

robot-history: Developed by Microsoft Corp.

robot-environment: commercial

modified-date: June 23, 2003

modified-by: [email protected]



robot-id: muncher

robot-name: Muncher

robot-details-url: http://www.goodlookingcooking.co.uk/info.htm

robot-cover-url: http://www.goodlookingcooking.co.uk

robot-owner-name: Chris Ridings

robot-owner-url: http://www.goodlookingcooking.co.uk

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: muncher

robot-noindex: yes

robot-nofollow: yes

robot-host: www.goodlookingcooking.co.uk

robot-from: no

robot-useragent: yes

robot-language: perl

robot-description: Used to build the index for www.goodlookingcooking.co.uk.

 Seeks out cooking and recipe pages.

robot-history: Private project september 2001

robot-environment: hobby

modified-date: Wed, 5 Sep 2001 19:21:00 GMT



robot-id: muninn

robot-name: Muninn

robot-cover-url: http://people.freenet.de/Muninn/eyrie.html

robot-details-url: http://people.freenet.de/Muninn/

robot-owner-name: Sandra Groth

robot-owner-url: http://santana.dynalias.net/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source, data

robot-exclusion: yes

robot-exclusion-useragent: muninn

robot-noindex: yes

robot-nofollow: yes

robot-host: santana.dynalias.net, 80.185.*, *

robot-from: yes

robot-useragent: Muninn/0.1 libwww-perl-5.76

 (http://people.freenet.de/Muninn/)

robot-language: Perl5

robot-description: Muninn looks at museums within my reach and tells me about

 current exhibitions.

robot-history: It's hard to keep track of things. Automation helps.

robot-environment: hobby

modified-date: Thu Jun  3 16:36:47 CEST 2004

modified-by: Sandra Groth



robot-id: muscatferret

robot-name: Muscat Ferret

robot-cover-url: http://www.muscat.co.uk/euroferret/

robot-details-url:

robot-owner-name: Olly Betts

robot-owner-url: http://www.muscat.co.uk/~olly/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: MuscatFerret

robot-noindex: yes

robot-host: 193.114.89.*, 194.168.54.11

robot-from: yes

robot-useragent: MuscatFerret/

robot-language: c, perl5

robot-description: Used to build the database for the EuroFerret

 

robot-history:

robot-environment: service

modified-date: Tue, 21 May 1997 17:11:00 GMT

modified-by: [email protected]



robot-id: mwdsearch

robot-name: Mwd.Search

robot-cover-url: (none)

robot-details-url: (none)

robot-owner-name: Antti Westerberg

robot-owner-url: (none)

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix (Linux)

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: MwdSearch

robot-noindex: yes

robot-host: *.fifi.net

robot-from: no

robot-useragent: MwdSearch/0.1

robot-language: perl5, c

robot-description: Robot for indexing finnish (toplevel domain .fi)

                   webpages for search engine called Fifi.

                   Visits sites in random order.

robot-history: (none)

robot-environment: service (+ commercial)mwd.sci.fi>

modified-date: Mon, 26 May 1997 15:55:02 EEST

modified-by: [email protected]



robot-id: myweb

robot-name: Internet Shinchakubin

robot-cover-url: http://naragw.sharp.co.jp/myweb/home/

robot-details-url:

robot-owner-name: SHARP Corp.

robot-owner-url: http://naragw.sharp.co.jp/myweb/home/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: find new links and changed pages

robot-type: standalone

robot-platform: Windows98

robot-availability: binary as bundled software

robot-exclusion: yes

robot-exclusion-useragent: sharp-info-agent

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: User-Agent: Mozilla/4.0 (compatible; sharp-info-agent v1.0; )

robot-language: Java

robot-description: makes a list of new links and changed pages based

      on  user's frequently clicked pages in the past 31 days.

      client may run this software one or few times every day, manually or

      specified time.

robot-history: shipped for SHARP's PC users since Feb 2000

robot-environment: commercial

modified-date: Fri, 30 Jun 2000 19:02:52 JST

modified-by: Katsuo Doi 



robot-id: NDSpider

robot-name: NDSpider

robot-cover-url: http://www.NationalDirectory.com/addurl

robot-details-url: http://www.NationalDirectory.com/addurl

robot-owner-name: NationalDirectory.com

robot-owner-url: http://www.NationalDirectory.com

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: Indexing

robot-type: Standalone

robot-platform: Unix platform

robot-availability: None

robot-exclusion: Yes

robot-exclusion-useragent:

robot-noindex:

robot-host: Blowfish.NationalDirectory.net

robot-from:

robot-useragent: NDSpider/1.5

robot-language: C

robot-description: It is designed to index the web.

robot-history: Development started on  05 December 1996

robot-environment: UNIX

modified-date: 14 March 2004

modified-by:



robot-id:           netcarta

robot-name:         NetCarta WebMap Engine

robot-cover-url:    http://www.netcarta.com/

robot-details-url:

robot-owner-name:   NetCarta WebMap Engine

robot-owner-url:    http://www.netcarta.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, maintenance, mirroring, statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    NetCarta CyberPilot Pro

robot-language:     C++.

robot-description:  The NetCarta WebMap Engine is a general purpose, commercial

	spider. Packaged with a full GUI in the CyberPilo Pro

	product, it acts as a personal spider to work with a browser

	to facilitiate context-based navigation.  The WebMapper

	product uses the robot to manage a site (site copy, site

	diff, and extensive link management facilities).  All

	versions can create publishable NetCarta WebMaps, which

	capture the crawled information.  If the robot sees a

	published map, it will return the published map rather than

	continuing its crawl. Since this is a personal spider, it

	will be launched from multiple domains. This robot tends to

	focus on a particular site.  No instance of the robot should

	have more than one outstanding request out to any given site

	at a time. The User-agent field contains a coded ID

	identifying the instance of the spider; specific users can

	be blocked via robots.txt using this ID.

robot-history:

robot-environment:

modified-date:      Sun Feb 18 02:02:49 1996.

modified-by:



robot-id:  netmechanic

robot-name:  NetMechanic

robot-cover-url: http://www.netmechanic.com

robot-details-url: http://www.netmechanic.com/faq.html

robot-owner-name: Tom Dahm

robot-owner-url:  http://iquest.com/~tdahm

robot-owner-email: [email protected]

robot-status: development

robot-purpose: Link and HTML validation

robot-type: standalone with web gateway

robot-platform: UNIX

robot-availability: via web page

robot-exclusion: Yes

robot-exclusion-useragent: WebMechanic

robot-noindex: no

robot-host: 206.26.168.18

robot-from: no

robot-useragent: NetMechanic

robot-language: C

robot-description:  NetMechanic is a link validation and

 HTML validation robot run using a web page interface.

robot-history:

robot-environment:

modified-date: Sat, 17 Aug 1996 12:00:00 GMT

modified-by:



robot-id: netscoop

robot-name: NetScoop

robot-cover-url: http://www-a2k.is.tokushima-u.ac.jp/search/index.html

robot-owner-name: Kenji Kita

robot-owner-url: http://www-a2k.is.tokushima-u.ac.jp/member/kita/index.html

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: UNIX

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: NetScoop

robot-host: alpha.is.tokushima-u.ac.jp, beta.is.tokushima-u.ac.jp

robot-useragent: NetScoop/1.0 libwww/5.0a

robot-language: C

robot-description: The NetScoop robot is used to build the database

                   for the NetScoop search engine.

robot-history: The robot has been used in the research project

               at the Faculty of Engineering, Tokushima University, Japan.,

               since Dec. 1996.

robot-environment: research

modified-date: Fri, 10 Jan 1997.

modified-by: Kenji Kita



robot-id: newscan-online

robot-name: newscan-online

robot-cover-url: http://www.newscan-online.de/

robot-details-url: http://www.newscan-online.de/info.html

robot-owner-name: Axel Mueller

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Linux

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: newscan-online

robot-noindex: no

robot-host: *newscan-online.de

robot-from: yes

robot-useragent: newscan-online/1.1

robot-language: perl

robot-description: The newscan-online robot is used to build a database for

 the newscan-online news search service operated by smart information

 services. The robot runs daily and visits predefined sites in a random order.

robot-history: This robot finds its roots in a prereleased software for

 news filtering for Lotus Notes in 1995.

robot-environment: service

modified-date: Fri, 9 Apr 1999 11:45:00 GMT

modified-by: Axel Mueller



robot-id:           nhse

robot-name:         NHSE Web Forager

robot-cover-url:    http://nhse.mcs.anl.gov/

robot-details-url:

robot-owner-name:   Robert Olson

robot-owner-url:    http://www.mcs.anl.gov/people/olson/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *.mcs.anl.gov

robot-from:         yes

robot-useragent:    NHSEWalker/3.0

robot-language:     perl 5

robot-description:  to generate a Resource Discovery database

robot-history:      

robot-environment:

modified-date:      Fri May 5 15:47:55 1995

modified-by:



robot-id:           nomad

robot-name:         Nomad

robot-cover-url:    http://www.cs.colostate.edu/~sonnen/projects/nomad.html

robot-details-url:

robot-owner-name:   Richard Sonnen

robot-owner-url:    http://www.cs.colostate.edu/~sonnen/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:         *.cs.colostate.edu

robot-from:         no

robot-useragent:    Nomad-V2.x

robot-language:     Perl 4

robot-description:

robot-history:      Developed in 1995 at Colorado State University.

robot-environment:

modified-date:      Sat Jan 27 21:02:20 1996.

modified-by:



robot-id:           northstar

robot-name:         The NorthStar Robot

robot-cover-url:    http://comics.scs.unr.edu:7000/top.html

robot-details-url:

robot-owner-name:   Fred Barrie

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         frognot.utdallas.edu, utdallas.edu, cnidir.org

robot-from:         yes

robot-useragent:    NorthStar

robot-language:     

robot-description:  Recent runs (26 April 94) will concentrate on textual

	analysis of the Web versus GopherSpace (from the Veronica

	data) as well as indexing.

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: objectssearch

robot-name: ObjectsSearch

robot-cover-url: http://www.ObjectsSearch.com/

robot-details-url: 

robot-owner-name: Software Objects, Inc

robot-owner-url: http://www.thesoftwareobjects.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, windows

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: ObjectsSearch

robot-noindex: yes

robot-host:

robot-from: yes

robot-useragent: ObjectsSearch/0.01

robot-language: java

robot-description: Objects Search Spider

robot-history: Developed by Software Objects Inc.

robot-environment: commercial

modified-date: Friday March 05, 2004

modified-by: [email protected]



robot-id: occam

robot-name: Occam

robot-cover-url: http://www.cs.washington.edu/research/projects/ai/www/occam/

robot-details-url:

robot-owner-name: Marc Friedman

robot-owner-url: http://www.cs.washington.edu/homes/friedman/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Occam

robot-noindex: no

robot-host: gentian.cs.washington.edu, sekiu.cs.washington.edu, saxifrage.cs.washington.edu

robot-from: yes

robot-useragent: Occam/1.0

robot-language: CommonLisp, perl4

robot-description: The robot takes high-level queries, breaks them down into

                multiple web requests, and answers them by combining disparate

                data gathered in one minute from numerous web sites, or from

                the robots cache.  Currently the only user is me.

robot-history: The robot is a descendant of Rodney,

               an earlier project at the University of Washington.

robot-environment: research

modified-date: Thu, 21 Nov 1996 20:30 GMT

modified-by: [email protected] (Marc Friedman)



robot-id:           octopus

robot-name:         HKU WWW Octopus

robot-cover-url:    http://phoenix.cs.hku.hk:1234/~jax/w3rui.shtml

robot-details-url:

robot-owner-name:   Law Kwok Tung , Lee Tak Yeung , Lo Chun Wing

robot-owner-url:    http://phoenix.cs.hku.hk:1234/~jax

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no.

robot-exclusion-useragent:

robot-noindex:

robot-host:         phoenix.cs.hku.hk

robot-from:         yes

robot-useragent:    HKU WWW Robot, 

robot-language:     Perl 5, C, Java.

robot-description:  HKU Octopus is an ongoing project for resource discovery in

	the Hong Kong and China WWW domain . It is a research

	project conducted by three undergraduate at the University

	of Hong Kong

robot-history:

robot-environment:

modified-date:      Thu Mar  7 14:21:55 1996.

modified-by:



robot-id:OntoSpider

robot-name:OntoSpider

robot-cover-url:http://ontospider.i-n.info

robot-details-url:http://ontospider.i-n.info

robot-owner-name:C. Fenijn

robot-owner-url:http://ontospider.i-n.info

robot-owner-email:[email protected]

robot-status:development

robot-purpose:statistics

robot-type:standalone

robot-platform:unix

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:

robot-noindex:no

robot-host:ontospider.i-n.info

robot-from:no

robot-useragent:OntoSpider/1.0 libwww-perl/5.65

robot-language:perl5

robot-description:Focused crawler for research purposes

robot-history:Research

robot-environment:research

modified-date:Sun Mar 28 14:39:38

modified-by:C. Fenijn



robot-id:			openfind

robot-name:			Openfind data gatherer

robot-cover-url:		http://www.openfind.com.tw/

robot-details-url:		http://www.openfind.com.tw/robot.html

robot-owner-name:

robot-owner-url:

robot-owner-email:		[email protected]

robot-status:			active

robot-purpose:			indexing

robot-type:			standalone

robot-platform:

robot-availability:

robot-exclusion:		yes

robot-exclusion-useragent:

robot-noindex:

robot-host:			66.7.131.132

robot-from:

robot-useragent:		Openfind data gatherer, Openbot/3.0+([email protected];+http://www.openfind.com.tw/robot.html)

robot-language:

robot-description:

robot-history:

robot-environment:

modified-date:			Thu, 26 Apr 2001 02:55:21 GMT

modified-by:			stanislav shalunov 



robot-id: orb_search

robot-name: Orb Search

robot-cover-url: http://orbsearch.home.ml.org

robot-details-url: http://orbsearch.home.ml.org

robot-owner-name: Matt Weber

robot-owner-url: http://www.weberworld.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: Orbsearch/1.0

robot-noindex: yes

robot-host: cow.dyn.ml.org, *.dyn.ml.org

robot-from: yes

robot-useragent: Orbsearch/1.0

robot-language: Perl5

robot-description: Orbsearch builds the database for Orb Search Engine.

  It runs when requested.

robot-history: This robot was started as a hobby.

robot-environment: hobby

modified-date: Sun, 31 Aug 1997 02:28:52 GMT

modified-by: Matt Weber



robot-id: packrat

robot-name: Pack Rat

robot-cover-url: http://web.cps.msu.edu/~dexterte/isl/packrat.html

robot-details-url: 

robot-owner-name: Terry Dexter

robot-owner-url: http://web.cps.msu.edu/~dexterte

robot-owner-email: [email protected]

robot-status: development

robot-purpose: both maintenance and mirroring

robot-type: standalone

robot-platform: unix

robot-availability:  at the moment, none...source when developed.

robot-exclusion: yes 

robot-exclusion-useragent: packrat or *

robot-noindex: no, not yet

robot-host: cps.msu.edu

robot-from: 

robot-useragent: PackRat/1.0

robot-language: perl with libwww-5.0

robot-description: Used for local maintenance and for gathering 

	web pages so

	that local statisistical info can be used in artificial intelligence programs.

         Funded by NEMOnline.

robot-history: In the making...

robot-environment: research

modified-date: Tue, 20 Aug 1996 15:45:11

modified-by: Terry Dexter



robot-id:pageboy

robot-name:PageBoy

robot-cover-url:http://www.webdocs.org/

robot-details-url:http://www.webdocs.org/ 

robot-owner-name:Chihiro Kuroda 

robot-owner-url:http://www.webdocs.org/

robot-owner-email:[email protected]

robot-status:development

robot-purpose:indexing

robot-type:standalone

robot-platform:unix

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:pageboy

robot-noindex:yes

robot-nofollow:yes

robot-host:*.webdocs.org

robot-from:yes

robot-useragent:PageBoy/1.0

robot-language:c

robot-description:The robot visits at regular intervals.

robot-history:none

robot-environment:service

modified-date:Fri, 21 Oct 1999 17:28:52 GMT

modified-by:webdocs



robot-id: parasite

robot-name: ParaSite

robot-cover-url: http://www.ianett.com/parasite/

robot-details-url: http://www.ianett.com/parasite/

robot-owner-name: iaNett.com

robot-owner-url: http://www.ianett.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: ParaSite

robot-noindex: yes

robot-nofollow: yes

robot-host: *.ianett.com

robot-from: yes

robot-useragent: ParaSite/0.21 (http://www.ianett.com/parasite/)

robot-language: c++

robot-description: Builds index for ianett.com search database. Runs

 continiously.

robot-history: Second generation of ianett.com spidering technology,

 originally called Sven.

robot-environment: service

modified-date: July 28, 2000

modified-by: Marty Anstey



robot-id:               patric

robot-name:             Patric

robot-cover-url:        http://www.nwnet.net/technical/ITR/index.html

robot-details-url:      http://www.nwnet.net/technical/ITR/index.html

robot-owner-name:       [email protected]

robot-owner-url:        http://www.nwnet.net/company/staff/toney

robot-owner-email:      [email protected]

robot-status:           development

robot-purpose:          statistics

robot-type:             standalone

robot-platform:         unix

robot-availability:     data

robot-exclusion:        yes

robot-exclusion-useragent: patric       

robot-noindex:          yes     

robot-host:             *.nwnet.net     

robot-from:             no

robot-useragent:        Patric/0.01a            

robot-language:         perl

robot-description:      (contained at http://www.nwnet.net/technical/ITR/index.html )

robot-history:          (contained at http://www.nwnet.net/technical/ITR/index.html )

robot-environment:      service 

modified-date:          Thurs, 15 Aug 1996

modified-by:            [email protected]



robot-id: pegasus

robot-name: pegasus

robot-cover-url: http://opensource.or.id/projects.html

robot-details-url: http://pegasus.opensource.or.id

robot-owner-name: A.Y.Kiky Shannon

robot-owner-url: http://go.to/ayks

robot-owner-email: [email protected]

robot-status: inactive - open source

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source, binary

robot-exclusion: yes

robot-exclusion-useragent: pegasus

robot-noindex: yes

robot-host: *

robot-from: yes

robot-useragent: web robot PEGASUS

robot-language: perl5

robot-description: pegasus gathers information from HTML pages (7 important

 tags). The indexing process can be started based on starting URL(s) or a range

 of IP address.

robot-history: This robot was created as an implementation of a final project on

 Informatics Engineering Department, Institute of Technology Bandung, Indonesia.

robot-environment: research

modified-date: Fri, 20 Oct 2000 14:58:40 GMT

modified-by: A.Y.Kiky Shannon



robot-id:           perignator

robot-name:         The Peregrinator

robot-cover-url:    http://www.maths.usyd.edu.au:8000/jimr/pe/Peregrinator.html

robot-details-url:

robot-owner-name:   Jim Richardson

robot-owner-url:    http://www.maths.usyd.edu.au:8000/jimr.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         yes

robot-useragent:    Peregrinator-Mathematics/0.7

robot-language:     perl 4

robot-description:  This robot is being used to generate an index of documents

	on Web sites connected with mathematics and statistics. It

	ignores off-site links, so does not stray from a list of

	servers specified initially.

robot-history:      commenced operation in August 1994

robot-environment:

modified-date:      

modified-by:



robot-id: perlcrawler

robot-name: PerlCrawler 1.0

robot-cover-url: http://perlsearch.hypermart.net/

robot-details-url: http://www.xav.com/scripts/xavatoria/index.html

robot-owner-name: Matt McKenzie 

robot-owner-url: http://perlsearch.hypermart.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: perlcrawler

robot-noindex: yes

robot-host: server5.hypermart.net

robot-from: yes

robot-useragent: PerlCrawler/1.0 Xavatoria/2.0

robot-language: perl5

robot-description: The PerlCrawler robot is designed to index and build

 a database of pages relating to the Perl programming language.

robot-history: Originated in modified form on 25 June 1998

robot-environment: hobby

modified-date: Fri, 18 Dec 1998 23:37:40 GMT

modified-by: Matt McKenzie



robot-id:           phantom

robot-name:         Phantom

robot-cover-url:    http://www.maxum.com/phantom/

robot-details-url:

robot-owner-name:   Larry Burke

robot-owner-url:    http://www.aktiv.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     Macintosh

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    Duppies

robot-language:

robot-description:  Designed to allow webmasters to provide a searchable index

	of their own site as well as to other sites, perhaps with

	similar content.

robot-history:

robot-environment:

modified-date:      Fri Jan 19 05:08:15 1996.

modified-by:



robot-id: phpdig

robot-name: PhpDig

robot-cover-url: http://phpdig.toiletoine.net/

robot-details-url: http://phpdig.toiletoine.net/

robot-owner-name: Antoine Bajolet

robot-owner-url: http://phpdig.toiletoine.net/

robot-owner-email: [email protected]

robot-status: *

robot-purpose: indexing

robot-type: standalone

robot-platform: all supported by Apache/php/mysql

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: phpdig

robot-noindex: yes

robot-host: yes

robot-from: no

robot-useragent: phpdig/x.x.x

robot-language: php 4.x

robot-description: Small robot and search engine written in php.

robot-history: writen first 2001-03-30

robot-environment: hobby

modified-date: Sun, 21 Nov 2001 20:01:19 GMT

modified-by: Antoine Bajolet



robot-id: piltdownman

robot-name: PiltdownMan

robot-cover-url: http://profitnet.bizland.com/

robot-details-url: http://profitnet.bizland.com/piltdownman.html

robot-owner-name: Daniel Vil�

robot-owner-url: http://profitnet.bizland.com/aboutus.html

robot-owner-email: [email protected]

robot-status: active

robot-purpose: statistics

robot-type: standalone

robot-platform: windows95, windows98, windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: piltdownman

robot-noindex: no

robot-nofollow: no

robot-host: 62.36.128.*, 194.133.59.*, 212.106.215.*

robot-from: no

robot-useragent: PiltdownMan/1.0 [email protected]

robot-language: c++

robot-description: The PiltdownMan robot is used to get a

                   list of links from the search engines

                   in our database. These links are

                   followed, and the page that they refer

                   is downloaded to get some statistics

                   from them.

                   The robot runs once a month, more or

                   less, and visits the first 10 pages

                   listed in every search engine, for a

                   group of keywords.

robot-history: To maintain a database of search engines,

               we needed an automated tool. That's why

               we began the creation of this robot.

robot-environment: service

modified-date: Mon, 13 Dec 1999 21:50:32 GMT

modified-by: Daniel Vil�



robot-id: pimptrain

robot-name: Pimptrain.com's robot

robot-cover-url: http://www.pimptrain.com/search.cgi

robot-details-url: http://www.pimptrain.com/search.cgi

robot-owner-name: Bryan Ankielewicz

robot-owner-url: http://www.pimptrain.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source;data

robot-exclusion: yes

robot-exclusion-useragent: Pimptrain

robot-noindex: yes

robot-host: pimtprain.com

robot-from: *

robot-useragent: Mozilla/4.0 (compatible: Pimptrain's robot)

robot-language: perl5

robot-description: Crawls remote sites as part of a search engine program

robot-history: Implemented in 2001

robot-environment: commercial

modified-date: May 11, 2001

modified-by: Bryan Ankielewicz



robot-id:           pioneer

robot-name:         Pioneer

robot-cover-url:    http://sequent.uncfsu.edu/~micah/pioneer.html

robot-details-url:

robot-owner-name:   Micah A. Williams

robot-owner-url:    http://sequent.uncfsu.edu/~micah/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         *.uncfsu.edu or flyer.ncsc.org

robot-from:         yes

robot-useragent:    Pioneer

robot-language:     C.

robot-description:  Pioneer is part of an undergraduate research

	project.

robot-history:

robot-environment:

modified-date:      Mon Feb  5 02:49:32 1996.

modified-by:



robot-id:           pitkow

robot-name:         html_analyzer

robot-cover-url:    

robot-details-url:

robot-owner-name:   James E. Pitkow

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintainance

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  to check validity of Web servers. I'm not sure if it has

	ever been run remotely.

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: pjspider

robot-name: Portal Juice Spider

robot-cover-url: http://www.portaljuice.com

robot-details-url: http://www.portaljuice.com/pjspider.html

robot-owner-name: Nextopia Software Corporation

robot-owner-url: http://www.portaljuice.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: pjspider

robot-noindex: yes

robot-host: *.portaljuice.com, *.nextopia.com

robot-from: yes

robot-useragent: PortalJuice.com/4.0

robot-language: C/C++

robot-description: Indexing web documents for Portal Juice vertical portal

 search engine

robot-history: Indexing the web since 1998 for the purposes of offering our

 commerical Portal Juice search engine services.

robot-environment: service

modified-date: Wed Jun 23 17:00:00 EST 1999

modified-by: [email protected]



robot-id:           pka

robot-name:         PGP Key Agent

robot-cover-url:    http://www.starnet.it/pgp

robot-details-url:

robot-owner-name:   Massimiliano Pucciarelli

robot-owner-url:    http://www.starnet.it/puma

robot-owner-email:  [email protected]

robot-status:       Active

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     UNIX, Windows NT

robot-availability: none

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         salerno.starnet.it

robot-from:         yes

robot-useragent:    PGP-KA/1.2

robot-language:     Perl 5

robot-description:  This program search the pgp public key for the 

                    specified user.

robot-history:      Originated as a research project at Salerno 

                    University in 1995.

robot-environment:  Research

modified-date:      June 27 1996.

modified-by:        Massimiliano Pucciarelli



robot-id: plumtreewebaccessor

robot-name: PlumtreeWebAccessor 

robot-cover-url:

robot-details-url: http://www.plumtree.com/

robot-owner-name: Joseph A. Stanko 

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing for the Plumtree Server

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: PlumtreeWebAccessor

robot-noindex: yes

robot-host:

robot-from: yes

robot-useragent: PlumtreeWebAccessor/0.9

robot-language: c++

robot-description: The Plumtree Web Accessor is a component that

 customers can add to the

        Plumtree Server to index documents on the World Wide Web.

robot-history:

robot-environment: commercial

modified-date: Thu, 17 Dec 1998

modified-by: Joseph A. Stanko 



robot-id: poppi

robot-name: Poppi

robot-cover-url: http://members.tripod.com/poppisearch

robot-details-url: http://members.tripod.com/poppisearch

robot-owner-name: Antonio Provenzano

robot-owner-url: Antonio Provenzano

robot-owner-email:

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix/linux

robot-availability: none

robot-exclusion:

robot-exclusion-useragent:

robot-noindex: yes

robot-host:=20

robot-from:

robot-useragent: Poppi/1.0

robot-language: C

robot-description: Poppi is a crawler to index the web that runs weekly 

 gathering and indexing hypertextual, multimedia and executable file 

 formats

robot-history: Created by Antonio Provenzano in the april of 2000, has 

 been acquired from Tomi Officine Multimediali srl and it is next to 

 release as service and commercial

robot-environment: service

modified-date: Mon, 22 May 2000 15:47:30 GMT

modified-by: Antonio Provenzano



robot-id: portalb

robot-name: PortalB Spider

robot-cover-url: http://www.portalb.com/

robot-details-url:

robot-owner-name: PortalB Spider Bug List

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: PortalBSpider

robot-noindex: yes

robot-nofollow: yes

robot-host: spider1.portalb.com, spider2.portalb.com, etc.

robot-from: no

robot-useragent: PortalBSpider/1.0 ([email protected])

robot-language: C++

robot-description: The PortalB Spider indexes selected sites for

 high-quality business information.

robot-history:

robot-environment: service



robot-id: psbot

robot-name: psbot

robot-cover-url: http://www.picsearch.com/

robot-details-url: http://www.picsearch.com/bot.html

robot-owner-name: picsearch AB

robot-owner-url: http://www.picsearch.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: psbot

robot-noindex: yes

robot-nofollow: yes

robot-host: *.picsearch.com

robot-from: yes

robot-useragent: psbot/0.X (+http://www.picsearch.com/bot.html)

robot-language: c, c++

robot-description: Spider for www.picsearch.com 

robot-history: Developed and tested in 2000/2001

robot-environment: commercial

modified-date: Tue, 21 Aug 2001 10:55:38 CEST 2001

modified-by: [email protected]



robot-id: Puu

robot-name: GetterroboPlus Puu

robot-details-url: http://marunaka.homing.net/straight/getter/

robot-cover-url: http://marunaka.homing.net/straight/

robot-owner-name: marunaka

robot-owner-url: http://marunaka.homing.net

robot-owner-email: [email protected]

robot-status: active: robot actively in use

robot-purpose: Purpose of the robot. One or more of:

  - gathering: gather data of original standerd TAG for Puu contains the

 information of the sites registered my Search Engin.

  - maintenance: link validation

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes (Puu patrols only registered url in my Search Engine)

robot-exclusion-useragent:  Getterrobo-Plus

robot-noindex:  no

robot-host: straight FLASH!! Getterrobo-Plus, *.homing.net

robot-from: yes

robot-useragent: straight FLASH!! GetterroboPlus 1.5

robot-language: perl5

robot-description:

  Puu robot is used to gater data from registered site in Search Engin

 "straight FLASH!!" for building anouncement page of state of renewal of

 registered site in "straight FLASH!!".

 Robot runs everyday.

robot-history:

  This robot patorols based registered sites in Search Engin "straight FLASH!!"

robot-environment: hobby

modified-date: Fri, 26 Jun 1998



robot-id:           python

robot-name:         The Python Robot

robot-cover-url:    http://www.python.org/

robot-details-url:  

robot-owner-name:   Guido van Rossum

robot-owner-url:    http://www.python.org/~guido/

robot-owner-email:  [email protected]

robot-status:       retired

robot-purpose:      

robot-type:         

robot-platform:     

robot-availability: none

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: raven 

robot-name: Raven Search

robot-cover-url: http://ravensearch.tripod.com

robot-details-url: http://ravensearch.tripod.com

robot-owner-name: Raven Group

robot-owner-url: http://ravensearch.tripod.com

robot-owner-email: [email protected]

robot-status: Development: robot under development

robot-purpose: Indexing: gather content for commercial query engine.

robot-type: Standalone: a separate program

robot-platform: Unix, Windows98, WindowsNT, Windows2000

robot-availability: None

robot-exclusion: Yes

robot-exclusion-useragent: Raven

robot-noindex: Yes

robot-nofollow: Yes

robot-host: 192.168.1.*

robot-from: Yes

robot-useragent: Raven-v2

robot-language: Perl-5

robot-description: Raven was written for the express purpose of indexing the web.

 It can parallel process hundreds of URLS's at a time. It runs on a sporadic basis 

 as testing continues. It is really several programs running concurrently.

 It takes four computers to run Raven Search. Scalable in sets of four.

robot-history: This robot is new. First active on March 25, 2000.

robot-environment: Commercial: is a commercial product. Possibly GNU later ;-)

modified-date: Fri, 25 Mar 2000 17:28:52 GMT

modified-by: Raven Group



robot-id:           rbse

robot-name:         RBSE Spider

robot-cover-url:    http://rbse.jsc.nasa.gov/eichmann/urlsearch.html

robot-details-url:

robot-owner-name:   David Eichmann

robot-owner-url:    http://rbse.jsc.nasa.gov/eichmann/home.html

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      indexing, statistics

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      

robot-host:         rbse.jsc.nasa.gov (192.88.42.10)

robot-from:         

robot-useragent:    

robot-language:     C, oracle, wais

robot-description:  Developed and operated as part of the NASA-funded Repository

	Based Software Engineering Program at the Research Institute

	for Computing and Information Systems, University of Houston

	- Clear Lake.

robot-history:      

robot-environment:

modified-date:      Thu May 18 04:47:02 1995

modified-by:



robot-id:           resumerobot

robot-name:         Resume Robot

robot-cover-url:    http://www.onramp.net/proquest/resume/robot/robot.html

robot-details-url:

robot-owner-name:   James Stakelum

robot-owner-url:    http://www.onramp.net/proquest/resume/java/resume.html

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing.

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    Resume Robot

robot-language:     C++.

robot-description:

robot-history:

robot-environment:

modified-date:      Tue Mar 12 15:52:25 1996.

modified-by:



robot-id: rhcs

robot-name: RoadHouse Crawling System

robot-cover-url: http://stage.perceval.be (under developpement)

robot-details-url:

robot-owner-name: Gregoire Welraeds, Emmanuel Bergmans

robot-owner-url: http://www.perceval.be

robot-owner-email: [email protected]

robot-status: development

robot-purpose1: indexing

robot-purpose2: maintenance

robot-purpose3: statistics

robot-type: standalone

robot-platform1: unix (FreeBSD & Linux)

robot-availability: none

robot-exclusion: no (under development)

robot-exclusion-useragent: RHCS

robot-noindex: no (under development)

robot-host: stage.perceval.be

robot-from: no

robot-useragent: RHCS/1.0a

robot-language: c

robot-description: robot used tp build the database for the RoadHouse search service project operated by Perceval  

robot-history: The need of this robot find its roots in the actual RoadHouse directory not maintenained since 1997

robot-environment: service

modified-date: Fri, 26 Feb 1999 12:00:00 GMT

modified-by: Gregoire Welraeds



robot-id: rixbot

robot-name: RixBot

robot-cover-url: http://www.oops-as.no/rix

robot-details-url: http://www.oops-as.no/roy/rix

robot-owner-name: HY

robot-owner-url: http://www.oops-as.no/roy

robot-status: active

robot-purpose: indexing

robot-type:standalone

robot-platform: mac

robot-exclusion: yes

robot-exclusion-useragent: RixBot

robot-noindex: yes

robot-nofollow: yes

robot-host: www.oops-as.no

robot-from: no

robot-useragent: RixBot (http://www.oops-as.no/rix/)

robot-language: REBOL

robot-description: The RixBot indexes any page containing the word "rebol".

robot-history: Hobby project

robot-environment: Hobby

modified-date: Fri, 14 May 2004 19:58:52 GMT



robot-id: roadrunner

robot-name: Road Runner: The ImageScape Robot

robot-owner-name: LIM Group

robot-owner-email: [email protected]

robot-status: development/active

robot-purpose: indexing

robot-type: standalone

robot-platform: UNIX

robot-exclusion: yes

robot-exclusion-useragent: roadrunner

robot-useragent: Road Runner: ImageScape Robot ([email protected])

robot-language: C, perl5

robot-description: Create Image/Text index for WWW

robot-history: ImageScape Project

robot-environment: commercial service

modified-date: Dec. 1st, 1996



robot-id: robbie

robot-name: Robbie the Robot

robot-cover-url:

robot-details-url:

robot-owner-name: Robert H. Pollack

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, windows95, windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Robbie

robot-noindex: no

robot-host: *.lmco.com

robot-from: yes

robot-useragent: Robbie/0.1

robot-language: java

robot-description: Used to define document collections for the DISCO system.

                   Robbie is still under development and runs several

                   times a day, but usually only for ten minutes or so.

                   Sites are visited in the order in which references

                   are found, but no host is visited more than once in

                   any two-minute period.

robot-history: The DISCO system is a resource-discovery component in

               the OLLA system, which is a prototype system, developed

               under DARPA funding, to support computer-based education

               and training.

robot-environment: research

modified-date: Wed,  5 Feb 1997 19:00:00 GMT

modified-by:





robot-id: robi

robot-name: ComputingSite Robi/1.0

robot-cover-url: http://www.computingsite.com/robi/

robot-details-url: http://www.computingsite.com/robi/

robot-owner-name: Tecor Communications S.L.

robot-owner-url: http://www.tecor.com/

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: indexing,maintenance

robot-type: standalone

robot-platform: UNIX

robot-availability:

robot-exclusion: yes

robot-exclusion-useragent: robi

robot-noindex: no

robot-host: robi.computingsite.com

robot-from:

robot-useragent: ComputingSite Robi/1.0 ([email protected])

robot-language: python

robot-description: Intelligent agent used to build the ComputingSite Search

 Directory.

robot-history: It was born on August 1997.

robot-environment: service

modified-date: Wed, 13 May 1998 17:28:52 GMT

modified-by: Jorge Alegre



robot-id: robocrawl

robot-name: RoboCrawl Spider

robot-cover-url: http://www.canadiancontent.net/

robot-details-url: http://www.canadiancontent.net/corp/spider.html

robot-owner-name: Canadian Content Interactive Media

robot-owner-url: http://www.canadiancontent.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: RoboCrawl

robot-noindex: yes

robot-host: ncc.canadiancontent.net, ncc.air-net.no, canadiancontent.net, spider.canadiancontent.net

robot-from: no

robot-useragent: RoboCrawl (http://www.canadiancontent.net)

robot-language: C and C++

robot-description: The Canadian Content robot indexes for it's search database.

robot-history: Our robot is a newer project at Canadian Content.

robot-environment: service

modified-date: July 30th, 2001

modified-by: Christopher Walsh and Adam Rutter



robot-id: robofox

robot-name: RoboFox

robot-cover-url:

robot-details-url:

robot-owner-name: Ian Hicks

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: site download

robot-type: standalone

robot-platform: windows9x, windowsme, windowsNT4, windows2000

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: robofox

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: Robofox v2.0

robot-language: Visual FoxPro

robot-description: scheduled utility to download and database a domain

robot-history:

robot-environment: service

modified-date: Tue, 6 Mar 2001 02:15:00 GMT

modified-by: Ian Hicks



robot-id: robozilla

robot-name: Robozilla

robot-cover-url: http://dmoz.org/

robot-details-url: http://www.dmoz.org/newsletter/2000Aug/robo.html

robot-owner-name: "Rob O'Zilla"

robot-owner-url: http://dmoz.org/profiles/robozilla.html

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-availability: none

robot-exclusion: no

robot-noindex: no

robot-host: directory.mozilla.org

robot-useragent: Robozilla/1.0

robot-description: Robozilla visits all the links within the Open Directory

 periodically, marking the ones that return errors for review.

robot-environment: service



robot-id:           roverbot

robot-name:         Roverbot

robot-cover-url:    http://www.roverbot.com/

robot-details-url:

robot-owner-name:   GlobalMedia Design (Andrew Cowan & Brian

	Clark)

robot-owner-url:    http://www.radzone.org/gmd/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         roverbot.com

robot-from:         yes

robot-useragent:    Roverbot

robot-language:     perl5

robot-description:  Targeted email gatherer utilizing user-defined seed points

	and interacting with both the webserver and MX servers of

	remote sites.

robot-history:

robot-environment:

modified-date:      Tue Jun 18 19:16:31 1996.

modified-by:



robot-id: rules

robot-name: RuLeS

robot-cover-url: http://www.rules.be

robot-details-url: http://www.rules.be

robot-owner-name: Marc Wils

robot-owner-url: http://www.rules.be

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: yes

robot-noindex: yes

robot-host: www.rules.be

robot-from: yes

robot-useragent: RuLeS/1.0 libwww/4.0

robot-language: Dutch (Nederlands)

robot-description: 

robot-history: none

robot-environment: hobby

modified-date: Sun, 8 Apr 2001 13:06:54 CET

modified-by: Marc Wils



robot-id:           safetynetrobot

robot-name:         SafetyNet Robot

robot-cover-url:    http://www.urlabs.com/

robot-details-url:

robot-owner-name:   Michael L. Nelson

robot-owner-url:    http://www.urlabs.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing.

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no.

robot-exclusion-useragent:

robot-noindex:

robot-host:         *.urlabs.com

robot-from:         yes

robot-useragent:    SafetyNet Robot 0.1, 

robot-language:     Perl 5

robot-description:  Finds URLs for K-12 content management.

robot-history:

robot-environment:

modified-date:      Sat Mar 23 20:12:39 1996.

modified-by:



robot-id: scooter

robot-name: Scooter

robot-cover-url: http://www.altavista.com/

robot-details-url: http://www.altavista.com/av/content/addurl.htm

robot-owner-name: AltaVista

robot-owner-url: http://www.altavista.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Scooter

robot-noindex: yes

robot-host: *.av.pa-x.dec.com

robot-from: yes

robot-useragent: Scooter/2.0 G.R.A.B. V1.1.0

robot-language: c

robot-description: Scooter is AltaVista's prime index agent.

robot-history: Version 2 of Scooter/1.0 developed by Louis Monier of WRL.

robot-environment: service

modified-date: Wed, 13 Jan 1999 17:18:59 GMT

modified-by: [email protected]



robot-id: search_au

robot-name: Search.Aus-AU.COM

robot-details-url: http://Search.Aus-AU.COM/

robot-cover-url: http://Search.Aus-AU.COM/

robot-owner-name: Dez Blanchfield

robot-owner-url: not currently available

robot-owner-email: [email protected]

robot-status: - development: robot under development

robot-purpose: - indexing: gather content for an indexing service

robot-type: - standalone: a separate program

robot-platform: - mac - unix - windows95 - windowsNT

robot-availability: - none

robot-exclusion: yes

robot-exclusion-useragent: Search-AU

robot-noindex: yes

robot-host: Search.Aus-AU.COM, 203.55.124.29, 203.2.239.29

robot-from: no

robot-useragent: not available

robot-language: c, perl, sql

robot-description: Search-AU is a development tool I have built

 to investigate the power of a search engine and web crawler

 to give me access to a database of web content ( html / url's )

 and address's etc from which I hope to build more accurate stats

 about the .au zone's web content.

 the robot started crawling from http://www.geko.net.au/ on

 march 1st, 1998 and after nine days had 70mb of compressed ascii

 in a database to work with. i hope to run a refresh of the crawl

 every month initially, and soon every week bandwidth and cpu allowing.

 if the project warrants further development, i will turn it into

 an australian ( .au ) zone search engine and make it commercially

 available for advertising to cover the costs which are starting

 to mount up. --dez (980313 - black friday!)

robot-environment: - hobby: written as a hobby

modified-date: Fri Mar 13 10:03:32 EST 1998



robot-id: search-info

robot-name: Sleek

robot-cover-url: http://search-info.com/

robot-details-url:

robot-owner-name: Lawrence R. Hughes, Sr.

robot-owner-url: http://hughesnet.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Unix, Linux, Windows

robot-availability: source;data

robot-exclusion: yes

robot-exclusion-useragent: robots.txt

robot-noindex: yes

robot-host: yes

robot-from: yes

robot-useragent: Mozilla/4.0 (Sleek Spider/1.2)

robot-language: perl5

robot-description: Crawls remote sites and performs link popularity checks before inclusion.

robot-history: HyBrid of the FDSE Crawler by: Zoltan Milosevic Current Mods: started 1/10/2002

robot-environment: hobby

modified-date: Mon, 14 Jan 2002 08:02:23 GMT

modified-by: Lawrence R. Hughes, Sr.



robot-id: searchprocess

robot-name: SearchProcess

robot-cover-url: http://www.searchprocess.com

robot-details-url: http://www.intelligence-process.com

robot-owner-name: Mannina Bruno

robot-owner-url: http://www.intelligence-process.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: Statistic

robot-type: browser

robot-platform: linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: searchprocess

robot-noindex: yes

robot-host: searchprocess.com

robot-from: yes

robot-useragent: searchprocess/0.9

robot-language: perl

robot-description: An intelligent Agent Online. SearchProcess is used to

 provide structured information to user.

robot-history: This is the son of Auresys

robot-environment: Service freeware

modified-date: Thus, 22 Dec 1999

modified-by: Mannina Bruno



robot-id:           senrigan

robot-name:         Senrigan

robot-cover-url:    http://www.info.waseda.ac.jp/search-e.html

robot-details-url:

robot-owner-name:   TAMURA Kent

robot-owner-url:    http://www.info.waseda.ac.jp/muraoka/members/kent/

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     Java

robot-availability: none

robot-exclusion:    yes

robot-exclusion-useragent:Senrigan

robot-noindex:      yes

robot-host:         aniki.olu.info.waseda.ac.jp

robot-from:         yes

robot-useragent:    Senrigan/xxxxxx

robot-language:     Java

robot-description:  This robot now gets HTMLs from only jp domain.

robot-history:      It has been running since Dec 1994

robot-environment:  research

modified-date:      Mon Jul  1 07:30:00 GMT 1996

modified-by:        TAMURA Kent




robot-id:           sgscout

robot-name:         SG-Scout

robot-cover-url:    http://www-swiss.ai.mit.edu/~ptbb/SG-Scout/SG-Scout.html

robot-details-url:

robot-owner-name:   Peter Beebee

robot-owner-url:    http://www-swiss.ai.mit.edu/~ptbb/personal/index.html

robot-owner-email:  [email protected], [email protected]

robot-status:       active

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         beta.xerox.com

robot-from:         yes

robot-useragent:    SG-Scout

robot-language:     

robot-description:  Does a "server-oriented" breadth-first search in a

	round-robin fashion, with multiple processes.

robot-history:      Run since 27 June 1994, for an internal XEROX research

	project

robot-environment:

modified-date:      

modified-by:



robot-id:shaggy

robot-name:ShagSeeker

robot-cover-url:http://www.shagseek.com

robot-details-url:

robot-owner-name:Joseph Reynolds

robot-owner-url:http://www.shagseek.com

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing

robot-type:standalone

robot-platform:unix

robot-availability:data

robot-exclusion:yes

robot-exclusion-useragent:Shagseeker

robot-noindex:yes

robot-host:shagseek.com

robot-from:

robot-useragent:Shagseeker at http://www.shagseek.com /1.0

robot-language:perl5

robot-description:Shagseeker is the gatherer for the Shagseek.com search 

 engine and goes out weekly.

robot-history:none yet

robot-environment:service

modified-date:Mon 17 Jan 2000 10:00:00 EST

modified-by:Joseph Reynolds



robot-id: shaihulud

robot-name: Shai'Hulud

robot-cover-url: 

robot-details-url:

robot-owner-name: Dimitri Khaoustov

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: mirroring

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: no

robot-exclusion-useragent: 

robot-noindex: no

robot-host: *.rdtex.ru

robot-from:

robot-useragent: Shai'Hulud

robot-language: C

robot-description: Used to build mirrors for internal use

robot-history: This robot finds its roots in a research project at RDTeX 

        Perspective Projects Group in 1996

robot-environment: research

modified-date: Mon, 5 Aug 1996 14:35:08 GMT

modified-by: Dimitri Khaoustov



robot-id: sift

robot-name: Sift

robot-cover-url: http://www.worthy.com/

robot-details-url: http://www.worthy.com/

robot-owner-name: Bob Worthy    

robot-owner-url: http://www.worthy.com/~bworthy  

robot-owner-email: [email protected]

robot-status: development, active  

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: sift

robot-noindex: yes

robot-host: www.worthy.com

robot-from:

robot-useragent: libwww-perl-5.41

robot-language: perl

robot-description: Subject directed (via key phrase list) indexing.

robot-history: Libwww of course, implementation using MySQL August, 1999.

 Indexing Search and Rescue sites.

robot-environment: research, service

modified-date: Sat, 16 Oct 1999 19:40:00 GMT

modified-by: Bob Worthy



robot-id: simbot

robot-name: Simmany Robot Ver1.0

robot-cover-url: http://simmany.hnc.net/

robot-details-url: http://simmany.hnc.net/irman1.html

robot-owner-name: Youngsik, Lee(@L?5=D)

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development & active

robot-purpose: indexing, maintenance, statistics

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: SimBot

robot-noindex: no 

robot-host: sansam.hnc.net

robot-from: no

robot-useragent: SimBot/1.0

robot-language: C

robot-description: The Simmany Robot is used to build the Map(DB) for

 the simmany service operated by HNC(Hangul & Computer Co., Ltd.). The

 robot runs weekly, and visits sites that have a useful korean

 information in a defined order.

robot-history: This robot is a part of simmany service and simmini

 products. The simmini is the Web products that make use of the indexing

 and retrieving modules of simmany.

robot-environment: service, commercial

modified-date: Thu, 19 Sep 1996 07:02:26 GMT

modified-by: Youngsik, Lee



robot-id: site-valet

robot-name: Site Valet

robot-cover-url: http://valet.webthing.com/

robot-details-url: http://valet.webthing.com/

robot-owner-name: Nick Kew

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: Site Valet

robot-noindex: no

robot-host: valet.webthing.com,valet.*

robot-from: yes

robot-useragent: Site Valet

robot-language: perl

robot-description: a deluxe site monitoring and analysis service

robot-history: builds on cg-eye, the WDG Validator, and the Link Valet

robot-environment: service

modified-date: Tue, 27 June 2000

modified-by: [email protected]



robot-id:           sitetech

robot-name:         SiteTech-Rover

robot-cover-url:    http://www.sitetech.com/

robot-details-url:

robot-owner-name:   Anil Peres-da-Silva

robot-owner-url:    http://www.sitetech.com

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    SiteTech-Rover

robot-language:     C++.

robot-description:  Originated as part of a suite of Internet Products to

        organize, search & navigate Intranet sites and to validate

        links in HTML documents.

robot-history: This robot originally went by the name of LiberTech-Rover

robot-environment:

modified-date:      Fri Aug 9 17:06:56 1996.

modified-by: Anil Peres-da-Silva



robot-id: skymob

robot-name: Skymob.com

robot-cover-url: http://www.skymob.com/

robot-details-url: http://www.skymob.com/about.html

robot-owner-name: Have IT Now Limited.

robot-owner-url: http://www.skymob.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: skymob

robot-noindex: no

robot-host: www.skymob.com

robot-from: [email protected]

robot-useragent: aWapClient

robot-language: c++

robot-description: WAP content Crawler.

robot-history: new

robot-environment: service

modified-date: Thu Sep  6 17:50:32 BST 2001

modified-by: Owen Lydiard



robot-id:slcrawler

robot-name:SLCrawler

robot-cover-url:

robot-details-url:

robot-owner-name:Inxight Software

robot-owner-url:http://www.inxight.com

robot-owner-email:[email protected]

robot-status:active

robot-purpose:To build the site map.

robot-type:standalone

robot-platform:windows, windows95, windowsNT

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:SLCrawler/2.0

robot-noindex:no

robot-host:n/a

robot-from:

robot-useragent:SLCrawler

robot-language:Java

robot-description:To build the site map.

robot-history:It is SLCrawler to crawl html page on Internet.

robot-environment: commercial: is a commercial product

modified-date:Nov. 15, 2000

modified-by:Karen Ng



robot-id: slurp

robot-name: Inktomi Slurp

robot-cover-url: http://www.inktomi.com/

robot-details-url: http://www.inktomi.com/slurp.html

robot-owner-name: Inktomi Corporation

robot-owner-url: http://www.inktomi.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: slurp

robot-noindex: yes

robot-host: *.inktomi.com

robot-from: yes

robot-useragent: Slurp/2.0

robot-language: C/C++

robot-description: Indexing documents for the HotBot search engine

		(www.hotbot.com), collecting Web statistics

robot-history: Switch from Slurp/1.0 to Slurp/2.0 November 1996

robot-environment: service

modified-date: Fri Feb 28 13:57:43 PST 1997

modified-by: [email protected]



robot-id: smartspider

robot-name: Smart Spider

robot-cover-url: http://www.travel-finder.com

robot-details-url: http://www.engsoftware.com/robots.htm

robot-owner-name: Ken Wadland

robot-owner-url: http://www.engsoftware.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windows95, windowsNT

robot-availability: data, binary, source

robot-exclusion: Yes

robot-exclusion-useragent: ESI

robot-noindex: Yes

robot-host: 207.16.241.*

robot-from: Yes

robot-useragent: ESISmartSpider/2.0

robot-language: C++

robot-description:  Classifies sites using a Knowledge Base.   Robot collects 

        web pages which are then parsed and feed to the Knowledge Base.  The

 Knowledge Base classifies the sites into any of hundreds of     categories

 based on the vocabulary used.  Currently used by:       //www.travel-finder.com

 (Travel and Tourist Info) and   //www.golightway.com (Christian Sites).

 Several options exist to        control whether sites are discovered and/or

 classified fully        automatically, full manually    or somewhere in between.

robot-history: Feb '96 -- Product design begun.  May '96 -- First data

 results         published by Travel-Finder.  Oct '96 -- Generalized and announced

 and a   product for other sites.  Jan '97 -- First data results published by

        GoLightWay.

robot-environment: service, commercial

modified-date: Mon, 13 Jan 1997 10:41:00 EST

modified-by: Ken Wadland



robot-id: snooper

robot-name: Snooper

robot-cover-url: http://darsun.sit.qc.ca

robot-details-url:

robot-owner-name: Isabelle A. Melnick

robot-owner-url:

robot-owner-email: [email protected]

robot-status: part under development and part active

robot-purpose:

robot-type:

robot-platform:

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: snooper

robot-noindex:

robot-host:

robot-from:

robot-useragent: Snooper/b97_01

robot-language:

robot-description:

robot-history:

robot-environment:

modified-date:

modified-by:



robot-id: solbot

robot-name: Solbot

robot-cover-url: http://kvasir.sol.no/

robot-details-url:

robot-owner-name: Frank Tore Johansen

robot-owner-url:

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: solbot

robot-noindex: yes

robot-host: robot*.sol.no

robot-from:

robot-useragent: Solbot/1.0 LWP/5.07

robot-language: perl, c

robot-description: Builds data for the Kvasir search service.  Only searches

 sites which ends with one of the following domains: "no", "se", "dk", "is", "fi"

robot-history: This robot is the result of a 3 years old late night hack when

 the Verity robot (of that time) was unable to index sites with iso8859

 characters (in URL and other places), and we just _had_ to have something up and going the next day...

robot-environment: service

modified-date: Tue Apr  7 16:25:05 MET DST 1998

modified-by: Frank Tore Johansen 



robot-id:speedy

robot-name:Speedy Spider

robot-cover-url:http://www.entireweb.com/

robot-details-url:http://www.entireweb.com/speedy.html

robot-owner-name:WorldLight.com AB

robot-owner-url:http://www.worldlight.com

robot-owner-email:[email protected]

robot-status:active

robot-purpose:indexing

robot-type:standalone

robot-platform:Windows

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:speedy

robot-noindex:yes

robot-host:router-00.sverige.net, 193.15.210.29, *.entireweb.com,

 *.worldlight.com

robot-from:yes

robot-useragent:Speedy Spider ( http://www.entireweb.com/speedy.html )

robot-language:C, C++

robot-description:Speedy Spider is used to build the database

           for the Entireweb.com search service operated by WorldLight.com

           (part of WorldLight Network).

           The robot runs constantly, and visits sites in a random order.

robot-history:This robot is a part of the highly advanced search engine

 Entireweb.com, that was developed in Halmstad, Sweden during 1998-2000.

robot-environment:service, commercial

modified-date:Mon, 17 July 2000 11:05:03 GMT

modified-by:Marcus Andersson



robot-id: spider_monkey

robot-name: spider_monkey

robot-cover-url: http://www.mobrien.com/add_site.html

robot-details-url: http://www.mobrien.com/add_site.html

robot-owner-name: MPRM Group Limited

robot-owner-url: http://www.mobrien.com

robot-owner-email: [email protected]

robot-status: robot actively in use

robot-purpose: gather content for a free indexing service

robot-type: FDSE robot

robot-platform: unix

robot-availability: bulk data gathered by robot available

robot-exclusion: yes

robot-exclusion-useragent: spider_monkey

robot-noindex: yes

robot-host: snowball.ionsys.com

robot-from: yes

robot-useragent: mouse.house/7.1

robot-language: perl5

robot-description: Robot runs every 30 days for a full index and weekly =

 on a list of accumulated visitor requests

robot-history: This robot is under development and currently active

robot-environment: written as an employee / guest service

modified-date: Mon, 22 May 2000 12:28:52 GMT

modified-by: MPRM Group Limited



robot-id: spiderbot

robot-name: SpiderBot

robot-cover-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/cover.htm

robot-details-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/details.htm

robot-owner-name: Ignacio Cruzado Nu.o

robot-owner-url: http://pisuerga.inf.ubu.es/lsi/Docencia/TFC/ITIG/icruzadn/icruzadn.htm

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, mirroring

robot-type: standalone, browser

robot-platform: unix, windows, windows95, windowsNT

robot-availability: source, binary, data

robot-exclusion: yes

robot-exclusion-useragent: SpiderBot/1.0

robot-noindex: yes

robot-host: *

robot-from: yes

robot-useragent: SpiderBot/1.0

robot-language: C++, Tcl

robot-description: Recovers Web Pages and saves them on your hard disk.  Then it reindexes them.

robot-history: This Robot belongs to Ignacio Cruzado Nu.o End of Studies Thesis "Recuperador p.ginas Web", to get the titulation of "Management Tecnical Informatics Engineer" in the  for the Burgos University in Spain.

robot-environment: research

modified-date: Sun, 27 Jun 1999 09:00:00 GMT

modified-by: Ignacio Cruzado Nu.o



robot-id: spiderline

robot-name: Spiderline Crawler

robot-cover-url: http://www.spiderline.com/

robot-details-url: http://www.spiderline.com/

robot-owner-name: Benjamin Benson

robot-owner-url: http://www.spiderline.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: free and commercial services

robot-exclusion: yes

robot-exclusion-useragent: spiderline

robot-noindex: yes

robot-host: *.spiderline.com, *.spiderline.org

robot-from: no

robot-useragent: spiderline/3.1.3

robot-language: c, c++

robot-description: 

robot-history: Developed for Spiderline.com, launched in 2001.

robot-environment: service

modified-date: Wed, 21 Feb 2001 03:36:39 GMT

modified-by: Benjamin Benson



robot-id:spiderman

robot-name:SpiderMan

robot-cover-url:http://www.comp.nus.edu.sg/~leunghok

robot-details-url:http://www.comp.nus.edu.sg/~leunghok/honproj.html

robot-owner-name:Leung Hok Peng , The School Of Computing Nus , Singapore

robot-owner-url:http://www.comp.nus.edu.sg/~leunghok

robot-owner-email:[email protected]

robot-status:development & active

robot-purpose:user searching using IR technique

robot-type:stand alone

robot-platform:Java 1.2

robot-availability:binary&source

robot-exclusion:no

robot-exclusion-useragent:nil

robot-noindex:no

robot-host:NA

robot-from:NA

robot-useragent:SpiderMan 1.0

robot-language:java

robot-description:It is used for any user to search the web given a query string

robot-history:Originated from The Center for Natural Product Research and The

 School of computing National University Of Singapore

robot-environment:research

modified-date:08/08/1999

modified-by:Leung Hok Peng and Dr Hsu Wynne



robot-id: spiderview

robot-name: SpiderView(tm)

robot-cover-url: http://www.northernwebs.com/set/spider_view.html

robot-details-url: http://www.northernwebs.com/set/spider_sales.html

robot-owner-name: Northern Webs

robot-owner-url: http://www.northernwebs.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix, nt

robot-availability: source

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex:

robot-host: bobmin.quad2.iuinc.com, *

robot-from: No

robot-useragent: Mozilla/4.0 (compatible; SpiderView 1.0;unix)

robot-language: perl

robot-description: SpiderView is a server based program which can spider

 a webpage, testing the links found on the page, evaluating your server

 and its performance.

robot-history: This is an offshoot http retrieval program based on our

 Medibot software.

robot-environment: commercial

modified-date:

modified-by:



robot-id:           spry

robot-name:         Spry Wizard Robot

robot-cover-url:    http://www.spry.com/wizard/index.html

robot-details-url:

robot-owner-name:   spry

robot-owner-url:    ttp://www.spry.com/index.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         wizard.spry.com or tiger.spry.com

robot-from:         no

robot-useragent:    no

robot-language:     

robot-description:  Its purpose is to generate a Resource Discovery database

	Spry is refusing to give any comments about this

	robot

robot-history:      

robot-environment:

modified-date:      Tue Jul 11 09:29:45 GMT 1995

modified-by:



robot-id: ssearcher

robot-name: Site Searcher

robot-cover-url: www.satacoy.com

robot-details-url: www.satacoy.com

robot-owner-name: Zackware

robot-owner-url: www.satacoy.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: winows95, windows98, windowsNT

robot-availability: binary

robot-exclusion: no

robot-exclusion-useragent:

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: ssearcher100

robot-language: C++

robot-description: Site Searcher scans web sites for specific file types.

 (JPG, MP3, MPG, etc)

robot-history:  Released 4/4/1999

robot-environment: hobby

modified-date: 04/26/1999



robot-id: suke

robot-name: Suke

robot-cover-url: http://www.kensaku.org/

robot-details-url: http://www.kensaku.org/

robot-owner-name: Yosuke Kuroda

robot-owner-url: http://www.kensaku.org/yk/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: FreeBSD3.*

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: suke

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: suke/*.*

robot-language: c

robot-description: This robot visits mainly sites in japan.

robot-history: since 1999

robot-environment: service



robot-id: suntek

robot-name: suntek search engine

robot-cover-url: http://www.portal.com.hk/

robot-details-url: http://www.suntek.com.hk/

robot-owner-name: Suntek Computer Systems

robot-owner-url: http://www.suntek.com.hk/

robot-owner-email: [email protected]

robot-status: operational

robot-purpose: to create a search portal on Asian web sites

robot-type:

robot-platform: NT, Linux, UNIX

robot-availability: available now

robot-exclusion:

robot-exclusion-useragent:

robot-noindex: yes

robot-host: search.suntek.com.hk

robot-from: yes

robot-useragent: suntek/1.0

robot-language: Java

robot-description: A multilingual search engine with emphasis on Asia contents

robot-history:

robot-environment:

modified-date:

modified-by:



robot-id: sven

robot-name: Sven

robot-cover-url: 

robot-details-url: http://marty.weathercity.com/sven/

robot-owner-name: Marty Anstey

robot-owner-url: http://marty.weathercity.com/

robot-owner-email: [email protected]

robot-status: Active

robot-purpose: indexing

robot-type: standalone

robot-platform: Windows

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: 

robot-noindex: no

robot-host: 24.113.12.29

robot-from: no

robot-useragent:

robot-language: VB5

robot-description: Used to gather sites for netbreach.com. Runs constantly.

robot-history: Developed as an experiment in web indexing.

robot-environment: hobby, service

modified-date: Tue, 3 Mar 1999 08:15:00 PST

modified-by: Marty Anstey



robot-id:                      sygol

robot-name:                    Sygol 

robot-cover-url:               http://www.sygol.com

robot-details-url:             http://www.sygol.com/who.asp

robot-owner-name:              Giorgio Galeotti

robot-owner-url:               http://www.sygol.com

robot-owner-email:             [email protected]

robot-status:                  active

robot-purpose:                 indexing: gather pages for the Sygol search engine

robot-type:                    standalone

robot-platform:                All Windows from 95 to latest.

robot-availability:            none

robot-exclusion:               yes

robot-exclusion-useragent:     http://www.sygol.com

robot-noindex:                 no

robot-host:                    http://www.sygol.com

robot-from:                    No

robot-useragent:               http://www.sygol.com

robot-language:                Visual Basic

robot-description:             Very standard robot: it gets all words and

 links from a page end then indexes the first and stores the latter for further

 crawling.

robot-history:                 It all started in 1999 as a hobby to try

 crawling the web and putting together a good search engine with very little

 hardware resources.

robot-environment:             Hobby

modified-date:                 Mon, 07 Jun 2004 14:50:01 GMT

modified-by:                   Giorgio Galeotti



robot-id: tach_bw

robot-name: TACH Black Widow

robot-cover-url: http://theautochannel.com/~mjenn/bw.html

robot-details-url: http://theautochannel.com/~mjenn/bw-syntax.html

robot-owner-name: Michael Jennings

robot-owner-url: http://www.spd.louisville.edu/~mejenn01/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: maintenance: link validation

robot-type: standalone

robot-platform: UNIX, Linux

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: tach_bw

robot-noindex: no

robot-host: *.theautochannel.com

robot-from: yes

robot-useragent: Mozilla/3.0 (Black Widow v1.1.0; Linux 2.0.27; Dec 31 1997 12:25:00

robot-language: C/C++

robot-description: Exhaustively recurses a single site to check for broken links

robot-history: Corporate application begun in 1996 for The Auto Channel

robot-environment: commercial

modified-date: Thu, Jan 23 1997 23:09:00 GMT

modified-by: Michael Jennings



robot-id:tarantula

robot-name: Tarantula

robot-cover-url: http://www.nathan.de/nathan/software.html#TARANTULA

robot-details-url: http://www.nathan.de/

robot-owner-name: Markus Hoevener

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: yes

robot-noindex: yes

robot-host: yes

robot-from: no

robot-useragent: Tarantula/1.0

robot-language: C

robot-description: Tarantual gathers information for german search engine Nathanrobot-history: Started February 1997

robot-environment: service

modified-date: Mon, 29 Dec 1997 15:30:00 GMT

modified-by: Markus Hoevener



robot-id:           tarspider

robot-name:         tarspider

robot-cover-url:    

robot-details-url:

robot-owner-name:   Olaf Schreck

robot-owner-url:    http://www.chemie.fu-berlin.de/user/chakl/ChaklHome.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      mirroring

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         [email protected]

robot-useragent:    tarspider

robot-language:     

robot-description:  

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id:           tcl

robot-name:         Tcl W3 Robot

robot-cover-url:    http://hplyot.obspm.fr/~dl/robo.html

robot-details-url:

robot-owner-name:   Laurent Demailly

robot-owner-url:    http://hplyot.obspm.fr/~dl/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintenance, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         hplyot.obspm.fr

robot-from:         yes

robot-useragent:    dlw3robot/x.y (in TclX by http://hplyot.obspm.fr/~dl/)

robot-language:     tcl

robot-description:  Its purpose is to validate links, and generate

	statistics.

robot-history:      

robot-environment:

modified-date:      Tue May 23 17:51:39 1995

modified-by:



robot-id: techbot

robot-name: TechBOT

robot-cover-url: http://www.techaid.net/

robot-details-url: http://www.echaid.net/TechBOT/

robot-owner-name: TechAID Internet Services

robot-owner-url: http://www.techaid.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose:statistics, maintenance

robot-type: standalone

robot-platform: Unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: TechBOT

robot-noindex: yes

robot-host: techaid.net

robot-from: yes

robot-useragent: TechBOT

robot-language: perl5

robot-description: TechBOT is constantly upgraded. Currently he is used for

 Link Validation, Load Time, HTML Validation and much much more.

robot-history: TechBOT started his life as a Page Change Detection robot,

 but has taken on many new and exciting roles.

robot-environment: service

modified-date: Sat, 18 Dec 1998 14:26:00 EST

modified-by: [email protected]



robot-id: templeton

robot-name: Templeton

robot-cover-url: http://www.bmtmicro.com/catalog/tton/

robot-details-url: http://www.bmtmicro.com/catalog/tton/

robot-owner-name: Neal Krawetz

robot-owner-url: http://www.cs.tamu.edu/people/nealk/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: mirroring, mapping, automating web applications

robot-type: standalone

robot-platform: OS/2, Linux, SunOS, Solaris

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: templeton

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: Templeton/{version} for {platform}

robot-language: C

robot-description: Templeton is a very configurable robots for mirroring, mapping, and automating applications on retrieved documents.

robot-history: This robot was originally created as a test-of-concept.

robot-environment: service, commercial, research, hobby

modified-date: Sun, 6 Apr 1997 10:00:00 GMT

modified-by: Neal Krawetz



robot-id: titin

robot-name: TitIn

robot-cover-url: http://www.foi.hr/~dpavlin/titin/

robot-details-url: http://www.foi.hr/~dpavlin/titin/tehnical.htm

robot-owner-name: Dobrica Pavlinusic

robot-owner-url: http://www.foi.hr/~dpavlin/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix

robot-availability: data, source on request

robot-exclusion: yes

robot-exclusion-useragent: titin

robot-noindex: no

robot-host: barok.foi.hr

robot-from: no

robot-useragent: TitIn/0.2

robot-language: perl5, c

robot-description:

        The TitIn is used to index all titles of Web server in

        .hr domain.

robot-history:

        It was done as result of desperate need for central index of

        Croatian web servers in December 1996.

robot-environment: research

modified-date: Thu, 12 Dec 1996 16:06:42 MET

modified-by: Dobrica Pavlinusic



robot-id:           titan

robot-name:         TITAN

robot-cover-url:    http://isserv.tas.ntt.jp/chisho/titan-e.html

robot-details-url:  http://isserv.tas.ntt.jp/chisho/titan-help/eng/titan-help-e.html

robot-owner-name:   Yoshihiko HAYASHI

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      indexing

robot-type:         standalone

robot-platform:     SunOS 4.1.4

robot-availability: no

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         nlptitan.isl.ntt.jp

robot-from:         yes

robot-useragent:    TITAN/0.1

robot-language:     perl 4

robot-description:  Its purpose is to generate a Resource Discovery

    database, and copy document trees. Our primary goal is to develop

    an advanced method for indexing the WWW documents. Uses libwww-perl

robot-history:      

robot-environment:

modified-date:      Mon Jun 24 17:20:44 PDT 1996

modified-by:        Yoshihiko HAYASHI



robot-id:           tkwww

robot-name:         The TkWWW Robot

robot-cover-url:    http://fang.cs.sunyit.edu/Robots/tkwww.html

robot-details-url:

robot-owner-name:   Scott Spetka

robot-owner-url:    http://fang.cs.sunyit.edu/scott/scott.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  It is designed to search Web neighborhoods to find pages

	that may be logically related. The Robot returns a list of

	links that looks like a hot list. The search can be by key

	word or all links at a distance of one or two hops may be

	returned. The TkWWW Robot is described in a paper presented

	at the WWW94 Conference in Chicago.

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: tlspider

robot-name:TLSpider

robot-cover-url: n/a

robot-details-url: n/a

robot-owner-name: topiclink.com

robot-owner-url: topiclink.com

robot-owner-email: [email protected]

robot-status: not activated

robot-purpose: to get web sites and add them to the topiclink future directory

robot-type:development: robot under development

robot-platform:linux

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:topiclink

robot-noindex:no

robot-host: tlspider.topiclink.com (not avalible yet)

robot-from:no

robot-useragent:TLSpider/1.1

robot-language:perl5

robot-description:This robot runs 2 days a week getting information for

 TopicLink.com

robot-history:This robot was created to server for the internet search engine

 TopicLink.com

robot-environment:service

modified-date:September,10,1999 17:28 GMT

modified-by: TopicLink Spider Team



robot-id:           ucsd

robot-name:         UCSD Crawl

robot-cover-url:    http://www.mib.org/~ucsdcrawl

robot-details-url:

robot-owner-name:   Adam Tilghman

robot-owner-url:    http://www.mib.org/~atilghma

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, statistics

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         nuthaus.mib.org scilib.ucsd.edu

robot-from:         yes

robot-useragent:    UCSD-Crawler

robot-language:     Perl 4

robot-description:  Should hit ONLY within UC San Diego - trying to count

	servers here.

robot-history:

robot-environment:

modified-date:      Sat Jan 27 09:21:40 1996.

modified-by:



robot-id: udmsearch

robot-name: UdmSearch

robot-details-url: http://mysearch.udm.net/

robot-cover-url: http://mysearch.udm.net/

robot-owner-name: Alexander Barkov

robot-owner-url: http://mysearch.udm.net/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, validation

robot-type: standalone

robot-platform: unix

robot-availability: source, binary

robot-exclusion: yes

robot-exclusion-useragent: UdmSearch

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: UdmSearch/2.1.1

robot-language: c

robot-description: UdmSearch is a free web search engine software for

 intranet/small domain internet servers

robot-history: Developed since 1998, origin purpose is a search engine

 over republic of Udmurtia http://search.udm.net

robot-environment: hobby

modified-date: Mon, 6 Sep 1999 10:28:52 GMT



robot-id: uptimebot

robot-name: UptimeBot

robot-cover-url: http://www.uptimebot.com

robot-details-url: http://www.uptimebot.com

robot-owner-name: UCO team

robot-owner-url: http://www.uptimebot.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, statistics

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: uptimebot

robot-exclusion-useragent: no

robot-noindex: no

robot-host: uptimebot.com

robot-from: no

robot-useragent: uptimebot

robot-language: c++

robot-description: UptimeBot is a web crawler that checks return codes of web

 servers and calculates average number of current servers status. The robot

 runs daily, and visits sites in a random order.

robot-history: This robot is a local research product of the UtimeBot team.

robot-environment: research

modified-date: Sat, 19 March 2004 21:19:03 GMT

modified-by: UptimeBot team



robot-id:                   urlck

robot-name:                 URL Check

robot-cover-url:            http://www.cutternet.com/products/webcheck.html

robot-details-url:          http://www.cutternet.com/products/urlck.html

robot-owner-name:           Dave Finnegan

robot-owner-url:            http://www.cutternet.com

robot-owner-email:          [email protected]

robot-status:               active

robot-purpose:              maintenance

robot-type:                 standalone

robot-platform:             unix

robot-availability:         binary

robot-exclusion:            yes

robot-exclusion-useragent:  urlck

robot-noindex:              no

robot-host:                 *

robot-from:                 yes

robot-useragent:            urlck/1.2.3

robot-language:             c

robot-description:          The robot is used to manage, maintain, and modify

                            web sites.  It builds a database detailing the

                            site, builds HTML reports describing the site, and

                            can be used to up-load pages to the site or to

                            modify existing pages and URLs within the site.  It

                            can also be used to mirror whole or partial sites.

                            It supports HTTP, File, FTP, and Mailto schemes.

robot-history:              Originally designed to validate URLs.

robot-environment:          commercial

modified-date:              July 9, 1997

modified-by:                Dave Finnegan



robot-id: us

robot-name: URL Spider Pro

robot-cover-url: http://www.innerprise.net

robot-details-url: http://www.innerprise.net/us.htm

robot-owner-name: Innerprise

robot-owner-url: http://www.innerprise.net

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Windows9x/NT

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: *

robot-noindex: yes

robot-host: *

robot-from: no

robot-useragent: URL Spider Pro

robot-language: delphi

robot-description: Used for building a database of web pages.

robot-history: Project started July 1998.

robot-environment: commercial

modified-date: Mon, 12 Jul 1999 17:50:30 GMT

modified-by: Innerprise



robot-id: valkyrie

robot-name: Valkyrie

robot-cover-url: http://kichijiro.c.u-tokyo.ac.jp/odin/

robot-details-url: http://kichijiro.c.u-tokyo.ac.jp/odin/robot.html

robot-owner-name: Masanori Harada

robot-owner-url: http://www.graco.c.u-tokyo.ac.jp/~harada/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Valkyrie libwww-perl

robot-noindex: no

robot-host: *.c.u-tokyo.ac.jp

robot-from: yes

robot-useragent: Valkyrie/1.0 libwww-perl/0.40

robot-language: perl4

robot-description: used to collect resources from Japanese Web sites for ODIN search engine.

robot-history: This robot has been used since Oct. 1995 for author's research.

robot-environment: service research

modified-date: Thu Mar 20 19:09:56 JST 1997

modified-by: [email protected]



robot-id: verticrawl

robot-name: Verticrawl

robot-cover-url: http://www.verticrawl.com/

robot-details-url: http://www.verticrawl.com/

robot-owner-name: Datamean, Malinge, Lhuisset

robot-owner-url: http://www.verticrawl.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, searching, and classifying urls in a global ASP search & Appliance search solution

robot-type: standalone

robot-platform: Unix, Linux

robot-availability: none

robot-exclusion:  verticrawl

robot-exclusion-useragent: verticrawl

robot-noindex: yes

robot-host: http://www.verticrawl.com/

robot-from: Yes

robot-useragent:  Verticrawlbot

robot-language:  c, perl, php

robot-description: Verticrawl is a global search engine dedicated to appliance service providing in ASP search & Appliance search solution

robot-history: Verticrawl is based on web services for knowledge management and Web portals services and sitesearch solutions

robot-environment: commercial

modified-date: mon,  27 Jul 2006 17:28:52 GMT

modified-by: [email protected]



robot-id: victoria

robot-name: Victoria

robot-cover-url:

robot-details-url:

robot-owner-name: Adrian Howard

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Victoria

robot-noindex: yes

robot-host:

robot-from:

robot-useragent: Victoria/1.0

robot-language: perl,c

robot-description: Victoria is part of a groupware produced

 by Victoria Real Ltd. (voice: +44 [0]1273 774469,

 fax: +44 [0]1273 779960 email: [email protected]).

 Victoria is used to monitor changes in W3 documents,

 both intranet and internet based.

 Contact Victoria Real for more information.

robot-history:

robot-environment: commercial

modified-date: Fri, 22 Nov 1996 16:45 GMT

modified-by: [email protected]



robot-id:           visionsearch

robot-name:         vision-search

robot-cover-url:    http://www.ius.cs.cmu.edu/cgi-bin/vision-search

robot-details-url:

robot-owner-name:   Henry A. Rowley

robot-owner-url:    http://www.cs.cmu.edu/~har

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing.

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:

robot-host:         dylan.ius.cs.cmu.edu

robot-from:         no

robot-useragent:    vision-search/3.0'

robot-language:     Perl 5

robot-description:  Intended to be an index of computer vision pages, containing

	all pages within n links (for some small

	n) of the Vision Home Page

robot-history:

robot-environment:

modified-date:      Fri Mar  8 16:03:04 1996

modified-by:



robot-id: voidbot

robot-name: void-bot

robot-cover-url: http://www.void.be/

robot-details-url: http://www.void.be/void-bot.html

robot-owner-name: Tristan Crombez

robot-owner-url: http://www.void.be/tristan/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing,maintenance

robot-type: standalone

robot-platform: FreeBSD,Linux

robot-availability: none

robot-exclusion: no

robot-exclusion-useragent: void-bot

robot-noindex: no

robot-host: void.be

robot-from: no

robot-useragent: void-bot/0.1 ([email protected]; http://www.void.be/)

robot-language: perl5

robot-description: The void-bot is

 used to build a database for the void search service, as well as for link

 validation.

robot-history: Development was started in october 2003, spidering

 began in january 2004.

robot-environment: research

modified-date: Mon, 9 Feb 2004 11:51:10 GMT

modified-by: [email protected]



robot-id: voyager

robot-name: Voyager

robot-cover-url: http://www.lisa.co.jp/voyager/

robot-details-url:

robot-owner-name: Voyager Staff

robot-owner-url: http://www.lisa.co.jp/voyager/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: indexing, maintenance

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Voyager 

robot-noindex: no

robot-host: *.lisa.co.jp

robot-from: yes

robot-useragent: Voyager/0.0

robot-language: perl5 

robot-description: This robot is used to build the database for the

                   Lisa Search service.  The robot manually launch  

                   and visits sites in a random order.

robot-history:

robot-environment: service

modified-date: Mon, 30 Nov 1998 08:00:00 GMT

modified-by: Hideyuki Ezaki



robot-id: vwbot

robot-name: VWbot

robot-cover-url: http://vancouver-webpages.com/VWbot/

robot-details-url: http://vancouver-webpages.com/VWbot/aboutK.shtml

robot-owner-name: Andrew Daviel

robot-owner-url:  http://vancouver-webpages.com/~admin/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: VWbot_K

robot-noindex: yes

robot-host: vancouver-webpages.com

robot-from: yes

robot-useragent: VWbot_K/4.2

robot-language: perl4

robot-description: Used to index BC sites for the searchBC database. Runs daily.

robot-history: Originally written fall 1995. Actively maintained.

robot-environment: service commercial research

modified-date: Tue, 4 Mar 1997 20:00:00 GMT

modified-by: Andrew Daviel



robot-id:             w3index

robot-name:           The NWI Robot

robot-cover-url:            http://www.ub2.lu.se/NNC/projects/NWI/the_nwi_robot.html

robot-owner-name:     Sigfrid Lundberg, Lund university, Sweden

robot-owner-url:      http://nwi.ub2.lu.se/~siglun

robot-owner-email:    [email protected]

robot-status:         active

robot-purpose:        discovery,statistics

robot-type:           standalone

robot-platform:       UNIX

robot-availability:   none (at the moment)

robot-exclusion:      yes

robot-noindex:        No

robot-host:   nwi.ub2.lu.se, mars.dtv.dk and a few others

robot-from:   yes

robot-useragent:      w3index

robot-language:       perl5

robot-description:    A resource discovery robot, used primarily for

	the indexing of the Scandinavian Web

robot-history:        It is about a year or so old.

	Written by Anders Ard�, Mattias Borrell, 

	H�kan Ard� and myself.

robot-environment: service,research

modified-date:        Wed Jun 26 13:58:04 MET DST 1996

modified-by:          Sigfrid Lundberg



robot-id:           w3m2

robot-name:         W3M2

robot-cover-url:    http://tronche.com/W3M2

robot-details-url:

robot-owner-name:   Christophe Tronche

robot-owner-url:    http://tronche.com/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing, maintenance, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         yes

robot-useragent:    W3M2/x.xxx

robot-language:     Perl 4, Perl 5, and C++

robot-description:  to generate a Resource Discovery database, validate links,

	validate HTML, and generate statistics

robot-history:      

robot-environment:

modified-date:      Fri May 5 17:48:48 1995

modified-by:



robot-id: wallpaper

robot-name: WallPaper (alias crawlpaper)

robot-cover-url: http://www.crawlpaper.com/

robot-details-url: http://sourceforge.net/projects/crawlpaper/

robot-owner-name: Luca Piergentili

robot-owner-url: http://www.geocities.com/lpiergentili/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windows

robot-availability: source, binary

robot-exclusion: yes

robot-exclusion-useragent: crawlpaper

robot-noindex: no

robot-host:

robot-from:

robot-useragent: CrawlPaper/n.n.n (Windows n)

robot-language: C++

robot-description: a crawler for pictures download and offline browsing

robot-history: started as screensaver the program has evolved to a crawler

 including an audio player, etc.

robot-environment: hobby

modified-date: Mon, 25 Aug 2003 09:00:00 GMT

modified-by:



robot-id:           wanderer

robot-name:         the World Wide Web Wanderer

robot-cover-url:    http://www.mit.edu/people/mkgray/net/

robot-details-url:

robot-owner-name:   Matthew Gray

robot-owner-url:    http://www.mit.edu:8001/people/mkgray/mkgray.html

robot-owner-email:  [email protected]

robot-status:       active

robot-purpose:      statistics

robot-type:         standalone

robot-platform:     unix

robot-availability: data

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *.mit.edu

robot-from:         

robot-useragent:    WWWWanderer v3.0

robot-language:     perl4

robot-description:  Run initially in June 1993, its aim is to measure

                    the growth in the web.

robot-history:      

robot-environment:  research

modified-date:      

modified-by:



robot-id: wapspider

robot-name: w@pSpider by wap4.com

robot-cover-url: http://mopilot.com/

robot-details-url: http://wap4.com/portfolio.htm

robot-owner-name: Dieter Kneffel

robot-owner-url: http://wap4.com/ (corporate)

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing, maintenance (special: dedicated to wap/wml pages)

robot-type: standalone

robot-platform: unix

robot-availability: data

robot-exclusion: yes

robot-exclusion-useragent: wapspider

robot-noindex: [does not apply for wap/wml pages!]

robot-host: *.wap4.com, *.mopilot.com

robot-from: yes

robot-useragent: w@pSpider/xxx (unix) by wap4.com

robot-language: c, php, sql

robot-description: wapspider is used to build the database for

 mopilot.com, a search engine for mobile contents; it is specially

 designed to crawl wml-pages. html is indexed, but html-links are

 (currently) not followed

robot-history: this robot was developed by wap4.com in 1999 for the

 world's first wap-search engine

robot-environment: service, commercial, research

modified-date: Fri, 23 Jun 2000 14:33:52 MESZ

modified-by: Dieter Kneffel, [email protected]



robot-id:webbandit

robot-name:WebBandit Web Spider

robot-cover-url:http://pw2.netcom.com/~wooger/

robot-details-url:http://pw2.netcom.com/~wooger/

robot-owner-name:Jerry Walsh

robot-owner-url:http://pw2.netcom.com/~wooger/

robot-owner-email:[email protected]

robot-status:active

robot-purpose:Resource Gathering / Server Benchmarking

robot-type:standalone application

robot-platform:Intel - windows95

robot-availability:source, binary

robot-exclusion:no

robot-exclusion-useragent:WebBandit/1.0

robot-noindex:no

robot-host:ix.netcom.com

robot-from:no

robot-useragent:WebBandit/1.0

robot-language:C++

robot-description:multithreaded, hyperlink-following,

 resource finding webspider 

robot-history:Inspired by reading of

 Internet Programming book by Jamsa/Cope 

robot-environment:commercial 

modified-date:11/21/96

modified-by:Jerry Walsh



robot-id: webcatcher

robot-name: WebCatcher

robot-cover-url: http://oscar.lang.nagoya-u.ac.jp

robot-details-url:

robot-owner-name: Reiji SUZUKI

robot-owner-url: http://oscar.lang.nagoya-u.ac.jp/~reiji/index.html

robot-owner-email: [email protected]

robot-owner-name2: Masatoshi SUGIURA

robot-owner-url2: http://oscar.lang.nagoya-u.ac.jp/~sugiura/index.html

robot-owner-email2: [email protected]

robot-status: development

robot-purpose: indexing  

robot-type: standalone   

robot-platform: unix, windows, mac

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: webcatcher

robot-noindex: no

robot-host: oscar.lang.nagoya-u.ac.jp

robot-from: no

robot-useragent: WebCatcher/1.0

robot-language: perl5

robot-description: WebCatcher gathers web pages

                   that Japanese collage students want to visit.

robot-history: This robot finds its roots in a research project 

           at Nagoya University in 1998.

robot-environment: research

modified-date: Fri, 16 Oct 1998 17:28:52 JST

modified-by: "Reiji SUZUKI" 



robot-id:           webcopy

robot-name:         WebCopy

robot-cover-url:    http://www.inf.utfsm.cl/~vparada/webcopy.html

robot-details-url:

robot-owner-name:   Victor Parada

robot-owner-url:    http://www.inf.utfsm.cl/~vparada/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      mirroring

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         *

robot-from:         no

robot-useragent:    WebCopy/(version)

robot-language:     perl 4 or perl 5

robot-description:  Its purpose is to perform mirroring. WebCopy can retrieve

	files recursively using HTTP protocol.It can be used as a

	delayed browser or as a mirroring tool. It cannot jump from

	one site to another.

robot-history:      

robot-environment:

modified-date:      Sun Jul 2 15:27:04 1995

modified-by:



robot-id:           webfetcher

robot-name:         webfetcher

robot-cover-url:    http://www.ontv.com/

robot-details-url:

robot-owner-name:

robot-owner-url:    http://www.ontv.com/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      mirroring

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:         *

robot-from:         yes

robot-useragent:    WebFetcher/0.8, 

robot-language:     C++

robot-description:  don't wait! OnTV's WebFetcher mirrors whole sites down to

	your hard disk on a TV-like schedule. Catch w3

	documentation. Catch discovery.com without waiting! A fully

	operational web robot for NT/95 today, most UNIX soon, MAC

	tomorrow.

robot-history:

robot-environment:

modified-date:      Sat Jan 27 10:31:43 1996.

modified-by:



robot-id:           webfoot

robot-name:         The Webfoot Robot

robot-cover-url:    

robot-details-url:

robot-owner-name:   Lee McLoughlin

robot-owner-url:    http://web.doc.ic.ac.uk/f?/lmjm

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         phoenix.doc.ic.ac.uk

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  

robot-history:      First spotted in Mid February 1994

robot-environment:

modified-date:      

modified-by:



robot-id: webinator

robot-name: Webinator

robot-details-url: http://www.thunderstone.com/texis/site/pages/webinator4_admin.html

robot-cover-url: http://www.thunderstone.com/texis/site/pages/webinator.html

robot-owner-name: 

robot-owner-email: 

robot-status: active, under further enhancement.

robot-purpose: information retrieval

robot-type: standalone

robot-exclusion: yes

robot-noindex: yes

robot-exclusion-useragent: T-H-U-N-D-E-R-S-T-O-N-E

robot-host: several

robot-from: No

robot-language: Texis Vortex

robot-history: 

robot-environment: Commercial



robot-id:           weblayers

robot-name:         weblayers

robot-cover-url:    http://www.univ-paris8.fr/~loic/weblayers/

robot-details-url:

robot-owner-name:   Loic Dachary

robot-owner-url:    http://www.univ-paris8.fr/~loic/

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintainance

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    weblayers/0.0

robot-language:     perl 5

robot-description:  Its purpose is to validate, cache and maintain links. It is

	designed to maintain the cache generated by the emacs emacs

	w3 mode (N*tscape replacement) and to support annotated

	documents (keep them in sync with the original document via

	diff/patch).

robot-history:      

robot-environment:

modified-date:      Fri Jun 23 16:30:42 FRE 1995

modified-by:



robot-id:           weblinker

robot-name:         WebLinker

robot-cover-url:    http://www.cern.ch/WebLinker/

robot-details-url:

robot-owner-name:   James Casey

robot-owner-url:    http://www.maths.tcd.ie/hyplan/jcasey/jcasey.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintenance

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      

robot-host:         

robot-from:         

robot-useragent:    WebLinker/0.0 libwww-perl/0.1

robot-language:     

robot-description:  it traverses a section of web, doing URN->URL conversion.

        It will be used as a post-processing tool on documents created

	by automatic converters such as LaTeX2HTML or WebMaker. At

	the moment it works at full speed, but is restricted to

	localsites. External GETs will be added, but these will be

	running slowly. WebLinker is meant to be run locally, so if

	you see it elsewhere let the author know!

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id:           webmirror

robot-name:         WebMirror

robot-cover-url:    http://www.winsite.com/pc/win95/netutil/wbmiror1.zip

robot-details-url:

robot-owner-name:   Sui Fung Chan

robot-owner-url:    http://www.geocities.com/NapaVally/1208

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      mirroring

robot-type:         standalone

robot-platform:     Windows95

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         no

robot-useragent:    no

robot-language:     C++

robot-description:  It download web pages to hard drive for off-line

	browsing.

robot-history:

robot-environment:

modified-date:      Mon Apr 29 08:52:25 1996.

modified-by:



robot-id: webmoose

robot-name: The Web Moose

robot-cover-url: 

robot-details-url: http://www.nwlink.com/~mikeblas/webmoose/

robot-owner-name: Mike Blaszczak

robot-owner-url: http://www.nwlink.com/~mikeblas/

robot-owner-email: [email protected]

robot-status: development

robot-purpose: statistics, maintenance

robot-type: standalone

robot-platform: Windows NT

robot-availability: data

robot-exclusion: no

robot-exclusion-useragent: WebMoose

robot-noindex: no

robot-host: msn.com

robot-from: no

robot-useragent: WebMoose/0.0.0000

robot-language: C++

robot-description: This robot collects statistics and verifies links.

 It 

 builds an graph of its visit path.

robot-history: This robot is under development.

 It will support ROBOTS.TXT soon.

robot-environment: hobby

modified-date: Fri, 30 Aug 1996 00:00:00 GMT

modified-by: Mike Blaszczak



robot-id:webquest

robot-name:WebQuest

robot-cover-url:

robot-details-url:

robot-owner-name:TaeYoung Choi

robot-owner-url:http://www.cosmocyber.co.kr:8080/~cty/index.html

robot-owner-email:[email protected]

robot-status:development

robot-purpose:indexing

robot-type:standalone

robot-platform:unix

robot-availability:none

robot-exclusion:yes

robot-exclusion-useragent:webquest

robot-noindex:no

robot-host:210.121.146.2, 210.113.104.1, 210.113.104.2

robot-from:yes

robot-useragent:WebQuest/1.0

robot-language:perl5

robot-description:WebQuest will be used to build the databases for various web

 search service sites which will be in service by early 1998. Until the end of

 Jan. 1998, WebQuest will run from time to time. Since then, it will run

 daily(for few hours and very slowly).

robot-history:The developent of WebQuest was motivated by the need for a

 customized robot in various projects of COSMO Information & Communication Co.,

 Ltd. in Korea.

robot-environment:service  

modified-date:Tue, 30 Dec 1997 09:27:20 GMT

modified-by:TaeYoung Choi



robot-id: webreader

robot-name: Digimarc MarcSpider

robot-cover-url: http://www.digimarc.com/prod_fam.html

robot-details-url: http://www.digimarc.com/prod_fam.html

robot-owner-name: Digimarc Corporation

robot-owner-url: http://www.digimarc.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: windowsNT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent:

robot-noindex:

robot-host: 206.102.3.*

robot-from: yes

robot-useragent: Digimarc WebReader/1.2

robot-language: c++

robot-description: Examines image files for watermarks. 

 In order to not waste internet bandwidth with yet

 another crawler, we have contracted with one of the major crawlers/seach

 engines to provide us with a list of specific URLs of interest to us.  If an

 URL is to an image, we may read the image, but we do not crawl to any other

 URLs.  If an URL is to a page of interest (ususally due to CGI), then we

 access the page to get the image URLs from it, but we do not crawl to any

 other pages.

robot-history: First operation in August 1997.

robot-environment: service

modified-date: Mon, 20 Oct 1997 16:44:29 GMT

modified-by: Brian MacIntosh



robot-id: webreaper

robot-name: WebReaper

robot-cover-url: http://www.otway.com/webreaper

robot-details-url:

robot-owner-name: Mark Otway

robot-owner-url: http://www.otway.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing/offline browsing

robot-type: standalone

robot-platform: windows95, windowsNT

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: webreaper

robot-noindex: no

robot-host: *

robot-from: no

robot-useragent: WebReaper [[email protected]]

robot-language: c++

robot-description: Freeware app which downloads and saves sites locally for

 offline browsing.

robot-history: Written for personal use, and then distributed to the public

 as freeware.

robot-environment: hobby

modified-date: Thu, 25 Mar 1999 15:00:00 GMT

modified-by: Mark Otway



robot-id:                       webs

robot-name:                     webs

robot-cover-url:                http://webdew.rnet.or.jp/

robot-details-url:              http://webdew.rnet.or.jp/service/shank/NAVI/SEARCH/info2.html#robot

robot-owner-name:               Recruit Co.Ltd,

robot-owner-url:                

robot-owner-email:              [email protected]

robot-status:                   active

robot-purpose:                  statistics

robot-type:                     standalone

robot-platform:                 unix

robot-availability:             none

robot-exclusion:                yes

robot-exclusion-useragent:      webs

robot-noindex:                  no

robot-host:                     lemon.recruit.co.jp

robot-from:                     yes

robot-useragent:                [email protected]

robot-language:                 perl5

robot-description:              The webs robot is used to gather WWW servers'

                                top pages last modified date data. Collected

                                statistics reflects the priority of WWW server

                                data collection for webdew indexing service.

                                Indexing in webdew is done by manually.

robot-history:

robot-environment:              service

modified-date:                  Fri,  6 Sep 1996 10:00:00 GMT

modified-by:



robot-id:           websnarf

robot-name:         Websnarf

robot-cover-url:    

robot-details-url:

robot-owner-name:   Charlie Stross

robot-owner-url:    

robot-owner-email:  [email protected]

robot-status:       retired

robot-purpose:      

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: webspider

robot-name: WebSpider

robot-details-url: http://www.csi.uottawa.ca/~u610468

robot-cover-url:

robot-owner-name: Nicolas Fraiji

robot-owner-email: [email protected]

robot-status: active, under further enhancement.

robot-purpose: maintenance, link diagnostics

robot-type: standalone

robot-exclusion: yes

robot-noindex: no

robot-exclusion-useragent: webspider

robot-host: several

robot-from: Yes

robot-language: Perl4

robot-history: developped as a course project at the University of

     Ottawa, Canada in 1996.

robot-environment: Educational use and Research



robot-id:           webvac

robot-name:         WebVac

robot-cover-url:    http://www.federated.com/~tim/webvac.html

robot-details-url:

robot-owner-name:   Tim Jensen

robot-owner-url:    http://www.federated.com/~tim

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      mirroring

robot-type:         standalone

robot-platform:

robot-availability:

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         no

robot-useragent:    webvac/1.0

robot-language:     C++

robot-description:

robot-history:

robot-environment:

modified-date:      Mon May 13 03:19:17 1996.

modified-by:



robot-id:           webwalk

robot-name:         webwalk

robot-cover-url:    

robot-details-url:

robot-owner-name:   Rich Testardi

robot-owner-url:    

robot-owner-email:  

robot-status:       retired

robot-purpose:      indexing, maintentance, mirroring, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    yes

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         yes

robot-useragent:    webwalk

robot-language:     c

robot-description:  Its purpose is to generate a Resource Discovery database,

	validate links, validate HTML, perform mirroring, copy

	document trees, and generate statistics. Webwalk is easily

	extensible to perform virtually any maintenance function

	which involves web traversal, in a way much like the '-exec'

	option of the find(1) command. Webwalk is usually used

	behind the HP firewall

robot-history:      

robot-environment:

modified-date:      Wed Nov 15 09:51:59 PST 1995

modified-by:



robot-id: webwalker

robot-name: WebWalker

robot-cover-url:

robot-details-url:

robot-owner-name: Fah-Chun Cheong

robot-owner-url: http://www.cs.berkeley.edu/~fccheong/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: WebWalker

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: WebWalker/1.10

robot-language: perl4

robot-description: WebWalker performs WWW traversal for individual

                   sites and tests for the integrity of all hyperlinks

                   to external sites. 

robot-history: A Web maintenance robot for expository purposes,

               first published in the book "Internet Agents: Spiders,

               Wanderers, Brokers, and Bots" by the robot's author.

robot-environment: hobby

modified-date: Thu, 25 Jul 1996 16:00:52 PDT

modified-by: Fah-Chun Cheong



robot-id:           webwatch

robot-name:         WebWatch

robot-cover-url:    http://www.specter.com/users/janos/specter

robot-details-url:

robot-owner-name:   Joseph Janos

robot-owner-url:    http://www.specter.com/users/janos/specter

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      maintainance, statistics

robot-type:         standalone

robot-platform:     

robot-availability: 

robot-exclusion:    no

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         

robot-from:         no

robot-useragent:    WebWatch

robot-language:     c++

robot-description:  Its purpose is to validate HTML, and generate statistics.

	Check URLs modified since a given date.

robot-history:      

robot-environment:

modified-date:      Wed Jul 26 13:36:32 1995

modified-by:



robot-id: wget

robot-name: Wget

robot-cover-url: ftp://gnjilux.cc.fer.hr/pub/unix/util/wget/

robot-details-url:

robot-owner-name: Hrvoje Niksic

robot-owner-url:

robot-owner-email: [email protected]

robot-status: development

robot-purpose: mirroring, maintenance

robot-type: standalone

robot-platform: unix

robot-availability: source

robot-exclusion: yes

robot-exclusion-useragent: wget

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: Wget/1.4.0

robot-language: C

robot-description:

  Wget is a utility for retrieving files using HTTP and FTP protocols.

  It works non-interactively, and can retrieve HTML pages and FTP

  trees recursively.  It can be used for mirroring Web pages and FTP

  sites, or for traversing the Web gathering data.  It is run by the

  end user or archive maintainer.

robot-history:

robot-environment: hobby, research

modified-date: Mon, 11 Nov 1996 06:00:44 MET

modified-by: Hrvoje Niksic



robot-id: whatuseek

robot-name: whatUseek Winona

robot-cover-url: http://www.whatUseek.com/

robot-details-url: http://www.whatUseek.com/

robot-owner-name: Neil Mansilla

robot-owner-url: http://www.whatUseek.com/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: Robot used for site-level search and meta-search engines.

robot-type: standalone

robot-platform: unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: winona

robot-noindex: yes

robot-host: *.whatuseek.com, *.aol2.com

robot-from: no

robot-useragent: whatUseek_winona/3.0

robot-language: c++

robot-description: The whatUseek robot, Winona, is used for site-level

 search engines.  It is also implemented in several meta-search engines.

robot-history: Winona was developed in November of 1996.

robot-environment: service

modified-date: Wed, 17 Jan 2001 11:52:00 EST

modified-by: Neil Mansilla



robot-id: whowhere

robot-name: WhoWhere Robot

robot-cover-url: http://www.whowhere.com

robot-details-url: 

robot-owner-name: Rupesh Kapoor

robot-owner-url: 

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: Sun Unix

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: whowhere

robot-noindex: no

robot-host: spica.whowhere.com

robot-from: no

robot-useragent: 

robot-language: C/Perl

robot-description: Gathers data for email directory from web pages

robot-history: 

robot-environment: commercial

modified-date: 

modified-by:



robot-id: wlm

robot-name: Weblog Monitor

robot-details-url: http://www.metastatic.org/wlm/

robot-cover-url: http://www.metastatic.org/wlm/

robot-owner-name: Casey Marshall

robot-owner-url: http://www.metastatic.org/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: statistics

robot-type: standalone

robot-platform: unix, windows,

robot-availability: source, data

robot-exclusion: no

robot-exclusion-useragent: wlm

robot-noindex: no

robot-nofollow: no

robot-host: blossom.metastatic.org

robot-from: no

robot-useragent: wlm-1.1

robot-language: java

robot-description1: Builds the 'Picture of Weblogs' applet.

robot-description2: See http://www.metastatic.org/wlm/.

robot-environment: hobby

modified-date: Fri, 2 Nov 2001 04:55:00 PST



robot-id:           wmir

robot-name:         w3mir

robot-cover-url:    http://www.ifi.uio.no/~janl/w3mir.html

robot-details-url:

robot-owner-name:   Nicolai Langfeldt

robot-owner-url:    http://www.ifi.uio.no/~janl/w3mir.html

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      mirroring.

robot-type:         standalone

robot-platform:     UNIX, WindowsNT

robot-availability:

robot-exclusion:    no.

robot-exclusion-useragent:

robot-noindex:

robot-host:

robot-from:         yes

robot-useragent:    w3mir

robot-language:     Perl

robot-description:  W3mir uses the If-Modified-Since HTTP header and recurses

	only the directory and subdirectories of it's start

	document.  Known to work on U*ixes and Windows

	NT.

robot-history:

robot-environment:

modified-date:      Wed Apr 24 13:23:42 1996.

modified-by:



robot-id: wolp

robot-name: WebStolperer

robot-cover-url: http://www.suchfibel.de/maschinisten

robot-details-url: http://www.suchfibel.de/maschinisten/text/werkzeuge.htm (in German)

robot-owner-name: Marius Dahler

robot-owner-url: http://www.suchfibel.de/maschinisten

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix, NT

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: WOLP

robot-noindex: yes

robot-host: www.suchfibel.de

robot-from: yes

robot-useragent: WOLP/1.0 mda/1.0

robot-language: perl5

robot-description: The robot gathers information about specified

 web-projects and generates knowledge bases in Javascript or an own

 format

robot-environment: hobby

modified-date: 22 Jul 1998

modified-by: Marius Dahler



robot-id:           wombat

robot-name:         The Web Wombat 

robot-cover-url:    http://www.intercom.com.au/wombat/

robot-details-url:

robot-owner-name:   Internet Communications

robot-owner-url:    http://www.intercom.com.au/

robot-owner-email:  [email protected]

robot-status:

robot-purpose:      indexing, statistics.

robot-type:

robot-platform:

robot-availability:

robot-exclusion:    no.

robot-exclusion-useragent:

robot-noindex:

robot-host:         qwerty.intercom.com.au

robot-from:         no

robot-useragent:    no

robot-language:     IBM Rexx/VisualAge C++ under OS/2.

robot-description:  The robot is the basis of the Web Wombat search engine

	(Australian/New Zealand content ONLY).

robot-history:

robot-environment:

modified-date:      Thu Feb 29 00:39:49 1996.

modified-by:



robot-id:           worm

robot-name:         The World Wide Web Worm

robot-cover-url:    http://www.cs.colorado.edu/home/mcbryan/WWWW.html

robot-details-url:

robot-owner-name:   Oliver McBryan

robot-owner-url:    http://www.cs.colorado.edu/home/mcbryan/Home.html

robot-owner-email:  [email protected]

robot-status:       

robot-purpose:      indexing

robot-type:         

robot-platform:     

robot-availability: 

robot-exclusion:    

robot-exclusion-useragent:

robot-noindex:      no

robot-host:         piper.cs.colorado.edu

robot-from:         

robot-useragent:    

robot-language:     

robot-description:  indexing robot, actually has quite flexible search

	options

robot-history:      

robot-environment:

modified-date:      

modified-by:



robot-id: wwwc

robot-name: WWWC Ver 0.2.5

robot-cover-url: http://www.kinet.or.jp/naka/tomo/wwwc.html

robot-details-url:

robot-owner-name: Tomoaki Nakashima.

robot-owner-url: http://www.kinet.or.jp/naka/tomo/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: maintenance

robot-type: standalone

robot-platform: windows, windows95, windowsNT

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: WWWC

robot-noindex: no

robot-host:

robot-from: yes

robot-useragent: WWWC/0.25 (Win95)

robot-language: c

robot-description:

robot-history: 1997

robot-environment: hobby

modified-date: Tuesday, 18 Feb 1997 06:02:47 GMT

modified-by: Tomoaki Nakashima ([email protected])



robot-id: wz101

robot-name: WebZinger

robot-details-url: http://www.imaginon.com/wzindex.html

robot-cover-url: http://www.imaginon.com

robot-owner-name: ImaginOn, Inc

robot-owner-url: http://www.imaginon.com

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: windows95, windowsNT 4, mac, solaris, unix

robot-availability: binary

robot-exclusion: no

robot-exclusion-useragent: none

robot-noindex: no

robot-host: http://www.imaginon.com/wzindex.html *

robot-from: no

robot-useragent: none

robot-language: java

robot-description: commercial Web Bot that accepts plain text queries, uses

 webcrawler, lycos or excite to get URLs, then visits sites.  If the user's

 filter parameters are met, downloads one picture and a paragraph of test.

 Playsback slide show format of one text paragraph plus image from each site.

robot-history: developed by ImaginOn in 1996 and 1997

robot-environment: commercial

modified-date: Wed, 11 Sep 1997 02:00:00 GMT

modified-by: [email protected]



robot-id: xget

robot-name: XGET

robot-cover-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html

robot-details-url: http://www2.117.ne.jp/~moremore/x68000/soft/soft.html

robot-owner-name: Hiroyuki Shigenaga

robot-owner-url: http://www2.117.ne.jp/~moremore/

robot-owner-email: [email protected]

robot-status: active

robot-purpose: mirroring

robot-type: standalone

robot-platform: X68000, X68030

robot-availability: binary

robot-exclusion: yes

robot-exclusion-useragent: XGET

robot-noindex: no

robot-host: *

robot-from: yes

robot-useragent: XGET/0.7

robot-language: c

robot-description: Its purpose is to retrieve updated files.It is run by the end userrobot-history: 1997

robot-environment: hobby

modified-date: Fri, 07 May 1998 17:00:00 GMT

modified-by: Hiroyuki Shigenaga



robot-id: Nederland.zoek

robot-name: Nederland.zoek

robot-cover-url: http://www.nederland.net/

robot-details-url: 

robot-owner-name: System Operator Nederland.net

robot-owner-url: 

robot-owner-email: [email protected]

robot-status: active

robot-purpose: indexing

robot-type: standalone

robot-platform: unix (Linux)

robot-availability: none

robot-exclusion: yes

robot-exclusion-useragent: Nederland.zoek

robot-noindex: no

robot-host: 193.67.110.*

robot-from: yes

robot-useragent: Nederland.zoek

robot-language: c

robot-description: This robot indexes all .nl sites for the search-engine of Nederland.net

robot-history: Developed at Computel Standby in Apeldoorn, The Netherlands

robot-environment: service

modified-date: Sat, 8 Feb 1997 01:10:00 CET

modified-by: Sander Steffann 
sitemap: http://cdn.attracta.com/sitemap/571580.xml.gz
		

Cross link