summaryrefslogtreecommitdiff
path: root/docs/LINKS
blob: afa10823174aec76d37c4a38cdd17b632e00c807 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Mercator, the "Altavista" robot

http://mercator.comm.nsdlib.org/

authors working for Microsoft now :-)

heritrix
crawler4j

mainly dead or unusable:

jspider
websphinx

Javascript support

phantomjs http://code.google.com/p/phantomjs/
https://github.com/mikeal/spider
https://github.com/joshfire/node-crawler

Php

http://www.makeuseof.com/tag/build-basic-web-crawler-pull-information-website/