Recommend this pageIf this page was useful to you, please recommend it to fellow websurfers:
You may rate this script by simply clicking on the appropriate star (5 stars is the best rating).
At least the following requirements are necessary to run phpcrawl (v 0.8) in basic single-process-mode:
In order to run phpcrawl in multi-process-mode, some additional requirements are needed:
PHPCrawl is a framework for crawling/spidering websites written in the programming language PHP, so just call it a webcrawler-library or crawler-engine for PHP.
PHPCrawl “spiders” websites and passes information about all found documents (pages, links, files ans so on) for futher processing to users of the library.
It is high configurable and provides several options to specify the behaviour of the crawler like URL- and Content-Type-filters, cookie-handling, robots.txt-handling, limiting options, multiprocessing and much more.
PHPCrawl is completly free opensource software and is licensed under the GNU GENERAL PUBLIC LICENSE v2.
To get a first impression on how to use the crawler you may want to take a look at the quickstart guide or an example inside the manual section.
A complete reference and documentation of all available options and methods of the framework can be found in the classreferences-section.