Hello. I made professional tool for Web Crawling and parsing in Python (and use it more than 5 years).
My PyWebCrawler options:
Pluses:
1. Multi Threading. (parallel execution)
2. Very Simple Crawl rules language. (Based on XPath and CSS. I'll send you examples in private message).
3. Automatically store crawl results into any database system or other data format (html,csv,xml). (Oracle, MySQL, MSSQL, MSACCESS, DB2, PostGres, FireBird, etc)
There is some custom modules (from opensource projects) and support for python sqlalchemy.
Also it will create database structure with all tables if they do not exists.
4. Proxies.
5. SSL.
6. Crawl whole website with only few strings of pure python code.
7. Uses Python urllib2 or Curl for Python library (with custom user-agent header, cookies, auto-redirect etc).
8. Any encodings support (with auto detecting encoding of website and content).
9. Cache support (for web content with auto refresh by expiration time)
10. Auto-login on web-site (automatically find and fill the login form).
11. Captcha detection + Allow to enter captcha :
With user dialog or
Web site. Send captcha to website (post image via url post) and waiting for answer (so anyone with access rights can enter it).