Creating a python crawler

进行中 已发布的 Mar 15, 2014 货到付款
进行中 货到付款

For a new project, I am looking for a guy who can programm a python crawler.

The python crawler shall crawl through the web and save the all domain names with certain top-level-domain endings.

There shall be the option to easily define the "top-level-endings" (e.g. "de" / "at" / "ch")

The found secondlevel domains are stored together with the TLD in a database.

Domains with top level ending "de" are saved in a table 1.

Domains with top level ending "at" are saved in a table 2.

Domains with top level ending "ch" are saced in a table 3.

(and so on)

If a domain name doesnt fit the setting of the TLD, it is not stored anyway.

ATTENTION please:

Just the secondlevel domain with the TLD (example: "[url removed, login to view]") has to be stored, NOT every single URL that can be found on a certain secondlevel domain. (e.g. [url removed, login to view]; [url removed, login to view], ...)

MySQL PHP Python 软件构架

项目ID: #5560761

关于项目

2个方案 远程项目 活跃的Mar 15, 2014

有2名威客正在参与此工作的竞标,均价$38/小时

pythonpower

Hello sir, i propse you a python script, where you will give as input an url, and the script will fetch the url and discover every href in it and , if the TLD is in your list, will save it in the db. Then, second step, 更多

$45 USD 在3天内
(6条评论)
3.4
rboshra

I have previously implemented web crawlers using python. Adding the extra features required will not be an issue. My code is guaranteed to be well commented and simple for possible future developments.

$30 USD 在3天内
(0条评论)
0.0