Creating a python crawler
$10-30 USD
货到付款
For a new project, I am looking for a guy who can programm a python crawler.
The python crawler shall crawl through the web and save the all domain names with certain top-level-domain endings.
There shall be the option to easily define the "top-level-endings" (e.g. "de" / "at" / "ch")
The found secondlevel domains are stored together with the TLD in a database.
Domains with top level ending "de" are saved in a table 1.
Domains with top level ending "at" are saved in a table 2.
Domains with top level ending "ch" are saced in a table 3.
(and so on)
If a domain name doesnt fit the setting of the TLD, it is not stored anyway.
ATTENTION please:
Just the secondlevel domain with the TLD (example: "[url removed, login to view]") has to be stored, NOT every single URL that can be found on a certain secondlevel domain. (e.g. [url removed, login to view]; [url removed, login to view], ...)
项目ID: #5560761
关于项目
有2名威客正在参与此工作的竞标,均价$38/小时
Hello sir, i propse you a python script, where you will give as input an url, and the script will fetch the url and discover every href in it and , if the TLD is in your list, will save it in the db. Then, second step, 更多
I have previously implemented web crawlers using python. Adding the extra features required will not be an issue. My code is guaranteed to be well commented and simple for possible future developments.