Create re-usable spider to scrape information from website

进行中 已发布的 Aug 26, 2014 货到付款
进行中 货到付款

We need a re-usable script to iterate through many web pages to pull a table of information from each page.

The script will need to iterate through a list of 600,000 URLS, not every URL will return a table of data, so we need to record just those that return valid data.

It is very important not to crash the website that is being scraped, so a delay of 2-3 seconds between each request to the server must occur.

The results of the scraping should be stored in a csv file.

Python

项目ID: #6372625

关于项目

远程项目 活跃的Aug 26, 2014