crawler
$500-700 USD
货到付款
*expert rating removed as it seems it is not very popuplar here
Crawler needs to be capable of completing below tasks:
1. collect external links from "list pages" specified by URL (eg.: specified categories of dmoz, yahoo, and other a 5-10 other major link directories)
<!-- -->
1.
<!-- -->
1. Analyze PageRank of links and save only sites with pagerank higher than 3.
2. Find RSS feed(s) on the pages found above. Save RSS feed(s) in DB and make crawling result downloadable in CSV.
2. Collect twitter URLs on specified sections of [[url removed, login to view]][1]'s. Visit twitters where follower number is greater than X (defined at the beginning of the crawling with the URL - section of wefollow) and save the RSS of the twitter.
1. there is a DFD image which clarifies this but I can't attach it here...
3. minimalistic UI where results of crawlings can be accessed in a list and results downloadable in CSV. List contains: date of crawling, URL where crawling stared.
## Deliverables
DFD attached.
项目ID: #3608648