Objective: to know when a story of our interest appears in monitored websites
To program a web spider, which periodically fetchs the web pages from a list of URL, looking for keywords, and arises an alert when find a match, storing the match in a database
Stored content must include all the content of the page, this is, HTML, images, flash objetcts, etc.
The same alert must not be fetched twice. This can be accomplished by storing "sign" of the match in a database, this is, storing the next 3 words after and 3 words before the match. If the words around the match are the same as an already matched and stored finding, then it is the same match.
There must be a "match browser", showing the match itself (ten words before and after the match), date, time, url, and the source html of the web page that contained the match. There must be possible to restrict/sort/group the list by date, keywords, and url.
For matches in RSS URLs, the complete note must be extracted and stored.
The application must be programmed in c#, microsoft's own .NET CLI. Database must be MySQL. Must be use 100% windows forms API, no gtk#, and 100% managed code, dont use specific windows dlls. Must not hardode backslashes for, ejample, internal routes, must use de constants the the sistem supplies.