Create a keyword/phrase counting program for webpages listed in a search result
$30-150 USD
已取消
已发布将近 13 年前
$30-150 USD
货到付款
This program should allow for entering a query term which will be searched for using Google. The webpages listed in that Google search should then all be analyzed and a final list of common phrases should be produced.
It should include an option to specify of the amount of Google search results included up to 100.
It should only issue a single Google query per term as up to 100 result can be obtained in a single query from Google.
After this list of URLs is obtained from the Google results a list should be produced of the most common 2, 3, 4, and 5 word combinations across all webpages in the list. All special characters and code should be removed such as -.,! or any HTML, java script, etc. Note that this is webpages, not sites so only the specific page URL found in the Google query needs to be analyzed.
Finally a list should be saved or output for each of the word combination amounts.
**For example** if the query term is "lyme disease" this program will perform a Google search for "lyme disease." The 100 results returned for this query (max Google allows for one query) will have their URL added to a list. Every URL will then have text content scraped, special characters and code removed, and analyzed to produce the most common 2,3,4, and 5 word combinations. These lists will then be saved as a text file or output another way (console or whatever).
Word combinations means words that are found together and separated by a space. So the text "one word two word three word" could have a two word combination of "word two" or a three word combination of "two word three."
## Deliverables
This program should allow for entering a query term which will be searched for using Google. The webpages listed in that Google search should then all be analyzed and a final list of common phrases should be produced.
It should include an option to specify of the amount of Google search results included up to 100.
It should only issue a single Google query per term as up to 100 result can be obtained in a single query from Google.
After this list of URLs is obtained from the Google results a list should be produced of the most common 2, 3, 4, and 5 word combinations across all webpages in the list. All special characters and code should be removed such as -.,! or any HTML, java script, etc. Note that this is webpages, not sites so only the specific page URL found in the Google query needs to be analyzed.
Finally a list should be saved or output for each of the word combination amounts.
**For example** if the query term is "lyme disease" this program will perform a Google search for "lyme disease." The 100 results returned for this query (max Google allows for one query) will have their URL added to a list. Every URL will then have text content scraped, special characters and code removed, and analyzed to produce the most common 2,3,4, and 5 word combinations. These lists will then be saved as a text file or output another way (console or whatever).
Word combinations means words that are found together and separated by a space. So the text "one word two word three word" could have a two word combination of "word two" or a three word combination of "two word three."