已取消

Create a keyword/phrase counting program for webpages listed in a search result

This program should allow for entering a query term which will be searched for using Google. The webpages listed in that Google search should then all be analyzed and a final list of common phrases should be produced.

It should include an option to specify of the amount of Google search results included up to 100.

It should only issue a single Google query per term as up to 100 result can be obtained in a single query from Google.

After this list of URLs is obtained from the Google results a list should be produced of the most common 2, 3, 4, and 5 word combinations across all webpages in the list. All special characters and code should be removed such as -.,! or any HTML, java script, etc. Note that this is webpages, not sites so only the specific page URL found in the Google query needs to be analyzed.

Finally a list should be saved or output for each of the word combination amounts.

**For example** if the query term is "lyme disease" this program will perform a Google search for "lyme disease." The 100 results returned for this query (max Google allows for one query) will have their URL added to a list. Every URL will then have text content scraped, special characters and code removed, and analyzed to produce the most common 2,3,4, and 5 word combinations. These lists will then be saved as a text file or output another way (console or whatever).

Word combinations means words that are found together and separated by a space. So the text "one word two word three word" could have a two word combination of "word two" or a three word combination of "two word three."

## Deliverables

This program should allow for entering a query term which will be searched for using Google. The webpages listed in that Google search should then all be analyzed and a final list of common phrases should be produced.

It should include an option to specify of the amount of Google search results included up to 100.

It should only issue a single Google query per term as up to 100 result can be obtained in a single query from Google.

After this list of URLs is obtained from the Google results a list should be produced of the most common 2, 3, 4, and 5 word combinations across all webpages in the list. All special characters and code should be removed such as -.,! or any HTML, java script, etc. Note that this is webpages, not sites so only the specific page URL found in the Google query needs to be analyzed.

Finally a list should be saved or output for each of the word combination amounts.

**For example** if the query term is "lyme disease" this program will perform a Google search for "lyme disease." The 100 results returned for this query (max Google allows for one query) will have their URL added to a list. Every URL will then have text content scraped, special characters and code removed, and analyzed to produce the most common 2,3,4, and 5 word combinations. These lists will then be saved as a text file or output another way (console or whatever).

Word combinations means words that are found together and separated by a space. So the text "one word two word three word" could have a two word combination of "word two" or a three word combination of "two word three."

技能: 工程, PHP, 项目管理, 脚本安装, shell脚本, 软件构架, 软件测试, 网络主机, 网站管理, 网站测试, 视窗桌面

查看更多: this is a search, php code for google search, common query, webpages using html code, search console, phrase, google search result, counting, google console, google query, search word java, create word search, java create html, word search php code, java google search result url, search result page, java google search, search across, create common search, java text search, keyword results, added keyword, create html file php script, word search java, url max results

About the Employer:
( 0 reviews ) United States

项目ID: #3347905