已关闭

Create Google Site Search Crawler for Jonathan Siennicki

The Jonathan Siennicki Project: Create a software in the language C or C++ (or script like in python or php) that is multi threaded, supports http and https proxys - and able to scrape Google, Yahoo, & Bing results to identify Google Site Search (and competing sites like [url removed, login to view], Algolia, etc) by the javascript and/or various footprints used. (similar to scraping Google results to see which sites have Google Analytics)

Crawled results will filter out duplicates of top level domains (TLD).

So if there is a www. or just a TLD domain, we want to only use one.

Tasks:

1. Identify contact page. (most often [url removed, login to view] and/or [url removed, login to view], but can vary)

Scrape the contact name (if appears), the email addresses, and the telephone numbers.

This information should be saved in a Excel file.

Would love it if could also submit to wordpress sites, etc and perhaps support captcha api for that purpose. Lots of code on github/sourceforge for this.

2. Extract WHOIS Contact information and if possible, also on the websites Contact page. The only thing we want is the website, contact name, email address and telephone number of webmaster. The software will have an option to save the output according to the search engine it was crawled from in Excel, CVS or TXT.

3. The software should have a built in WYSIYG editor, and support multiple SMTP credentials and proxys for sending emails, and have the ability to do the scraping and task the emails immediately thereafter.

Menu Options of Software:

Define how many sites to crawl. Define tasks related to emails. There should be a menu to output success/error logs, which search engines to crawl (all are defaulted), which search engines to use (all are defaulted), ability to configure additional footprints, and produce output of results (compile database) in excel or CVS format.

Also, the menu should show how many proxies are working, and should randomly use them when extracting from search engines.

Output:

For example, if we selected "Google Site Search" and used "Google, Bing and Yahoo" to get the results, we should be able to create a database based on that.

By default, all search engines are used to find all the results for each 'site search' platform and duplicate domains are erased before following up with checking for website CONTACT and WHOIS info

Note: There is tons of open source for proxies, scraping, WHOIS lookups, etc and everything written here on Sourceforge and Github. So this is like a lego.

If you can create the Jonathan Siennicki software, please let me know the language (can be web, but preferably a binary application). Please give a price, time of delivery, and also any software resume you have to convince us your the right person for the job.

We've prepared a document that is attached with various footprints of the various platforms offering a similar service to Google Site Search.

技能: 谷歌应用引擎, Javascript, PHP, Python, 网页搜罗

查看更多: customize google site search results, results google site search, google site search php, dreamhost google site search, google images search crawler, google mini google site search, google site search multiple fields, google site search, google site search wrapper, asp master page google site search, google site search api, google site search masterpage, movie streaming site search crawler, google site search master page, create link site search term

关于此雇主:
( 0个评论 ) Israel

项目ID: #16532957

13 威客就此工作平均出价 $492

sun0815

Hello. I am full experience with C++ C C# dotnet aspnet and windows desktop application development. You will be satisfied with my great result. I can implement google search crawler Best regards

$555 USD 在10天内
(21条评论)
5.9
$555 USD 在10天内
(47条评论)
5.6
$277 USD 在10天内
(24条评论)
5.1
$396 USD 在10天内
(6条评论)
4.0
$400 USD 在10天内
(5条评论)
3.8
hungvotan

with 10 years experience in Wordpress, PHP, Woocommerce, SEO. I can help you finish projects which you probose. we can communicate each other easily.

$666 USD 在10天内
(5条评论)
2.7
$555 USD 在10天内
(2条评论)
2.9
$555 USD 在10天内
(2条评论)
2.0
Morvika

Hi there, I’d like to be considered for your writing position. I’m a strategic writer with a strong background developing online content, including blog posts, social media posts, articles, press releases and other br 更多

$555 USD 在10天内
(2条评论)
0.2
WondersoftMS

Hello, Hope you are doing well ! Wondersoft Multimedia Solutions is 5 years young company in Mobile & Website development with development center in India. Our experienced & enthusiastic development team have dev 更多

$500 USD 在10天内
(0条评论)
0.0
lokeshab

 11+ Years of IT experience  Extensive knowledge in Full stack web development using HTML5, CSS3, LESS, SASS, Postcss, JavaScript, ES6, JQuery, ReactJS, Redux, flux, D3, NodeJS, Angular 2.0, GUI desings, webpack, Gu 更多

$555 USD 在15天内
(0条评论)
0.0
$277 USD 在5天内
(0条评论)
0.0
$555 USD 在10天内
(0条评论)
0.0