Website Parser/Scraper Needed [Java]

已取消 已发布的 7 年前 货到付款
已取消 货到付款

I require a web-scraping application to be written that does not require additional dependencies (ie nothing other than the base install java/jvm/jre).

*** READ ENTIRE PROJECT REQUIREMENT BEFORE BIDDING ***

Scraper must log to an excel document (see sample in files section)

Sample URL for testing:

[url removed, login to view]|&bedrooms=0&bathrooms=1&Accessible=False&pictures=False&pets=False&ac=False&AgeRestricted=False&smoking=False&coveredParking=False&MaxSqFt=5000&MinSqFt=0&keyword=&sortBy=LastUpdate

Scraper requirements:

- GUI must match design of sample in files section (Scraper [url removed, login to view])

- accept direct input of URL to begin scraping

- allow adding/editing/deleting of saved URLs

- - a saved URL item will contain a title and the URL. Only the title will be displayed in the GUI list

- clicking an entry in the list will load the URL into the URL text box above the list

- select an output file / option (... button) to type or browse to target file location

- - browse dialog must filter for .xlsx by default

- - scraper must log data into an excel document

- - if file exists, data will be appended to the existing file

- - - set all entries in the "Active" column to "No"

- - if file does not exist, scraper must create the target file

- - - First row must contain the following headings: Active, Last Active, Landlord Name, Phone, Contacted, Notes

- "Scrape" button will begin scraping the URL in the URL text box.

- - MUST use the text in the box, as it might be manually edited before running

- connect to website - determine # of listings/pages returned

- visit each listing's page

- save each landlord's name and telephone number (on the right side of the page)

- - if excel file selected earlier exists, scan the file to see if landlord information is already in the file (excluding '

- - if already in the file

- - - update record's "Active" field to "Yes"

- - - update record's "Last Active" field to current date (YYYY/MM/DD format)

- - - go to next property

- - if not already in file

- - - add a record to the file using field structure below

Scraper must go through each page of returned results to get all data. Links for each page are at the bottom of the page. Visual progress should be displayed as scraper runs, and stored in a log file named "[url removed, login to view]" representing the date and time the scraper was executed. See "console and log [url removed, login to view]" for example of what both should look like.

Fields: Active (Yes/No)

Last Active (YYYY/MM/DD format)

Landlord Name (as it appears)

Phone (###-###-#### format, no ( ) around first digits)

Contacted ("No" by default for all new contacts. do not alter for existing contacts)

Notes (leave blank, do not edit)

The end-state is to have an excel document i can use to keep updating and adding new contacts based on the scrape of the [url removed, login to view] website.

Deliverable includes all source code files.

To be considered for this project, you MUST:

- Have a bid proposal within the posted project budget

- Include the phrase "Java is more than just coffee" as the first line in your bid proposal

- State when you will be able to begin actively working the project, and if you are working any other projects at this time

- Visit the sample URL, select the first property, and confirm you can see the name and phone number on the page

*** Your bid will not be considered if you do not conduct the steps above ***


If you are unable to see the name and number listed on the right side of the property's page, you may need to click the green "View Phone Number" button. Since you will be parsing the source code, however, the phone number is accessible from this page. The code format follows:

<div class="printcontact">
<h1>Landlord:
<span id="ctl00_MainContentPlaceHolder_LLNameLblPrint">Russell Thompson</span></h1>
<b>Phone:
<span id="ctl00_MainContentPlaceHolder_lblPhoneDisplayPrint">(682) 564-4245</span></b>
</div>


=== corrections ===
- An entry is considered a duplicate if the phone number matches. Do not worry about the name.
- Do not put dashes or ( ) in the phone numbers
- For an 800 number, ensure a 1 is placed in front of the number (18005555555)

Java 软件开发 网页搜罗

项目ID: #11385508

关于项目

6个方案 远程项目 活跃的7 年前

有6名威客正在参与此工作的竞标,均价$155/小时

shafaqat11

Hello Sir, How are you? I understand your job and very much excited to offer my services for your job. Please feel free to contact me directly to discuss this position further. I am all time online on Skype and 更多

$20 USD 在2天内
(43条评论)
5.1
arhamIT2020

Dear Sir,I read your job description very carefully .I ready to start this project .i can show sample .if you interested please discuss over live chat about project. Thank you, Arham IT

$166 USD 在5天内
(4条评论)
2.6
hoangduong97

Hello, I am experienced and specialized in C/C++ and Java programming, I have experience in Android. I have taken part in many competitions, including national competition, in which I won 2nd prize. I am currently a Co 更多

$111USD 在1天里
(3条评论)
0.9
elasolova35

Java is more than just coffee I have a lot of experience with custom web scraping, you can check my previous projects about scraping. Also, I am good at web design so that the a good at designing gui/s, or it can 更多

$100 USD 在7天内
(2条评论)
0.2