Build CSS Selectors for ~600 News Sites

已关闭 已发布的 3 年前 货到付款
已关闭 货到付款

We want to get CSS selectors for a list of ~600 websites, mainly news sites. The selectors should point to where the news articles, or things similar to news articles, are on the site and where the title, url and (optionally) content of the articles can be found. We need this to build a search index over the sites by following all the articles' links.

For this, we will give you a list of urls of the sites as Excel file. You should send back an Excel file that contains the CSS selectors for them. An example file is attached.

General remarks:

- We use a library called JSoup which extends CSS selectors with some useful extra features - you can read up on them here [login to view URL] Generally, CSS selectors should be good enough for most websites, but JSoup's features might be useful in specific cases.

- Basically the urls in our list contain a list some kind of thing: news articles, forum posts, items in a shop etc. You should build CSS selectors that point to the individual things, e.g. the separate news articles separated from one another, the separate forum posts etc.

- We plan to use the CSS selectors in a software, so all entries in the Excel file you send back have to be valid CSS selectors. If you do not find a CSS selector for a url, please leave the corresponding field empty. (Do *not* mark it with something like "?", "unclear" etc., because our software will not be able to understand such entries.)

- If you are unsure about how to handle some specific case, please ask.

- We need the results as soon as possible. It would be great if you could send back first results even before you've finished with the complete list.

- Probably the easiest way to find CSS selectors for a page is to use the "developer tools" of your browser. I've attached a screenshot of how they look like in Chrome. If you click on the symbol at very left of the top row shown in the screenshot, you can hover/click an element on the website and are shown the corresponding html elements at the bottom row.

The following steps explain how to fill out the columns of the Excel sheet:

1. Selector Article

Choose a selector that all articles share.

Example 1: On the page [login to view URL] (also see the attached screenshot), all articles have a surrounding tag <article class="article hp">. The full CSS selector for the leftmost article is "body > div.page_container > div.page_content > div > section > [login to view URL] > div:nth-child(1) > [login to view URL] > ul > li.item_32338249.item.hppos0.new.item_id > article", but we want a selector that *all* articles share, so "article" is the CSS selector that you should enter in the Excel file (without the "").

Example 2: On [login to view URL] there are several columns of articles that have different CSS selectors. Fortunately, selectors can be combined with , so the correct entry in the Excel file is "[login to view URL] li,[login to view URL] li" (without the "").

2. Selector URL

Choose a selector to the link element that links to the article's own page. This selector has to be relative to the article selector from step 1.

Example 1: For [login to view URL], this is ".article_content a" (without the "").

3. Selector Title

Choose a selector to something that could be used as a title for the article. If there is nothing better, point to the text of the link element from the step before. Again, the selector has to be relative to the selector from step 1.

Example 1: For [login to view URL], "article_content h2" or ".article_content a" would both be good. In that case you could choose one of them.

4. Selector Content (optional)

Choose a selector to the actual text / teaser text of the article. If there is no teaser text, or the teaser text is not in an html element below the selector from step 1, or simply if you are unsure about it, you can leave out this step.

5. Selector Published Date

Choose a selector to the published date. Leave empty if there is no published date available.

数据输入 Excel CSS 网页搜罗 HTML

项目ID: #28197116

关于项目

9个方案 远程项目 活跃的3 年前

有9名威客正在参与此工作的竞标,均价€190/小时

PMPMPM1985

Hi, I have been working on HTML, CSS, jQuery for many years, and I can do this job for you. I assume you need these selectors for scraping, and I can easily handle them. I have well understood the requirement to care 更多

€500 EUR 在3天内
(14条评论)
4.8
BHTECHNOLOGY

Hi Sir, I will make your website in professional way and mobile responsive. I will not charge anything until you are satisfied with the product. Message me if you are interested and for more information. Thank you

€167 EUR 在10天内
(0条评论)
0.0
Tripplen

Hello there, I have a strong background in handling technical projects. I will complete the project as stipulated on your requirements. Lets chat for further details about your project. Thank you. Ps Ihave a degree in 更多

€140EUR 在1天里
(0条评论)
0.0
SadiyaTKhan

Accurate, fast keying skills and sound knowledge of computer applications. Skilled in planning and organizing with the ability to complete tasks on deadline. An independent worker who successfully meets the challenges 更多

€140 EUR 在7天内
(0条评论)
0.0
paulineatwork

Dear Client, I worked for 8 Years in a Marketing Company to collect Data, Research, update the Databases also I do some Marketing Mailings but due to pandemic the company closed. I am here to apply for the job post opp 更多

€230 EUR 在7天内
(0条评论)
0.0
creativeimplanet

Greetings, I have read your Project Scope and offering our service for your project. We are providing Digital Solutions to our clients for their project and We Can start your work right now. We have over 20years of ex 更多

€140 EUR 在2天内
(0条评论)
0.0
Hagarsal

Hi there I've read with interest your post and i am confident i can do the job, I applied for this job looking for review more than money. I have solid knowledge of CSS, SCSS, HTML and developer tools. Send me a qu 更多

€140 EUR 在7天内
(0条评论)
0.0