Find Jobs
Hire Freelancers

Development of web scraping modules in Scrapy(repost)

$30-100 USD

已取消
已发布超过 13 年前

$30-100 USD

货到付款
You should develop spider modules to extract the ads from the following five italian sites, using the Python scraping framework Scrapy: [login to view URL] [login to view URL] [login to view URL] [login to view URL] [login to view URL] Please see the attached file for a Scrapy example project with two scraping modules for [login to view URL] and [login to view URL] Specifically, your task will be: * Find good starting urls for the five specified sites, to ensure that the sites can be widely scraped for new ads * Develop the spider modules for the five sites. The scraping modules MUST be robust, i.e. you MUST NEVER use full XPath paths to extract the requested elements, but instead you should use relative and clever ones based on attributes (such as id, class, width, etc) or any identifying features like contains(@href, 'image'). * Ensure that all available fields (described later), where present in the ads, are extracted. Not all sites have all fields in their ads: you should check which fields are present in the ads of each site, and extract them **A basic knowledge of Italian is required to work on this project.** ## Deliverables Here is a list of the fields which you should extract from the ads, where present. Please note that some sites have only a few of these fields: * source = (string, fixed) the name of the site, e.g. [login to view URL] (without the http://) * title = (string) the title of the ad, e.g. "Appartamento" or "Villa" (like in the examples, you should remove the city/province/region from the title) * city = (string) the city where the building of the ad is located * province = (string) the province where the building of the ad is located * region = (string) the region where the building of the ad is located * area = (string) for larger cities, the area of the city where the building of the ad is located * address = (string) the address of the building of the ad * description = (string) the description of the building of the ad * sale_rent = (integer) 0 if the building if for sale, 1 if the building is for rent (suggestion: you need to check for words "vendita" and "affitto" in the ad, like in the examples) * publish_date = (date) the date when the ad has been published * price = (integer) the price of the building of the ad (default -1 if not specified) * building_type = (string) the type of the building, e.g. "Residenziale" * building_surface = (integer) the building surface, in square meters (default -1 if not specified) * rooms = (integer) the number of rooms of the buidling (default -1 if not specified) * bathrooms = (integer) the number of bathrooms of the buidling (default -1 if not specified) * box_type = (string) the type/description of the car's box (if the building has a car's box) * box_surface = (integer) the surface of the car's box in square meters (if the building has a car's box. Default -1 if not specified) * has_balcony = (integer) 0 if the ad says that the building doesn't have a balcony, 1 if the ad says it has a balcony, -1 if unspecified * has_terrace = (integer) 0 if the ad says that the building doesn't have a terrace, 1 if the ad says it has a terrace, -1 if unspecified * has_elevator = (integer) 0 if the ad says that the building doesn't have an elevator, 1 if the ad says it has an elevator, -1 if unspecified * garden_type = (string) the type of the garden * garden_surface = (integer) the garden surface, in square meters (default -1 if not specified) * floor = (integer) the floor of the building (default -1 if not specified) * heating_type = (string) the type of the heating, e.g. "Autonomo" or "Centralizzato" * building_condition = (string) the condition of the building, e.g. "Ottimo" or "Buono" or "Ristrutturato" You will need to install Scrapy, along with these Python modules: * libxml2 * lxml * pywin32 (if you work in win32) * Twisted * [login to view URL]
项目 ID: 3710814

关于此项目

1条提案
远程项目
活跃14 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
1威客以均价$340 USD来参与此工作竞标
用户头像
See private message.
$340 USD 在10天之内
0.0 (1条评论)
0.0
0.0

关于客户

ITALY的国旗
Biella, Italy
5.0
19
付款方式已验证
会员自2月 26, 2004起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。