Find Jobs
Hire Freelancers

Body Text Extraction

$100-500 USD

进行中
已发布超过 18 年前

$100-500 USD

货到付款
Body Text Extraction Articles published on the WWW often contain extraneous clutter. Most articles consist of a main body which constitutes the relevant part of the particular page. Surrounding this body is irrelevant information such as copyright notices, advertising, links to sponsors, etc. The technology needed to extract only relevant text is commonly used in clustering and categorization applications where html is used. Below are a couple of URLs to one example we have found for solving the body text extraction problem, but the tool is written in Python and Perl. Do not limit your bids to just this solution if you have other ideas or experience to offer. <[login to view URL]> <[login to view URL]> 1. We need a Java function 2. That can be embedded within our tool 3. That can take an HTML based document that is currently stored in memory and produce just the main body of text 4. No text associated with buttons, advertisements, etc. 5. The function needs to work on all types of HTML based documents We are NOT looking for a HTML stripper or HTML to text converter. We are less concerned about the format and the chance of loosing some information, than we are that it contains the majority of the main context of the document. Our tools will then learn from the context and knowledge within the extracted text. Please provide a bid along with a high level description on how you will solve this problem. Also provide information on any experience you may have in this area of work. ## Deliverables 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables): a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment. b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request. 3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement). ## Platform All that run Java
项目 ID: 3862695

关于此项目

11提案
远程项目
活跃19 年前

想赚点钱吗?

在Freelancer上竞价的好处

设定您的预算和时间范围
为您的工作获得报酬
简要概述您的提案
免费注册和竞标工作
颁发给:
用户头像
See private message.
$175 USD 在10天之内
5.0 (5条评论)
3.5
3.5
11威客以平均价$189 USD来参与此工作竞价
用户头像
See private message.
$102 USD 在10天之内
4.9 (114条评论)
7.0
7.0
用户头像
See private message.
$425 USD 在10天之内
4.9 (147条评论)
6.5
6.5
用户头像
See private message.
$119 USD 在10天之内
5.0 (122条评论)
5.6
5.6
用户头像
See private message.
$169.15 USD 在10天之内
4.0 (23条评论)
5.1
5.1
用户头像
See private message.
$127.50 USD 在10天之内
2.5 (14条评论)
5.2
5.2
用户头像
See private message.
$255 USD 在10天之内
5.0 (14条评论)
3.6
3.6
用户头像
See private message.
$425 USD 在10天之内
4.6 (15条评论)
3.4
3.4
用户头像
See private message.
$93.50 USD 在10天之内
5.0 (12条评论)
3.2
3.2
用户头像
See private message.
$102 USD 在10天之内
5.0 (7条评论)
2.6
2.6
用户头像
See private message.
$85 USD 在10天之内
0.0 (0条评论)
0.0
0.0

关于客户

UNITED STATES的国旗
United States
5.0
15
会员自8月 26, 2004起

客户认证

谢谢!我们已通过电子邮件向您发送了索取免费积分的链接。
发送电子邮件时出现问题。请再试一次。
已注册用户 发布工作总数
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
加载预览
授予地理位置权限。
您的登录会话已过期而且您已经登出,请再次登录。