NLP Python Linux Tool needed

已完成 已发布的 3 年前 货到付款
已完成 货到付款

I need a tool that will work similarly to [login to view URL] pipeline and support following languages PL DE IT EN UA FR CZ

The tool should process parallel texts in pure text format, as can be found in the [login to view URL] repository (to be more precise, moses format). Based on file extension, the program should automatically detect what language it is.

The tool should have the following capabilities, executed one after another in exactly that order:

step 1: reducing the whole text to lowercase letters (it should be optional and disabled by default)

Step 2: pre-clean the text (optional, standard enabled, we want to use [login to view URL] scripts) i.e. [login to view URL], [login to view URL], [login to view URL], [login to view URL] – maybe you will find something else essential?

Step 3: normalize punctuation marks (optional, standard enabled), we want to use the same tool as here: [login to view URL] i.e. [login to view URL]

Step 4: Tokenization - should be performed with the use of the SpaCy tool, and for the Polish language SpaCy-pl [login to view URL]

Step 5: Truecasing - (optional, standard enabled) you can use a fragment of [login to view URL] because the whole thing comes from [login to view URL] anyway.

I don't have any prepared models, I want such models to be trained based on the input data and then applied on the same data. Just like it is done in Moses

Step 6: division into units smaller than words with the BPE algorithm [login to view URL] (optional function, standard on with a 50,000 dictionary) it must be possible to adjust the size of the dictionary with the appropriate parameter.

The result should be pure text encoded in utf8, in the same format as the input format. The number of lines MUST MATCH, the text must be still PARALLEL after processing. The program should write on the console what it is currently doing, it should easily work under Linux Ubuntu control and be easy to install. Ideally it should provide an installation script. You will also need to create short documentation and user manual with simple examples.

Python Perl Linux UNIX 编程

项目ID: #26553197

关于项目

7个方案 远程项目 活跃的3 年前

授予:

computerroman

We have discussed the project in the chat so I just trying to put here enough characters to bid, cause here should be more than 100 characters.

$150 USD 在7天内
(15条评论)
4.2

有7名威客正在参与此工作的竞标,均价$151/小时

Demenntor

Dear Employer, I have read the project details and confident to work on NLP python linux tool. I have extensive knowledge on perl, python, Linux and UNIX. Kindly message me so that we can discuss more about the work. 更多

$200 USD 在2天内
(18条评论)
4.3
engrfarooq04

Hi, Good day. I read your project description very carefully. I've really rich experience in python,linux and C programming and excellent a software architecure skills. I'm really confident about your project, and very 更多

$200 USD 在7天内
(1条评论)
2.6
rukshanlancer

Hi Thanks for your contact. I've carefully checked your requirements and really interested in this job. I'm full stack developer working at large-scale websites as a developer . I can complete your project on time an 更多

$100 USD 在2天内
(0条评论)
0.0
utkarsh7238

Hey, I can help you in NLP Python Linus Tool In how much time you want it to be completed???? Let's talk upon your project Waiting for your response!!!

$30USD 在1天里
(0条评论)
0.0
oxanarvayva

Hi! Agnieszka K. I have read your job description and assure you that I am a perfect fit for the job. Available NOW and can start Immediately. Looking for soonest reply from you. Thanks

$150 USD 在3天内
(0条评论)
0.0