Need help to build and evaluate a named entity recognition (NER System) via sequence tagging or perform a systematic comparison of existing NER approaches its your choice.

已关闭 已发布的 5 年前 货到付款
已关闭 货到付款

3.0.1 Option I: Implementation

• Implement the Viterbi algorithm for predicting the best tag sequence according to a learnt model;

• Use this to construct a Maximum Entropy Markov Model;

• Explore possible feature sets and perform experiments comparing them;

• Evaluate the performance of your system on English and German;

• Describe your experiments, results and analysis in a report.

For Option I, your submission should include:

• your report (∼3 pages, not including tables/diagrams);

• a zipfile containing your code and README instructions on how to run it. Please do not include the

data, but assume the README directory contains the conll03 subdirectory.

3.0.2 Option II: Application

• Find existing NER systems and apply them to given text;

• Critically compare NER system descriptions;

• Systematically analyse and compare the errors made by NER systems;

• Describe your experiments, results and analysis in a report.

For Option II, your submission should include:

• your report (∼4 pages, not including tables/diagrams).

• a zipfile containing any code or notebooks used in analysis and README instructions on how to run

it. Please include the [login to view URL] data tagged by each system.

Data Set

The main dataset (eng) is a collection of newswire articles (1996-7) from Reuters, which was developed for the

Computational Natural Language Learning (CONLL) 2003 shared task. It uses the typical set of four entity types:

person (PER), organisation (ORG), location (LOC), miscellaneous (MISC). The second dataset (deu) is the German

data from CONLL 2003, which has an extra column (the second) which holds the lemma (base form) of each

word. Your system should be primarily developed for English, but also tested on German for comparison.

The prepared data can be downloaded from here. By downloading the data you agree 1) to only use it for this

assignment, 2) to delete all copies after you are finished with the assignment, and 3) not to distribute it to anybody.

The data is split into subsets:

• training ([login to view URL]) - to be used for training you system;

• development ([login to view URL]) - to be used for development experiments;

• held-out test ([login to view URL]) - to be used only once features and algorithms are finalised.

For further information, see [login to view URL]

Do not download and build the data set from the above URL. It does not include the text.

机器学习(ML) 自然语言 Neural Networks Python 软件开发

项目ID: #17138164

关于项目

4个方案 远程项目 活跃的5 年前

有4名威客正在参与此工作的竞标,均价$236/小时

schoudhary1553

Hello, I can help with you in your project build and evaluate a named entity recognition . I have more than 5 years of experience in Machine Learning, Natural Language, Neural Networks, Python, Software Development. 更多

$250 AUD 在3天内
(40条评论)
6.1
shivampanchal

I have a good hands on working with Advanced Excel, R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Application. My ar 更多

$200 AUD 在3天内
(36条评论)
6.4