Data Mining Using C++

已完成 已发布的 6 年前 货到付款
已完成 货到付款

Please read the project and see the attachments

1 Introduction

This project requires you to explore classi cation algorithms on a real world dataset, and write a

report explaining your experimental results. The language of implementation is to be C++. The other requirements are that your program be able to interpret the data format speci ed below, and

be able to classify instances and produce interesting statistics such as accuracy, false positive rate,

false negative rate, etc. You are free to construct whatever user interface for your program, but

you must fully document your interface.

2 Algorithm

 Your algorithm should be based on the classfi cation algorithms: KNN, Decision Tree, SVM, Naive Bayes, Logistic Regression.

Usually a straight forward implementation of one method will not lead to satisfactory perfor-

mance. Your algorithm can be a combination of methods and should incorporate one or more

data mining techniques when the situation arises. These techniques include (and certainly

not limited to):

{ Handling imbalanced dataset

{ Proper imputation methods for missing values

{ Di erent treatment of various type of features: continuous, discrete, categorical, etc.

3 Data

You'll be examining the behavior of your model on a dataset from the UCI machine learning lab.

The dataset is represented in a standard format, consisting of 3 les. The rst le, [login to view URL],

describes the categories and features of the dataset. It also has some empirical results for your ref-

erence. The other two les are [login to view URL] and [login to view URL], containing the

actual data instances, formatted at one instance per line, as follows:

1

F1

1 ; F2

1 ; : : : ; Fk

1 ; label1

F1

2 ; F2

2 ; : : : ; Fk

2 ; label2

...

F1

n; F2

n; : : : ; Fk

n ; labeln

where Fj

i , labeli (i = 1; : : : ; n; j = 1; : : : ; k) represent the value of the jth feature and class category

for the ith instance respectively.

The data you will be examining was extracted from the census bureau database. Each instance

contains an individual's educational, demographic and family information. Prediction task is to

determine whether a person makes over 50K a year. You should use [login to view URL] to

train your classi er and use [login to view URL] to evaluate the performance of your learning

algorithm.

4 Your Mission...

Deliverables for this project are:

 Code to implement the classi cation algorithm for the data le formats given above

 A README le, with simple, clear instructions on how to compile and run your

code

 Testing statistics for the application of your learning algorithm. At a minimum you should

provide training set accuracy, test set accuracy

 A discussion of data mining techniques employed in your algorithm

 A report analyzing the behavior of your algorithm on the dataset, including any unusual or

anomalous (in your opinion) behavior

算法 C++编程 数据挖掘 机器学习(ML)

项目ID: #15791812

关于项目

8个方案 远程项目 活跃的6 年前

授予:

Yknox

I'm interesting your project very well I'm a Good C++, Java, Math, ML, Algorithm expert. I m quite well experienced in these jobs. Let's go ahead with me I want to service for you continously. Relevant Skills and Expe 更多

$210 USD 在2天内
(647条评论)
8.8

有8名威客正在参与此工作的竞标,均价$104/小时

Programmer59

Hello Sir I will do your work and i will assure you a quality work , i have a team of professional developers. Relevant Skills and Experience i am expert in Algorithm, C++ Programming, Data Mining, Machine Learning 更多

$155 USD 在3天内
(11条评论)
4.8
fastlabindia

I am good in , JAVA, ASP, DOT NET , Android, Java, C/C++, AJAX, JavaScript, C#, Visual Basic, JQUERY and etc Relevant Skills and Experience I am good in , JAVA, ASP, DOT NET , Android, Java, C/C++, AJAX, JavaScript, C 更多

$30USD 在1天里
(7条评论)
4.0