Data Mining Using C++
$30-250 USD
货到付款
Please read the project and see the attachments
1 Introduction
This project requires you to explore classication algorithms on a real world dataset, and write a
report explaining your experimental results. The language of implementation is to be C++. The other requirements are that your program be able to interpret the data format specied below, and
be able to classify instances and produce interesting statistics such as accuracy, false positive rate,
false negative rate, etc. You are free to construct whatever user interface for your program, but
you must fully document your interface.
2 Algorithm
Your algorithm should be based on the classfication algorithms: KNN, Decision Tree, SVM, Naive Bayes, Logistic Regression.
Usually a straight forward implementation of one method will not lead to satisfactory perfor-
mance. Your algorithm can be a combination of methods and should incorporate one or more
data mining techniques when the situation arises. These techniques include (and certainly
not limited to):
{ Handling imbalanced dataset
{ Proper imputation methods for missing values
{ Dierent treatment of various type of features: continuous, discrete, categorical, etc.
3 Data
You'll be examining the behavior of your model on a dataset from the UCI machine learning lab.
The dataset is represented in a standard format, consisting of 3 les. The rst le, [login to view URL],
describes the categories and features of the dataset. It also has some empirical results for your ref-
erence. The other two les are [login to view URL] and [login to view URL], containing the
actual data instances, formatted at one instance per line, as follows:
1
F1
1 ; F2
1 ; : : : ; Fk
1 ; label1
F1
2 ; F2
2 ; : : : ; Fk
2 ; label2
...
F1
n; F2
n; : : : ; Fk
n ; labeln
where Fj
i , labeli (i = 1; : : : ; n; j = 1; : : : ; k) represent the value of the jth feature and class category
for the ith instance respectively.
The data you will be examining was extracted from the census bureau database. Each instance
contains an individual's educational, demographic and family information. Prediction task is to
determine whether a person makes over 50K a year. You should use [login to view URL] to
train your classier and use [login to view URL] to evaluate the performance of your learning
algorithm.
4 Your Mission...
Deliverables for this project are:
Code to implement the classication algorithm for the data le formats given above
A README le, with simple, clear instructions on how to compile and run your
code
Testing statistics for the application of your learning algorithm. At a minimum you should
provide training set accuracy, test set accuracy
A discussion of data mining techniques employed in your algorithm
A report analyzing the behavior of your algorithm on the dataset, including any unusual or
anomalous (in your opinion) behavior
项目ID: #15791812
关于项目
有8名威客正在参与此工作的竞标,均价$104/小时
Hello Sir I will do your work and i will assure you a quality work , i have a team of professional developers. Relevant Skills and Experience i am expert in Algorithm, C++ Programming, Data Mining, Machine Learning 更多
I am good in , JAVA, ASP, DOT NET , Android, Java, C/C++, AJAX, JavaScript, C#, Visual Basic, JQUERY and etc Relevant Skills and Experience I am good in , JAVA, ASP, DOT NET , Android, Java, C/C++, AJAX, JavaScript, C 更多