R Code to Calculate Random Forest Out-of-Bag Estimate of Error (Revised Price)
$30-250 USD
已关闭
已发布将近 8 年前
$30-250 USD
货到付款
Will pay $50 for project to start immediately and be completed within 24 hours (by June 18th, 11:30pm GMT). $10 bonus if completed by within next 4.5 hours (by June 18th, 4:00am GMT). I am available to work with you by chat until then.
Preference given to freelancers who have R and Random Forest experience. You will need to be very familiar with Random Forests and R as I am not and can not provide much assistance.
Essentially, I am looking for an small enhancement of the Random Forest process in the R GUI called Rattle. From what I can tell by looking at the R Add-In called Party, there are a number of functions included which might mean adding perhaps 5-15 additional lines of code to what I already have (although I could certainly be off on that estimate).
Using the R GUI called Rattle, I can easily select my dataset (see below) and choose a single Y, as well as the random seed, and choose the ratio of training to testing data. Next, I execute the RF (Random Forest) model choosing only the number of trees (default is 500) and the number of predictors (default is the integer of the square root of m total predictors). From this, R (through Rattle's code) gives me the Out-of-Bag Error and the traditional 2x2 classification grid for both training and testing data. Not including the 5 seconds it takes R to run the code, I can set up this scenario from scratch in less than 1 minute. Due to Rattle’s limitations, I can only execute for a single Y at a time. This issue, as well as the inability to aggregate those Out-of-Bag results, is my problem.
The algorithm above is outlined very succinctly at [login to view URL]~dzeng/BIOS740/[login to view URL] on the first page under the title “The algorithm” and is covered in the listed points 1, 2, 3 and 1. Essentially, what I need done is the very next point they list that says:
2. Aggregated the OOB predictions. (On the average, each data point would be out-of-bag around 36% of the times, so aggregate these predictions.) Calculate the error rate, and call it the OOB estimate of error rate.
However, as I am really after the PPV (Positive Prediction Value - i.e. where a 1 is predicted for Yn) and not the global OOB error (due to my data being skewed towards y-values of 0) of the models, I am more interested in the raw prediction counts so I can calculate error rates myself.
I will supply a CSV data sample of ~4000 observations (~50/50 training/testing split) with multiple binary Y's and multiple binary X's and one continuous X (an integer ranging from 0 to ~30) for each observation. I can even supply the R code from Rattle for the procedure I am currently using.
I would like your R code to be able to accept the following inputs from me:
-observations in the format: Observation #, Y1…Yn, X1…Xm
-random seed value
-number of trees value (default is 500)
-number of predictors to be randomly sampled (default is the integer of the square root of m total predictors)
-number of rows at bottom of data list for holdout data (to be scored each round)
-number of rounds (which will be ~1,000 – 1,000,000)
I would like your R code to be able to supply the following outputs to me:
-CSV file with full original data plus the aggregated OOB prediction totals (for both training and testing data) for each observation for each Y (i.e. the number of times the OOB prediction was 0 for each observation for each Y and the number of times the OOB prediction was 1 for each observation for each Y)
If you happen to be aware of an open source R GUI that will already do all of the above for me (and that I can understand and use), you can just help me install it and will not need to supply the R code. As long as it works for me, the project will be considered completed.
I am a STATISTICS tutor for last 5 years. I have expertise in Statistical Analysis. I can show you some of my previous analysis. I have excellent concepts of Random variables, Probability Distribution, Sampling and different tests. . I had a course on DATA ENINEERING and Artificial Intelligence as well. I know all data mining techniques (Predication & Classification) and data analysis techniques. I have worked on K-mean, ID3, Bayesian theorem, confusion matrix, Hungarian algo and so on .My research was on Rough Set Theory. The tool I uses are SPSS, EXCEL, minitab, Weka and R for programming. Thank you for considering my proposal
dear sir,
i have more than 8+ years of experience in r Programming.i can provide you best suitable solution for your requirement. looking forward for more discussion.
I am Herilalaina RASOLONJATOVO from Madagascar, and I am an expert in Data Analysis using R programming and I can help you do this project according to your specification. I have done several projects in this field in the past and locally in my country. I have gone through your requirements and I am ready to start work as soon as possible. Please respond to give a go ahead so work can start.
Thank you in advance,
Herilalaina RASOLONJATOVO.
Hi,
I have extensive experience in R and I am the author of muRandomForest (Product for leading Analytics Player Mu Sigma). I can complete in next 5 hours.
Thanks,
Atul
I have 3 year experience in the same. My most of the previous work experience is in field of healthcare or more precisely analyzing patients data for developing algorithms for automates disorder detection.
I have developed many novel algorithms for seizure, cancer etc detection.
I am expert in MATLAB and R programming only.