I am interested in recreating a solution in Python. I will be able to supply the general notion of the model and the data for testing. I expect the same result to be obtained as the originator of this model.
This model is based on Bayesian inference and implementing Gibbs sampling.
I have MATLAB 7.12 code for the Gibbs sampler which I can provide.
Properly commented code would be appreciated.
We are attempting to build a model with accurate predictions on unseen data.
A necessary step in the building of models is to ensure that they have not overfit the training data, which leads to sub optimal predictions on new data.
In order to achieve this we have created a simulated data set with 200 variables and 20,000 cases. An ‘equation’ based on this data was created in order to generate a Target to be predicted. Given the all 20,000 cases, the problem is very easy to solve – but you only get given the Target value of 250 cases – the task is to build a model that gives the best predictions on the remaining 19,750 cases.
This is a classification problem. I will be using Area Under the Curve to evaluate.
I am only interested in solving the above with specified Bayesian Inference technique and Gibbs sampler.
** I will provide the data necessary to complete this mini-project.