(MLC's aim is not to identify the machine learning algorithm with the best test performance. In this sense, we do not consider this "Challenge" as a competition, rather we view it as a collective experiment.)
- To download data, go here.
- There are two problems: binary and continuous. Thus, you need a machine-learning toolkit (algorithm) that can handle both types of variables: a regression model for predicting continuous variables and classification model for binary variables.
- For the binary problem, we will use classification accuracy (ACC: the fraction of cases where prediction is accurate) and area under the ROC curve (AUC) as the performance metric. We recommend the scikit-learn package in python, which we will use in our own evaluations, or Matlab©’s perfcurve function to compute AUC values (http://www.mathworks.com/help/stats/perfcurve.html). For the continuous problem, we will use root mean square error (RMSE, also called the root mean square deviation, RMSD. Defined here) and Pearson's linear correlation (R). See validation details, below.
- Using the training data (either summary or mri), the training labels and your prediction algorithm of choice, conduct:
- 5-fold cross-validation and (if you want)
- your favorite cross-validation procedure. For example, you can use (repeated) random leave-2-out cross-validation.
HINT: The classification problem is particularly challenging and a prediction accuracy over 0.6 is acceptable.
- Compute and save your estimates of the performance metrics (ACC, AUC for binary, and RMSE and R for continuous) for both 5-fold and your favorite (additional) cross-validation. You need to submit these as files with following names: Pred_Cont_Estimated_Accuracies.csv, Pred_Binary_Estimated_Accuracies.csv, Pred_Cont_AdditionalEstimates.csv, and Pred_Binary_AdditionalEstimates.csv. Each of these files should contain two numbers, separated by comma. For classification, these numbers are ACC and AUC. For regression, they are RMSE and R -- all estimated via cross-validation (CV) on the training data. The first two files are for 5-fold CV. The latter two, for your favorite CV method, are not required. The latter two files should contain a line of text as a brief decsription of CV method preceding the numeric values. For further information on the formatting of these files, please refer to the CodaLab details page.
- Next, train your algorithm on the entire training data (either summary or mri).
- Use this trained model from previous step to compute predictions on the test data. Save these predictions as two files, Pred_ContTest_sbj_list.csv and Pred_BinaryTest_sbj_list.csv. Please use the same formatting as ContTest_sbj_list.csv and BinaryTest_sbj_list.csv, but replace the NaN’s (not provided test labels), with real numbers. For the binary problem, the prediction labels should be between 0 and 1, e.g., reflect class probabilities. We will threshold these values at 0.5 to compute binary predictions.
- Finally, please write up a brief description of your method to include in your submission and save as Description.doc or Description.tex. This should be at most one page long text (it could be as short as a single paragraph) that includes all relevant information (and references) for someone else to replicate your analysis. If you use a novel method, please provide a reference to this method. We also recommend that you share a pointer to the executables of the method(s) you used. Your description write-up should also contain some information on your "favorite cross-validation" method, you used to produce Pred_Cont_AdditionalEstimates.csv, and Pred_Binary_AdditionalEstimates.csv. Finally, you should also provide a list of team members: list of names and affiliations of the persons contributing to this particular submission. Please provide a designated contact person, with their email address.
- Once you've completed all above steps, proceed to submission instructions below.
For binary classificaiton, we will use ACC and AUC metrics.
ACC is defined based on the binarized predictions. Thus, it requires you to threshold your continuous predictions. When we are doing this on your predictions on the test data, we will use 0.5 as the threshold. I.e., values larger than 0.5 will be assigned a predicted class value of 1 and smaller values will be assigned to class 0. ACC is equal to the number of cases where the binarized prediction is the same as the ground truth label, divided by the total number of cases.
AUC is the area under the receiver operatinc characteristic curve, or ROC curve. For furhter details, please refer to the references here.
For continouous regression, we will compute RMSE and Pearson's R. We will assess the quality of predictions based on RMSE.
Both RMSE and Pearson's R should be computed between the continouous predictions and ground truth label values.
(Submissions are open as of April 15, 2014)
The submissions can be made from the MLC Codalab website. [PLEASE READ FOLLOWING CAREFULLY BEFORE GOING TO THE CODALAB PAGE]
Once you have completed analyzing the challenge data and followed the steps described above, you are ready to submit your results.
Each team is allowed (and encouraged) to submit multiple entries. To achieve this in the CodaLab system, you will have to create multiple user accounts. This is because, although each user account can submit multiple entries, only one of these entries will be represented on the leaderboard. As default, this will be the user's latest submission before the deadline. However, we will allow users to revert back to previous submissions to be represented on the leaderboard. TO SUM: Each user will be allowed to submit up to 5 entries, one of which will be represented on the leaderboard.
Also, by submitting your results, you agree to the terms and conditions below. This includes that at least one member of each team agrees to officially register to the accompanying workshop at MICCAI 2014 and possibly give a presentation.
Before submitting your results, please make sure you have the following.
- Estimated ACC and AUC values for the binary classification and estimated RMSE and R values for the regression problem (both for 5-fold and -if you want- your favorite cross-validation procedure). Save these files: Pred_Cont_Estimated_Accuracies.csv, Pred_Binary_Estimated_Accuracies.csv, Pred_Cont_AdditionalEstimates.csv, and Pred_Binary_AdditionalEstimates.csv. The latter two files are not required.
- Pred_ContTest_sbj_list.csv and Pred_BinaryTest_sbj_list.csv
- Brief description of method (call this file Description.doc or Description.tex). This decsription should contain list of team members: list of names and affiliations of the persons contributing to this particular submission. Please provide a designated contact person, with their email address.
- Prepare submission file to upload to the MLC Codalab website.
The submission file should be a zip file with the following structure:
Note: zip file can have any name - but has to contain all required files. Also the zip file contains the csv files and NOT a folder that contains the csv files. To obtain this (e.g., on a Windows PC or Mac), simply select all relevant files you want to submit, right click and compress.
|- Pred_Binary_AdditionalEstimates.csv [not required]
|- Pred_Cont_AdditionalEstimates.csv [not required]
|- Description.tex [or Description.doc]
The deadline for submissions will be 11:59 pm UTC, June 8, 2014.
Challenge Terms and Conditions
All the data made available for the MICCAI 2014 Machine Learning Challenge (MLC) can only be used to generate a submission for this challenge.
Results submitted to MICCAI 2014 MLC, can be published (as seen appropriate by the organizers) through different media including this website and journal publications.
By submitting an entry to MICCAI 2014 MLC, each team agrees to have at least a single member register to the accompanying workshop (held on September 18, 2014 at MIT).