TVB-Pypeline - - Work in Progress!

This project maps our current automatized MRI processing pipeline (http://github.com/BrainModes/TVB-empirical-data-pipeline) to Python using Nipype, making the used toolboxes inside easily exchangeable.

For a general overview about the pipeline see Schirner, Rothmeier et al. (2015)

Pipeline Overview

Please note that this pipeline does extensive analysis and is thus computationally heavy. TEsting was carried out on a High-Performance-Clustercomputer using >100 CPU Cores.

Installation:

The Pipeline uses Nipype which depends mainly on Python 2.7. The following list gives an overview about the Python toolboxes which are used in the current state of the Pipeline. See the corresponding Doc-Pages for installation and dependency resolving.

Since Nipype/Python also perform as a wrapper for Toolboxes invoked through the Shell-Interface, you also have to make sure the toolboxes you want to use are installed on your system and their binaries/libs are included in the Shell's searchpath.

For preprocessing, the following toolboxes are used:

When it comes to fiber tractography, there is a vast number of available tools for that. Their usage also highly depens on how your dwMRI-Data was recorded. One of the main parting points is the number of different diffusion-gradient strengths applied during the measurement (i.e. the number of different b-values). If the dataset has only a single value greater than zero, one talks about single-shell data. As soon as more than one value (>0) is involved, the data is called multi-shell data

Currently, we tested two toolboxes for tractography, one for each of the aforementioned scenarios:

MRTrix 0.2.12: Single-Shell Tracking
FSL: Multi-Shell Tracking (Not yet implemented in the Python-Pipeline!)

Install the Pipeline

Download the files from the GitHub Repository and unpack the files on your workstation/cluster. To run it on a specific cluster architecture, simply edit the plugin-type in the master control script TVB_pipeline.py. Locate the following code block at the end of the file

# ## Run the Workflow
#wf.run(plugin='MultiProc', plugin_args={'n_procs': cpu_count()})
wf.run(plugin='OAR', plugin_args={'oarsub_args': '-l walltime=04:00:00'})
wf.run()

As you can see, plugins are used to handle different situations considering the environment in which the pipeline is intended to be run, e.g. different job schedulers on High-Performance-Clustercomputer or local installations on a multicore workstation. For an overview about the available plugins see the Doc-Page about Plugins. Since this page is sometimes a bit outdated (e.g. the OAR plugin is not yet listed), see also https://github.com/nipy/nipype/tree/master/nipype/pipeline/plugins

Preparing your rawdata

Looking at the TODO-List in the bottom-section of this manual, you can see that the organization of the users raw-data is still a bit inflexible considering the fact that the pipeline requires a certain folder-schema. Currently, you need to precisely stick to the following naming conventions:

/home/myUserName/pipeline/subjects/
|-- Sub1/
|   |-- RAWDATA/
|   |   |-- MPRAGE/
|   |   |   |-- Maybe/Some/SubFolders
|   |   |   |   |-- Arbitrary-Image-Names-001.dcm
|   |   |   |   |-- Arbitrary-Image-Names-002.dcm
|   |   |   |   |-- ...
|   |   |-- DTI/ 
|   |   |-- BOLD-EPI/

Inside the several folders for the different imaging modalities, the number of subfolder doesnt matter. Note that the pipeline currently only support DICOM data as input

Using fMRI data is optional, i.e. if you dont include that data into your RAWDATA-folder, you still get the structural and dwMRI data processed!

Running the Pipeline

To finally run the pipeline, locate the TVB_pipeline.py script using your systems Shell and pass the subjects ID and the absolute path to the folder holding your subjects RAWDATA-folder (see above).

python /home/myUser/pipeline/TVB_pipeline.tyb --sub-id <SUBJECT-ID> --sub-dir <SUBJECT-DIR>

The log-files are stored into a subfolder of your SUBJECT-DIR called TVB_pipeline.

The Results of the Pipeline

Among several intermediate results, like a full FREESURFER recon_all dataset, there are also several datasets which are in-house developed. THe generation is described in the aformentioned research article. The following tables can bee seen as a reference linking the explanations in the paper to the file- and variable-names which are generated by the pipeline-code.

Diffusion-MRI:

The results are by default stored into <SUBJECT-DIR>/tractography/tracks/<SUBJECT-ID>_SC.mat (MATLAB/Octave file) and also in JSON format <SUBJECT-DIR>/tractography/tracks/<SUBJECT-ID>_SC.json Those files include several matrices representing different metrics:

Variable-Name	Type of Data	Refered to in the paper as
SC_cap_agg_counts	Region-wise Capacity Matrix using the number of tracts found between different regions	Raw Counts
SC_cap_agg_bwflav1	Region-wise Capacity Matrix using the number of distinct connections found on single-voxel level	Distinct Connections
SC_cap_agg_bwflav1_norm	Same data as above but normalized to the range between 0 and 1
SC_cap_agg_bwflav2	Region-wise Capacity Matrix using the number of distinct connections found on single-voxel level. Each strength entry is weighted by the total number of connections leaving the corresponding brain area	Weighted Distinct Connections
SC_cap_agg_bwflav2_norm	Same data as above but normalized to the range between 0 and 1
SC_dist_<mean/mode/median>_agg	The mean/mode/median distance between all distinct tracks connecting the individual brain regions	SC Distances
SC_dist_var_agg	The variance of the distance between all distinct tracks connecting the individual brain regions

Functional-MRI:

By default, resulting data will be stored into ** <SUBEJCT-DIR>/bold/*. Results feature a run of FSLs feat pipeline and also regionswise timeseries stored into the file * <SUBJECT-ID>_fMRI.mat **. As for the SC-file descbried above, this MATLAB/Octave file stores various things:

Variable-Name	Type of Data
ROI_ID_table	Various numbers from FREESURFERs mri_segstat. The headers have been removed. They can be found in the following file: <SUBEJCT-DIR>/bold/segstat_summary.txt
<SUBJECT-ID>_ROIts	A Matrix with dimensions fmri-timepoints X parcellation-regions. This matrix hold the region-wise averaged bold time course
FC_cc	The functional connectivity matrix. This matrix is computed by applying the corrcoeff function onto the parcellated bold timecourse, resulting in a matrix of dimensionality number-of-brainregions X number-of-brainregions

Throughout our paper, we used the 68-Cortical-Regions of the Desikan-Killaney atlas. To reproduce the FC based on this atlas from the parcellated timeseries obtained by using the default parcellation in this pipeline (aparc+aseg), one can use the following code snippet:

from numpy import shape
# Clear the ROI-table and leave only the Desikan entries
start1 = shape(ROI_ID_table)[0] - 69
stop1 = start1 + 34
start2 = stop1 + 1
stop2 = shape(ROI_ID_table)[0]
fMRI_DK68 = fMRI[:, range(start1, stop1)+range(start2, stop2)]

TODO-List:

~~Write a new Documentation!~~
~~Implement fMRI processing based on the code here: https://github.com/BrainModes/TVB-empirical-data-pipeline/blob/NSG/fmriFC.sh~~
Implement Tractography Thresholding into the MRTrix module. Possibly trying to include the method described in Morris et al. (2008). Alternatively one could also dig up the old hard-threshold code since the short-range tracking-flares are rendered meaningles anyway by our aggregation method!
Make the file-sorting of the user-data more sophisticated. This means that the pipeline should be able to somehow recognize which kinds of data-sets (e.g. fMRI, T1, dwMRI) is included in the user data and then route the particular folder-paths onto the corresponding processing-nodes inside the pipeline. This might be achieved through using nipype's SelectFiles interface
Include some example workflows for different cluster scenarios, realized through e.g. controll-scripts written in BASH
Re-Implement Multishell-Tracking using FSLs bedpostx as in https://github.com/BrainModes/TVB-empirical-data-pipeline/tree/multiShell
Implement the formatting of the results into a TVB-ZIP-File as in https://github.com/BrainModes/TVB-empirical-data-pipeline/blob/NSG/matlab_scripts/connectivity2TVBFS.m
Check if Non-DICOM data works as input
Add a Doc-Section about the resulting data
Support multiple runs of fMRI (e.g. bold1; bold2; bold3; ...)