Restoration analysis
The restoration analysis uses a combinatorial approach to permute all possible sets of up to six parameters at a time. Such combination of permuted parameters is called a virtual intervention.
Default parameters are sampled from the OCD posteriors. Permuted parameters are sampled from the controls’ posteriors. 1000 simulations are run in parallel, each with a new draw from posterior distributions, for each virtual intervention. Then, simulations are grouped by sets of 50 to get 20 independent cohorts for each virtual intervention. The distance between the the virtual intervention and the reference healthy controls’ simulations is evaluated as the sum of Wasserstein distances across pathways in frontostriatal functional connectivity space.
The process is illustrated here:
Generation of virtual cohorts and evaluation of virtual interventions. A. Parameters are drawn from OCD (orange) and control (blue) posterior distributions to create 2000 reference virtual subjects, 1000 in each group. Virtual interventions are modelled by drawing from the reference control group distributions for the parameters targeted by the intervention and drawing from the reference distributions of the OCD group for parameters not targeted by the intervention. 1000 virtual subjects are generated to create the virtual intervention cohorts. B. For each group, the 1000 virtual subjects are separated into 20 cohorts of 50 subjects. We computed functional connectivity (FC) distances between all controls and OCD cohorts (\(d(A,B)\); i.e., reference); and all controls and virtual interventions cohorts (\(d(A,B’)\); i.e., intervention). C. The distribution of FC distances between reference (orange) and intervention (purple) is compared (decrease in the distance implies functional improvement). D. The efficacy of the intervention is statistically quantified using a Mann-Whitney U test between reference and intervention distributions. E. Normalization of the U statistic by the number of samples leads to the AUC for which scores above 0.5 denote functional improvement.
Note
Subscript notation may differ from the publication, original code uses numbered subscripts rather than letters. The mapping is :
\(\theta_{ij}\) refers to parameter \(\theta\) to population \(i\) from population \(j\).
Combinatorial permutations
We save the combinations of permuted parameters on the distributed filesystem such that all workers can access it.
params = ['C_12', 'C_13', 'C_24', 'C_31', 'C_34', 'C_42', 'eta_C_13', 'eta_C_24', 'sigma', 'sigma_C_13', 'sigma_C_24']
combinations = []
for i in range(0,6):
for p in itertools.combinations(params, r=i+1):
combinations.append(p)
print("number of parameter combinations: "+str(len(combinations)))
with open(os.path.join(proj_dir, 'postprocessing', 'params_combinations.pkl'), 'wb') as f:
pickle.dump(combinations, f)
>>> number of parameter combinations: 1485
Then we generate new synthetic data using the adequate posteriors from either OCD or healthy controls. We run 1000 simulations (20 cohorts of 50 virtual subjects) for each vitual intervention.
- OCD_modeling.mcmc.launch_sims_parallel(kdes, cols, test_param, args=None)[source]
Run batched simulations from posterior inference in parallel:
- Parameters:
kdes (dict) – Kernel Density Estimates from optimization (structured as kdes[group][param])
cols (list) – Parameters (columns) which draw samples from KDEs (otherwise default values are used)
test_param (list) – Parameters for which the posterior is permuted for the virtual interventions.
args (argparse.Namespace) – Extra arguments with options.
- Return type:
None. Output of the simulations are written into the local SQLite database.
Intervention improvement measure
A Mann-Whitney U statistic was computed to quantify the functional improvement (as the distance to healthy controls FC) of the virtual intervention. A normalized measure (the Area Under the receiver operating characteristic Curve – AUC) of functional improvement is derived from the U statistic, and AUCs are sorted by number of intervention targets. This informs about how reliably parameters influence the restoration of healthy functional dynamics from OCD.
- OCD_modeling.mcmc.compute_efficacy(df_restore, args=None)[source]
Add new column to input DataFrame with treatment efficacy based on distance to healthy controls in functional connectivity space.
Note
Simulation metric is the Wasserstein distance and the data metric is the Euclidean distance. Comparison in FC space is automatically performed using the correct metric (default: Wasserstein).
- Parameters:
df_restore (pandas.DataFrame) – Virtual intervention simulation outputs with distance precomputed.
args (argparse.Namespace) – Extra arguments with options. Important option in this function is args.efficacy_base which informs how treatment efficacy is computed (e.g. retained was “ustat”)
- Returns:
df_restore – Input DataFrame with new column ‘efficacy’.
- Return type:
pandas.DataFrame
- OCD_modeling.mcmc.compute_distance_restore(df_sims, args)[source]
Compare base simulations (from OCD subjects and healthy controls) to simulated interventions by calculating distances in parallel.
- Parameters:
df_sims (pandas.DataFrame) – Simulation outputs loaded from the database.
args (argparse.Namespace) – Extra arguments with options.
- Returns:
df_restore – Copy of df_sims with efficacy measures (including cohorts first level statistics).
- Return type:
pandas.DataFrame
- OCD_modeling.mcmc.compute_distance_restore_sims(df_base, df_sims, args)[source]
Compute distance metric between batches of simulations in controls, patients and continuous restoration from patients to controls.
- Parameters:
df_base (pandas.DataFrame) – Simulated data from virtual OCD subjects (patients) and healthy controls. No virtual interventions were performed. Corresponds to simulation using OCD and healthy posteriors.
df_sims (pandas.DataFrame) – Simulation outputs loaded from the virtual interventions database.
args (argparse.Namespace) – Extra arguments with options.
- Returns:
outputs – Distances between simulated interventions and healthy controls cohorts, with normalized parameters values w.r.t original OCD parameter posterior distribution.
- Return type:
list of dictionanries
We plot the AUC of each virtual intervention (x-axis), showing the targeted parameters on the y-axis, according to their number of targets (colorcode):
- OCD_modeling.mcmc.plot_distance_restore(df_restore, args, gs=None)[source]
Plot horizontal box plot of best virtual interventions outcomes, sorted by number of target points.
- Parameters:
df_restore (pandas.DataFrame) – Virtual intervention simulation outputs with distance and efficacy precomputed.
args ((argparse.Namespace)) – Extra arguments with options. An important options here is args.n_tops which defines how many of the best interventions to display by number of target points.
gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.
AUC measures for each virtual interventions (via plot_distance_restore function).
We also show the log-linear relationship between the efficacy of the five best intervention for each number of targets.
- OCD_modeling.mcmc.plot_efficacy_by_number_of_target(df_top, gs=None, args=None)[source]
Swarm plots of efficacy score (y-axis) by number of targets (x-axis), with means projected in log-linear scale.
- Parameters:
df_top (pandas.DataFrame) – Subset of df_restore with only virtual intervention resulting in positive outcomes.
gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.
args (argparse.Namespace) – Extra arguments with options.
Log-linear realtionship between virtual intervention AUC and number of intervention targets \(n_t\).
Parameters contribution measure
We want to relate the contribution of changes in each parameter \(\theta\) to the interventions’ efficacy (i.e. the AUC value) using only statistically significant AUC, i.e. \(p_{FWE}<0.05\).
First, to give an intuition of the final measure let’s visualize the Pearson’s correlation between the change \(\Delta\) in models parameters (z-score normalized \(\theta^z\)) and the change in frontostriatal functional connectivity \(\Delta FC\).
- OCD_modeling.mcmc.plot_five_top_params_distance_correlations(df_top, args)[source]
Plot intermediate visualization of parameter change correlation to change of distance in FC space.
- Parameters:
df_top (pandas.DataFrame) – Five best parameter combinations (i.e. virtual interventions) per number of target (number of target \(n_t=1 \cdots 6\)).
args (argparse.Namespace) – Optional arguments.
Associations between parameter values and intervention efficacies. Z-score normalized parameter values (\(\theta^z\), where \(\theta\) represents the parameter name; x-axis) against intervention efficacy (difference in distance to healthy controls in functional connectivity space, \(\Delta FC\); y-axis). Only the best intervention (highest AUC) is shown for each number of targets \(n_t\).
Then, the metric used is the dot-product between the two variables, that basically summarize the the correlation measure, across all significantly positive interventions (per number of targets \(n_t\)) and not only the best one as above. Because the variables are not zero-centered before applyig the dot-product, the output of this measure carries the sign of the intervention for each target (increase vs. decrease).
- OCD_modeling.mcmc.compute_scaled_feature_score(df_top, params, kdes, scaling='dot_product_correlation', args=None)[source]
Compute feature scores (i.e. parameter contribution) as the dot-product between their normalized location on the KDEs distribution (using z-statistic) and their resulting efficacy (AUC).
- Parameters:
df_top (pandas.DataFrame) – Subset of df_restore with significantly positive virtual interventions
params (list) – Individual intervention targets (i.e. model parameters).
kdes (dict) – Kernel Density Estimates of posterior distributions of OCD subjects and healthy controls.
scaling (string) – How to scale the efficacy of the virtual intervention by the z-score normalized parameter. “dot_product_correlation” (default) multiplies the normalized parameter by the efficacy (AUC). Other values can be “pearson_correlation”, “spearman_correlation” and “covariance_correlation” but those measures distort the results and interpretation due to the mean-centering of variables. Other legacy values are “contribution” (same as dot-product) and “sensitivity” which divides the normalized parameter by the AUC efficacy (giving a sense of “sensitivity” of the parameter).
args (argparse.Namespace) – (optional) Extra arguments with options.
- OCD_modeling.mcmc.plot_parameters_contribution(df_params_contribution, params, gs=None, args=None)[source]
Polar plots of parameters contribution across virtual interventions, colorcoded by number of intervention targets \(n_t\). Each polar plot corresponds to a number of target \(n_t\).
- Parameters:
df_params_contribution (pandas.DataFrame) – Contribution of parameters (z-score normalized parameters times efficacies of virtual interventions).
params (list) – Individual intervention targets (i.e. model parameters).
gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.
args (argparse.Namespace) – (optional) Extra arguments with options.
Contribution measures for each target across all significant positive intervention, colorcoded by number of targets \(n_t\).