OCD_modeling.mcmc

OCD_modeling.mcmc.compute_distance_restore(df_sims, args)[source]

Compare base simulations (from OCD subjects and healthy controls) to simulated interventions by calculating distances in parallel.

Parameters:
  • df_sims (pandas.DataFrame) – Simulation outputs loaded from the database.

  • args (argparse.Namespace) – Extra arguments with options.

Returns:

df_restore – Copy of df_sims with efficacy measures (including cohorts first level statistics).

Return type:

pandas.DataFrame

OCD_modeling.mcmc.compute_distance_restore_sims(df_base, df_sims, args)[source]

Compute distance metric between batches of simulations in controls, patients and continuous restoration from patients to controls.

Parameters:
  • df_base (pandas.DataFrame) – Simulated data from virtual OCD subjects (patients) and healthy controls. No virtual interventions were performed. Corresponds to simulation using OCD and healthy posteriors.

  • df_sims (pandas.DataFrame) – Simulation outputs loaded from the virtual interventions database.

  • args (argparse.Namespace) – Extra arguments with options.

Returns:

outputs – Distances between simulated interventions and healthy controls cohorts, with normalized parameters values w.r.t original OCD parameter posterior distribution.

Return type:

list of dictionanries

OCD_modeling.mcmc.compute_distances(df_data, df_sims, ses, args)[source]

Compute Euclidian distances between single empirical functional connectivity (FC) and simulated FC.

Parameters:
  • df_data (pandas.DataFrame) – Empirical FC.

  • df_sims (pandas.DataFrame) – Simulated FC.

  • ses (string) – Session (i.e. point in time): initial (“ses-pre”) or follow-up (“ses-post”) appointment.

  • args (argparse.Namespace) – Extra arguments with options.

Returns:

assoc – Unique pairing between individual OCD subjects and simulated subjects (digital twins).

Return type:

dict

OCD_modeling.mcmc.compute_efficacy(df_restore, args=None)[source]

Add new column to input DataFrame with treatment efficacy based on distance to healthy controls in functional connectivity space.

Note

Simulation metric is the Wasserstein distance and the data metric is the Euclidean distance. Comparison in FC space is automatically performed using the correct metric (default: Wasserstein).

Parameters:
  • df_restore (pandas.DataFrame) – Virtual intervention simulation outputs with distance precomputed.

  • args (argparse.Namespace) – Extra arguments with options. Important option in this function is args.efficacy_base which informs how treatment efficacy is computed (e.g. retained was “ustat”)

Returns:

df_restore – Input DataFrame with new column ‘efficacy’.

Return type:

pandas.DataFrame

OCD_modeling.mcmc.compute_errors(df_sims, df_data, args)[source]

Compute root-mean-square errors between empirical and simulated data in frontostriatal pathways functional connectivity.

This error takes the form of a distribution, i.e. with 1000 simulations grouped by cohorts of 50, it makes 20 error values.

Parameters:
  • df_sims (pandas.DataFrame) – Simulated dataset.

  • df_data (pandas.DataFrame) – Empirical dataset.

  • args (argparse.Namespace) – Optional arguments.

Returns:

Distribution of errors in functinal connectivity space.

Return type:

pandas.DataFrame

OCD_modeling.mcmc.compute_kdes(histories, n_pts=100, args=None)[source]

Computes Kernel Density Estimates (KDEs) of the posterior distributions of parameters.

Parameters:
  • histories (dict) – Nested dictionnary of SQL alchemy history objects.

  • n_pts (int) – Number of points used to estimate the probability density functions (PDFs).

  • args (argparse.Namespace) – Extra options.

Returns:

  • kdes (dict) – Nested dictiorany of KDEs and associated PDFs.

  • cols (list) – List of parameters for which the KDEs were estimated.

OCD_modeling.mcmc.compute_scaled_feature_score(df_top, params, kdes, scaling='dot_product_correlation', args=None)[source]

Compute feature scores (i.e. parameter contribution) as the dot-product between their normalized location on the KDEs distribution (using z-statistic) and their resulting efficacy (AUC).

Parameters:
  • df_top (pandas.DataFrame) – Subset of df_restore with significantly positive virtual interventions

  • params (list) – Individual intervention targets (i.e. model parameters).

  • kdes (dict) – Kernel Density Estimates of posterior distributions of OCD subjects and healthy controls.

  • scaling (string) – How to scale the efficacy of the virtual intervention by the z-score normalized parameter. “dot_product_correlation” (default) multiplies the normalized parameter by the efficacy (AUC). Other values can be “pearson_correlation”, “spearman_correlation” and “covariance_correlation” but those measures distort the results and interpretation due to the mean-centering of variables. Other legacy values are “contribution” (same as dot-product) and “sensitivity” which divides the normalized parameter by the AUC efficacy (giving a sense of “sensitivity” of the parameter).

  • args (argparse.Namespace) – (optional) Extra arguments with options.

OCD_modeling.mcmc.compute_stats(histories, args=None)[source]

Computes the statistics of the optimization outcome, i.e. tests the posterior distributions (parameters) between controls and patients.

Parameters:
  • histories (list) – Controls and patient pyABC history objects.

  • args (argparse.Namespace) – Extra options

Returns:

df_stats – Statistics.

Return type:

pandas DataFrame

OCD_modeling.mcmc.create_df_null(multivar, multivar_null)[source]

create pandas dataframe of regression coefficients w.r.t. null distribution

OCD_modeling.mcmc.create_params(kdes, cols, test_param, args)[source]

creates n_sims new parameters from posetrior distribution of base cohort

OCD_modeling.mcmc.cross_validation(model, X, y, args)[source]

Run cross validation using CV type given in argument

OCD_modeling.mcmc.evaluate_prediction(history, n_samples=1000)[source]

Re-run model for the best scored parameters (highest posteriors)

OCD_modeling.mcmc.fix_df_base(df_base)[source]

fix duplicate entries in df_base

OCD_modeling.mcmc.get_default_params(N=4)[source]

Create a structure with default models, simulation and BOLD parameters

Parameters:

N (int) – Number of regions to model

Returns:

  • default_model_params (dict) – Model parameters.

  • default_sim_params (dict) – Simulation parameters.

  • default_control_params (dict) – Control parameters.

  • default_bold_params (dict) – BOLD parameters.

OCD_modeling.mcmc.get_df_base(args)[source]

Import simulations from infered parameters for controls and patients without restoration

OCD_modeling.mcmc.get_history_parser()[source]

Script arguments when ran as main

OCD_modeling.mcmc.get_prior()[source]

Create uniform prior distributions of parameters to start Sequential Monte-Carlo.

Returns:

  • prior (pyABC.Distribution) – Distribution object of priors.

  • bounds (dict) – Lower and upper bounds of parameters , i.e. bounds[‘param_name’]=[min,max].

OCD_modeling.mcmc.import_results(args)[source]

Read optimization results from DB

OCD_modeling.mcmc.launch_sims_parallel(kdes, cols, test_param, args=None)[source]

Run batched simulations from posterior inference in parallel:

Parameters:
  • kdes (dict) – Kernel Density Estimates from optimization (structured as kdes[group][param])

  • cols (list) – Parameters (columns) which draw samples from KDEs (otherwise default values are used)

  • test_param (list) – Parameters for which the posterior is permuted for the virtual interventions.

  • args (argparse.Namespace) – Extra arguments with options.

Return type:

None. Output of the simulations are written into the local SQLite database.

OCD_modeling.mcmc.load_df_data(args)[source]

Loads clinical FC data in pandas Dataframe

OCD_modeling.mcmc.load_empirical_data()[source]

Load empirical dataset that we optimized against.

Returns:

df_data – Empirical dataset with subjects and functional connecticity (Pearson’s correlation) in each frontostriatal pathways.

Return type:

pandas.DataFrame

OCD_modeling.mcmc.load_simulations(args)[source]

Load simulated inferences.

Parameters:

args (argparse.Namespace) – Dictionnary-like datastructure containing args.db_names argument which was enterred in command-line as a list of simulated dataset.

Returns:

out – List of pandas.DataFrame, each containing n simulations with functional connectivity in each frontostriatal pathway. (default n=1000)

Return type:

list

OCD_modeling.mcmc.multivariate_analysis(df_sim_pat, params=['C_12', 'C_13', 'C_24', 'C_31', 'C_34', 'C_42'], behavs=['YBOCS_Total', 'OCIR_Total', 'OBQ_Total', 'MADRS_Total', 'HAMA_Total', 'Anx_total', 'Dep_Total'], models={'Ridge': Ridge(alpha=0.01)}, null=False, args=None)[source]

Multivariate regression of parameters to predict behaviors

OCD_modeling.mcmc.plot_cv_regression(multivar, df_sim_pat, args=None)[source]

Scatter plots of cross-validated regression

OCD_modeling.mcmc.plot_distance_restore(df_restore, args, gs=None)[source]

Plot horizontal box plot of best virtual interventions outcomes, sorted by number of target points.

Parameters:
  • df_restore (pandas.DataFrame) – Virtual intervention simulation outputs with distance and efficacy precomputed.

  • args ((argparse.Namespace)) – Extra arguments with options. An important options here is args.n_tops which defines how many of the best interventions to display by number of target points.

  • gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

OCD_modeling.mcmc.plot_efficacy_by_number_of_target(df_top, gs=None, args=None)[source]

Swarm plots of efficacy score (y-axis) by number of targets (x-axis), with means projected in log-linear scale.

Parameters:
  • df_top (pandas.DataFrame) – Subset of df_restore with only virtual intervention resulting in positive outcomes.

  • gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

  • args (argparse.Namespace) – Extra arguments with options.

OCD_modeling.mcmc.plot_epsilons(histories, ax=None, args=None)[source]

Plot evolution of epsilons across generations

Parameters:
  • histories (list) – Optimization pyABC history objects (e.g. controls and patients)

  • ax (matplotlib.Axes) – (optional) Axis to draw the figure.

  • args (argparse.Namespace) – (optional) Extra arguments (for saving, etc).

OCD_modeling.mcmc.plot_errors(df_errors, args)[source]

Box plot of inference error on frontostriatal functional connectivity

Parameters:
  • df_errors (pandas.DataFrame) – Distributions of errors for each models to compare.

  • args (argparse.Namespace) – Optional arguments, including args.model_tags that contains labels of model names for saving.

OCD_modeling.mcmc.plot_fc_sim_vs_data(df_data, df_base, stats, axes=None, args=None)[source]

Plot functional connectivity (Pearson correlation) across the frontostriatal regions of interests in empirical and simulated data.

Parameters:
  • df_data (pandas.DataFrame) – Empirical data extracted from fMRI in OCD subjects and healthy controls.

  • df_base (pandas.DataFrame) – Simulated data using parameters infered from posterior distributions (either only OCD parameters or only controls, no permutation to model virtual intervention).

  • stats (dict) – (deprecated) Statistics within and between cohorts for both empirical and simulated data (deprecated since now stats are directly computed within plotting function).

  • axes (list of matplotlib.Axes) – (optional) List of axes to plot data (if embedded in other figure, otherwise create new figure).

  • args (argparse.Namespace) – (optional) Extra arguments (e.g. for saving, etc).

OCD_modeling.mcmc.plot_five_top_params_distance_correlations(df_top, args)[source]

Plot intermediate visualization of parameter change correlation to change of distance in FC space.

Parameters:
  • df_top (pandas.DataFrame) – Five best parameter combinations (i.e. virtual interventions) per number of target (number of target \(n_t=1 \cdots 6\)).

  • args (argparse.Namespace) – Optional arguments.

OCD_modeling.mcmc.plot_improvement_pre_post_params_paper(df_summary, params, gs=None, args=None)[source]

Plot initial (pre) and follow-up (post) distributions of parameters from digital twin analysis. (only relevant parameters are shown for manuscript).

Parameters:
  • df_summary (pandas.DataFrame) – Summary measures of empirical analysis, inluding subject’s functional connectivity across frontostriatal circuits, behavioral measures (e.g. Y-BOCS score, IQ, etc.), digital pairing and values of the digital twin parameters.

  • params (list) – Individual intervention targets (i.e. model parameters).

  • gs (matplotlib.Gridspec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

  • args (argparse.Namespace) – Extra arguments with options.

OCD_modeling.mcmc.plot_improvement_windrose(df_improvement, params, gs=None, args=None)[source]

Make windrose vizualisation of mean improvement in parameter space.

Parameters:
  • df_improvement (pandas.DataFrame) – Z-score normalized differences between initial (pre) and follow-up (post) parameters of digital twins for number of targets in virtual interventions.

  • params (list) – Individual intervention targets (i.e. model parameters).

  • gs (matplotlib.Gridspec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

  • args (argparse.Namespace) – Extra arguments with options.

OCD_modeling.mcmc.plot_kdes(kdes, cols, df_stats, df_real=[], df_pred=[], plot_args={'col_offset': 3, 'figsize': [10, 7], 'hist_alpha': 0.3, 'kde_alpha': 1, 'ncols': 5, 'nrows': 4, 'row_offset': 2, 'show_stars': True}, args=None)[source]

Plot Kernel Density Estimates of posteriors (controls vs OCD)

Parameters:
  • kdes (dict) – Kernel Density Estimates of parameters

  • cols (list) – Model parameters

  • df_stats (pandas.DataFrame) – Stastics for each parameter (healthy controls vs OCD patients)

  • df_real (pandas.DataFrame) – (Optional) Synthetic data (observed)

  • df_pred (pandas.DataFrame) – (Optional) Synthetic data (predicted)

  • plot_args (dict) –

    Default options for plotting.

    nrows, ncols: number of rows and columns of the GridSpec object (grid of axes). row_offset, col_offset: shifts in rows and colums to let space for another plot. figsize: figure size. show_stars: show stars for statistical significant between controls and OCD. hist_alpha, kde_alpha: opacity of the histograms and kernel density estimates.

  • args (argparse.Namespace) – (Optional) Extra arguments

OCD_modeling.mcmc.plot_multivariate_results(multivar, models=['Ridge'], args=None)[source]

Display results of the multivariate analysis

OCD_modeling.mcmc.plot_null_distrib(multivar, args=None)[source]

plot null distributions of regression coefficients and stats

OCD_modeling.mcmc.plot_param_behav(df_sim_pat, params=['C_12', 'C_13', 'C_24', 'C_31', 'C_34', 'C_42'], behavs=['YBOCS_Total', 'OCIR_Total', 'OBQ_Total', 'MADRS_Total', 'HAMA_Total'], args=None)[source]

Plot association between simulation parameters and behavioral/clinical measures

OCD_modeling.mcmc.plot_param_distrib(histories, args)[source]

plot posterior distribution at end of optimization

OCD_modeling.mcmc.plot_parameters_contribution(df_params_contribution, params, gs=None, args=None)[source]

Polar plots of parameters contribution across virtual interventions, colorcoded by number of intervention targets \(n_t\). Each polar plot corresponds to a number of target \(n_t\).

Parameters:
  • df_params_contribution (pandas.DataFrame) – Contribution of parameters (z-score normalized parameters times efficacies of virtual interventions).

  • params (list) – Individual intervention targets (i.e. model parameters).

  • gs (matplotlib.GridSpec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

  • args (argparse.Namespace) – (optional) Extra arguments with options.

OCD_modeling.mcmc.plot_pre_post_dist_ybocs(df_summary, behav='YBOCS_Total', gs=None, args=None)[source]

Plot improvement in behavioral measure of symptoms severity (Y-BOCS) of subjects, and their association to functional improvement (via their distance to healthy functional connectivity).

Parameters:
  • df_summary (pandas.DataFrame) – Summary measures of empirical analysis, inluding subject’s functional connectivity across frontostriatal circuits, behavioral measures (e.g. Y-BOCS score, IQ, etc.), digital pairing and values of the digital twin parameters.

  • gs (matplotlib.Gridspec) – (optional) A GridSpec object that can be used to embbed axes when this figure is a subplot of a larger figure.

  • args (argparse.Namespace) – Extra arguments with options.

OCD_modeling.mcmc.plot_single_contribution_windrose(df, params, theta, palette, ax)[source]

A single windrose of parameters’ contribution.

Parameters:
  • df (pandas.DataFrame) – Contribution data.

  • theta (list) – Angles at which to put polar bars.

  • palette (dict) – Color palette used by matplotlib.

  • ax (matplotlib.Axes) – Axes to plot the windrose (must be polar type).

OCD_modeling.mcmc.plot_weights(histories, gs=None, nrows=None, ncols=None, row_offset=None, col_offset=None, args=None)[source]

Plot evolution of weights across generations

Parameters:
  • histories (list) – Optimization pyABC history objects (e.g. controls and patients)

  • gs (matplotlib.GridSpec) – (optional) GridSpec object to draw the figure into.

  • nrows (int) – number of rows/columns to make the grid.

  • ncols (int) – number of rows/columns to make the grid.

  • row_offset (int) – offsets to account for when some (top left) entries of the grid should be left empty.

  • col_offset (int) – offsets to account for when some (top left) entries of the grid should be left empty.

  • args (argparse.Namespace) – (optional) Extra arguments (e.g. for saving, etc).

OCD_modeling.mcmc.print_ANOVA(df_sim_pat, behavs, params)[source]

Print stats for mixed and one-way ANOVAs

OCD_modeling.mcmc.run_abc(prior, cfg)[source]

Setup and run the Approximate Bayesian Computation.

Parameters:
  • prior (pyABC.Distribution) – Prior distributions of parameters.

  • cfg (Argparse.Namespace) – Server configuration information.

Returns:

history – Output of the optimization, i.e. population parameters of accepted particles and summary statistics.

Return type:

pyABC.History

OCD_modeling.mcmc.score_improvement(df, params, kdes, behav='YBOCS_Total')[source]

Score parameters based on improvement they induce in functional connectivity (FC) space across virtual interventions.

Parameters:
  • df (pandas.DataFrame) – Summary data (empirical data paired with digital twins)

  • params (list) – Individual intervention targets (i.e. model parameters).

  • kdes (dict) – Kernel Density Estimates of posterior distributions of OCD subjects and healthy controls.

  • behav (string) – Behavioral measure. Default: Y-BOCS score.

Returns:

df_improvement – Normalized differences between initial (pre) and follow-up (post) parameters of digital twins for each number of targets in virtual interventions.

Return type:

pandas.DataFrame

OCD_modeling.mcmc.simulate_population_rww(params)[source]

Run a pool of simulations and score their outputs.

As a design choice, the number of simulations per pool and the number of processes used in parallel are hard coded here in a Argparse.Namespace object which is propagated to the launcher.

Parameters:

params (dict) – Set of parameters are used to instanciate model.

Returns:

RMSE – Root Mean Square Error between the simulated pool (i.e. a “population cohort”) and the real data (either “controls” or “patients”).

Return type:

dict

OCD_modeling.mcmc.simulate_rww(params)[source]

instanciate model and simulate trace

OCD_modeling.mcmc.unpack_params(in_params)[source]

Unpack parameters to be given to model and simulation.

All type of input parameter can be given, it will be unpacked and atrributed to the correct parameter dictionnary. If the parameter is not recognized, a warning will be displayed and the input parameters will be ignored.

Parameters:

in_params (dict) – Input parameters.

Returns:

  • model_params (dict) – Model parameters.

  • sim_params (dict) – Simulations parameters.

  • control_params (dict) – Control parameters.

  • bold_params (dict) – BOLD parameters.

OCD_modeling.mcmc.write_outputs_to_db(params, cols, test_param, outputs, paired_ids, args)[source]

Write output of simulation inference into SQLite database