KGBN.experiment_data

The ExperimentData class loads and validates experimental data from CSV files. It handles stimuli, inhibitors, and measured values for model optimization.

class KGBN.experiment_data.ExperimentData[source]

Bases: object

Handle experimental data from CSV files

Methods

get_experiment_summary(experiments)

Get summary of experiments for debugging/inspection

load_from_csv(csv_file[, use_formula])

Load experiments from CSV file

validate_experiments(experiments, node_dict)

Validate experiments against PBN structure

static load_from_csv(csv_file, use_formula: bool = False)[source]

Load experiments from CSV file

Only required columns are used. Extra columns in the CSV are ignored. If use_formula is True, the Measured_nodes column should contain the formula string, and Measured_values must have a single value per row.

CSV Format: Experiments,Stimuli,Stimuli_efficacy,Inhibitors,Inhibitors_efficacy,Measured_nodes,Measured_values 1,TGFa,1,TNFa,1,”NFkB,ERK,C8,Akt”,”0.7,0.88,0,1” 2,TNFa,1,TGFa,1,”NFkB,ERK,C8,Akt”,”0.3,0.12,1,0” 3,”TGFa,TNFa”,”1,1”,,,”NFkB,ERK,C8,Akt”,”1,1,1,1” 4,”TGFa,TNFa”,”1,1”,PI3K,0.7,”NFkB,ERK,C8,Akt”,”0.3,0.12,1,0”

Note: - The measured values are normalized to be between 0 and 1 if not already. - Simply divide the measured values by the maximum value of the measured values. - Stimuli_efficacy and Inhibitors_efficacy are optional columns. - If efficacy is not specified, defaults to 1.0 (full efficacy). - Efficacy < 1 means the probability of achieving the target state is reduced.

static validate_experiments(experiments, node_dict)[source]

Validate experiments against PBN structure

static get_experiment_summary(experiments)[source]

Get summary of experiments for debugging/inspection

KGBN.experiment_data.extract_experiment_nodes(csv_file, use_formula: bool = False)[source]

Extract measured and perturbed nodes from experimental CSV file.

KGBN.experiment_data.generate_experiments(pbn, experiment_csv: str, config: dict | None = None, output_csv: str | None = None, round_to: int = 4) DataFrame[source]

Generate hypothesized experimental values using the current PBN parameters.

This function simulates the experiments defined in the CSV file using the current PBN parameters and writes the generated values to Measured_values so the returned or saved CSV can be used directly by the optimizer for evaluation.

Data Format

CSV file format:

Experiments,Stimuli,Stimuli_efficacy,Inhibitors,Inhibitors_efficacy,Measured_nodes,Measured_values
1,TGFa,1,TNFa,1,"NFkB,ERK,C8,Akt","0.7,0.88,0,1"
2,TNFa,1,TGFa,1,"NFkB,ERK,C8,Akt","0.3,0.12,1,0"
3,"TGFa,TNFa","1,1",,,"NFkB,ERK,C8,Akt","1,1,1,1"
4,"TGFa,TNFa","1,1",PI3K,0.7,"NFkB,ERK,C8,Akt","0.3,0.12,1,0"

Column Descriptions

  • Experiments: Experiment identifier

  • Stimuli: Nodes fixed to 1 (comma-separated)

  • Stimuli_efficacy: Efficacy values 0-1 (optional, defaults to 1.0)

  • Inhibitors: Nodes fixed to 0 (comma-separated)

  • Inhibitors_efficacy: Efficacy values 0-1 (optional, defaults to 1.0)

  • Measured_nodes: Nodes with experimental measurements, or a formula expression when loading with use_formula=True

  • Measured_values: Corresponding values 0-1 (normalized). For formulas, provide a single value per row in the formula’s natural range

Efficacy Values

  • 1.0 (default): Full efficacy - node completely knocked out/stimulated

  • < 1.0: Partial efficacy - creates probabilistic perturbation

    • For inhibitors (target=0): P(node=0) = efficacy, P(node=1) = 1-efficacy

    • For stimuli (target=1): P(node=1) = efficacy, P(node=0) = 1-efficacy

  • Example: PI3K,0.7 means PI3K inhibition has 70% probability of setting PI3K=0, 30% of PI3K=1

Formula-based Measurements

When using formulas, call ExperimentData.load_from_csv(..., use_formula=True). For optimization and sensitivity workflows, you can also pass Measured_formula="N1 + N2 - N3" to override the CSV Measured_nodes column.

  • Measured_nodes: Contains a formula expression (e.g., N1 + N2 - N3)

  • Measured_values: Single value per row in the formula’s natural range

  • Supported operators: +, -, *, /, parentheses

  • Variables must be node names from the network

Important: Ensure measured values are scaled to match the theoretical formula range:

  • N1 + N2 + N3: range [0, 3]

  • N1 + N2 - N3: range [-1, 2]

  • N1 - N2: range [-1, 1]

The optimizer will warn if measured values fall outside the expected range.