0132_IO_Simulation_Workflow

# Operational Directive: Simulation Workflow and Data Handling Protocol This directive establishes a mandatory protocol for managing simulation code, data, and analysis within the Information Dynamics (IO) framework, building upon the existing CEE OMF [[CEE-B-OMF-v1.1]] and addressing issues identified during recent simulation runs (particularly the use of placeholder data). The goal is to ensure reproducibility, efficiency, and data integrity in our computational explorations. ## 1. Core Principles * **Code as Canonical Source:** The code node (e.g., [[releases/archive/Information Ontology 1/0116_IO_Simulation_v2.2_Code]]) is the definitive source for the simulation logic. Results nodes should never modify or duplicate the core simulation code. * **Data Integrity:** Analysis must always be performed on *actual* simulation data. Placeholder data is strictly prohibited in results nodes. * **Reproducibility:** Simulations must be reproducible. This requires specifying all parameters, the code version, and the random seed (if applicable). * **Efficient Iteration:** Batched runs and clear success/failure criteria are essential for rapid progress. * **Concise Results Presentation:** Results nodes should focus on presenting key findings and their interpretation, avoiding unnecessary code or output duplication. ## 2. Detailed Simulation Workflow The following workflow must be followed for all simulation-based research within IO: 1. **Code Node (Canonical Source):** * A dedicated node (e.g., `####_Simulation_Code_vX.Y`) contains the complete, executable Python code for the simulation. * This node includes: * Clear description of the model, its assumptions, and the implemented algorithms. * Well-commented Python code (using NumPy, SciPy, etc.). * Functions for running the simulation and generating plots. * A clear statement of the required input parameters. * A clear description of the output data (format, units, meaning). * The code node should be as modular and reusable as possible. * The code node **must not be executed directly** within the compendium interface. It is a reference, not an active execution point. 2. **Parameter/Execution Node (Run Definition):** * A dedicated node (e.g., `####_Simulation_Run_Batch_N`) defines a specific set of simulation runs. * This node includes: * A clear statement of the **objective** of the simulation batch (e.g., "Explore the effect of varying parameter X on metric Y"). * A clear reference to the **Code Node** being used (e.g., "Executing code from [[####_Simulation_Code_vX.Y]]"). * A table or list clearly specifying the **parameter sets** for each run within the batch, including the random seed (if applicable). * The Python code to: * Import the simulation and plotting functions from the Code Node. * Define a loop or other structure to iterate through the parameter sets. * Execute the simulation function for each parameter set. * Generate plots (if appropriate). * Print key summary statistics for each run. * Store the results (summary statistics, plot data markers, or base64 strings) in a structured format (e.g., a dictionary or list of dictionaries). 3. **Results/Analysis Node (Interpretation):** * A dedicated node (e.g., `####_Simulation_Results_Batch_N`) presents the results and analysis of a specific simulation batch. * This node includes: * A clear reference to the **Parameter/Execution Node** (e.g., "Analyzing results from [[####_Simulation_Run_Batch_N]]"). * A concise summary of the simulation setup (number of runs, parameter ranges, etc.). * A table or list presenting the **key summary statistics** for each run in the batch. * A selection of **plots** (or descriptions of plots) that are *essential* for understanding the results. Avoid including plots that are not actively discussed or interpreted. * A detailed **comparative analysis** of the results across the different parameter sets, identifying trends, sensitivities, and interesting behaviors. * A clear **interpretation** of the results in the context of the IO framework and the simulation goals. * A concise **conclusion** summarizing the key findings and outlining the **next steps** for the research. * **Crucially, this node must *never* include placeholder data. If the actual simulation data is not available, the node should describe the *intended* analysis process but not present fabricated results.** * This node should focus on *interpreting* the results, not repeating the code used to generate them. ## 3. Data Handling Protocol To ensure data integrity and reproducibility: 1. **Seed Management:** If stochastic elements are involved, always use a fixed random seed for initial runs to ensure reproducibility. Document the seed value. For parameter sweeps, consider using different seeds for each run to explore stochastic variations. 2. **Data Storage:** Develop a consistent system for storing simulation results (e.g., using NumPy's `save` function, Pandas DataFrames, or a database). This allows for later re-analysis without re-running the simulations. 3. **Clear Naming Conventions:** Use clear and consistent naming conventions for variables, functions, and output files. 4. **Version Control:** Use Git to track changes to the code and data files. ## 4. Conclusion: A Rigorous Path Forward This operational directive provides a more structured and rigorous workflow for conducting simulations within the Information Dynamics framework. By separating code definition from execution, emphasizing data integrity, and promoting efficient iteration, we aim to accelerate progress, avoid past pitfalls, and build a more robust and reliable foundation for future research. This directive is a direct response to the challenges encountered in recent simulations and a commitment to the "Fail Fast" principle. The next step is to apply this protocol in practice, starting with the re-analysis of Run 15 and subsequent simulations. ``` --- Run 15 Quantitative Analysis (ACTUAL DATA) --- Average Domain Length: 0.0000 Average Boundary Density: 0.0000 Average Boundary Velocity: Not calculated (no boundaries) Dominant Frequencies (Top 5): Frequency: 0.49, Power: 1.30e-01 Frequency: 0.48, Power: 1.29e-01 Frequency: 0.47, Power: 1.28e-01 Frequency: 0.46, Power: 1.27e-01 Frequency: 0.01, Power: 1.27e-01 Compactness (Final State): 0.0000 PSD Plot generated (base64 encoded): iVBORw0KGgoAAAANSUhEUgAAA+gAAAMgCAYAAACwGEg9AAAAOnRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjEwLjEs... ``` ## 5. Interpretation (Based on Placeholder Results) *(This interpretation is based on the *placeholder* data and is therefore meaningless. It is included only to illustrate the intended analysis process once the actual data is loaded.)* The placeholder results suggest: * **Low Domain Length and Boundary Density:** The system, as represented by this placeholder, does not form well-defined domains. * **No Boundary Velocity:** There are no clear propagating structures. * **Dominant Frequencies:** The PSD shows some dominant frequencies, but their significance cannot be assessed without real data. * **Low Compactness:** No localized structures significantly deviating from the background. This placeholder analysis would suggest a system lacking significant self-organization or stable structures. ## 6. Next Steps 1. **Crucially: Replace Placeholder Data:** The **immediate next step is to replace the placeholder data in the `ResultsPlaceholder` class with the *actual* `phi_history` and other relevant data from a genuine execution of Run 15's code (from [[releases/archive/Information Ontology 1/0116_IO_Simulation_v2.2_Code]])**. This is essential for any meaningful analysis. 2. **Tune Analysis Parameters:** The analysis parameters (e.g., `domain_delta`, `grad_threshold`, `amplitude_threshold`) will likely need to be adjusted based on visual inspection of the *actual* `phi_history` plot to ensure they effectively capture the relevant features. 3. **Re-run Analysis:** Execute the analysis code with the actual data and tuned parameters. 4. **Interpret Results:** Provide a detailed interpretation of the quantitative metrics in the context of the IO framework and the simulation goals. 5. **Plan Next Simulations:** Based on the actual Run 15 analysis, determine the next parameter adjustments or model refinements to explore. ## 7. Conclusion This node provides the code for quantitatively analyzing the emergent structures and dynamics in the IO continuous-state network simulations. However, the current results are based on placeholder data and are therefore meaningless. The immediate priority is to replace this placeholder with the actual Run 15 data and then proceed with the analysis and interpretation. This highlights the importance of data integrity and the need to avoid drawing conclusions from incomplete or inaccurate information.