# Defining Data Loading Protocol for IO Simulation Results
## 1. Objective
To ensure data integrity and reproducibility in the analysis of IO simulations, a clear protocol for loading simulation results from storage is essential. This node defines that protocol.
## 2. Assumed Data Storage Format
For consistency and efficiency, we will assume that simulation results are stored in one of the following formats:
1. **NumPy `.npy` files:** Individual NumPy arrays (e.g., `phi_history.npy`, `theta_history.npy`) are saved using `np.save()`. The parameters dictionary is saved separately as a `.npy` file or potentially as a JSON file.
2. **Pandas DataFrame:** The entire `results` dictionary (including parameters and data arrays) is converted into a Pandas DataFrame and saved to a file (e.g., a CSV file using `df.to_csv()` or a Parquet file for larger datasets).
The choice of format depends on the size and complexity of the data. For the current 1D simulations, NumPy arrays are likely sufficient. For more complex simulations with larger datasets or more structured data, Pandas DataFrames might be preferable.
## 3. Data Loading Protocol
The following steps must be followed when loading simulation results for analysis:
1. **Specify Data Source:** In the analysis node, clearly state the filename(s) and path(s) to the data files being loaded.
2. **Load Parameters:** Load the simulation parameters. If stored separately, load them (e.g., from a NumPy file or JSON file) into a dictionary.
3. **Load Data Arrays:** Load the data arrays (e.g., `phi_history`, `theta_history`) using the appropriate NumPy or Pandas loading functions.
4. **Verify Data Integrity:** Perform basic checks to ensure the data has been loaded correctly:
* Print the shape and data type of the loaded arrays.
* Print a few sample values from the arrays (e.g., the first few rows of `phi_history`, the final value of `avg_theta_history`).
* Verify that the loaded parameters match the expected values for the simulation run.
5. **Create Results Dictionary:** Construct a `results` dictionary (as used in the simulation code) containing the loaded parameters and data arrays. This ensures consistency with the analysis functions.
## 4. Example Code Snippet (Illustrative)
```python
import numpy as np
import pandas as pd
# --- Specify Data Source ---
data_dir = "path/to/simulation/results/"
run15_phi_history_file = data_dir + "run15_phi_history.npy"
run15_theta_history_file = data_dir + "run15_theta_history.npy"
run15_params_file = data_dir + "run15_params.npy" # Or run15_params.json
# --- Load Parameters ---
try:
params_run15 = np.load(run15_params_file, allow_pickle=True).item() # If saved as .npy dict
# If saved as JSON:
# import json
# with open(run15_params_file, 'r') as f:
# params_run15 = json.load(f)
except FileNotFoundError:
print(f"Error: Parameter file not found: {run15_params_file}")
# Handle error appropriately (e.g., exit or use default parameters)
# --- Load Data Arrays ---
try:
phi_history = np.load(run15_phi_history_file)
theta_history = np.load(run15_theta_history_file)
except FileNotFoundError:
print(f"Error: Data file not found.")
# Handle error
# --- Verify Data Integrity ---
print("Loaded phi_history: Shape =", phi_history.shape, ", dtype =", phi_history.dtype)
print("Sample phi_history (first 5 rows):\n", phi_history[:5,:])
print("Loaded theta_history: Shape =", theta_history.shape, ", dtype =", theta_history.dtype)
print("Final Average Theta:", theta_history[-1].mean())
print("Loaded parameters:", params_run15)
# Add more checks as needed
# --- Create Results Dictionary ---
results_run15 = {
"parameters": params_run15,
"phi_history": phi_history,
"theta_history": theta_history
# Add other data as needed
}
# Now the 'results_run15' dictionary can be used by the analysis functions
```
## 5. Conclusion: Ensuring Data Integrity
This protocol provides a clear and consistent procedure for loading simulation results, emphasizing data integrity and reproducibility. By explicitly specifying the data source, loading parameters and data arrays, and verifying the loaded data, we can minimize the risk of errors and ensure that subsequent analysis is based on accurate information. This protocol will be used in all future analysis nodes.
The next step is to use this protocol to load the actual data from Run 15 and then re-run the analysis from [[releases/archive/Information Ontology 1/0129_IO_Metrics_Implementation]].