Data Collection#
Quick intro#
DataCollector samples model- and agent-level columns over time and returns cleaned DataFrames suitable for analysis. Typical patterns:
Provide
model_reporters(callables producing scalars) andagent_reporters(column selectors or callables that operate on an AgentSet).Call
collector.collect(model)inside the model step or use built-in integration if the model calls the collector automatically.
Minimal example#
from mesa_frames import DataCollector, Model, AgentSet
import polars as pl
class P(AgentSet):
def __init__(self, model):
super().__init__(model)
self.add(pl.DataFrame({'x': [1,2]}))
class M(Model):
def __init__(self):
super().__init__()
self.sets += P(self)
self.dc = DataCollector(model_reporters={'count': lambda m: len(m.sets['P'])},
agent_reporters='x')
m = M()
m.dc.collect()
API reference#
Lifecycle / Core
Initialize the DataCollector with configuration options. |
|
Trigger Data collection. |
|
Trigger data collection if condition is met. |
|
Persist all collected data to configured backend. |
|
Retrieve the collected data as eagerly evaluated Polars DataFrames. |
Reporting / Internals
Function to get the model seed. |
- class DataCollector(model: Model, model_reporters: dict[str, Callable] | None = None, agent_reporters: dict[str, str | Callable] | None = None, trigger: Callable[[Any], bool] | None = None, reset_memory: bool = True, storage: Literal['memory', 'csv', 'parquet', 'S3-csv', 'S3-parquet', 'postgresql'] = 'memory', storage_uri: str | None = None, schema: str = 'public', max_worker: int = 4)[source]#
Methods:
Initialize the DataCollector with configuration options.
Trigger Data collection.
Trigger data collection if condition is met.
Persist all collected data to configured backend.
Attributes:
Retrieve the collected data as eagerly evaluated Polars DataFrames.
Function to get the model seed.
- __init__(model: Model, model_reporters: dict[str, Callable] | None = None, agent_reporters: dict[str, str | Callable] | None = None, trigger: Callable[[Any], bool] | None = None, reset_memory: bool = True, storage: Literal['memory', 'csv', 'parquet', 'S3-csv', 'S3-parquet', 'postgresql'] = 'memory', storage_uri: str | None = None, schema: str = 'public', max_worker: int = 4)[source]#
Initialize the DataCollector with configuration options.
- Parameters:
model (Model) – The model object from which data is collected.
model_reporters (dict[str, Callable] | None) – Functions to collect data at the model level.
agent_reporters (dict[str, str | Callable] | None) – Attributes or functions to collect data at the agent level.
trigger (Callable[[Any], bool] | None) – A function(model) -> bool that determines whether to collect data.
reset_memory (bool) – Whether to reset in-memory data after flushing. Default is True.
storage (Literal["memory", "csv", "parquet", "S3-csv", "S3-parquet", "postgresql" ]) – Storage backend URI (e.g. ‘memory:’, ‘csv:’, ‘postgresql:’).
storage_uri (str | None) – URI or path corresponding to the selected storage backend.
schema (str) – Schema name used for PostgreSQL storage.
max_worker (int) – Maximum number of worker threads used for flushing collected data asynchronously
- property data: dict[str, DataFrame]#
Retrieve the collected data as eagerly evaluated Polars DataFrames.
- collect() None#
Trigger Data collection.
This method calls _collect() to perform actual data collection.
Examples
>>> datacollector.collect()
- conditional_collect() None#
Trigger data collection if condition is met.
This method calls _collect() to perform actual data collection only if trigger returns True
Examples
>>> datacollector.conditional_collect()
- flush() None#
Persist all collected data to configured backend.
After flushing data optionally clears in-memory data buffer if reset_memory is True (default behavior).
use this method to save collected data.
Examples
>>> datacollector.flush() >>> # Data is saved externally and in-memory buffers are cleared if configured