Data Collection#

Quick intro#

DataCollector samples model- and agent-level columns over time and returns cleaned DataFrames suitable for analysis. Typical patterns:

Provide model_reporters (callables producing scalars) and agent_reporters (column selectors or callables that operate on an AgentSet).
Call collector.collect(model) inside the model step or use built-in integration if the model calls the collector automatically.

Minimal example#

from mesa_frames import DataCollector, Model, AgentSet
import polars as pl

class P(AgentSet):
    def __init__(self, model):
        super().__init__(model)
        self.add(pl.DataFrame({'x': [1,2]}))

class M(Model):
    def __init__(self):
        super().__init__()
        self.sets += P(self)
        self.dc = DataCollector(model_reporters={'count': lambda m: len(m.sets['P'])},
                                agent_reporters='x')

m = M()
m.dc.collect()

API reference#

Overview

Lifecycle / Core

`DataCollector.__init__`	Initialize the DataCollector with configuration options.
`DataCollector.collect`	Trigger Data collection.
`DataCollector.conditional_collect`	Trigger data collection if condition is met.
`DataCollector.flush`	Persist all collected data to configured backend.
`DataCollector.data`	Retrieve the collected data as eagerly evaluated Polars DataFrames.

Reporting / Internals

DataCollector.seed

Function to get the model seed.

Full API

class DataCollector(model: Model, model_reporters: dict[str, Callable] | None = None, agent_reporters: dict[str, str | Callable] | None = None, trigger: Callable[[Any], bool] | None = None, reset_memory: bool = True, storage: Literal['memory', 'csv', 'parquet', 'S3-csv', 'S3-parquet', 'postgresql'] = 'memory', storage_uri: str | None = None, schema: str = 'public', max_worker: int = 4)[source]#

Methods:

`__init__`	Initialize the DataCollector with configuration options.
`collect`	Trigger Data collection.
`conditional_collect`	Trigger data collection if condition is met.
`flush`	Persist all collected data to configured backend.

Attributes:

`data`	Retrieve the collected data as eagerly evaluated Polars DataFrames.
`seed`	Function to get the model seed.

__init__(model: Model, model_reporters: dict[str, Callable] | None = None, agent_reporters: dict[str, str | Callable] | None = None, trigger: Callable[[Any], bool] | None = None, reset_memory: bool = True, storage: Literal['memory', 'csv', 'parquet', 'S3-csv', 'S3-parquet', 'postgresql'] = 'memory', storage_uri: str | None = None, schema: str = 'public', max_worker: int = 4)[source]#

Initialize the DataCollector with configuration options.

Parameters:

model (Model) – The model object from which data is collected.
model_reporters (dict[str, Callable] | None) – Functions to collect data at the model level.
agent_reporters (dict[str, str | Callable] | None) – Attributes or functions to collect data at the agent level.
trigger (Callable[[Any], bool] | None) – A function(model) -> bool that determines whether to collect data.
reset_memory (bool) – Whether to reset in-memory data after flushing. Default is True.
storage (Literal["memory", "csv", "parquet", "S3-csv", "S3-parquet", "postgresql" ]) – Storage backend URI (e.g. ‘memory:’, ‘csv:’, ‘postgresql:’).
storage_uri (str | None) – URI or path corresponding to the selected storage backend.
schema (str) – Schema name used for PostgreSQL storage.
max_worker (int) – Maximum number of worker threads used for flushing collected data asynchronously

property data: dict[str, DataFrame]#

Retrieve the collected data as eagerly evaluated Polars DataFrames.

Returns:: A dictionary with keys “model” and “agent” mapping to concatenated DataFrames of collected data.
Return type:: dict[str, pl.DataFrame]

collect() → None#

Trigger Data collection.

This method calls _collect() to perform actual data collection.

Examples

>>> datacollector.collect()

conditional_collect() → None#

Trigger data collection if condition is met.

This method calls _collect() to perform actual data collection only if trigger returns True

Examples

>>> datacollector.conditional_collect()

flush() → None#

Persist all collected data to configured backend.

After flushing data optionally clears in-memory data buffer if reset_memory is True (default behavior).

use this method to save collected data.

Examples

>>> datacollector.flush()
>>> # Data is saved externally and in-memory buffers are cleared if configured

property seed: int#

Function to get the model seed.

Examples

>>> seed = datacollector.seed