# Exporter Tutorial This tutorial walks through exporting a Reinforcement Learning environment and its policy to a self-contained ONNX file using `exploy.exporter.core`. Rather than exporting only the neural network, exploy captures the entire environment logic — observation computation, action processing, and any additional inputs and outputs — into a single ONNX graph. This ensures the exported model behaves identically in simulation and on hardware. By the end of this tutorial you will know how to: 1. Create an exportable adapter for an existing environment. 2. Register inputs, outputs, and memory with the context manager. 3. Create an actor (policy network). 4. Export the environment to ONNX. 5. Load and run the exported model. 6. Validate the export by comparing ONNX outputs against the original environment. > **Note:** A complete, runnable version of the code in this tutorial is available as a test in > [`python/exploy/exporter/core/tests/test_export_environment.py`](https://github.com/bdaiinstitute/exploy/blob/main/python/exploy/exporter/core/tests/test_export_environment.py). > This tutorial explains the details behind that test step by step. ## Prerequisites The tutorial uses the following imports: ```python import pathlib import tempfile import torch from exploy.exporter.core.actor import ExportableActor, add_actor_memory from exploy.exporter.core.context_manager import Group, Input, Memory, Output from exploy.exporter.core.evaluator import evaluate from exploy.exporter.core.exportable_environment import ExportableEnvironment from exploy.exporter.core.exporter import export_environment_as_onnx from exploy.exporter.core.session_wrapper import SessionWrapper ``` --- ## Step 1 — Define the Data Source and the Environment Before we can export anything, we need an environment to export. In a real project this would typically be a simulation environment (e.g., from IsaacLab or another RL framework). For this tutorial we define a minimal data source and environment from scratch. ### Data source An environment typically reads state from some external source (sensors, a simulator, etc.). Here we create a simple `DataSource` that holds a few tensors and can step and reset them: ```python class DataSource: """Provides state tensors to the environment.""" def __init__(self): self._init_foo = torch.Tensor([[1.0, 2.0, 3.0, 4.0]]) self._init_bar = torch.Tensor([[0.5, 0.6]]) self._init_baz = torch.Tensor([[-7.0, -8.0]]) self.foo = self._init_foo.clone() self.bar = self._init_bar.clone() self.baz = self._init_baz.clone() def reset(self): """Reset to initial state.""" self.foo[:] = self._init_foo self.bar[:] = self._init_bar self.baz[:] = self._init_baz def step(self): """Advance state to emulate a changing environment.""" self.foo[:] += 0.1 self.bar[:] += 0.2 self.baz[:] += 0.3 ``` The data source stores its initial values so it can be reset, and its `step()` method mutates the tensors in place — just like a physics simulator would update joint positions and velocities each tick. ### Environment The `Environment` class represents a standard RL environment. It owns a `DataSource`, computes observations from the state tensors, processes and applies actions, and implements a `step()` method that advances the simulation: ```python class Environment: """A simple Reinforcement Learning environment.""" def __init__(self, data_source: DataSource): self._data_source = data_source self._decimation = 4 self._actions = torch.zeros(1, 2) self._processed_action = torch.zeros_like(self._actions) self._output = torch.zeros_like(self._actions) self._reset_after_steps = 10 self._step_count = 0 @property def num_act(self) -> int: return self._actions.shape[-1] @property def num_obs(self) -> int: return self.compute_obs().shape[-1] @property def data_source(self): return self._data_source def compute_obs(self) -> torch.Tensor: return torch.cat( [ self.data_source.foo + 1.0, self.data_source.bar + 2.0 * self.data_source.baz, self.data_source.baz, self._actions, ], dim=-1, ) def process_actions(self, actions: torch.Tensor): self._actions[:] = actions self._processed_action = 3 * actions def apply_actions(self): self._output = self._processed_action + 2 def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, bool]: done = False self.process_actions(actions) for _ in range(self._decimation): self.apply_actions() self.data_source.step() self._step_count += 1 if self._step_count >= self._reset_after_steps: self._step_count = 0 self._data_source.reset() self._actions[:] = torch.zeros_like(self._actions) done = True return self.compute_obs(), done ``` Note that `compute_obs()` uses only torch operations — it concatenates tensors from the data source together with the current actions. This is important because these operations will be traced during ONNX export. At this point, `DataSource` and `Environment` are plain Python classes with no dependency on exploy. The next step is to create an adapter that makes the environment exportable. --- ## Step 2 — Implement the Exportable Environment Adapter The exporter needs a standardized interface to trace observation computation, action processing, and simulation stepping. This interface is defined by the {py:class}`ExportableEnvironment ` abstract base class. Rather than modifying the original `Environment` class, create a thin adapter — `ExportableEnv` — that wraps it and delegates each method: ```python class ExportableEnv(ExportableEnvironment): """Adapter that makes an Environment exportable to ONNX.""" def __init__(self, env: Environment): super().__init__() self._env = env @property def env(self) -> Environment: return self._env # -- Required abstract methods ------------------------------------------------- def compute_observations(self) -> torch.Tensor: return self.env.compute_obs() def process_actions(self, actions: torch.Tensor): return self.env.process_actions(actions) def apply_actions(self): return self.env.apply_actions() def step(self, actions: torch.Tensor) -> tuple[torch.Tensor, bool]: return self.env.step(actions) @property def decimation(self) -> int: return self.env._decimation def prepare_export(self): pass def empty_actor_observations(self) -> torch.Tensor: return torch.zeros_like(self.env.compute_obs()) def empty_actions(self) -> torch.Tensor: return torch.zeros_like(self.env._actions) def metadata(self) -> dict: return {"env_name": "Env", "version": "1.0"} def register_evaluation_hooks(self, update, reset, evaluate_substep): pass def get_observation_names(self) -> list[str]: obs1_names = [f"foo_{i}" for i in range(self.env.data_source.foo.shape[-1])] obs2_names = [f"bar_{i}" for i in range(self.env.data_source.bar.shape[-1])] obs3_names = [f"baz_{i}" for i in range(self.env.data_source.baz.shape[-1])] obs4_names = [f"actions_{i}" for i in range(self.env._actions.shape[-1])] return obs1_names + obs2_names + obs3_names + obs4_names def observations_reset(self) -> torch.Tensor: return self.env.compute_obs() ``` A few things to note about the methods: - **`compute_observations()`** — Must use only traceable torch operations so the computation graph can be captured during ONNX export. The adapter delegates to `env.compute_obs()`, which already satisfies this requirement. - **`process_actions()` / `apply_actions()`** — Split the action pipeline so the exporter can trace the path from raw actions to commanded outputs at the simulation time-step rate. - **`decimation`** — The number of simulation sub-steps per policy step (e.g., policy runs at 50 Hz while simulation runs at 200 Hz → decimation = 4). - **`step()`** — Called during evaluation to advance the environment. Returns updated observations and a flag indicating whether a reset occurred. - **`metadata()`** — Arbitrary key-value pairs that are embedded in the exported ONNX file. --- ## Step 3 — Register Inputs, Outputs, and Memory The exporter traces the torch operations inside `compute_observations()` to build the ONNX graph, but it also needs to know which tensors correspond to named inputs and outputs of the graph. This is where the context manager comes in. Looking at `compute_obs()` from Step 1, the observation is computed from three state tensors (`foo`, `bar`, `baz`) and the previous actions: ```python torch.cat([ self.data_source.foo + 1.0, self.data_source.bar + 2.0 * self.data_source.baz, self.data_source.baz, self._actions, ], dim=-1) ``` Each of these tensors needs to be declared so the exporter knows how to feed data into the ONNX graph at inference time. Additionally, the environment produces an output (`self._output`) that we want to read back after each inference call. We register all of these on the {py:class}`ContextManager ` using three component types: | Component | Direction | Purpose | |-----------|-----------|---------| | {py:class}`Input ` | Into the graph | Raw state data read from the robot/simulator | | {py:class}`Output ` | Out of the graph | Computed values to send to the robot/simulator | | {py:class}`Memory ` | Both | Values that persist across inference calls | `foo`, `bar`, and `baz` come from the data source and change every step — these are **inputs**. The processed action result `self._output` is something the controller needs to read back — this is an **output**. The previous actions (`self._actions`) are used in the observation computation *and* need to persist from one inference call to the next — this is **memory**. Register them on the exportable environment's context manager: ```python data_source = DataSource() env = Environment(data_source=data_source) exp_env = ExportableEnv(env=env) exp_env.context_manager().add_components( [ # foo is an input: it feeds into compute_observations() from the data source. Input( name="foo", get_from_env_cb=lambda: exp_env.env.data_source.foo, metadata={"description": "The foo tensor from the data source."}, ), # out is an output: it carries the result of apply_actions() back to the caller. Output( name="out", get_from_env_cb=lambda: exp_env.env._output, metadata={"description": "The output tensor computed from the actions."}, ), # actions is memory: it is used inside compute_observations() and must persist # across inference calls because the observation includes the previous actions. Memory( name="actions", get_from_env_cb=lambda: exp_env.env._actions, ), ] ) ``` Each component takes a `name` and a `get_from_env_cb` callback that returns the corresponding tensor from the environment. Optionally, you can attach `metadata` as a dictionary — these key-value pairs are embedded in the exported ONNX file and can be read at deployment time. ### Grouping Related Inputs Related inputs can be organized into a {py:class}`Group `. This is purely organizational and does not affect the export — it simply lets you add multiple inputs together with shared metadata: ```python exp_env.context_manager().add_group( Group( name="bar_baz_group", items=[ Input( name="bar", get_from_env_cb=lambda: exp_env.env.data_source.bar, ), Input( name="baz", get_from_env_cb=lambda: exp_env.env.data_source.baz, ), ], metadata={"description": "A group of related inputs."}, ) ) ``` --- ## Step 4 — Create an Actor The actor is the policy network. Subclass {py:class}`ExportableActor ` and implement `forward()`: ```python class Actor(ExportableActor): """A simple MLP policy network.""" def __init__(self, num_obs: int, num_act: int): super().__init__() self._net = torch.nn.Sequential( torch.nn.Linear(num_obs, 10), torch.nn.ELU(), torch.nn.Linear(10, 10), torch.nn.ReLU(), torch.nn.Linear(10, num_act), torch.nn.ELU(), ) def forward(self, x: torch.Tensor) -> torch.Tensor: return self._net(x) ``` Create the actor in evaluation mode: ```python actor = Actor(num_obs=env.num_obs, num_act=env.num_act).eval() ``` Calling `.eval()` is important — it disables dropout and batch-norm training behavior, ensuring deterministic inference during export and evaluation. --- ## Step 5 — Export to ONNX With the exportable environment, context manager, and actor ready, export everything to a single ONNX file using {py:func}`export_environment_as_onnx() `: ```python export_path = pathlib.Path("/tmp/exploy_tutorial") onnx_file_name = "policy.onnx" export_environment_as_onnx( env=exp_env, actor=actor, path=export_path, filename=onnx_file_name, verbose=False, ) ``` This traces all registered torch operations — observation computation, the actor network, action processing — and writes them into a single ONNX graph. The resulting file contains the full computation pipeline from raw inputs to commanded outputs, along with the metadata you registered. ### Inspecting and understanding the generated ONNX files When exporting an environment and its actor, Exploy produces `policy.onnx` (or whatever filename is passed to `export_environment_as_onnx()`). That exported file can then be used for evaluation (see [Putting It All Together](#putting-it-all-together)) or for deployment in a controller (see [Controller Tutorial](../controller/controller_tutorial.md)). The export contains two execution paths by design: **Default** and **ProcessActions**. The **Default** path runs at the policy rate and includes observation computation plus actor inference, while the **ProcessActions** path runs at the simulation rate and captures the computational graph for action post-processing and command application that occurs at every simulation sub-step. This separation preserves the original control timing (policy step vs. simulation sub-step) in the exported model. Additionally, Exploy will produce debug computational graphs. These are only to be used for debugging and visual inspection. - `debug/policy_default.onnx`: The `Default` path of the computational graph. - `debug/policy_process_actions.onnx`: The `ProcessActions` path of the computational graph. - `debug/policy_optimized.onnx`: The optimized version of the exported ONNX model. > **Note:** `debug/policy_optimized.onnx` is generated when `optimize=True` is passed to > {py:class}`SessionWrapper `. While fully > functional, it requires the same ONNX Runtime version to be used in both the exporter and the > controller. This constraint depends on the user's setup. > For deployment in a controller, only the `policy.onnx` file should be used. ### Visualizing the Exported ONNX Graph You can inspect the exported ONNX file using [Netron](https://github.com/lutzroeder/netron), an open-source viewer for neural network models. The screenshots below show the computational graphs produced by the export steps in this tutorial. The [unit test](https://github.com/bdaiinstitute/exploy/blob/main/python/exploy/exporter/core/tests/test_export_environment.py) runs on three different environments: - an environment that computes observations and uses an MLP actor - an environment that uses a torch module to compute parts of its observations and uses an MLP actor - an environment that uses a torch module to compute parts of its observations and uses an RNN-based actor Each exported ONNX file contains two subgraphs: a **Default** graph that computes observations and actions at the policy rate, and a **ProcessActions** graph that maps raw actions to commanded outputs at the simulation rate. #### Environment and MLP Actor This is the full default graph for the basic `Environment` from Step 1. You can see the named inputs (`foo`), the group inputs (`bar`, `baz`), and the memory inputs (`memory.actions.in`) flowing through the observation computation and into the actor network, which produces actions and the named outputs (`out`, `memory.actions.out`). The ProcessActions subgraph traces only `process_actions()` and `apply_actions()`. It runs at the simulation time-step rate (i.e., once per decimation sub-step) and maps actions to commanded outputs. ![Overview of the exported ONNX graph](exporter_tutorial_netron_env.svg) #### Environment with a torch module and MLP Actor When the environment includes a learnable `torch.nn.Module` (see the [Advanced: Using Torch Modules in Observations](#advanced-using-torch-modules-in-observations) section), the module's parameters and operations appear in the graph: ![Default graph for environment with torch module](exporter_tutorial_netron_env_with_module.svg) #### Environment with a torch module and an RNN Actor When using an RNN-based actor (see [Advanced: RNN-Based Actors](#advanced-rnn-based-actors)), the graph includes the LSTM hidden states as additional memory inputs and outputs: ![Default graph for environment with torch module and RNN actor](exporter_tutorial_netron_env_with_module_rnn_actor.svg) --- ## Step 6 — Load and Run the Exported Model Use {py:class}`SessionWrapper ` to load the ONNX file and run inference: ```python session_wrapper = SessionWrapper( onnx_folder=export_path, onnx_file_name=onnx_file_name, actor=actor, optimize=True, ) ``` Setting `optimize=True` enables ONNX Runtime graph optimizations for faster inference. The `actor` parameter is optional — it is stored for use during evaluation, but inference runs entirely through the ONNX graph. --- ## Step 7 — Evaluate the Export The {py:func}`evaluate() ` function runs the environment and the ONNX session side by side, comparing their outputs at every step: ```python with torch.inference_mode(): export_ok, observations = evaluate( env=exp_env, context_manager=exp_env.context_manager(), session_wrapper=session_wrapper, num_steps=20, verbose=True, pause_on_failure=False, ) ``` {py:func}`evaluate() ` returns: - **`export_ok`** (`bool`) — `True` if all outputs matched within tolerance across all steps. - **`observations`** (`torch.Tensor`) — The final observations after evaluation. If `export_ok` is `False`, the evaluator prints a detailed diagnostic showing which outputs diverged and at which step. --- ## Advanced: Using Torch Modules in Observations If your observation computation includes learnable torch modules (e.g., a normalization layer or a learned feature extractor), you need to register them so the exporter includes their parameters in the ONNX graph. Create an environment that uses a `torch.nn.Module` inside `compute_obs()`: ```python class EnvironmentWithTorchModule(Environment): """An environment that uses a torch Module in observation computation.""" def __init__(self, data_source: DataSource): module_dim_in = data_source.foo.shape[-1] module_dim_out = module_dim_in self._module = torch.nn.Linear(module_dim_in, module_dim_out) super().__init__(data_source=data_source) @property def module(self) -> torch.nn.Module: return self._module def compute_obs(self) -> torch.Tensor: return torch.cat( [ self._module(self.data_source.foo) + 1.0, self.data_source.bar + 2.0 * self.data_source.baz, self.data_source.baz, self._actions, ], dim=-1, ) ``` The `ExportableEnv` adapter from Step 2 works with this subclass without modification, since it delegates to `env.compute_obs()`. The key additional step is registering the module with the context manager **before** exporting: ```python data_source = DataSource() env = EnvironmentWithTorchModule(data_source=data_source) exp_env = ExportableEnv(env=env) actor = Actor(num_obs=env.num_obs, num_act=env.num_act).eval() # Register the module so its parameters are included in the ONNX graph. exp_env.context_manager().add_module(env.module) ``` After this, the standard export and evaluation workflow works unchanged — the module's learned weights will be embedded in the ONNX file. --- ## Advanced: RNN-Based Actors For policies that use recurrent networks (e.g., LSTM), the hidden state must persist across inference calls. Subclass {py:class}`ExportableActor ` and implement `reset()` and `get_state()`: ```python class RNNActor(ExportableActor): """An LSTM-based policy network with persistent hidden state.""" def __init__(self, num_obs: int, num_act: int): super().__init__() hidden_dim = 5 num_rnn_layers = 1 self._rnn = torch.nn.LSTM( input_size=num_obs, hidden_size=hidden_dim, num_layers=num_rnn_layers, batch_first=False, ) self._net = Actor(num_obs=hidden_dim, num_act=num_act) self._state = None def reset(self, dones: torch.Tensor): if self._state is None: return for state in self._state: state[:, dones, :] = 0.0 def forward(self, x: torch.Tensor) -> torch.Tensor: rnn_out, self._state = self._rnn(x.unsqueeze(dim=0), self._state) return self._net(rnn_out.squeeze(dim=0)) def get_state(self) -> tuple[torch.Tensor, ...] | None: return self._state ``` Two additional steps are required when exporting an RNN actor: 1. **Initialize the hidden state** by running a forward pass before export. 2. **Register the hidden state as memory** using {py:func}`add_actor_memory() `. ```python data_source = DataSource() env = EnvironmentWithTorchModule(data_source=data_source) exp_env = ExportableEnv(env=env) exp_env.context_manager().add_module(env.module) actor = RNNActor(num_obs=env.num_obs, num_act=env.num_act).eval() # Initialize the hidden state. actor(exp_env.empty_actor_observations()) # Register hidden states as memory in the context manager. add_actor_memory( context_manager=exp_env.context_manager(), get_hidden_states_func=actor.get_state, ) ``` {py:func}`add_actor_memory() ` reads the hidden state tensors from the actor and registers each one as a {py:class}`Memory ` component. This ensures the ONNX graph exposes them as both inputs and outputs, so the inference runtime can feed the previous hidden state back in at each step. After this setup, export and evaluate as usual. --- ## Putting It All Together Here is a complete helper function that ties all the steps together — registering components, exporting to ONNX, loading the model, and validating the export: ```python def export_and_evaluate( exp_env: ExportableEnv, actor: ExportableActor, onnx_file_name: str, num_eval_steps: int, ) -> bool: # Register inputs, outputs, and memory. exp_env.context_manager().add_components( [ Input( name="foo", get_from_env_cb=lambda: exp_env.env.data_source.foo, metadata={"description": "The foo tensor from the data source."}, ), Output( name="out", get_from_env_cb=lambda: exp_env.env._output, metadata={"description": "The output tensor computed from the actions."}, ), Memory( name="actions", get_from_env_cb=lambda: exp_env.env._actions, ), ] ) exp_env.context_manager().add_group( Group( name="bar_baz_group", items=[ Input( name="bar", get_from_env_cb=lambda: exp_env.env.data_source.bar, ), Input( name="baz", get_from_env_cb=lambda: exp_env.env.data_source.baz, ), ], metadata={"description": "A group of related inputs."}, ) ) # Export to ONNX. with tempfile.TemporaryDirectory() as tmpdir: onnx_path = pathlib.Path(tmpdir) / "exploy" export_environment_as_onnx( env=exp_env, actor=actor, path=onnx_path, filename=onnx_file_name, verbose=False, ) # Load the exported model. session_wrapper = SessionWrapper( onnx_folder=onnx_path, onnx_file_name=onnx_file_name, actor=actor, optimize=True, ) # Evaluate. with torch.inference_mode(): export_ok, _ = evaluate( env=exp_env, context_manager=exp_env.context_manager(), session_wrapper=session_wrapper, num_steps=num_eval_steps, verbose=False, pause_on_failure=False, ) return export_ok ``` ### Basic environment ```python data_source = DataSource() env = Environment(data_source=data_source) exp_env = ExportableEnv(env=env) actor = Actor(num_obs=env.num_obs, num_act=env.num_act).eval() assert export_and_evaluate(exp_env, actor, "policy.onnx", num_eval_steps=20) ``` ### Environment with a torch module ```python data_source = DataSource() env = EnvironmentWithTorchModule(data_source=data_source) exp_env = ExportableEnv(env=env) actor = Actor(num_obs=env.num_obs, num_act=env.num_act).eval() exp_env.context_manager().add_module(env.module) assert export_and_evaluate(exp_env, actor, "policy_with_module.onnx", num_eval_steps=20) ``` ### Environment with a torch module and an RNN actor ```python data_source = DataSource() env = EnvironmentWithTorchModule(data_source=data_source) exp_env = ExportableEnv(env=env) exp_env.context_manager().add_module(env.module) actor = RNNActor(num_obs=env.num_obs, num_act=env.num_act).eval() actor(exp_env.empty_actor_observations()) add_actor_memory( context_manager=exp_env.context_manager(), get_hidden_states_func=actor.get_state, ) assert export_and_evaluate(exp_env, actor, "policy_with_rnn.onnx", num_eval_steps=20) ```