Skip to content

Output Module

gustavo-toro edited this page Apr 16, 2024 · 1 revision

Output Module

The Output module is in charge of constructing the ECS and enumerating the results. The main components of this module are the ECS, the node manager, the enumerator and the mapping. This module is defined in src/rematch/output_enumeration.

ECS

ECS stands for enumerable compact set which is the structure used to store the mappings to be enumerated. The ECS is constructed from the algorithm class. In the evaluation of the automaton, as the document is read, the nodes are created in the ECS. It is defined in ecs.hpp.

There are 3 main functions to create nodes: create_bottom_node, create_extend_node and create_union_node. The bottom node is a node that represents the end of the result and it is created in the constructor of the algorithm class. The extend node is a node containing variable markers and a position in the document. The union node is a node that has 2 children, so that when enumerating from the union node, there are 2 results.

Each node has a reference count which denotes the number of objects that hold a pointer to it. An object pointing to a node might be an extended VA state or the algorithm which stores the final node.

Node manager

The node manager is in charge of allocating ECS nodes and recycling them when they are no longer in use. It is defined in node_manager.hpp. To store the nodes, it uses minipools which are containers for the nodes. The node manager has a linked list of minipools. When the minipool head of the list is full, then the node manager adds another minipool to the list and updates the head. Each minipool created is twice the size of the previous one.

The node manager has also a linked list of recyclable nodes. When the reference count of a node reaches 0, the node is added to the list. Everytime the ECS asks for a node, the node manager checks if there is a node in the recyclable list. If there is one, then it resets its data and passes it to the ECS.

Enumerator

The enumerator is in charge of enumerating the results by traversing the ECS. When an output is found in the algorithm, it calls the next function in the enumerator to get the next mapping. The enumerator is defined in enumerator.hpp.

The enumerator traverses the ECS using DFS starting from a node in its stack. As the enumerator traverses the ECS, it adds the markers and the document position from each output node to the Mapping object, which stores them as an annotation. When the enumerator reaches the bottom node, it returns the mapping created to the algorithm.

If a union node was processed in the previous output, then the annotations added after the union node are removed from the mapping.

Mapping

The mapping is the result of the enumerator. It contains a vector of annotations which represent the mapping. An annotation is a set of variable markers and a position in the document. The construct_mapping function constructs the mapping as a map that associates variable IDs to spans. It is defined in mapping.hpp. The mapping with multi-span capturing support is defined in extended_mapping.hpp.

Clone this wiki locally