# Pipeline A pipeline is made up of [operators](/docs/operators/README.md). The pipeline defines how stanza should input, process, and output logs. ## Linear Pipelines Many stanza pipelines are a linear sequence of operators. Logs flow from one operator to the next, according to the order in which they are defined. For example, the following pipeline will read logs from a file, parse them as `json`, and print them to `stdout`: ```yaml pipeline: - type: file_input include: - my-log.json - type: json_parser - type: stdout ``` Notice that every operator has a `type` field. The `type` of operator must always be specified. ## `id` and `output` Linear pipelines are sufficient for many use cases, but stanza is also capabile of processing non-linear pipelines as well. In order to use non-linear pipelines, the `id` and `output` fields must be understood. Let's take a close look at these. Each operator in a pipeline has a unique `id`. By default, `id` will take the same value as `type`. Alternately, you can specify an `id` for any operator. If your pipeline contains multiple operators of the same `type`, then the `id` field must be used. All operators (except output operators) support an `output` field. By default, the output field takes the value of the next operator's `id`. Let's look at how these default values work together by considering the linear pipeline shown above. The following pipeline would be exactly the same (although much more verbosely defined): ```yaml pipeline: - type: file_input id: file_input include: - my-log.json output: json_parser - type: json_parser id: json_parser output: stdout - type: stdout id: stdout ``` Additionally, we could accomplish the same task using custom `id`'s. ```yaml pipeline: - type: file_input id: my_file include: - my-log.json output: my_parser - type: json_parser id: my_parser output: my_out - type: stdout id: my_out ``` We could even shuffle the order of operators, so long as we're explicitly declaring each output. This is a little counterintuitive, so it isn't recommended. However, it is shown here to highlight the fact that operators in a pipeline are ultimately connected via `output`'s and `id`'s. ```yaml pipeline: - type: stdout # 3rd operator id: my_out - type: json_parser # 2nd operator id: my_parser output: my_out - type: file_input # 1st operator id: my_file include: - my-log.json output: my_parser ``` Finally, we could even remove some of the `id`'s and `output`'s, and depend on the default values. This is even less readable, so again would not be recommended. However, it is provided here to demonstrate that default values can be depended upon. ```yaml pipeline: - type: json_parser # 2nd operator - type: stdout # 3rd operator - type: file_input # 1st operator include: - my-log.json output: json_parser ``` ## Non-Linear Pipelines Now that we understand how `id` and `output` work together, we can configure stanza to run more complex pipelines. Technically, the structure of a stanza pipeline is limited only in that it must be a [directed, acyclic, graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph). Let's consider a pipeline with two inputs and one output: ```yaml pipeline: - type: file_input include: - my-log.json output: stdout # flow directly to stdout - type: windows_eventlog_input channel: security # implicitly flow to stdout - type: stdout ``` Here's another, where we read from two files that should be parsed differently: ```yaml pipeline: # Read and parse a JSON file - type: file_input id: file_input_one include: - my-log.json - type: json_parser output: stdout # flow directly to stdout # Read and parse a text file - type: file_input id: file_input_two include: - my-other-log.txt - type: regex_parser regex: ... # regex appropriate to file format # implicitly flow to stdout # Print - type: stdout ``` Finally, in some cases, you might expect multiple log formats to come from a single input. This solution uses the [router](/docs/operators/router.md) operator. The `router` operator allows one to define multiple "routes", each of which has an `output`. ```yaml pipeline: # Read log file - type: file_input include: - my-log.txt # Route based on log type - type: router routes: - expr: '$record startsWith "ERROR"' output: error_parser - expr: '$record startsWith "INFO"' output: info_parser # Parse logs with format one - type: regex_parser id: error_parser regex: ... # regex appropriate to parsing error logs output: stdout # flow directly to stdout # Parse logs with format two - type: regex_parser id: info_parser regex: ... # regex appropriate to parsing info logs output: stdout # flow directly to stdout # Print - type: stdout ```