# Pipeline

A pipeline is made up of [operators](/docs/operators/README.md). The pipeline defines how stanza should input, process, and output logs. 


## Linear Pipelines

Many stanza pipelines are a linear sequence of operators. Logs flow from one operator to the next, according to the order in which they are defined.

For example, the following pipeline will read logs from a file, parse them as `json`, and print them to `stdout`:
```yaml
pipeline:
  - type: file_input
    include: 
      - my-log.json
  - type: json_parser
  - type: stdout
```

Notice that every operator has a `type` field. The `type` of operator must always be specified.


## `id` and `output`

Linear pipelines are sufficient for many use cases, but stanza is also capabile of processing non-linear pipelines as well. In order to use non-linear pipelines, the `id` and `output` fields must be understood. Let's take a close look at these.

Each operator in a pipeline has a unique `id`. By default, `id` will take the same value as `type`. Alternately, you can specify an `id` for any operator. If your pipeline contains multiple operators of the same `type`, then the `id` field must be used.

All operators (except output operators) support an `output` field. By default, the output field takes the value of the next operator's `id`.

Let's look at how these default values work together by considering the linear pipeline shown above. The following pipeline would be exactly the same (although much more verbosely defined):

```yaml
pipeline:
  - type: file_input 
    id: file_input
    include: 
      - my-log.json
    output: json_parser
  - type: json_parser
    id: json_parser
    output: stdout
  - type: stdout
    id: stdout
```

Additionally, we could accomplish the same task using custom `id`'s.

```yaml
pipeline:
  - type: file_input
    id: my_file
    include: 
      - my-log.json
    output: my_parser
  - type: json_parser
    id: my_parser
    output: my_out
  - type: stdout
    id: my_out
```

We could even shuffle the order of operators, so long as we're explicitly declaring each output. This is a little counterintuitive, so it isn't recommended. However, it is shown here to highlight the fact that operators in a pipeline are ultimately connected via `output`'s and `id`'s.

```yaml
pipeline:
  - type: stdout      # 3rd operator
    id: my_out
  - type: json_parser # 2nd operator
    id: my_parser
    output: my_out
  - type: file_input  # 1st operator
    id: my_file
    include: 
      - my-log.json
    output: my_parser
```

Finally, we could even remove some of the `id`'s and `output`'s, and depend on the default values. This is even less readable, so again would not be recommended. However, it is provided here to demonstrate that default values can be depended upon.

```yaml
pipeline:
  - type: json_parser # 2nd operator
  - type: stdout      # 3rd operator
  - type: file_input  # 1st operator
    include: 
      - my-log.json
    output: json_parser
```

## Non-Linear Pipelines

Now that we understand how `id` and `output` work together, we can configure stanza to run more complex pipelines. Technically, the structure of a stanza pipeline is limited only in that it must be a [directed, acyclic, graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph).

Let's consider a pipeline with two inputs and one output:
```yaml
pipeline:
  - type: file_input
    include: 
      - my-log.json
    output: stdout # flow directly to stdout

  - type: windows_eventlog_input
    channel: security
    # implicitly flow to stdout

  - type: stdout
```

Here's another, where we read from two files that should be parsed differently:
```yaml
pipeline:
  # Read and parse a JSON file
  - type: file_input
    id: file_input_one
    include: 
      - my-log.json
  - type: json_parser
    output: stdout # flow directly to stdout
  
  # Read and parse a text file
  - type: file_input
    id: file_input_two
    include: 
      - my-other-log.txt
  - type: regex_parser
    regex: ... # regex appropriate to file format
    # implicitly flow to stdout

  # Print
  - type: stdout
```

Finally, in some cases, you might expect multiple log formats to come from a single input. This solution uses the [router](/docs/operators/router.md) operator. The `router` operator allows one to define multiple "routes", each of which has an `output`.


```yaml
pipeline:
  # Read log file
  - type: file_input
    include: 
      - my-log.txt

  # Route based on log type
  - type: router
    routes:
      - expr: '$record startsWith "ERROR"'
        output: error_parser
      - expr: '$record startsWith "INFO"'
        output: info_parser

  # Parse logs with format one
  - type: regex_parser
    id: error_parser
    regex: ... # regex appropriate to parsing error logs
    output: stdout # flow directly to stdout

  # Parse logs with format two
  - type: regex_parser
    id: info_parser
    regex: ... # regex appropriate to parsing info logs
    output: stdout # flow directly to stdout

  # Print
  - type: stdout
```