Skip to content

A dynamic "Getting Started" Configuration file #269

@strawgate

Description

@strawgate

Problem Description

If #268 merges we could add a new get-started.yml.example that we can reference in documentation for the getting started experience.

Proposed Solution

Introduce a configuration file --config getting-started.yml.example with Environment Variables embedded that:

  1. Act as an example for customers who want to use dynamic configuration for other purposes
  2. Allows configuring many common Crawler settings via Environment Variables env=value bin/crawler ...:
    a. URL to be crawled: CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html"
    b. Output to local file: OUTPUT_DIR="/test"
    c. Output to Elasticsearch: ES_HOST="https://192.168.0.1"

This would allow enable a bunch of new quick-start scenarios

Crawl example.com and print the results to the console

bin/crawler --config getting-started.yml.example

Crawl your companies website and print the results to the console

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to a local Directory

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  OUTPUT_DIR=./local/dir \
  bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to Elasticsearch

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  ES_HOST="https://localhost:9200" \
  bin/crawler --config getting-started.yml.example

Crawl your companies website and save it to Elasticsearch with a custom Index Pipeline and custom index

CRAWL_URL="https://www.elastic.co/guide/en/workplace-search/current/index.html" \
  ES_HOST="https://localhost:9200" \
  ES_INDEX="crawler-workplace-search" \
  ES_PIPELINE="my-ent-search-pipeline" \  
  bin/crawler --config getting-started.yml.example

Alternatives

Additional Context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions