Skip to content

GlobalFishingWatch/pipe-segment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

729 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pipe-segment

Coverage Python versions Last release

This repository contains the segment pipeline, a dataflow pipeline which divides vessel tracks into contiguous "segments", separating out noise and signals that may come from two or more vessels which are broadcasting using the same MMSI at the same time.

Usage

System Dependencies

Install Docker Engine using the docker official instructions (avoid snap packages) and the docker compose plugin. No other system dependencies are required.

Installation

First, make sure you have git installed, and configure a SSH-key for GitHub.

Then, clone the repository:

git clone git@github.com:GlobalFishingWatch/pipe-segment.git

Create virtual environment and activate it:

python -m venv .venv
. ./.venv/bin/activate

Install dependencies

make install

Make sure you can run unit tests

make test

Make sure you can build the docker image:

make docker-build

In order to be able to connect to BigQuery, authenticate and configure the project:

make docker-gcp

You can check the examples folder to see how to run the pipe

CLI

The pipeline includes a CLI that can be used to start both local test runs and remote full runs.

Wtih docker compose run dev --help you can see the available processes:

$ docker compose run dev --help
Available Commands
  segment                     run the segmenter in dataflow
  segment_identity_daily      generate daily summary of identity messages
                              per segment
  segment_vessel_daily        generate daily vessel_ids per segment
  segment_info                create a segment_info table with one row
                              per segment
  vessel_info                 create a vessel_info table with one row
                              per vessel_id
  segment_vessel              Create a many-to-many table mapping between
                              segment_id, vessel_id and ssvid

If you want to know the parameters of one of the processes, run for example:

docker compose run dev segment --help

How to contribute

The Makefile should ease the development process.

Git Workflow

Please refer to our git workflow documentation to know how to manage branches in this repository.

Updating dependencies

The requirements.txt contains all transitive dependencies pinned to specific versions. This file is compiled automatically with pip-tools, based on requirements/prod.in.

Use requirements/prod.in to specify high-level dependencies with restrictions. Do not modify requirements.txt manually.

To re-compile dependencies, just run

make reqs

If you want to upgrade all dependencies to latest available versions (compatible with restrictions declared), just run:

make reqs-upgrade

Schema

To get the schema for an existing bigquery table - use something like this

bq show --format=prettyjson world-fishing-827:pipeline_measures_p_p516_daily.20170923 | jq '.schema'`

About

Dataflow pipeline for running the vessel track segmenter.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages