Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions mkdocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,14 @@ repo_url: https://github.com/datajoint-company/documentation
repo_name: datajoint-company/documentation
nav:
- Welcome: welcome.md
- Concepts:
- Datatypes: concepts/datatypes.md
- Core: core.md
- Elements: elements.md
- Concepts:
- Mantra: concepts/mantra.md
- Query Language:
- Datatypes: concepts/query-lang/datatypes.md
- Referential Integrity:
- Query Backend: concepts/ref-integrity/query-backend.md
- Glossary: glossary.md
- Community:
- Contribution: community/contribution.md
Expand Down
100 changes: 99 additions & 1 deletion src/community/contribution.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,102 @@ Thank you for your interest in contributing! 🤝

To help keep everyone in alignment and coordinated in the community effort, we’ve created this document. It serves as the contribution guideline that outlines how open-source software development is to be conducted. Any software development that makes reference to this document can be assumed to adopt the policies outlined below. We’ve structured the guideline in a FAQ (frequently asked questions) format to make it easier to digest. Feel free to review the questions below to determine any specific policy.

The principal maintainer of DataJoint and associated tools is the DataJoin company. The pronouns “we” and “us” in this guideline refer to the principal maintainers. We invite reviews and contributions of the open-source software. We compiled these guidelines to make this work clear and efficient.
The principal maintainer of DataJoint and associated tools is the DataJoint company. The pronouns “we” and “us” in this guideline refer to the principal maintainers. We invite reviews and contributions of the open-source software. We compiled these guidelines to make this work clear and efficient.

## 1) Which issue should I contribute towards?

There are three primary things to consider when looking to contribute.

**Availability:** An indication of whether anyone is currently working on a fix for the given issue. Availability is indicated by who is `assigned`. Issues that are `unassigned` mean that there is no one yet working on resolving the issue and the issue is available for someone to work on. If an issue has been assigned, then any additional work on that issue should be coordinated with the assignee.

**Specification:** In order for issues to be properly addressed, the requirements of satisfying and closing the issue should be clear. If it is not, a label will be added as `unspecified`. This could be due to more debug info being necessary, more details on intended behavior, or perhaps that further discussion is required to determine a good solution. Feel free to help us arrive at a proper specification.

**Priority:** As a community, we work on a concerted effort to bring about the realization of the milestones. We utilize milestones as a planning tool to help focus a group of changes around a release. To determine the priority of issues, simply have a look at the next milestone that is expected to arrive. Therefore, each milestone following this can be understood as lower in priority respectively. Bear in mind that much like a hurricane forecast, the execution plan is much more likely to be accurate the closer to today’s date as opposed to milestones further out. Extremely low priority issues are assigned to the `Backburner` milestone. Since `Backburner` does not have a target date, this indicates that its issues may be deferred indefinitely. Occasionally the maintainers will move issues from `Backburner` as it makes sense to address them within a release. Also, issues `unassigned` to a milestone can be understood as new issues which have not been triaged.

After considering the above, you may comment on the issue you’d like to help fix and a maintainer will assign it to you.

## 2) What is the proper etiquette for proposing changes as contribution?

What is generally expected from new contributions are the following:

Any proposed contributor changes should be introduced in the form of a pull request (PR) from their fork.

Proper branch target specified. The following are the generally the available branches that can be targeted:

- `main` or `master`: Represents the single source of truth and the latest in completed development.

- `pre`: Represents the source at the point of the last stable release.

For larger more involved changes, a maintainer may determine it best to create a feature-specific branch and adjust the PR accordingly.

A summary description that describes the overall intent behind the PR.

Proper links to the issue(s) that the PR serves to resolve.

Newly introduced changes must pass any required checks. Typically as it relates to tests, this means:

1. No syntax errors
1. No integration errors
1. No style errors e.g. PEP8, etc.
1. Similar or better code coverage

Additional documentation to reflect new feature or behavior introduced.

Necessary updates to the changelog following [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) convention.

A contributor should not approve or merge their own PR.

Reviewer suggestions or feedback should not be directly committed to a branch on a contributor’s fork. A less intrusive way to collaborate would be for the reviewer to PR to the contributor’s fork/branch that is associated with the main PR currently in review.

Maintainers will also ensure that PR’s have the appropriate assignment for reviewer, milestone, and project.

## 3) How can I track the progress of an issue that has been assigned?

Since milestones represent the development plan, projects represent the actual execution. Projects are typically fixed-time sprints (1-2 weeks). A ‘workable’ number of issues that have been assigned to developers and assigned to the next milestone are selected and tracked in each project to provide greater granularity in the week-to-week progress. Automation is included observing the `Automated kanban with reviews` template. Maintainers will adjust the project assignment to reflect the order in which to resolve the milestone issues.

## 4) What is the release process? How do I know when my merged contribution will officially make it into a release?
Releases follow the standard definition of [semantic versioning](https://semver.org/spec/v2.0.0.html). Meaning:

`MAJOR` . `MINOR` . `PATCH`

- `MAJOR` version when you make incompatible API changes,

- `MINOR` version when you add functionality in a backwards compatible manner, and

- `PATCH` version when you make backwards compatible bug fixes.

Each release requires tagging the commit appropriately and is then issued in the normal medium for release e.g. PyPi, NPM, YARN, GitHub Release, etc.

Minor releases are triggered when all the issues assigned to a milestone are resolved and closed. Patch releases are triggered periodically from `main` or `master` after a reasonable number of PR merges have come in.

## 5) I am not yet too comfortable contributing but would like to engage the community. What is the policy on community engagement?

In order to follow the appropriate process and setting, please reference the following flow for your desired mode of engagement:

### 5a) Generally, how do I perform __________?

If the documentation does not provide clear enough instruction, please see StackOverflow posts related to the [datajoint](https://stackoverflow.com/questions/tagged/datajoint) tag or ask a new question tagging it appropriately. You may refer to our [datajoint tag wiki](https://stackoverflow.com/tags/datajoint/info) for more details on its proper use.

### 5b) I just encountered this error, how can I resolve it?

Please see StackOverflow posts related to the [datajoint](https://stackoverflow.com/questions/tagged/datajoint) tag or ask a new question tagging it appropriately. You may refer to our [datajoint tag wiki](https://stackoverflow.com/tags/datajoint/info) for more details on its proper use.

### 5c) I just encountered this error and I am sure it is a bug, how do I report it?

Please file it under the issue tracker associated with the open-source software.

### 5d) I have an idea or new feature request, how do I submit it?

Please file it under the issue tracker associated with the open-source software.

### 5e) I am curious why the maintainers choose to __________? i.e. questions that are ‘opinionated’ in nature with answers that some might disagree.

Please join the community on the [DataJoint Slack](https://join.slack.com/t/datajoint/shared_invite/enQtMjkwNjQxMjI5MDk0LTQ3ZjFiZmNmNGVkYWFkYjgwYjdhNTBlZTBmMWEyZDc2NzZlYTBjOTNmYzYwOWRmOGFmN2MyYzU0OWQ0MWZiYTE) and ask on the most relevant channel. There, you may engage directly with the maintainers for proper discourse.

### 5f) What is the timeline or roadmap for the release of certain supported features?

Please refer to milestones and projects associated with the open-source software.

### 5g) I need urgent help best suited for live debugging, how can I reach out directly?

Please join the community on the [DataJoint Slack](https://join.slack.com/t/datajoint/shared_invite/enQtMjkwNjQxMjI5MDk0LTQ3ZjFiZmNmNGVkYWFkYjgwYjdhNTBlZTBmMWEyZDc2NzZlYTBjOTNmYzYwOWRmOGFmN2MyYzU0OWQ0MWZiYTE) and ask on the most relevant channel. Please bear in mind that as open-source community software, availability of the maintainers might be limited.
56 changes: 56 additions & 0 deletions src/concepts/mantra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Mantra

The *DataJoint Mantra* consists of three main objectives:

- Simplify your queries through an intuitive [query language](./#query-language).
- Make automated, [reproducible computation](./#reproducible-computation) by integrating computation with the data model.
- Ensure validity of your data through [referential integrity](./#referential-integrity).

## Query Language

Writing good, optimized [SQL](https://en.wikipedia.org/wiki/SQL) queries can be difficult and often becomes a barrier for individuals lacking experience in [computer science](https://en.wikipedia.org/wiki/Computer_science) and programming. That said, we don't feel this should discourage the use of databases. Databases help to structure our daily lives which streamlines the time required to glean insights and build robust applications from truth. [SQL](https://en.wikipedia.org/wiki/SQL) is powerful but requires practice which we feel is the real fault in the language.

To address this, the DataJoint query language serves as a query builder and optimizer for [SQL](https://en.wikipedia.org/wiki/SQL). It leverages the stack's own operator precedence and combines it with both operator overloading and [SQL](https://en.wikipedia.org/wiki/SQL) algebra to achieve a more intuitive experience. Additionally, interoperability between Python and MATLAB is crucial due to the diversity of tools available to scientists. So much so that this is a guiding principle in [FAIR](https://www.go-fair.org/fair-principles/).

Case in point, here is a comparison of equivalent queries:

*SQL*

```sql
SELECT *
FROM `shapes`.`rectangle`
NATURAL JOIN `shapes`.`area`
WHERE (
(`shape_area`=8) AND (`shape_height`=2)
);
```

*DataJoint (Python)*
```python
Rectangle * Area & dict(shape_height=2, shape_area=8)
```

*DataJoint (MATLAB)*
```matlab
shapes.Rectangle * shapes.Area & struct('shape_height', 2, 'shape_area', 8)
```

## Reproducible Computation

Reproducibility is a key concept within the scientific community since research is largely conducted, shared, and reviewed in the public domain. This is necessary to independently validate discoveries and have others support new findings. Such a practice is well advocated in the scientific community as [open science](Open_science).

Yet, reliably reproducing computed results of others has proven difficult since there are many factors that affect the determinism of a process e.g. hardware, software environment, scripts, input data, seeding, etc.

DataJoint pipelines address these challenges by allowing computation to be defined such that they are associated *with* an entity. Drawing relationships between many entities we can create a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) that describes a compute workflow as an [entity-relationship model](https://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model).

For instance, an entity such as `Area` could represent the computed value of a parent entity, `Rectangle`. Therefore, we feel it should be reasonable when defining `Area` to include the specification of a computation that automates how `Area` is generated based on relation to `Rectangle`.

## Referential Integrity

Referential integrity is the concept of keeping all your data consistent and up-to-date. The goal is to ensure [data pipelines](../glossary#data-pipeline) always reflect the truth of how data was created.

In the realm of databases, entities can be related to one another through [foreign keys](https://en.wikipedia.org/wiki/Foreign_key). However, our opinionated view is that foreign keys on [primary keys](https://en.wikipedia.org/wiki/Primary_key) should enforce the contraint.

What this means is that our data model always reflects the truth. When a parent entity is removed, all child computed values will also be removed since they no longer have meaning without the subject. There is not a clear way to reproduce the results otherwise.

An important consequence to note is that deletes take longer as a result since they must be cascaded down to all the descendants. We believe this to be a feature as it is the behavior most inline with typical expectations. Deletes should be done cautiously.
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,14 @@ Throughout the DataJoint ecosystem, there are several datatypes that are used to

## Standard Types

These types are largely wrappers around existing types in MySQL since this is the backend to the DataJoint Engine.
These types are largely wrappers around existing types in the current [query backend](../../ref-integrity/query-backend) for [data pipelines](../../../glossary#data-pipeline).

| Datatype | Description | Size | Example |
| --- | --- | ---| --- |
| int | integer | 4 bytes | `8` |
| <span id="int">int</span> | integer | 4 bytes | `8` |

## Unique Types

| Datatype | Description | Size | Example |
| --- | --- | ---| --- |
| uuid | a unique GUID value | 16 bytes | `6ed5ed09-e69c-466f-8d06-a5afbf273e61` |
| <span id="uuid">uuid</span> | a unique GUID value | 16 bytes | `6ed5ed09-e69c-466f-8d06-a5afbf273e61` |
9 changes: 9 additions & 0 deletions src/concepts/ref-integrity/query-backend.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Query Backend

Currently, data pipelines use [MySQL](https://www.mysql.com/why-mysql/) server for its query backend.

The following are some important topics to maintain a healthy system:

- Access Control
- Optimal Server Configuration
- Maintenance Guidelines
38 changes: 15 additions & 23 deletions src/core.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,17 @@
# Core

## Relational Database
DataJoint Core projects are fully open-source and are built to develop, define, manage, and visualize [data pipelines](../glossary#data-pipeline). Below are the projects that make up the family of core open-source projects.

- MySQL usage
- Optimal configuration
- Maintenance
- Permission management and access control
## [API](https://en.wikipedia.org/wiki/API)'s

- **[DataJoint Python](https://datajoint.com/docs/core/datajoint-python/)**: A low-level client for managing [data pipelines](../glossary#data-pipeline).
- **[DataJoint MATLAB](https://datajoint.com/docs/core/datajoint-matlab/)**: A low-level client for managing [data pipelines](../glossary#data-pipeline).
- **[Pharus](https://datajoint.com/docs/core/pharus/)**: Expose [data pipelines](../glossary#data-pipeline) via a [REST](https://en.wikipedia.org/wiki/Representational_state_transfer) interface.

## Programming Interfaces
## Web [GUI](https://en.wikipedia.org/wiki/Graphical_user_interface)'s

Below are the projects that make up the family of core open-source projects:

- **[Python API](https://datajoint.com/docs/core/datajoint-python/)**: Relational framework that allows for intuitive queries and reproducible computation.
- **[MATLAB API](https://datajoint.com/docs/core/datajoint-matlab/)**: Relational framework that allows for intuitive queries and reproducible computation.
- **[Pharus](https://datajoint.com/docs/core/pharus/)**: REST interface for communicating with data pipelines.

## Web Interfaces

- **[LabBook](https://datajoint.com/docs/core/datajoint-labbook/)**: Data entry and data model browsing web GUI.
- **[SciViz](https://datajoint.com/docs/core/sci-viz/)**: Visualization framework for making low-code web apps.
- **[LabBook](https://datajoint.com/docs/core/datajoint-labbook/)**: Data entry and data model browsing for [data pipelines](../glossary#data-pipeline).
- **[SciViz](https://datajoint.com/docs/core/sci-viz/)**: A visualization framework for making [low-code](https://en.wikipedia.org/wiki/Low-code_development_platform) web apps for [data pipelines](../glossary#data-pipeline).

## Container Images

Expand All @@ -33,10 +25,10 @@ graph
datajoint/djlab --> datajoint/djlabhub;
```

- **[datajoint/mysql](https://datajoint.com/docs/core/mysql-docker/)**: Optimized MySQL image for use with DataJoint Engine.
- **[datajoint/miniconda3](https://datajoint.com/docs/core/miniconda3-docker/)**: Minimal Python image with `conda`.
- **[datajoint/djbase](https://datajoint.com/docs/core/djbase-docker/)**: DataJoint engine dependencies only.
- **[datajoint/djtest](https://datajoint.com/docs/core/djtest-docker/)**: Includes testing tools like `pytest`.
- **[datajoint/datajoint](https://datajoint.com/docs/core/datajoint-python/)**: Official DataJoint engine image.
- **[datajoint/djlab](https://datajoint.com/docs/core/djlab-docker/)**: Includes local Jupyter Lab environment.
- **[datajoint/djlabhub](https://datajoint.com/docs/core/djlabhub-docker/)**: Includes necessary dependencies for launching with Jupyter Hub.
- **[datajoint/mysql](https://datajoint.com/docs/core/mysql-docker/)**: An optimized, MySQL backend for [data pipelines](../glossary#data-pipeline).
- **[datajoint/miniconda3](https://datajoint.com/docs/core/miniconda3-docker/)**: A minimal Python image with [conda](https://docs.conda.io/en/latest/).
- **[datajoint/djbase](https://datajoint.com/docs/core/djbase-docker/)**: Adds only dependencies for managing [data pipelines](../glossary#data-pipeline).
- **[datajoint/djtest](https://datajoint.com/docs/core/djtest-docker/)**: Adds testing tools like [pytest](https://docs.pytest.org/en/7.1.x/).
- **[datajoint/datajoint](https://datajoint.com/docs/core/datajoint-python/)**: Official image for managing [data pipelines](../glossary#data-pipeline).
- **[datajoint/djlab](https://datajoint.com/docs/core/djlab-docker/)**: Adds a local [Jupyter Lab](https://jupyterlab.readthedocs.io/en/stable/) environment.
- **[datajoint/djlabhub](https://datajoint.com/docs/core/djlabhub-docker/)**: Adds a client to allow hosting with [Jupyter Hub](https://jupyter.org/hub).
6 changes: 2 additions & 4 deletions src/glossary.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
# Glossary

There are many terms that are reused throughout the documentation that we feel it imporant to define together. We've taken careful consideration to be consistent so below you will find how we've understood and use these terms.
There are many terms that are reused throughout the documentation that we feel important to define together. We've taken careful consideration to be consistent. Below you will find how we've understood and use these terms.

| Term | Definition |
| --- | --- |
| DAG | directed acyclic graph |
| task | an independent unit of processing |
| workflow | programs that are guaranteed to eventually reach a terminal state represented as DAGs of tasks |
| <span id="data-pipeline">data pipeline</span> | formal definition of a [DAG](https://en.wikipedia.org/wiki/Directed_acyclic_graph) of [processes](https://en.wikipedia.org/wiki/Process) that achieves the [DataJoint Mantra](../concepts/mantra) |
Loading