Skip to content

Commit a85dd4f

Browse files
authored
Restructure README.md (#9)
Move the database list, appropriate use cases, and quickstart to the top of the readme.
1 parent ce11ae4 commit a85dd4f

File tree

1 file changed

+22
-22
lines changed

1 file changed

+22
-22
lines changed

README.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,28 @@ OpenData is a collection of open source databases designed from ground up for ob
99

1010
Building performant, cost-effective, and correct online database on object storage takes special care. Successful designs all have to solve the problem of write batching, multiple levels of caching, and snapshot isolation for correctness. OpenData databases build on a common foundation to solve these problems. This common foundation gives our databases a common set of operational tools, configuration systems, etc., that make our databases easier to operate in aggregate.
1111

12+
# Databases
13+
14+
OpenData ships two databases today, with more on the way:
15+
16+
* **TSDB**: An objectstore native timeseries database that can serve as a backend for Prometheus. Its a great option for a low cost, easy to operate, grafana backend. [Learn more](open-tsdb/rfcs/0001-tsdb-storage.md).
17+
* **Log**: Think of it as Kafka 2.0. An objecstore native event streaming backend that supports millions of logs, so you can finally get a replayable log per key. [Learn more](open-log/rfcs/0001-storage.md).
18+
19+
20+
# Which usecases are OpenData Databases suited for?
21+
22+
The key feature of OpenData databases is that object storage is the sole persistence layer, and readers and writers coordinate solely via manifest files in object storage. This results in several interesting properties:
23+
1. Object storage being the sole persistence layer means that each Database instance can be tuned to trade off between [Latency, Cost, and Durability](https://materializedview.io/p/cloud-storage-triad-latency-cost-durability). This flexibility allows new workloads which may not have been economical with traditional designs.
24+
2. Since readers and writers are stateless and decoupled, each can be scaled to 0 independently. This means workloads with massive skews between writes and reads can be served far more economically with OpenData databases.
25+
3. The architecture allows several deployment models. It's possible for OpenData database components to be fully embedded in the application process. Or they can be fully distributed, with each component running as services in a k8s cluster. In either case, data is in S3 and always persistent. This makes per app, per agent, or other arrangements a natural fit.
26+
27+
The flip side of this decoupled architecture is that you have higher end-to-end latency between when data is inserted into the system and when it is returned in a query. This means truly interactive use cases where users must read their writes as soon as possible are not good fits for OpenData databases. However, when some end-to-end latency is acceptable, the flexiblity of the OpenData architecture makes OpenData databases the superior option in a cloud-native world.
28+
29+
# Quick Start
30+
31+
TODO.
32+
33+
1234
# Architecture
1335

1436
## 10,000ft view
@@ -105,28 +127,6 @@ In addition to solving core storage problems, SlateDB also solves this basic met
105127

106128
Different databases likely need different metadata. Making the manifest system extensible would allow us to use it across databases, which we think is a high leverage thing to do.
107129

108-
# Which usecases are OpenData Databases suited for?
109-
110-
The key feature of OpenData databases is that object storage is the sole persistence layer, and readers and writers coordinate solely via manifest files in object storage. This results in several interesting properties:
111-
1. Object storage being the sole persistence layer means that each Database instance can be tuned to trade off between [Latency, Cost, and Durability](https://materializedview.io/p/cloud-storage-triad-latency-cost-durability). This flexibility allows new workloads which may not have been economical with traditional designs.
112-
2. Since Ingestors, Compactors, and Query Executors are completely stateless and decoupled, each can be scaled to 0 independently. This means workloads with massive skews between writes and reads can be served far more economically with OpenData databases.
113-
3. The architecture allows several deployment models. It's possible for OpenData databases to be fully embedded, with the Ingestors, Compactors, and Query Executors running within the application process. Or they can be fully distributed, with each component running as services in a k8s cluster. In either case, data is in S3 and always persistent. This makes per app, per agent, or other arrangements a natural fit.
114-
115-
The flip side of this decoupled architecture is that you have higher end-to-end latency between when data is inserted into the system and when it is returned in a query. This means truly interactive use cases where users must read their writes as soon as possible are not good fits for OpenData databases. However, when some end-to-end latency is acceptable, the flexiblity of the OpenData architecture makes OpenData databases the superior option in a cloud-native world.
116-
117-
118-
# Databases
119-
120-
OpenData ships two databases today:
121-
122-
* TSDB: An objectstore native timeseries database that can serve as a backend for Prometheus. Its a great option for a low cost, easy to operate, grafana backend. Learn more about it [here](open-tsdb/README.md).
123-
* Log: Think of it as Kafka 2.0. An objecstore native event streaming backend that supports millions of logs, so you can finally get a replayable log per key. Learn more about it [here](open-log/rfcs/0001-storage.md).
124-
125-
# Quick Start
126-
127-
TODO.
128-
129-
130130
# Why OpenData?
131131

132132
1. We believe that object storage is a fundamentally new ingredient in data systems: it provides highly durable, highly available, infinite storage with unique performance and cost structures. It solves one of the hardest problems in distributed data systems: consistent replication. At the same time, tremendous care must be taken to make object storage work correctly, performantly, and cost-effectively. When done right, systems built natively on object storage are far simpler and cheaper to operate in modern clouds than the alternatives. We want to bring the benefits of object storage to every database.

0 commit comments

Comments
 (0)