You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,6 +9,28 @@ OpenData is a collection of open source databases designed from ground up for ob
9
9
10
10
Building performant, cost-effective, and correct online database on object storage takes special care. Successful designs all have to solve the problem of write batching, multiple levels of caching, and snapshot isolation for correctness. OpenData databases build on a common foundation to solve these problems. This common foundation gives our databases a common set of operational tools, configuration systems, etc., that make our databases easier to operate in aggregate.
11
11
12
+
# Databases
13
+
14
+
OpenData ships two databases today, with more on the way:
15
+
16
+
***TSDB**: An objectstore native timeseries database that can serve as a backend for Prometheus. Its a great option for a low cost, easy to operate, grafana backend. [Learn more](open-tsdb/rfcs/0001-tsdb-storage.md).
17
+
***Log**: Think of it as Kafka 2.0. An objecstore native event streaming backend that supports millions of logs, so you can finally get a replayable log per key. [Learn more](open-log/rfcs/0001-storage.md).
18
+
19
+
20
+
# Which usecases are OpenData Databases suited for?
21
+
22
+
The key feature of OpenData databases is that object storage is the sole persistence layer, and readers and writers coordinate solely via manifest files in object storage. This results in several interesting properties:
23
+
1. Object storage being the sole persistence layer means that each Database instance can be tuned to trade off between [Latency, Cost, and Durability](https://materializedview.io/p/cloud-storage-triad-latency-cost-durability). This flexibility allows new workloads which may not have been economical with traditional designs.
24
+
2. Since readers and writers are stateless and decoupled, each can be scaled to 0 independently. This means workloads with massive skews between writes and reads can be served far more economically with OpenData databases.
25
+
3. The architecture allows several deployment models. It's possible for OpenData database components to be fully embedded in the application process. Or they can be fully distributed, with each component running as services in a k8s cluster. In either case, data is in S3 and always persistent. This makes per app, per agent, or other arrangements a natural fit.
26
+
27
+
The flip side of this decoupled architecture is that you have higher end-to-end latency between when data is inserted into the system and when it is returned in a query. This means truly interactive use cases where users must read their writes as soon as possible are not good fits for OpenData databases. However, when some end-to-end latency is acceptable, the flexiblity of the OpenData architecture makes OpenData databases the superior option in a cloud-native world.
28
+
29
+
# Quick Start
30
+
31
+
TODO.
32
+
33
+
12
34
# Architecture
13
35
14
36
## 10,000ft view
@@ -105,28 +127,6 @@ In addition to solving core storage problems, SlateDB also solves this basic met
105
127
106
128
Different databases likely need different metadata. Making the manifest system extensible would allow us to use it across databases, which we think is a high leverage thing to do.
107
129
108
-
# Which usecases are OpenData Databases suited for?
109
-
110
-
The key feature of OpenData databases is that object storage is the sole persistence layer, and readers and writers coordinate solely via manifest files in object storage. This results in several interesting properties:
111
-
1. Object storage being the sole persistence layer means that each Database instance can be tuned to trade off between [Latency, Cost, and Durability](https://materializedview.io/p/cloud-storage-triad-latency-cost-durability). This flexibility allows new workloads which may not have been economical with traditional designs.
112
-
2. Since Ingestors, Compactors, and Query Executors are completely stateless and decoupled, each can be scaled to 0 independently. This means workloads with massive skews between writes and reads can be served far more economically with OpenData databases.
113
-
3. The architecture allows several deployment models. It's possible for OpenData databases to be fully embedded, with the Ingestors, Compactors, and Query Executors running within the application process. Or they can be fully distributed, with each component running as services in a k8s cluster. In either case, data is in S3 and always persistent. This makes per app, per agent, or other arrangements a natural fit.
114
-
115
-
The flip side of this decoupled architecture is that you have higher end-to-end latency between when data is inserted into the system and when it is returned in a query. This means truly interactive use cases where users must read their writes as soon as possible are not good fits for OpenData databases. However, when some end-to-end latency is acceptable, the flexiblity of the OpenData architecture makes OpenData databases the superior option in a cloud-native world.
116
-
117
-
118
-
# Databases
119
-
120
-
OpenData ships two databases today:
121
-
122
-
* TSDB: An objectstore native timeseries database that can serve as a backend for Prometheus. Its a great option for a low cost, easy to operate, grafana backend. Learn more about it [here](open-tsdb/README.md).
123
-
* Log: Think of it as Kafka 2.0. An objecstore native event streaming backend that supports millions of logs, so you can finally get a replayable log per key. Learn more about it [here](open-log/rfcs/0001-storage.md).
124
-
125
-
# Quick Start
126
-
127
-
TODO.
128
-
129
-
130
130
# Why OpenData?
131
131
132
132
1. We believe that object storage is a fundamentally new ingredient in data systems: it provides highly durable, highly available, infinite storage with unique performance and cost structures. It solves one of the hardest problems in distributed data systems: consistent replication. At the same time, tremendous care must be taken to make object storage work correctly, performantly, and cost-effectively. When done right, systems built natively on object storage are far simpler and cheaper to operate in modern clouds than the alternatives. We want to bring the benefits of object storage to every database.
0 commit comments