|
| 1 | +# Pixels Retina |
| 2 | + |
| 3 | +Retina (<ins>re</ins>al-<ins>ti</ins>me a<ins>na</ins>lytics) is the real-time data synchronization framework in Pixels. |
| 4 | +It supports replaying data-change operations from a log-based CDC (change-data-capture) source as mirror transactions on the columnar table data. |
| 5 | +It proposes a light-weight MVCC mechanism and corresponding version storage to support parallel mirror transaction replay and |
| 6 | +concurrent analytical query processing, and a vectorized-filter-on-read (VFoR) approach for analytical queries to read consistent |
| 7 | +data snapshots. |
| 8 | + |
| 9 | +Compared to the merge-on-read (MoR) based on catalog snapshots in existing lakehouse systems, |
| 10 | +such as Apache Iceberg and Apache Paimon, Retina supports real row-granular (instead of batch-granular) transactional |
| 11 | +data change replay without the expensive version merging and data compaction mechanisms. |
| 12 | +Evaluations show that Retina simultaneously provides 10-ms-level data freshness and over 3.2M row/s scalable data-change |
| 13 | +replay throughput, without compromising query performance or resource cost-efficiency, |
| 14 | +significantly outperforming state-of-the-art lakehouses, Iceberg and Paimon, which provides minute-level data freshness |
| 15 | +and one order of magnitude lower data-change throughput. |
| 16 | + |
| 17 | +## Retina Components |
| 18 | + |
| 19 | +The components related to Retina are: |
| 20 | + |
| 21 | +- Sink: It connects to CDC streams from Debezium, reconstructs the data-change messages in the |
| 22 | +CDC stream into mirror transactions, and send the data-change operations in mirror transaction through stream RPC to Pixels-Retina. |
| 23 | +[Source code](https://github.com/pixelsdb/pixels-sink); |
| 24 | +- Replayer: It receives the data-change operations from Pixels-Sink and replays them on the columnar data tables. |
| 25 | +Source code: |
| 26 | +[core data structures and operations](../cpp/pixels-retina), |
| 27 | +[top-level replay and garbage collection](.), |
| 28 | +[client](../pixels-common/src/main/java/io/pixelsdb/pixels/common/retina), |
| 29 | +and [server](../pixels-daemon/src/main/java/io/pixelsdb/pixels/daemon/retina). |
| 30 | +RPC handling are in this directory and . |
| 31 | +- Transaction Service: It allocates transaction timestamps for mirror transactions and analytical queries, and manages the timestamp watermarks |
| 32 | +for the MVCC protocol of Retina. |
| 33 | +Source code: |
| 34 | +[client](../pixels-common/src/main/java/io/pixelsdb/pixels/common/transaction) and |
| 35 | +[server](../pixels-daemon/src/main/java/io/pixelsdb/pixels/daemon/transaction). |
| 36 | +- Index Service: It is a multi-version index that mapping the index key (e.g., primary key or secondary key) to row location. |
| 37 | +Replayer looks up and updates the primary index during data-change replay. |
| 38 | +Source code: |
| 39 | +[framework and interfaces](../pixels-common/src/main/java/io/pixelsdb/pixels/common/index), |
| 40 | +[pluggable implementations](../pixels-index), |
| 41 | +[clients](../pixels-common/src/main/java/io/pixelsdb/pixels/common/index/service) |
| 42 | +[server](../pixels-daemon/src/main/java/io/pixelsdb/pixels/daemon/index) |
| 43 | +- Catalog Service (i.e., metadata service): It manages the schema, statistics, and data catalog of tables. |
| 44 | +Source code: [client](../pixels-common/src/main/java/io/pixelsdb/pixels/common/metadata) and |
| 45 | +[server](../pixels-daemon/src/main/java/io/pixelsdb/pixels/daemon/metadata). |
| 46 | +- Columnar file format: It provides the file format definition, reader, writer of the Pixels file format. [Source code](../pixels-core). |
| 47 | +- Trino connector: It runs inside the Trino cluster to access the services of Retina/Pixels, and calls the file reader to read data. |
| 48 | +[Source code](https://github.com/pixelsdb/pixels-trino). |
| 49 | + |
| 50 | +## Usage |
0 commit comments