Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions .github/workflows/docs-check-links.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#

name: Check Markdown docs links

on:
push:
paths:
- docs/**
Comment thread
Fokko marked this conversation as resolved.
- site/**
branches:
- 'main'
pull_request:
workflow_dispatch:

jobs:
markdown-link-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: gaurav-nelson/github-action-markdown-link-check@v1
with:
config-file: 'site/link-checker-config.json'
use-verbose-mode: yes
7 changes: 2 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
- under the License.
-->

![Iceberg](https://iceberg.apache.org/docs/latest/img/Iceberg-logo.png)
![Iceberg](https://iceberg.apache.org/assets/images/Iceberg-logo.svg)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broken on the repository 😱


[![](https://github.com/apache/iceberg/actions/workflows/java-ci.yml/badge.svg)](https://github.com/apache/iceberg/actions/workflows/java-ci.yml)
[![Slack](https://img.shields.io/badge/chat-on%20Slack-brightgreen.svg)](https://apache-iceberg.slack.com/)
Expand All @@ -37,11 +37,8 @@ The core Java library is located in this repository and is the reference impleme

[Documentation][iceberg-docs] is available for all libraries and integrations.

Current work is tracked in the [roadmap][roadmap].

[iceberg-docs]: https://iceberg.apache.org/docs/latest/
[iceberg-spec]: https://iceberg.apache.org/spec
[roadmap]: https://iceberg.apache.org/roadmap/
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in #9941

[iceberg-spec]: https://iceberg.apache.org/spec/

## Collaboration

Expand Down
8 changes: 4 additions & 4 deletions docs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,9 +108,9 @@ Iceberg tables support table properties to configure table behavior, like the de
Reserved table properties are only used to control behaviors when creating or updating a table.
The value of these properties are not persisted as a part of the table metadata.

| Property | Default | Description |
| -------------- | -------- | ------------------------------------------------------------- |
| format-version | 2 | Table's format version (can be 1 or 2) as defined in the [Spec](../../../spec/#format-versioning). Defaults to 2 since version 1.4.0. |
| Property | Default | Description |
| -------------- | -------- |--------------------------------------------------------------------------------------------------------------------------------------|
| format-version | 2 | Table's format version (can be 1 or 2) as defined in the [Spec](../../spec.md#format-versioning). Defaults to 2 since version 1.4.0. |

### Compatibility flags

Expand All @@ -131,7 +131,7 @@ Iceberg catalogs support using catalog properties to configure catalog behaviors
| clients | 2 | client pool size |
| cache-enabled | true | Whether to cache catalog entries |
| cache.expiration-interval-ms | 30000 | How long catalog entries are locally cached, in milliseconds; 0 disables caching, negative values disable expiration |
| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](../metrics-reporting.md) section for additional details |
| metrics-reporter-impl | org.apache.iceberg.metrics.LoggingMetricsReporter | Custom `MetricsReporter` implementation to use in a catalog. See the [Metrics reporting](metrics-reporting.md) section for additional details |

`HadoopCatalog` and `HiveCatalog` can access the properties in their constructors.
Any other custom catalog can access the properties by implementing `Catalog.initialize(catalogName, catalogProperties)`.
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/daft.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ title: "Daft"

# Daft

[Daft](www.getdaft.io) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.
[Daft](https://www.getdaft.io/) is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry.

It exposes its flavor of the familiar [Python DataFrame API](https://www.getdaft.io/projects/docs/en/latest/api_docs/dataframe.html) which is a common abstraction over querying tables of data in the Python data ecosystem.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/flink-actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ title: "Flink Actions"

## Rewrite files action

Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](../maintenance.md#compact-data-files).
Iceberg provides API to rewrite small files into large files by submitting Flink batch jobs. The behavior of this Flink action is the same as Spark's [rewriteDataFiles](maintenance.md#compact-data-files).

```java
import org.apache.iceberg.flink.actions.Actions;
Expand Down
6 changes: 3 additions & 3 deletions docs/docs/flink-connector.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,13 @@ To create the table in Flink SQL by using SQL syntax `CREATE TABLE test (..) WIT
* `connector`: Use the constant `iceberg`.
* `catalog-name`: User-specified catalog name. It's required because the connector don't have any default value.
* `catalog-type`: `hive` or `hadoop` for built-in catalogs (defaults to `hive`), or left unset for custom catalog implementations using `catalog-impl`.
* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](../flink.md#adding-catalogs) for more details.
* `catalog-impl`: The fully-qualified class name of a custom catalog implementation. Must be set if `catalog-type` is unset. See also [custom catalog](flink.md#adding-catalogs) for more details.
* `catalog-database`: The iceberg database name in the backend catalog, use the current flink database name by default.
* `catalog-table`: The iceberg table name in the backend catalog. Default to use the table name in the flink `CREATE TABLE` sentence.

## Table managed in Hive catalog.

Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](../flink.md).
Before executing the following SQL, please make sure you've configured the Flink SQL client correctly according to the [quick start documentation](flink.md).

The following SQL will create a Flink table in the current Flink catalog, which maps to the iceberg table `default_database.flink_table` managed in iceberg catalog.

Expand Down Expand Up @@ -138,4 +138,4 @@ SELECT * FROM flink_table;
3 rows in set
```

For more details, please refer to the Iceberg [Flink documentation](../flink.md).
For more details, please refer to the Iceberg [Flink documentation](flink.md).
2 changes: 1 addition & 1 deletion docs/docs/flink-ddl.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ Table create commands support the commonly used [Flink create clauses](https://n

* `PARTITION BY (column1, column2, ...)` to configure partitioning, Flink does not yet support hidden partitioning.
* `COMMENT 'table document'` to set a table description.
* `WITH ('key'='value', ...)` to set [table configuration](../configuration.md) which will be stored in Iceberg table properties.
* `WITH ('key'='value', ...)` to set [table configuration](configuration.md) which will be stored in Iceberg table properties.

Currently, it does not support computed column and watermark definition etc.

Expand Down
2 changes: 1 addition & 1 deletion docs/docs/flink-queries.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ SET table.exec.iceberg.use-flip27-source = true;

### Reading branches and tags with SQL
Branch and tags can be read via SQL by specifying options. For more details
refer to [Flink Configuration](../flink-configuration.md#read-options)
refer to [Flink Configuration](flink-configuration.md#read-options)

```sql
--- Read from branch b1
Expand Down
10 changes: 5 additions & 5 deletions docs/docs/flink-writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Iceberg supports `UPSERT` based on the primary key when writing data into v2 tab
) with ('format-version'='2', 'write.upsert.enabled'='true');
```

2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](../flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.
2. Enabling `UPSERT` mode using `upsert-enabled` in the [write options](#write-options) provides more flexibility than a table level config. Note that you still need to use v2 table format and specify the [primary key](flink-ddl.md/#primary-key) or [identifier fields](../../spec.md#identifier-field-ids) when creating the table.

```sql
INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
Expand Down Expand Up @@ -185,7 +185,7 @@ FlinkSink.builderFor(

### Branch Writes
Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
For more information on branches please refer to [branches](../branching.md).
For more information on branches please refer to [branches](branching.md).
```java
FlinkSink.forRowData(input)
.tableLoader(tableLoader)
Expand Down Expand Up @@ -262,13 +262,13 @@ INSERT INTO tableName /*+ OPTIONS('upsert-enabled'='true') */
...
```

Check out all the options here: [write-options](../flink-configuration.md#write-options)
Check out all the options here: [write-options](flink-configuration.md#write-options)

## Notes

Flink streaming write jobs rely on snapshot summary to keep the last committed checkpoint ID, and
store uncommitted data as temporary files. Therefore, [expiring snapshots](../maintenance.md#expire-snapshots)
and [deleting orphan files](../maintenance.md#delete-orphan-files) could possibly corrupt
store uncommitted data as temporary files. Therefore, [expiring snapshots](maintenance.md#expire-snapshots)
and [deleting orphan files](maintenance.md#delete-orphan-files) could possibly corrupt
the state of the Flink job. To avoid that, make sure to keep the last snapshot created by the Flink
job (which can be identified by the `flink.job-id` property in the summary), and only delete
orphan files that are old enough.
35 changes: 18 additions & 17 deletions docs/docs/flink.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,22 +22,22 @@ title: "Flink Getting Started"

Apache Iceberg supports both [Apache Flink](https://flink.apache.org/)'s DataStream API and Table API. See the [Multi-Engine Support](../../multi-engine-support.md#apache-flink) page for the integration of Apache Flink.

| Feature support | Flink | Notes |
| ----------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
| [SQL create catalog](../flink-ddl.md#create-catalog) | ✔️ | |
| [SQL create database](../flink-ddl.md#create-database) | ✔️ | |
| [SQL create table](../flink-ddl.md#create-table) | ✔️ | |
| [SQL create table like](../flink-ddl.md#create-table-like) | ✔️ | |
| [SQL alter table](../flink-ddl.md#alter-table) | ✔️ | Only support altering table properties, column and partition changes are not supported |
| [SQL drop_table](../flink-ddl.md#drop-table) | ✔️ | |
| [SQL select](../flink-queries.md#reading-with-sql) | ✔️ | Support both streaming and batch mode |
| [SQL insert into](../flink-writes.md#insert-into) | ✔️ ️ | Support both streaming and batch mode |
| [SQL insert overwrite](../flink-writes.md#insert-overwrite) | ✔️ ️ | |
| [DataStream read](../flink-queries.md#reading-with-datastream) | ✔️ ️ | |
| [DataStream append](../flink-writes.md#appending-data) | ✔️ ️ | |
| [DataStream overwrite](../flink-writes.md#overwrite-data) | ✔️ ️ | |
| [Metadata tables](../flink-queries.md#inspecting-tables) | ✔️ | |
| [Rewrite files action](../flink-actions.md#rewrite-files-action) | ✔️ ️ | |
| Feature support | Flink | Notes |
| -------------------------------------------------------- |-------|----------------------------------------------------------------------------------------|
| [SQL create catalog](flink-ddl.md#create-catalog) | ✔️ | |
| [SQL create database](flink-ddl.md#create-database) | ✔️ | |
| [SQL create table](flink-ddl.md#create-table) | ✔️ | |
| [SQL create table like](flink-ddl.md#create-table-like) | ✔️ | |
| [SQL alter table](flink-ddl.md#alter-table) | ✔️ | Only support altering table properties, column and partition changes are not supported |
| [SQL drop_table](flink-ddl.md#drop-table) | ✔️ | |
| [SQL select](flink-queries.md#reading-with-sql) | ✔️ | Support both streaming and batch mode |
| [SQL insert into](flink-writes.md#insert-into) | ✔️ ️ | Support both streaming and batch mode |
| [SQL insert overwrite](flink-writes.md#insert-overwrite) | ✔️ ️ | |
| [DataStream read](flink-queries.md#reading-with-datastream) | ✔️ ️ | |
| [DataStream append](flink-writes.md#appending-data) | ✔️ ️ | |
| [DataStream overwrite](flink-writes.md#overwrite-data) | ✔️ ️ | |
| [Metadata tables](flink-queries.md#inspecting-tables) | ✔️ | |
| [Rewrite files action](flink-actions.md#rewrite-files-action) | ✔️ ️ | |

## Preparation when using Flink SQL Client

Expand Down Expand Up @@ -69,6 +69,7 @@ export HADOOP_CLASSPATH=`$HADOOP_HOME/bin/hadoop classpath`
./bin/start-cluster.sh
```

<!-- markdown-link-check-disable-next-line -->
Start the Flink SQL client. There is a separate `flink-runtime` module in the Iceberg project to generate a bundled jar, which could be loaded by Flink SQL client directly. To build the `flink-runtime` bundled jar manually, build the `iceberg` project, and it will generate the jar under `<iceberg-root-dir>/flink-runtime/build/libs`. Or download the `flink-runtime` jar from the [Apache repository](https://repo.maven.apache.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.16/{{ icebergVersion }}/).

```bash
Expand Down Expand Up @@ -271,7 +272,7 @@ env.execute("Test Iceberg DataStream");

### Branch Writes
Writing to branches in Iceberg tables is also supported via the `toBranch` API in `FlinkSink`
For more information on branches please refer to [branches](../branching.md).
For more information on branches please refer to [branches](branching.md).
```java
FlinkSink.forRowData(input)
.tableLoader(tableLoader)
Expand Down
Loading