pub-api-client

Coverage Map for Korean Public Data

pub-api-client is not just a set of API wrappers. It is agent infrastructure for mapping, classifying, and automating Korea's public data catalog.

The denominator is the full data.go.kr catalog, not only the APIs already implemented in this repo. The target is the whole asset surface:

openapi: assets with callable API endpoints
file_data: datasets distributed as downloadable files
link_api: assets that point to an external institution API
external_download: assets that point to an institution-managed download page
unknown: catalog entries not yet classified

The goal is to make coverage visible:

what can be discovered automatically
what can be scraped into specs
what can be code-generated into clients
what still requires manual handling

Vision

Public data on data.go.kr does not come in one shape. Some entries expose OpenAPI pages with embedded Swagger. Some expose AJAX-loaded operation details. Some only provide downloadable files. Some are just links to external institution systems.

A binary supported / unsupported view is too weak for this landscape. This project treats coverage as a staged automation map.

Coverage Model

Each catalog asset should eventually be tracked along three axes.

1. Asset Type

openapi
file_data
link_api
external_download
unknown

2. Discovery / Extraction Method

swagger: spec can be extracted directly from embedded Swagger JSON
ajax: operation details can be reconstructed from portal AJAX responses
download: only downloadable artifacts are available
link: portal points to an external service or institution URL
manual: requires custom parsing or human intervention

3. Automation Stage

cataloged: asset is indexed with metadata
spec_extracted: machine-readable spec or structured descriptor exists
code_generated: Python client or adapter has been generated
runtime_verified: real request path has been validated

Coverage is therefore not a yes/no claim. It is the current automation stage of a public data asset.

What Exists Today

The repo already contains the first working pieces of this vision.

tools/scraper.py extracts API specs from data.go.kr pages
tools/codegen.py generates Python clients from normalized specs
src/pub_api_client/core/ provides shared request, response, error, and pagination logic
specs/*.json stores normalized API descriptors

Today the implemented extraction paths are:

swagger
ajax

The next expansion is to treat download-only and link-based assets as first-class inventory entries instead of out-of-scope exceptions.

Why This Matters

Korean public data is fragmented at the interface level. The problem is not only calling APIs. The real problem is knowing which assets are automatable, which are partially automatable, and which need a different ingestion path.

This repo exists to answer that question systematically.

Roadmap

Build a catalog-wide inventory of data.go.kr assets.
Classify each asset into openapi, file_data, link_api, external_download, or unknown.
Extract specs for assets that expose automatable interfaces.
Generate Python clients or adapters from extracted specs.
Verify runtime behavior where direct calling is possible.
Publish static coverage reports in JSON and Markdown.

Current Status

Current examples in this repo show the basic pipeline:

API detail page -> normalized spec JSON
normalized spec JSON -> generated client
generated client -> shared runtime for request/response handling

This is the seed of a broader coverage system, not the final product.

Design Principles

Treat the portal catalog as the source of truth for coverage.
Track partial automation explicitly instead of hiding it.
Keep dataset and derived API as separate but linked assets.
Track LINK-style APIs separately instead of pretending they are native OpenAPI support.
Prefer static, inspectable artifacts before building dashboards.

Near-Term Deliverables

catalog inventory manifest
coverage classification manifest
static coverage report generator
README-linked project status view

Development

Project metadata lives in pyproject.toml. The package targets Python >=3.10.

Code generation:

python tools/codegen.py specs/15094808.json
python tools/codegen.py --all

Spec scraping:

python tools/scraper.py 15094808
python tools/scraper.py --list apis.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
catalog		catalog
specs		specs
src/pub_api_client		src/pub_api_client
tests		tests
tools		tools
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
test_fetch.py		test_fetch.py
test_stock.py		test_stock.py
test_stock_pagination.py		test_stock_pagination.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pub-api-client

Vision

Coverage Model

1. Asset Type

2. Discovery / Extraction Method

3. Automation Stage

What Exists Today

Why This Matters

Roadmap

Current Status

Design Principles

Near-Term Deliverables

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pub-api-client

Vision

Coverage Model

1. Asset Type

2. Discovery / Extraction Method

3. Automation Stage

What Exists Today

Why This Matters

Roadmap

Current Status

Design Principles

Near-Term Deliverables

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages