Coverage Map for Korean Public Data
pub-api-client is not just a set of API wrappers.
It is agent infrastructure for mapping, classifying, and automating Korea's public data catalog.
The denominator is the full data.go.kr catalog, not only the APIs already implemented in this repo.
The target is the whole asset surface:
openapi: assets with callable API endpointsfile_data: datasets distributed as downloadable fileslink_api: assets that point to an external institution APIexternal_download: assets that point to an institution-managed download pageunknown: catalog entries not yet classified
The goal is to make coverage visible:
- what can be discovered automatically
- what can be scraped into specs
- what can be code-generated into clients
- what still requires manual handling
Public data on data.go.kr does not come in one shape.
Some entries expose OpenAPI pages with embedded Swagger.
Some expose AJAX-loaded operation details.
Some only provide downloadable files.
Some are just links to external institution systems.
A binary supported / unsupported view is too weak for this landscape.
This project treats coverage as a staged automation map.
Each catalog asset should eventually be tracked along three axes.
openapifile_datalink_apiexternal_downloadunknown
swagger: spec can be extracted directly from embedded Swagger JSONajax: operation details can be reconstructed from portal AJAX responsesdownload: only downloadable artifacts are availablelink: portal points to an external service or institution URLmanual: requires custom parsing or human intervention
cataloged: asset is indexed with metadataspec_extracted: machine-readable spec or structured descriptor existscode_generated: Python client or adapter has been generatedruntime_verified: real request path has been validated
Coverage is therefore not a yes/no claim. It is the current automation stage of a public data asset.
The repo already contains the first working pieces of this vision.
tools/scraper.pyextracts API specs fromdata.go.krpagestools/codegen.pygenerates Python clients from normalized specssrc/pub_api_client/core/provides shared request, response, error, and pagination logicspecs/*.jsonstores normalized API descriptors
Today the implemented extraction paths are:
swaggerajax
The next expansion is to treat download-only and link-based assets as first-class inventory entries instead of out-of-scope exceptions.
Korean public data is fragmented at the interface level. The problem is not only calling APIs. The real problem is knowing which assets are automatable, which are partially automatable, and which need a different ingestion path.
This repo exists to answer that question systematically.
- Build a catalog-wide inventory of
data.go.krassets. - Classify each asset into
openapi,file_data,link_api,external_download, orunknown. - Extract specs for assets that expose automatable interfaces.
- Generate Python clients or adapters from extracted specs.
- Verify runtime behavior where direct calling is possible.
- Publish static coverage reports in
JSONandMarkdown.
Current examples in this repo show the basic pipeline:
- API detail page -> normalized spec JSON
- normalized spec JSON -> generated client
- generated client -> shared runtime for request/response handling
This is the seed of a broader coverage system, not the final product.
- Treat the portal catalog as the source of truth for coverage.
- Track partial automation explicitly instead of hiding it.
- Keep
datasetandderived APIas separate but linked assets. - Track
LINK-style APIs separately instead of pretending they are native OpenAPI support. - Prefer static, inspectable artifacts before building dashboards.
- catalog inventory manifest
- coverage classification manifest
- static coverage report generator
- README-linked project status view
Project metadata lives in pyproject.toml.
The package targets Python >=3.10.
Code generation:
python tools/codegen.py specs/15094808.json
python tools/codegen.py --allSpec scraping:
python tools/scraper.py 15094808
python tools/scraper.py --list apis.txt