Skip to content

StatPan/pub-api-client

Repository files navigation

pub-api-client

Coverage Map for Korean Public Data

pub-api-client is not just a set of API wrappers. It is agent infrastructure for mapping, classifying, and automating Korea's public data catalog.

The denominator is the full data.go.kr catalog, not only the APIs already implemented in this repo. The target is the whole asset surface:

  • openapi: assets with callable API endpoints
  • file_data: datasets distributed as downloadable files
  • link_api: assets that point to an external institution API
  • external_download: assets that point to an institution-managed download page
  • unknown: catalog entries not yet classified

The goal is to make coverage visible:

  • what can be discovered automatically
  • what can be scraped into specs
  • what can be code-generated into clients
  • what still requires manual handling

Vision

Public data on data.go.kr does not come in one shape. Some entries expose OpenAPI pages with embedded Swagger. Some expose AJAX-loaded operation details. Some only provide downloadable files. Some are just links to external institution systems.

A binary supported / unsupported view is too weak for this landscape. This project treats coverage as a staged automation map.

Coverage Model

Each catalog asset should eventually be tracked along three axes.

1. Asset Type

  • openapi
  • file_data
  • link_api
  • external_download
  • unknown

2. Discovery / Extraction Method

  • swagger: spec can be extracted directly from embedded Swagger JSON
  • ajax: operation details can be reconstructed from portal AJAX responses
  • download: only downloadable artifacts are available
  • link: portal points to an external service or institution URL
  • manual: requires custom parsing or human intervention

3. Automation Stage

  • cataloged: asset is indexed with metadata
  • spec_extracted: machine-readable spec or structured descriptor exists
  • code_generated: Python client or adapter has been generated
  • runtime_verified: real request path has been validated

Coverage is therefore not a yes/no claim. It is the current automation stage of a public data asset.

What Exists Today

The repo already contains the first working pieces of this vision.

  • tools/scraper.py extracts API specs from data.go.kr pages
  • tools/codegen.py generates Python clients from normalized specs
  • src/pub_api_client/core/ provides shared request, response, error, and pagination logic
  • specs/*.json stores normalized API descriptors

Today the implemented extraction paths are:

  • swagger
  • ajax

The next expansion is to treat download-only and link-based assets as first-class inventory entries instead of out-of-scope exceptions.

Why This Matters

Korean public data is fragmented at the interface level. The problem is not only calling APIs. The real problem is knowing which assets are automatable, which are partially automatable, and which need a different ingestion path.

This repo exists to answer that question systematically.

Roadmap

  1. Build a catalog-wide inventory of data.go.kr assets.
  2. Classify each asset into openapi, file_data, link_api, external_download, or unknown.
  3. Extract specs for assets that expose automatable interfaces.
  4. Generate Python clients or adapters from extracted specs.
  5. Verify runtime behavior where direct calling is possible.
  6. Publish static coverage reports in JSON and Markdown.

Current Status

Current examples in this repo show the basic pipeline:

  • API detail page -> normalized spec JSON
  • normalized spec JSON -> generated client
  • generated client -> shared runtime for request/response handling

This is the seed of a broader coverage system, not the final product.

Design Principles

  • Treat the portal catalog as the source of truth for coverage.
  • Track partial automation explicitly instead of hiding it.
  • Keep dataset and derived API as separate but linked assets.
  • Track LINK-style APIs separately instead of pretending they are native OpenAPI support.
  • Prefer static, inspectable artifacts before building dashboards.

Near-Term Deliverables

  • catalog inventory manifest
  • coverage classification manifest
  • static coverage report generator
  • README-linked project status view

Development

Project metadata lives in pyproject.toml. The package targets Python >=3.10.

Code generation:

python tools/codegen.py specs/15094808.json
python tools/codegen.py --all

Spec scraping:

python tools/scraper.py 15094808
python tools/scraper.py --list apis.txt

About

공공데이터포털(data.go.kr) 커버리지 매핑 & 자동화 시스템

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors