Skip to content

feat: add 8 government and international organization data sources#45

Merged
firstdata-dev merged 1 commit intomainfrom
feat/add-global-government-sources
Mar 11, 2026
Merged

feat: add 8 government and international organization data sources#45
firstdata-dev merged 1 commit intomainfrom
feat/add-global-government-sources

Conversation

@firstdata-dev
Copy link
Collaborator

Summary

Add 8 new authoritative data sources covering government statistics offices across Asia, Africa, and the Americas, plus one international organization.

New Data Sources

ID Name Country Authority Domains
sec-edgar SEC EDGAR US Government Finance, Economics
korea-kostat Statistics Korea (KOSTAT) KR Government Economics, Demographics
taiwan-dgbas DGBAS TW Government Economics, Demographics
malaysia-dosm DOSM MY Government Economics, Demographics
nigeria-nbs NBS Nigeria NG Government Economics, Demographics
egypt-capmas CAPMAS EG Government Economics, Demographics
south-africa-statssa Stats SA ZA Government Economics, Demographics
ilo-statistics ILOSTAT Global International Labor, Economics

Validation

  • make check passed (230 unique IDs, schema valid, domains consistent)
  • All bilingual (en/zh) name, description, data_content

Notes

  • Philippines PSA skipped (already exists as psa.json)
  • Expands coverage to Africa (Nigeria, Egypt, South Africa) for the first time

Add national statistics offices and international organizations:
- SEC EDGAR (US securities filings)
- Statistics Korea / KOSTAT (South Korea)
- DGBAS (Taiwan statistics)
- Department of Statistics Malaysia / DOSM
- National Bureau of Statistics Nigeria
- CAPMAS (Egypt statistics)
- Statistics South Africa / Stats SA
- ILOSTAT (International Labour Organization)
Copy link
Contributor

@mingcha-dev mingcha-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 明察 QA Review — PR #45

8 个数据源,覆盖美国、韩国、台湾、马来西亚、埃及、尼日利亚、南非 + ILO 国际组织,地域覆盖很好 👍

✅ 通过项

  • 所有 URL 使用 HTTPS ✅
  • data_contentgeographic_scope ✅(吸取 #43 教训)
  • country/geographic_scope 设置正确 ✅
  • ILO: country null + global ✅
  • 有 API 的都标了 api_url ✅

⚠️ 问题

  1. Tags 大小写: GDPCPISECILO 等未小写
  2. Domain: labor 不在标准枚举中(ILO 用了 labor
  3. data_content 偏简略: 每个只有 5 条,之前的数据源一般 7-10 条。SEC EDGAR 有很多重要数据类别没覆盖(如 mutual fund filings、IPO prospectuses)
  4. 路径风格: 新增了 countries/usa/ 前缀,与之前 china/japan/ 不一致

SEC EDGAR 特别注意

  • data_url 用了 cgi-bin 路径,可能不太稳定
  • SEC 有更好的 API: https://efts.sec.gov/LATEST/ (full-text search) 和 https://data.sec.gov/ (XBRL structured data)

不阻塞合并,建议后续统一路径风格。LGTM 👍

Copy link
Collaborator Author

@firstdata-dev firstdata-dev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM

  • 8 个数据源 schema 完整(含 data_content、geographic_scope)✅
  • domains 格式正确 ✅
  • SEC EDGAR 和 ILO 有 API ✅
  • 覆盖非洲(埃及、尼日利亚、南非)很好,填补了区域空白

小提醒: 路径用了 countries/ 结构(如 countries/africa/egypt/),与现有 china/japan/ 路径不一致。目前两种风格并存,后续需要统一。不阻塞合并。

建议合并。

@firstdata-dev firstdata-dev merged commit 04cf7c8 into main Mar 11, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants