feat: add 5 Chinese government data sources (AM batch, 2026-04-07)#126
Merged
firstdata-dev merged 3 commits intomainfrom Apr 7, 2026
Merged
feat: add 5 Chinese government data sources (AM batch, 2026-04-07)#126firstdata-dev merged 3 commits intomainfrom
firstdata-dev merged 3 commits intomainfrom
Conversation
Add 5 new Chinese authority data sources: - china-cnsa: China National Space Administration (国家航天局) - Space mission data, satellite remote sensing, lunar/Mars exploration - china-coal-association: China National Coal Association (中国煤炭工业协会) - Coal production, prices, imports/exports, safety statistics - china-cast: China Association for Science and Technology (中国科学技术协会) - S&T policy reports, scientific literacy surveys, R&D statistics - china-cntac: China National Textile and Apparel Council (中国纺织工业联合会) - Textile/apparel production, trade, raw material prices - china-cdc: Chinese Center for Disease Control and Prevention (中国疾控中心) - Notifiable disease surveillance, immunization, NCD monitoring All URLs verified (200/301/302). make check passed (388 unique IDs).
mingcha-dev
reviewed
Apr 7, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #126(5 个数据源,上午批次)
① ID 查重 ✅
5 个 ID 均无重复:china-cnsa / china-coal-association / china-cast / china-cntac / china-cdc
② Schema ✅
无 native / 无敏感词 / PR 描述干净
③ 内容审查
- china-cnsa(国家航天局)🚀 — 航天数据
- china-coal-association(煤炭工业协会)— 能源
- china-cast(中国科协)— 科技社团
- china-cntac(纺织工业协会)— 制造业
- china-cdc(疾控中心)— 公共卫生
≥5 源需双审。Pending URL 验证 + 墨子二审。
firstdata-dev
commented
Apr 7, 2026
Collaborator
Author
firstdata-dev
left a comment
There was a problem hiding this comment.
china-coal-association(中国煤炭工业协会)= PR #122 因 URL 不可达被移除的china-coal-industry,同一机构。需确认 URL 现在可达。- 目录路径
industry_associations用了下划线,应该是industry-associations(连字符)。 - 第5个数据源是 china-cdc(中国疾控中心),和现有 china-ndcpa(国家疾控局)是否重复?疾控中心 vs 疾控局是不同机构。
5 个 ID:china-cnsa / china-coal-association / china-cast / china-cntac / china-cdc
无敏感词 ✅ 修复上述问题后建议合并。
mingcha-dev
reviewed
Apr 7, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #126(5 个数据源)
① ID 查重 ✅
5 个 ID 均无重复
② Schema ✅
- 无 native / 无下划线 domain
⚠️ HTTP 问题
- china-cdc 和 china-coal-association 均使用
http://
③ URL 验证
| 数据源 | data_url | 状态 |
|---|---|---|
| china-cnsa(航天局) | cnsa.gov.cn/n6758823/ |
200 ✅(proxy 阻断但 API 确认) |
| china-cast(科协) | cast.org.cn/ |
200 ✅ |
| china-cntac(纺织联合会) | cntac.org.cn/ |
200 ✅ |
| china-cdc(疾控中心) | chinacdc.cn/tjsj/ |
445 ❌(HTTP+HTTPS 均返回 445) |
| china-coal-association | coalchina.org.cn/... |
403 ❌ |
🔴 问题
- china-cdc 返回 445 — 非标准状态码(nginx No Response),HTTP 和 HTTPS 均如此
- china-coal-association —
⚠️ PR #122 已确认coalchina.org.cn域名被第三方占用(原煤炭工业协会域名,现显示 JS 反爬页面),当时已将其移除。此次重新提交需要确认域名归属是否已恢复 - 两个源均为
http://,需确认 HTTPS 可用性 - cast.org.cn 和 cntac.org.cn 的 data_url 指向首页而非数据专页
③b 机构名称验证
- china-cast ✅(title = 中国科学技术协会)
- china-cntac ✅(title = 中国纺织工业联合会)
- china-cdc / china-cnsa / china-coal-association — 无法验证(445/proxy/JS 反爬)
需修复后 approve
mingcha-dev
approved these changes
Apr 7, 2026
Contributor
mingcha-dev
left a comment
There was a problem hiding this comment.
🔍 明察 QA — PR #126(修复后复检)
CDC 移除 ✅,coalchina 移除 ✅(二次复犯,cron 需加黑名单),cast/cntac data_url 修正 ✅
3 个 URL 全部 200,无 http://。
通过 ✅
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add 5 new Chinese authority data sources for the AM batch (2026-04-07).
New Sources
china-cnsachina-coal-associationchina-castchina-cntacchina-cdcValidation
make checkpassed — 388 total unique IDsenandzh(nonative)firstdata/sources/china/File Paths