feat: add ArkTS (.ets) language support with tree-sitter parser#791
feat: add ArkTS (.ets) language support with tree-sitter parser#791Happy-26 wants to merge 9 commits into
Conversation
- Add CBM_LANG_ARKTS enum to cbm.h - Add ArkTS node type definitions to lang_specs.c - Add language spec entry for ArkTS - Create grammar_arkts.c wrapper - Vendor tree-sitter-arkts parser (MIT License) - Map .ets extension to ArkTS language - Add ArkTS display name to LANG_NAMES Signed-off-by: Happy-26 <488127311@qq.com>
The tree-sitter-arkts parser does not define a 'name' field in its AST node field map, causing ts_node_child_by_field_name(node, "name") to return null for all ArkTS nodes. This resulted in zero Function/Class/ Method definitions being extracted from .ets files. Changes in extract_defs.c: - Add cbm_find_child_by_kind fallback in cbm_resolve_func_name for ArkTS function_declaration and decorated_function_declaration - Synthesize "build" name for build_method nodes (no name child) - Add ArkTS identifier fallback in extract_class_def and compute_class_qn - Add component_body to class body traversal (find_class_body, push_class_body_children) - Add ArkTS method name fallback in resolve_method_name - Add build_method synthesis in extract_class_methods - Include CBM_LANG_ARKTS in walk_defs descend_into_func check - Add Component label for component_declaration and decorated_export_declaration in class_label_for_kind Changes in lang_specs.c: - Replace non-existent node types with actual tree-sitter-arkts output: add decorated_function_declaration, ui_builder_arrow_function, type_declaration, decorated_export_declaration - Remove types not produced by the parser: generator_function_declaration, method_definition, class, abstract_class_declaration, type_alias_declaration, internal_module Signed-off-by: Happy-26 <488127311@qq.com>
46fe7fb to
1804659
Compare
- Add arkts entry to MANIFEST.md vendored grammar table with pinned commit and COMMUNITY verdict (not in nvim-treesitter/Helix registries) - Add arkts to Custom extraction handling table in MANIFEST.md - Update grammar count from 159 to 160 - Apply clang-format to extract_defs.c and lang_specs.c Signed-off-by: Happy-26 <488127311@qq.com>
1804659 to
1f643d4
Compare
Upstream Million-mo/tree-sitter-arkts declares MIT in grammar.js header but ships no LICENSE file, causing the audit script to return ERROR. Added arkts to SPECIAL_NOTICE with PROVENANCE-NOTICE verdict. Signed-off-by: Happy-26 <488127311@qq.com>
20fbc18 to
bb99141
Compare
…s label The previous commit added a global `component_declaration` → "Component" mapping in `class_label_for_kind()`. This was intended for ArkTS (`@Component struct Foo`), but it also affected the `templ` grammar, whose `component_declaration` node represents a Go templating function and must keep the "Class" label. This caused the `grammar_label_goldens` test to fail: [LABEL] templ MISMATCH golden=[Class:1,Module:1] actual=[Component:1,Module:1] Move the `component_declaration` / `decorated_export_declaration` → "Component" mapping out of the global `class_label_for_kind()` and into a CBM_LANG_ARKTS-scoped override block in `extract_class_def()`, next to the existing Sway/WGSL and Rust/Swift/D language-scoped label overrides. This restores templ's golden snapshot while keeping ArkTS components labeled as "Component". Refs DeusData#760 Signed-off-by: Happy-26 <488127311@qq.com>
|
Thanks for adding ArkTS support. Triage: language-support PR for #760, but this needs the full vendored-grammar review path before it can move forward. Because this vendors a parser and changes provenance tooling, review must verify upstream source, exact pinned commit, license compatibility, generated parser provenance, and static security of vendored sources. Please make sure the PR description includes those details and that the vendored grammar can be audited without guessing. |
- Fix copyright holder in LICENSE: "aspect-ux" → "million" (per grammar.js @author million and package.json author.name) - Fix URL in lang_specs.c: aspect-ux/tree-sitter-arkts → Million-mo/tree-sitter-arkts - Update MANIFEST.md arkts entry: correct copyright, add security review statement (no scanner.c vendored, no hooks/workflows vendored) - Update audit-license-provenance.py SPECIAL_NOTICE to match Addresses vendored-grammar review requirements from PR DeusData#791 feedback.
Thanks for the triage. I've pushed a fix (commit 35d521b) addressing the vendored-grammar review requirements:
All details are documented in the updated PR description (Vendored Grammar Provenance section) and the MANIFEST.md entry. Also fixed an incorrect URL in |
- Fix copyright holder in LICENSE: "aspect-ux" → "million" (per grammar.js @author million and package.json author.name) - Fix URL in lang_specs.c: aspect-ux/tree-sitter-arkts → Million-mo/tree-sitter-arkts - Update MANIFEST.md arkts entry: correct copyright, add security review statement (no scanner.c vendored, no hooks/workflows vendored) - Update audit-license-provenance.py SPECIAL_NOTICE to match Addresses vendored-grammar review requirements from PR DeusData#791 feedback. Signed-off-by: Happy-26 <488127311@qq.com>
35d521b to
6c52254
Compare
Summary
Add ArkTS (HarmonyOS) language support to codebase-memory-mcp, enabling full AST-based indexing of
.etsfiles via the tree-sitter-arkts parser.Resolves #760
Background
ArkTS is TypeScript-like language used for HarmonyOS application development (file extension
.ets). It extends TypeScript with decorators (@Component,@Entry,@Prop, etc.), component declarations, and UI builder methods (build()).Changes
Language Registration
CBM_LANG_ARKTSenum incbm.h.ets→CBM_LANG_ARKTSmapping inlanguage.cParser Integration
grammar_arkts.cwrapping the vendored tree-sitter-arkts parservendored/grammars/arkts/parser.candtree_sitter/parser.hNode Type Configuration (
lang_specs.c)Define ArkTS-specific node types matching the actual tree-sitter-arkts parser output:
function_declaration,function_expression,arrow_function,method_declaration,constructor_declaration,build_method,decorated_function_declaration,ui_builder_arrow_functionclass_declaration,enum_declaration,interface_declaration,type_declaration,component_declaration,decorated_export_declarationName Extraction Fallback (
extract_defs.c)The tree-sitter-arkts parser does not define a
namefield in its AST node field map, causingts_node_child_by_field_name(node, "name")to return null. To handle this:cbm_find_child_by_kindfallback incbm_resolve_func_namefor ArkTSfunction_declarationanddecorated_function_declaration"build"name forbuild_methodnodes (these have no name child at all)extract_class_defandcompute_class_qncomponent_bodyto class body traversal (find_class_body,push_class_body_children)resolve_method_namebuild_methodsynthesis inextract_class_methodsCBM_LANG_ARKTSinwalk_defsdescend_into_funccheckComponentlabel forcomponent_declarationanddecorated_export_declarationinclass_label_for_kindVerification
Indexed a HarmonyOS project (~40
.etsfiles) with results:build_methodinside@Componentstructs is recognized as a Method with synthesized name "build"Also verified no regression on the codebase-memory-mcp project itself: 12,389 nodes, 58,847 edges.
Note on tree-sitter-arkts
The root cause of the fallback logic is that tree-sitter-arkts lacks
field('name', ...)mappings in its grammar. I've planned to submit a PR to Million-mo/tree-sitter-arkts to add proper field definitions. If that PR is merged, the fallback logic in this PR will remain as a harmless safety net for older parser versions.Vendored Grammar Provenance
Per the vendored-grammar review requirements, the following details are provided for audit:
COMMUNITY)2fd0ad75e2d8(upstreammasterHEAD as of 2025-10-27)grammar.jsheader (@license MIT,@author million) andpackage.json("license": "MIT","author": {"name": "million"}). Upstream ships no standalone LICENSE file; the vendoredLICENSEis reconstructed from this declaration, copyright (c) 2024 millionparser.cgenerated by tree-sitter CLI v0.25.3 (per file header/* Automatically generated by tree-sitter v0.25.3 */), ABI 15parser.c(149,394 lines),tree_sitter/parser.h— noscanner.c(grammar hasEXTERNAL_TOKEN_COUNT 0)parser.c+tree_sitter/parser.h); nosystem()/popen()/fork()/network calls; no package manager hooks, workflow files, prompt/agent instruction files, or generated lockfiles were vendoredCOMMUNITYgrammar. The pinned commit is upstreammasterHEAD (not a registry-pinned SHA) because no registry tracks this grammar