Skip to content

feat: add ArkTS (.ets) language support with tree-sitter parser#791

Open
Happy-26 wants to merge 9 commits into
DeusData:mainfrom
Happy-26:feature/arkts-support
Open

feat: add ArkTS (.ets) language support with tree-sitter parser#791
Happy-26 wants to merge 9 commits into
DeusData:mainfrom
Happy-26:feature/arkts-support

Conversation

@Happy-26

@Happy-26 Happy-26 commented Jul 3, 2026

Copy link
Copy Markdown

Summary

Add ArkTS (HarmonyOS) language support to codebase-memory-mcp, enabling full AST-based indexing of .ets files via the tree-sitter-arkts parser.

Resolves #760

Background

ArkTS is TypeScript-like language used for HarmonyOS application development (file extension .ets). It extends TypeScript with decorators (@Component, @Entry, @Prop, etc.), component declarations, and UI builder methods (build()).

Changes

Language Registration

  • Add CBM_LANG_ARKTS enum in cbm.h
  • Add .etsCBM_LANG_ARKTS mapping in language.c
  • Add ArkTS display name configuration

Parser Integration

  • Add grammar_arkts.c wrapping the vendored tree-sitter-arkts parser
  • Add vendored/grammars/arkts/parser.c and tree_sitter/parser.h

Node Type Configuration (lang_specs.c)

Define ArkTS-specific node types matching the actual tree-sitter-arkts parser output:

  • Function types: function_declaration, function_expression, arrow_function, method_declaration, constructor_declaration, build_method, decorated_function_declaration, ui_builder_arrow_function
  • Class types: class_declaration, enum_declaration, interface_declaration, type_declaration, component_declaration, decorated_export_declaration

Name Extraction Fallback (extract_defs.c)

The tree-sitter-arkts parser does not define a name field in its AST node field map, causing ts_node_child_by_field_name(node, "name") to return null. To handle this:

  • Add cbm_find_child_by_kind fallback in cbm_resolve_func_name for ArkTS function_declaration and decorated_function_declaration
  • Synthesize "build" name for build_method nodes (these have no name child at all)
  • Add ArkTS identifier fallback in extract_class_def and compute_class_qn
  • Add component_body to class body traversal (find_class_body, push_class_body_children)
  • Add ArkTS method name fallback in resolve_method_name
  • Add build_method synthesis in extract_class_methods
  • Include CBM_LANG_ARKTS in walk_defs descend_into_func check
  • Add Component label for component_declaration and decorated_export_declaration in class_label_for_kind

Verification

Indexed a HarmonyOS project (~40 .ets files) with results:

  • 888 nodes, 2253 edges (previously 0 Function/Class/Method nodes without the fallback logic)
  • Correctly extracts Components, classes, functions, methods, enums, interfaces
  • build_method inside @Component structs is recognized as a Method with synthesized name "build"

Also verified no regression on the codebase-memory-mcp project itself: 12,389 nodes, 58,847 edges.

Note on tree-sitter-arkts

The root cause of the fallback logic is that tree-sitter-arkts lacks field('name', ...) mappings in its grammar. I've planned to submit a PR to Million-mo/tree-sitter-arkts to add proper field definitions. If that PR is merged, the fallback logic in this PR will remain as a harmless safety net for older parser versions.

Vendored Grammar Provenance

Per the vendored-grammar review requirements, the following details are provided for audit:

Item Detail
Upstream source Million-mo/tree-sitter-arkts — community grammar, not tracked by nvim-treesitter or Helix registries (verdict: COMMUNITY)
Pinned commit 2fd0ad75e2d8 (upstream master HEAD as of 2025-10-27)
License MIT — declared in grammar.js header (@license MIT, @author million) and package.json ("license": "MIT", "author": {"name": "million"}). Upstream ships no standalone LICENSE file; the vendored LICENSE is reconstructed from this declaration, copyright (c) 2024 million
Generated parser parser.c generated by tree-sitter CLI v0.25.3 (per file header /* Automatically generated by tree-sitter v0.25.3 */), ABI 15
Vendored files parser.c (149,394 lines), tree_sitter/parser.h — no scanner.c (grammar has EXTERNAL_TOKEN_COUNT 0)
Static security Reviewed the vendored C surface only (parser.c + tree_sitter/parser.h); no system()/popen()/fork()/network calls; no package manager hooks, workflow files, prompt/agent instruction files, or generated lockfiles were vendored
Registry status Not in nvim-treesitter or Helix — this is a COMMUNITY grammar. The pinned commit is upstream master HEAD (not a registry-pinned SHA) because no registry tracks this grammar

Happy-26 added 2 commits July 3, 2026 08:52
- Add CBM_LANG_ARKTS enum to cbm.h

- Add ArkTS node type definitions to lang_specs.c

- Add language spec entry for ArkTS

- Create grammar_arkts.c wrapper

- Vendor tree-sitter-arkts parser (MIT License)

- Map .ets extension to ArkTS language

- Add ArkTS display name to LANG_NAMES

Signed-off-by: Happy-26 <488127311@qq.com>
The tree-sitter-arkts parser does not define a 'name' field in its AST
node field map, causing ts_node_child_by_field_name(node, "name") to
return null for all ArkTS nodes. This resulted in zero Function/Class/
Method definitions being extracted from .ets files.

Changes in extract_defs.c:
- Add cbm_find_child_by_kind fallback in cbm_resolve_func_name for
  ArkTS function_declaration and decorated_function_declaration
- Synthesize "build" name for build_method nodes (no name child)
- Add ArkTS identifier fallback in extract_class_def and compute_class_qn
- Add component_body to class body traversal (find_class_body,
  push_class_body_children)
- Add ArkTS method name fallback in resolve_method_name
- Add build_method synthesis in extract_class_methods
- Include CBM_LANG_ARKTS in walk_defs descend_into_func check
- Add Component label for component_declaration and
  decorated_export_declaration in class_label_for_kind

Changes in lang_specs.c:
- Replace non-existent node types with actual tree-sitter-arkts output:
  add decorated_function_declaration, ui_builder_arrow_function,
  type_declaration, decorated_export_declaration
- Remove types not produced by the parser: generator_function_declaration,
  method_definition, class, abstract_class_declaration,
  type_alias_declaration, internal_module

Signed-off-by: Happy-26 <488127311@qq.com>
@Happy-26 Happy-26 requested a review from DeusData as a code owner July 3, 2026 03:45
@Happy-26 Happy-26 force-pushed the feature/arkts-support branch 2 times, most recently from 46fe7fb to 1804659 Compare July 3, 2026 05:41
- Add arkts entry to MANIFEST.md vendored grammar table with pinned
  commit and COMMUNITY verdict (not in nvim-treesitter/Helix registries)
- Add arkts to Custom extraction handling table in MANIFEST.md
- Update grammar count from 159 to 160
- Apply clang-format to extract_defs.c and lang_specs.c

Signed-off-by: Happy-26 <488127311@qq.com>
@Happy-26 Happy-26 force-pushed the feature/arkts-support branch from 1804659 to 1f643d4 Compare July 3, 2026 05:43
Upstream Million-mo/tree-sitter-arkts declares MIT in grammar.js
header but ships no LICENSE file, causing the audit script to return
ERROR. Added arkts to SPECIAL_NOTICE with PROVENANCE-NOTICE verdict.

Signed-off-by: Happy-26 <488127311@qq.com>
@Happy-26 Happy-26 force-pushed the feature/arkts-support branch from 20fbc18 to bb99141 Compare July 3, 2026 06:00
Happy-26 and others added 2 commits July 3, 2026 14:00
…s label

The previous commit added a global `component_declaration` → "Component"
mapping in `class_label_for_kind()`. This was intended for ArkTS
(`@Component struct Foo`), but it also affected the `templ` grammar,
whose `component_declaration` node represents a Go templating function
and must keep the "Class" label. This caused the `grammar_label_goldens`
test to fail:

  [LABEL] templ MISMATCH golden=[Class:1,Module:1] actual=[Component:1,Module:1]

Move the `component_declaration` / `decorated_export_declaration` →
"Component" mapping out of the global `class_label_for_kind()` and into
a CBM_LANG_ARKTS-scoped override block in `extract_class_def()`, next to
the existing Sway/WGSL and Rust/Swift/D language-scoped label overrides.

This restores templ's golden snapshot while keeping ArkTS components
labeled as "Component".

Refs DeusData#760

Signed-off-by: Happy-26 <488127311@qq.com>
@DeusData DeusData added enhancement New feature or request parsing/quality Graph extraction bugs, false positives, missing edges language-request Request for new language support priority/backlog Valuable contribution, lower scheduling urgency; review when maintainer capacity opens. labels Jul 3, 2026
@DeusData

DeusData commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Thanks for adding ArkTS support. Triage: language-support PR for #760, but this needs the full vendored-grammar review path before it can move forward.

Because this vendors a parser and changes provenance tooling, review must verify upstream source, exact pinned commit, license compatibility, generated parser provenance, and static security of vendored sources. Please make sure the PR description includes those details and that the vendored grammar can be audited without guessing.

Happy-26 added a commit to Happy-26/codebase-memory-mcp that referenced this pull request Jul 3, 2026
- Fix copyright holder in LICENSE: "aspect-ux" → "million"
  (per grammar.js @author million and package.json author.name)
- Fix URL in lang_specs.c: aspect-ux/tree-sitter-arkts → Million-mo/tree-sitter-arkts
- Update MANIFEST.md arkts entry: correct copyright, add security review
  statement (no scanner.c vendored, no hooks/workflows vendored)
- Update audit-license-provenance.py SPECIAL_NOTICE to match

Addresses vendored-grammar review requirements from PR DeusData#791 feedback.
@Happy-26

Happy-26 commented Jul 3, 2026

Copy link
Copy Markdown
Author

Thanks for adding ArkTS support. Triage: language-support PR for #760, but this needs the full vendored-grammar review path before it can move forward.

Because this vendors a parser and changes provenance tooling, review must verify upstream source, exact pinned commit, license compatibility, generated parser provenance, and static security of vendored sources. Please make sure the PR description includes those details and that the vendored grammar can be audited without guessing.

Thanks for the triage. I've pushed a fix (commit 35d521b) addressing the vendored-grammar review requirements:

  1. Upstream source: Million-mo/tree-sitter-arkts — community grammar (not in nvim-treesitter/Helix registries, verdict: COMMUNITY)

  2. Pinned commit: 2fd0ad75e2d8 (upstream master HEAD, 2025-10-27)

  3. License: MIT — declared in grammar.js header (@license MIT, @author million) and package.json ("license": "MIT", "author": {"name": "million"}). Upstream ships no standalone LICENSE file; the vendored LICENSE is reconstructed from this declaration. Fixed: the copyright holder was incorrectly listed as "aspect-ux" — corrected to "million" per grammar.js @author million.

  4. Generated parser provenance: parser.c generated by tree-sitter CLI v0.25.3, ABI 15. No scanner.c (grammar has EXTERNAL_TOKEN_COUNT 0).

  5. Static security: Vendored C surface reviewed (parser.c + tree_sitter/parser.h); no system()/popen()/fork()/network calls; no package manager hooks or workflow files vendored. Security review statement added to MANIFEST.md.

All details are documented in the updated PR description (Vendored Grammar Provenance section) and the MANIFEST.md entry. Also fixed an incorrect URL in lang_specs.c that pointed to a non-existent aspect-ux/tree-sitter-arkts repo.

- Fix copyright holder in LICENSE: "aspect-ux" → "million"
  (per grammar.js @author million and package.json author.name)
- Fix URL in lang_specs.c: aspect-ux/tree-sitter-arkts → Million-mo/tree-sitter-arkts
- Update MANIFEST.md arkts entry: correct copyright, add security review
  statement (no scanner.c vendored, no hooks/workflows vendored)
- Update audit-license-provenance.py SPECIAL_NOTICE to match

Addresses vendored-grammar review requirements from PR DeusData#791 feedback.

Signed-off-by: Happy-26 <488127311@qq.com>
@Happy-26 Happy-26 force-pushed the feature/arkts-support branch from 35d521b to 6c52254 Compare July 4, 2026 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request language-request Request for new language support parsing/quality Graph extraction bugs, false positives, missing edges priority/backlog Valuable contribution, lower scheduling urgency; review when maintainer capacity opens.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add support for ArkTS (.ets) / HarmonyOS application development

2 participants