Skip to content

feat(snowflake): T1.2 data types — TypeName struct + parseDataType#39

Open
h3n4l wants to merge 3 commits intomainfrom
feat/snowflake/datatypes
Open

feat(snowflake): T1.2 data types — TypeName struct + parseDataType#39
h3n4l wants to merge 3 commits intomainfrom
feat/snowflake/datatypes

Conversation

@h3n4l
Copy link
Copy Markdown
Member

@h3n4l h3n4l commented Apr 10, 2026

Summary

Second Tier 1 node of the snowflake parser migration (BYT-9000): data type parsing covering every form from the legacy data_type grammar rule. Adds TypeName Node struct with TypeKind enum (22 values) to snowflake/ast, plus parseDataType() parser helper to snowflake/parser.

Depends on T1.1 identifiers (PR #22, merged). Unblocks T1.3 (expressions — needs type parsing for CAST/TRY_CAST and :: operator).

Design highlights

  • Single TypeName struct with Kind TypeKind enum for fast dispatch + Name string for round-tripping. Covers INT through VECTOR with 22 classified categories. No per-type-family struct bloat.
  • Params []int for numeric parameters (NUMBER(38,0), VARCHAR(100), TIME(9)). Nil = no params; zero IS a valid value (TIME(0)).
  • ElementType *TypeName for recursive ARRAY and VECTOR element types. Walker descends into it.
  • Multi-word types fused via parser lookahead: DOUBLE PRECISION, CHAR VARYING, NCHAR VARYING. F2 emits them as separate tokens; the parser peeks and combines.
  • F2 keyword additions: 3 new constants + 6 map entries for timestamp variants (TIMESTAMP_LTZ/NTZ/TZ with both underscore and no-underscore forms).

Test plan

  • CI green on snowflake/ast/ and snowflake/parser/
  • Walker regen produces 2 cases / 2 child fields (File + TypeName)
  • All 22 TypeKind values have String() coverage
  • Every data type from the legacy grammar parses correctly (60+ parser subtests)
  • TIMESTAMPLTZ (no underscore) recognized alongside TIMESTAMP_LTZ

🤖 Generated with Claude Code

h3n4l and others added 3 commits April 10, 2026 17:22
Single TypeName struct with TypeKind enum (22 values) covering every
Snowflake data type from the legacy grammar's data_type rule. Numeric
params via []int slice, recursive ElementType for ARRAY/VECTOR, and
VectorDim for VECTOR(type, dim).

Requires small F2 scope creep: 3 new keyword constants
(kwTIMESTAMP_LTZ/NTZ/TZ) + 6 keyword map entries (with/without
underscore aliases). Multi-word types (DOUBLE PRECISION, CHAR VARYING,
NCHAR VARYING) handled via parser lookahead, not F2 changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
6-task plan: F2 timestamp variant keywords (3 constants + 6 map
entries), TypeKind enum (22 values) + TypeName struct in ast, AST
unit tests, parseDataType dispatch (all type forms from legacy
grammar), parser tests (16 categories), acceptance sweep.

Handles multi-word types (DOUBLE PRECISION, CHAR/NCHAR VARYING) via
parser lookahead, ARRAY(element_type) via recursive parse, and
VECTOR(type, dim) via restricted element-type rule. Walker regen
produces a real *TypeName case with ElementType child walk.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second Tier 1 node. Adds data type parsing covering every form from the
legacy SnowflakeParser.g4 data_type rule.

AST types (snowflake/ast):

  type TypeKind int  // 22 values: TypeInt through TypeVector
  type TypeName struct {
      Kind        TypeKind
      Name        string      // source text for round-tripping
      Params      []int       // numeric params: [precision] or [precision, scale]
      ElementType *TypeName   // ARRAY element / VECTOR element type
      VectorDim   int         // VECTOR dimensions; -1 for non-VECTOR
      Loc         Loc
  }

TypeName is a Node (T_TypeName tag). The walker descends into
ElementType when non-nil (verified by TestTypeName_WalkerVisitsElementType).

Parser helpers (snowflake/parser/datatypes.go):

  parseDataType()          — full dispatch: 30+ keyword cases
  parseOptionalTypeParams()— optional (n) or (n, m)
  parseTimestampType()     — shared handler for TIMESTAMP variants
  parseVectorElementType() — restricted to INT/INTEGER/FLOAT/FLOAT4/FLOAT8
  ParseDataType(string)    — freestanding exported helper

Multi-word types (DOUBLE PRECISION, CHAR VARYING, NCHAR VARYING) are
fused via parser lookahead — F2 emits them as separate tokens (kwDOUBLE
+ tokIdent("PRECISION")), the parser peeks and combines.

F2 keyword additions (scope creep, 3 constants + 6 map entries):
  kwTIMESTAMP_LTZ, kwTIMESTAMP_NTZ, kwTIMESTAMP_TZ
  Both underscore and no-underscore forms in keywordMap
  (TIMESTAMP_LTZ and TIMESTAMPLTZ both accepted)

Tests: 25 AST subtests (TypeKind.String, Tag, walker traversal) +
60+ parser subtests (all integer types, NUMBER with params, float
aliases, DOUBLE PRECISION, CHAR/NCHAR VARYING, all string types,
binary, timestamp variants incl. no-underscore forms, TIME/DATETIME
with precision, BOOLEAN, semi-structured, ARRAY untyped/typed/nested,
VECTOR, error cases, freestanding helper, Loc spanning).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant