Skip to content

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10

Open
lohanidamodar wants to merge 5 commits into
mainfrom
feat/clickhouse-schema-extras-2
Open

feat(clickhouse): add UUID, Decimal, Array/Tuple, UInt8/Int8, raw ORDER BY, rawColumn passthrough#10
lohanidamodar wants to merge 5 commits into
mainfrom
feat/clickhouse-schema-extras-2

Conversation

@lohanidamodar
Copy link
Copy Markdown
Contributor

Summary

Follow-up to #8 — adds the remaining ClickHouse schema features commonly needed in production OLAP workloads, plus a small compiler fix. Base-level features (uuid(), decimal(), tinyInteger(), smallInteger(), defaultRaw()) also map cleanly across MySQL, PostgreSQL, SQLite, and MongoDB.

What's new

UInt8 / Int8 via tinyInteger() and UInt16 / Int16 via smallInteger()

Small integer columns are a natural fit for bounded enumerations, percentage values, and other fields whose value range fits well below 32 bits. Storing them as UInt8 saves 75% of the disk and memory footprint compared to the default UInt32 produced by integer()->unsigned(). ClickHouse emits UInt8/Int8 and UInt16/Int16; MySQL maps to TINYINT/SMALLINT; PostgreSQL to SMALLINT (no TINYINT); SQLite to INTEGER.

$schema->table('events')
    ->bigInteger('id')->primary()
    ->tinyInteger('scroll_depth')->unsigned()
    ->smallInteger('year_offset')
    ->create();

Array(T) and Tuple(...) column types

Array(T) is the canonical ClickHouse type for multi-valued attributes — tags, labels, key/value pairs flattened into parallel arrays — and is the standard way to model nested records in the MergeTree family. Tuple(...) covers fixed-arity composites like geo points and key/value pairs.

use Utopia\Query\Schema\ColumnType;

$schema->table('events')
    ->bigInteger('id')->primary()
    ->array('meta.key', ColumnType::String)
    ->array('meta.value', ColumnType::String)
    ->array('user_ids', ColumnType::BigInteger)->unsigned()
    ->tuple('coords', [ColumnType::Float, ColumnType::Float])
    ->create();

Element types run back through the standard column-type compiler so the parent column's unsigned() and precision flags carry through to the inner type. Nullable(...) wraps the whole Array/Tuple; LowCardinality(...) is rejected on these columns because ClickHouse only permits it on scalar types. ClickHouse-only — calling ->array() or ->tuple() on a different dialect's builder fails at the type level.

decimal(precision, scale)

Fixed-point numeric column type for monetary or precision-sensitive values where binary-floating-point error is unacceptable. ClickHouse emits Decimal(P, S); MySQL/PostgreSQL emit DECIMAL(P, S); SQLite emits NUMERIC(P, S); MongoDB maps to the decimal BSON type. Combines with nullable() exactly as scalar columns do.

$schema->table('orders')
    ->bigInteger('id')->primary()
    ->decimal('amount', precision: 18, scale: 3)
    ->decimal('rate', precision: 5, scale: 4)->nullable()
    ->create();

UUID column type with defaultRaw()

UUIDs are first-class fixed-width identifier types in ClickHouse and PostgreSQL and a 36-character string elsewhere; production schemas commonly use them as primary identifiers with server-generated defaults. Column::defaultRaw(string) emits the expression verbatim after DEFAULT — distinct from default(), which quotes string literals — so callers can attach generateUUIDv4(), gen_random_uuid(), UUID(), now(), CURRENT_TIMESTAMP, and similar dialect-specific server-generated defaults.

$schema->table('events')
    ->uuid('event_id')->defaultRaw('generateUUIDv4()')->primary()
    ->datetime('ts', 3)
    ->create();

uuid() compiles to UUID on ClickHouse and PostgreSQL, CHAR(36) on MySQL, TEXT on SQLite, and the string BSON type on MongoDB. defaultRaw() is on the base Column, so it works on every dialect; it takes precedence over default() when both are set, and rejects empty strings and semicolons.

Raw expressions in ORDER BY

MergeTree ORDER BY clauses routinely include scalar function calls — toDate(ts), cityHash64(...), intHash32(user_id) — to control sparse-index cardinality. orderBy(array) restricts each entry to a plain identifier; orderByRaw(string) accepts the full parenthesised tuple verbatim, mirroring the existing partitionBy(string) convention.

$schema->table('events')
    ->string('tenant')
    ->bigInteger('id')
    ->datetime('ts')
    ->orderByRaw('(`tenant`, toDate(`ts`), `id`)')
    ->create();

Takes precedence over orderBy() when both are set; rejects empty strings and semicolons. ClickHouse-only.

rawColumn() passthrough fix on ClickHouse

Table::rawColumn(string $definition) is the documented escape hatch for column types the typed builder does not yet model. The base Schema::compileCreate() already iterates $table->rawColumnDefs, but the Schema\ClickHouse::compileCreate() override loop did not — so raw fragments registered through the same fluent builder silently disappeared from the generated DDL on ClickHouse only. The fix mirrors the loop in the ClickHouse override (one for-loop).

Out of scope (planned follow-up)

  • Bulk insert formats on Builder\ClickHouse (FORMAT JSONEachRow, RowBinary, TabSeparated, Parquet) — broader surface that touches the builder rather than the schema compiler; deserves its own PR.

Tests

38 new assertions across:

  • ClickHouseTestuuid() with and without defaultRaw(), nullable wrapping, defaultRaw() precedence and validation, tinyInteger()/smallInteger() (signed and unsigned), decimal() with nullable(), array(T) with String/UInt64/nullable wrapping, LowCardinality rejection on Array, tuple() with empty-list validation, orderByRaw() with mixed function calls, orderByRaw() precedence and validation, rawColumn() passthrough through compileCreate().
  • MySQLTest, PostgreSQLTest, SQLiteTesttinyInteger/smallInteger/decimal/uuid cross-dialect mappings; defaultRaw() rendered correctly alongside NOT NULL/PRIMARY KEY; decimal() precision/scale validation.
  • MongoDBTestdecimal/tinyInteger/uuid BSON type mappings.

All gates green: composer test, composer lint, composer check (PHPStan level max).

`rawColumn()` is the documented escape hatch for emitting dialect-specific
column types the typed builder does not yet model. The base
`Schema::compileCreate()` already iterates `$table->rawColumnDefs`, but the
ClickHouse override loop did not — so raw fragments registered through the
same fluent builder silently disappeared from the generated DDL on
ClickHouse only. Mirror the loop in `Schema\ClickHouse::compileCreate()`.
…w(), plus ClickHouse Array/Tuple and raw ORDER BY

Adds the remaining production-OLAP-shaped schema features that callers
had to drop to `rawColumn()` for after the 0.3.x bump:

- `Table::uuid()` — UUID column type, native on ClickHouse (`UUID`) and
  PostgreSQL (`UUID`); `CHAR(36)` on MySQL; `TEXT` on SQLite; `string`
  BSON type on MongoDB. Server-generated UUIDs are common as primary
  identifiers and need a dialect-specific default expression rather
  than an application-supplied value.

- `Column::defaultRaw(string)` — raw default expression emitted
  verbatim after `DEFAULT`. Lets callers attach `generateUUIDv4()`,
  `gen_random_uuid()`, `UUID()`, `now()`, `CURRENT_TIMESTAMP`, etc.
  without the quoting `default()` applies to scalar values. Takes
  precedence over `default()` when both are set; rejects empty strings
  and semicolons.

- `Table::tinyInteger()` and `Table::smallInteger()` — small integer
  column types. On ClickHouse they map to `UInt8`/`Int8` and
  `UInt16`/`Int16` (75% smaller than the default `UInt32` produced by
  `integer()->unsigned()`), to native `TINYINT`/`SMALLINT` on MySQL,
  to `SMALLINT` on PostgreSQL (which has no `TINYINT`), and to
  `INTEGER` on SQLite. Useful for bounded enumerations, percentage
  values, and other fields that fit well under 32 bits.

- `Table::decimal(name, precision, scale)` — fixed-point numeric
  column for monetary and precision-sensitive values where
  binary-floating-point error is unacceptable. ClickHouse emits
  `Decimal(P, S)`; MySQL/PostgreSQL emit `DECIMAL(P, S)`; SQLite
  emits `NUMERIC(P, S)`; MongoDB maps to the `decimal` BSON type.
  Rejects negative scale and scale greater than precision.

- `Table\ClickHouse::array(name, ColumnType $element)` and
  `Table\ClickHouse::tuple(name, list<ColumnType>)` — `Array(T)` and
  `Tuple(...)` nested column types. Core ClickHouse types for
  multi-valued attributes (tags, labels, parallel-array nested
  records) and fixed-arity composites (geo points, key/value pairs).
  Element types run back through the standard column-type compiler so
  `unsigned()` and `precision`/`scale` flags carry into the inner
  type. `Nullable(...)` wraps the whole `Array`/`Tuple`;
  `LowCardinality(...)` is rejected on these columns to match
  ClickHouse's documented constraints.

- `Table\ClickHouse::orderByRaw(string)` — raw `ORDER BY` expression
  emitted verbatim. MergeTree `ORDER BY` clauses routinely include
  scalar function calls (`toDate(ts)`, `cityHash64(...)`,
  `intHash32(user_id)`) to control sparse-index cardinality; the
  existing identifier-only `orderBy(array)` blocks this common shape.
  Mirrors the `partitionBy(string)` convention. Takes precedence over
  `orderBy()` when both are set; rejects empty strings and semicolons.

README updated under "Creating Tables" (new types and modifiers) and
"ClickHouse Schema" (per-feature subsections with generated DDL).

`Column::$scale` is added alongside the existing `$precision`/`$length`
constructor args, and dialect `Table::newColumn()` overrides forward
it through.
@github-actions
Copy link
Copy Markdown

📊 Coverage

Metric PR Baseline Δ
Lines 91.70% (7251/7907) 91.85% -0.15%
Methods 84.41% (1083/1283) 84.56% -0.15%
Classes 65.35% (132/202) 65.84% -0.50%

Full per-file breakdown in the job summary.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR adds six new schema features to the utopia-php/query library: tinyInteger/smallInteger, decimal, uuid, Array(T)/Tuple(...) (ClickHouse-only), defaultRaw(), and orderByRaw(), plus a bug fix that restores rawColumn() passthrough in ClickHouse's compileCreate() override.

  • ClickHouse schema additions: Array(T) and Tuple(...) column types with correct LowCardinality guards; UInt8/Int8, UInt16/Int16, and Decimal(P,S) compiler paths; UUID type; verbatim DEFAULT expression via defaultRaw(); verbatim ORDER BY clause via orderByRaw().
  • Cross-dialect coverage: tinyInteger, smallInteger, decimal, uuid, and defaultRaw mapped cleanly across MySQL, PostgreSQL, SQLite, and MongoDB with validation and 38 new test assertions.
  • Bug fix: The compileCreate() override in Schema\\ClickHouse now iterates $table->rawColumnDefs, so raw column fragments registered via Table::rawColumn() are no longer silently dropped from ClickHouse DDL.

Confidence Score: 5/5

Safe to merge; changes are additive schema compiler features with no impact on existing query-builder paths

All changed paths are new, self-contained compiler features. The rawColumn fix is a one-loop addition mirroring the existing base-class loop. Validation guards are consistent with existing partitionBy/sampleBy conventions. The only known open items were flagged in a prior review round.

tests/Query/Schema/ClickHouseTest.php — testCreateTableArrayNullable expects Nullable(Array(String)) output but the compiler now throws for that case (tracked in a prior review comment)

Important Files Changed

Filename Overview
src/Query/Schema/ClickHouse.php Adds Array/Tuple/TinyInt/SmallInt/Decimal/UUID/defaultRaw/orderByRaw compiler paths; rawColumn passthrough fix; nullable-Array and nullable-Tuple guards both correctly throw UnsupportedException
src/Query/Schema/Column.php Adds scale constructor param, defaultRaw() fluent method with empty/semicolon validation, and forwarding methods for tinyInteger/smallInteger/decimal/uuid; well-formed
src/Query/Schema/Table/ClickHouse.php Adds orderByRaw property and orderByRaw() / array() / tuple() builder methods; empty/semicolon validation mirrored from partitionBy; clean
src/Query/Schema/Table.php Extends newColumn signature with scale, adds tinyInteger/smallInteger/decimal/uuid factory methods with input validation; scale threaded through all dialect overrides
tests/Query/Schema/ClickHouseTest.php 38 new assertions covering uuid/defaultRaw/tinyInt/smallInt/decimal/array/tuple/orderByRaw/rawColumn; testCreateTableArrayNullable still expects Nullable(Array(String)) output which now throws UnsupportedException (flagged in prior review)
src/Query/Schema/ColumnType.php Adds TinyInteger, SmallInteger, Decimal, Uuid, Array, Tuple enum cases; clean addition

Reviews (3): Last reviewed commit: "Update src/Query/Schema/ClickHouse.php" | Re-trigger Greptile

Comment thread src/Query/Schema/ClickHouse.php
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@abnegate
Copy link
Copy Markdown
Member

@copilot Fix the unit test failure introduced by the last commit

Comment on lines +1329 to +1342
public function testCreateTableArrayNullable(): void
{
$schema = new Schema();
$result = $schema->table('events')
->bigInteger('id')->primary()
->array('tags', ColumnType::String)->nullable()
->create();
$this->assertBindingCount($result);

$this->assertSame(
'CREATE TABLE `events` (`id` Int64, `tags` Nullable(Array(String))) ENGINE = MergeTree() ORDER BY (`id`)',
$result->query,
);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Broken test: code throws but test expects success

ClickHouse::compileColumnType() (line 56–58) now throws UnsupportedException when an Array column has isNullable = true. testCreateTableArrayNullable calls ->array('tags', ColumnType::String)->nullable()->create(), which means create() throws before a result is returned — so $result is never assigned and the assertSame is never reached. The test will fail with an unexpected exception.

The README example at line 2308 (// \scores` Nullable(Array(String))) also documents the old (now-invalid) behaviour. Both the test and that README snippet need to be updated: the test should use $this->expectException(UnsupportedException::class)and the README should remove or correct the->nullable()example forArray` columns.

Comment thread src/Query/Schema/ClickHouse.php
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants