Skip to content

Commit f35e027

Browse files
Define complete core type system with blob→longblob mapping
Core DataJoint types (fully supported, recorded in :type: comments): - Numeric: float32, float64, int64, uint64, int32, uint32, int16, uint16, int8, uint8 - Boolean: bool - UUID: uuid → binary(16) - JSON: json - Binary: blob → longblob - Temporal: date, datetime - String: char(n), varchar(n) - Enumeration: enum(...) Changes: - declare.py: Define CORE_TYPES with (pattern, sql_mapping) pairs - declare.py: Add warning for non-standard native type usage - heading.py: Update to use CORE_TYPE_NAMES - storage-types-spec.md: Update documentation to reflect core types Native database types (text, mediumint, etc.) pass through with a warning about non-standard usage. Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent 2de222a commit f35e027

File tree

3 files changed

+118
-90
lines changed

3 files changed

+118
-90
lines changed

docs/src/design/tables/storage-types-spec.md

Lines changed: 46 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -12,19 +12,20 @@ This document defines a three-layer type architecture:
1212
┌───────────────────────────────────────────────────────────────────┐
1313
│ AttributeTypes (Layer 3) │
1414
│ │
15-
│ Built-in: <object> <content> <filepath@s> <djblob> <xblob> │
15+
│ Built-in: <djblob> <object> <content> <filepath@s> <xblob> │
1616
│ User: <custom> <mytype> ... │
1717
├───────────────────────────────────────────────────────────────────┤
1818
│ Core DataJoint Types (Layer 2) │
1919
│ │
20-
int8 int16 int32 int64 float32 float64 bool decimal
21-
│ uint8 uint16 uint32 uint64 varchar char uuid date
22-
json longblob blob timestamp datetime enum
20+
float32 float64 int64 uint64 int32 uint32 int16 uint16
21+
int8 uint8 bool uuid json blob date datetime
22+
char(n) varchar(n) enum(...)
2323
├───────────────────────────────────────────────────────────────────┤
2424
│ Native Database Types (Layer 1) │
2525
│ │
2626
│ MySQL: TINYINT SMALLINT INT BIGINT FLOAT DOUBLE ... │
2727
│ PostgreSQL: SMALLINT INTEGER BIGINT REAL DOUBLE PRECISION │
28+
│ (pass through with warning for non-standard types) │
2829
└───────────────────────────────────────────────────────────────────┘
2930
```
3031

@@ -49,61 +50,65 @@ For arbitrary URLs that don't need ObjectRef semantics, use `varchar` instead.
4950
Core types provide a standardized, scientist-friendly interface that works identically across
5051
MySQL and PostgreSQL backends. Users should prefer these over native database types.
5152

53+
**All core types are recorded in field comments using `:type:` syntax for reconstruction.**
54+
5255
### Numeric Types
5356

54-
| Core Type | Description | MySQL | PostgreSQL |
55-
|-----------|-------------|-------|------------|
56-
| `int8` | 8-bit signed | `TINYINT` | `SMALLINT` (clamped) |
57-
| `int16` | 16-bit signed | `SMALLINT` | `SMALLINT` |
58-
| `int32` | 32-bit signed | `INT` | `INTEGER` |
59-
| `int64` | 64-bit signed | `BIGINT` | `BIGINT` |
60-
| `uint8` | 8-bit unsigned | `TINYINT UNSIGNED` | `SMALLINT` (checked) |
61-
| `uint16` | 16-bit unsigned | `SMALLINT UNSIGNED` | `INTEGER` (checked) |
62-
| `uint32` | 32-bit unsigned | `INT UNSIGNED` | `BIGINT` (checked) |
63-
| `uint64` | 64-bit unsigned | `BIGINT UNSIGNED` | `NUMERIC(20)` |
64-
| `float32` | 32-bit float | `FLOAT` | `REAL` |
65-
| `float64` | 64-bit float | `DOUBLE` | `DOUBLE PRECISION` |
66-
| `decimal(p,s)` | Fixed precision | `DECIMAL(p,s)` | `NUMERIC(p,s)` |
57+
| Core Type | Description | MySQL |
58+
|-----------|-------------|-------|
59+
| `int8` | 8-bit signed | `TINYINT` |
60+
| `int16` | 16-bit signed | `SMALLINT` |
61+
| `int32` | 32-bit signed | `INT` |
62+
| `int64` | 64-bit signed | `BIGINT` |
63+
| `uint8` | 8-bit unsigned | `TINYINT UNSIGNED` |
64+
| `uint16` | 16-bit unsigned | `SMALLINT UNSIGNED` |
65+
| `uint32` | 32-bit unsigned | `INT UNSIGNED` |
66+
| `uint64` | 64-bit unsigned | `BIGINT UNSIGNED` |
67+
| `float32` | 32-bit float | `FLOAT` |
68+
| `float64` | 64-bit float | `DOUBLE` |
6769

6870
### String Types
6971

70-
| Core Type | Description | MySQL | PostgreSQL |
71-
|-----------|-------------|-------|------------|
72-
| `char(n)` | Fixed-length | `CHAR(n)` | `CHAR(n)` |
73-
| `varchar(n)` | Variable-length | `VARCHAR(n)` | `VARCHAR(n)` |
72+
| Core Type | Description | MySQL |
73+
|-----------|-------------|-------|
74+
| `char(n)` | Fixed-length | `CHAR(n)` |
75+
| `varchar(n)` | Variable-length | `VARCHAR(n)` |
7476

7577
### Boolean
7678

77-
| Core Type | Description | MySQL | PostgreSQL |
78-
|-----------|-------------|-------|------------|
79-
| `bool` | True/False | `TINYINT(1)` | `BOOLEAN` |
79+
| Core Type | Description | MySQL |
80+
|-----------|-------------|-------|
81+
| `bool` | True/False | `TINYINT` |
8082

8183
### Date/Time Types
8284

83-
| Core Type | Description | MySQL | PostgreSQL |
84-
|-----------|-------------|-------|------------|
85-
| `date` | Date only | `DATE` | `DATE` |
86-
| `datetime` | Date and time | `DATETIME(6)` | `TIMESTAMP` |
87-
| `timestamp` | Auto-updating | `TIMESTAMP` | `TIMESTAMP` |
88-
| `time` | Time only | `TIME` | `TIME` |
85+
| Core Type | Description | MySQL |
86+
|-----------|-------------|-------|
87+
| `date` | Date only | `DATE` |
88+
| `datetime` | Date and time | `DATETIME` |
8989

9090
### Binary Types
9191

92-
Core binary types store raw bytes without any serialization. Use `<djblob>` AttributeType
92+
The core `blob` type stores raw bytes without any serialization. Use `<djblob>` AttributeType
9393
for serialized Python objects.
9494

95-
| Core Type | Description | MySQL | PostgreSQL |
96-
|-----------|-------------|-------|------------|
97-
| `blob` | Raw bytes up to 64KB | `BLOB` | `BYTEA` |
98-
| `longblob` | Raw bytes up to 4GB | `LONGBLOB` | `BYTEA` |
95+
| Core Type | Description | MySQL |
96+
|-----------|-------------|-------|
97+
| `blob` | Raw bytes | `LONGBLOB` |
98+
99+
### Other Types
100+
101+
| Core Type | Description | MySQL |
102+
|-----------|-------------|-------|
103+
| `json` | JSON document | `JSON` |
104+
| `uuid` | UUID | `BINARY(16)` |
105+
| `enum(...)` | Enumeration | `ENUM(...)` |
99106

100-
### Special Types
107+
### Native Passthrough Types
101108

102-
| Core Type | Description | MySQL | PostgreSQL |
103-
|-----------|-------------|-------|------------|
104-
| `json` | JSON document | `JSON` | `JSONB` |
105-
| `uuid` | UUID | `CHAR(36)` | `UUID` |
106-
| `enum(...)` | Enumeration | `ENUM(...)` | `VARCHAR` + CHECK |
109+
Users may use native database types directly (e.g., `text`, `mediumint auto_increment`),
110+
but these will generate a warning about non-standard usage. Native types are not recorded
111+
in field comments and may have portability issues across database backends.
107112

108113
## AttributeTypes (Layer 3)
109114

src/datajoint/declare.py

Lines changed: 70 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -14,73 +14,83 @@
1414
from .errors import DataJointError
1515
from .settings import config
1616

17-
# Core DataJoint type aliases - scientist-friendly names mapped to native SQL types
18-
# These types can be used without angle brackets in table definitions
19-
CORE_TYPE_ALIASES = {
20-
# Numeric types
21-
"FLOAT32": "float",
22-
"FLOAT64": "double",
23-
"INT64": "bigint",
24-
"UINT64": "bigint unsigned",
25-
"INT32": "int",
26-
"UINT32": "int unsigned",
27-
"INT16": "smallint",
28-
"UINT16": "smallint unsigned",
29-
"INT8": "tinyint",
30-
"UINT8": "tinyint unsigned",
31-
"BOOL": "tinyint",
32-
# UUID type
33-
"UUID": "binary(16)",
17+
# Core DataJoint types - scientist-friendly names that are fully supported
18+
# These are recorded in field comments using :type: syntax for reconstruction
19+
# Format: pattern_name -> (regex_pattern, mysql_type or None if same as matched)
20+
CORE_TYPES = {
21+
# Numeric types (aliased to native SQL)
22+
"float32": (r"float32$", "float"),
23+
"float64": (r"float64$", "double"),
24+
"int64": (r"int64$", "bigint"),
25+
"uint64": (r"uint64$", "bigint unsigned"),
26+
"int32": (r"int32$", "int"),
27+
"uint32": (r"uint32$", "int unsigned"),
28+
"int16": (r"int16$", "smallint"),
29+
"uint16": (r"uint16$", "smallint unsigned"),
30+
"int8": (r"int8$", "tinyint"),
31+
"uint8": (r"uint8$", "tinyint unsigned"),
32+
"bool": (r"bool$", "tinyint"),
33+
# UUID (stored as binary)
34+
"uuid": (r"uuid$", "binary(16)"),
35+
# JSON
36+
"json": (r"json$", None), # json passes through as-is
37+
# Binary (blob maps to longblob)
38+
"blob": (r"blob$", "longblob"),
39+
# Temporal
40+
"date": (r"date$", None),
41+
"datetime": (r"datetime$", None),
42+
# String types (with parameters)
43+
"char": (r"char\s*\(\d+\)$", None),
44+
"varchar": (r"varchar\s*\(\d+\)$", None),
45+
# Enumeration
46+
"enum": (r"enum\s*\(.+\)$", None),
3447
}
3548

49+
# Compile core type patterns
50+
CORE_TYPE_PATTERNS = {name: re.compile(pattern, re.I) for name, (pattern, _) in CORE_TYPES.items()}
51+
52+
# Get SQL mapping for core types
53+
CORE_TYPE_SQL = {name: sql_type for name, (_, sql_type) in CORE_TYPES.items()}
54+
3655
MAX_TABLE_NAME_LENGTH = 64
3756
CONSTANT_LITERALS = {
3857
"CURRENT_TIMESTAMP",
3958
"NULL",
4059
} # SQL literals to be used without quotes (case insensitive)
4160

4261
# Type patterns for declaration parsing
43-
# Two categories: core type aliases and native passthrough types
4462
TYPE_PATTERN = {
4563
k: re.compile(v, re.I)
4664
for k, v in dict(
47-
# Core DataJoint type aliases (scientist-friendly names)
48-
FLOAT32=r"float32$",
49-
FLOAT64=r"float64$",
50-
INT64=r"int64$",
51-
UINT64=r"uint64$",
52-
INT32=r"int32$",
53-
UINT32=r"uint32$",
54-
INT16=r"int16$",
55-
UINT16=r"uint16$",
56-
INT8=r"int8$",
57-
UINT8=r"uint8$",
58-
BOOL=r"bool$",
59-
UUID=r"uuid$",
60-
# Native SQL types (passthrough)
65+
# Core DataJoint types
66+
**{name.upper(): pattern for name, (pattern, _) in CORE_TYPES.items()},
67+
# Native SQL types (passthrough with warning for non-standard use)
6168
INTEGER=r"((tiny|small|medium|big|)int|integer)(\s*\(.+\))?(\s+unsigned)?(\s+auto_increment)?|serial$",
6269
DECIMAL=r"(decimal|numeric)(\s*\(.+\))?(\s+unsigned)?$",
6370
FLOAT=r"(double|float|real)(\s*\(.+\))?(\s+unsigned)?$",
64-
STRING=r"(var)?char\s*\(.+\)$",
65-
JSON=r"json$",
66-
ENUM=r"enum\s*\(.+\)$",
67-
TEMPORAL=r"(date|datetime|time|timestamp|year)(\s*\(.+\))?$",
68-
BLOB=r"(tiny|small|medium|long|)blob$",
71+
STRING=r"(var)?char\s*\(.+\)$", # Catches char/varchar not matched by core types
72+
TEMPORAL=r"(time|timestamp|year)(\s*\(.+\))?$", # time, timestamp, year (not date/datetime)
73+
NATIVE_BLOB=r"(tiny|small|medium|long)blob$", # Specific blob variants
74+
TEXT=r"(tiny|small|medium|long)?text$", # Text types
6975
# AttributeTypes use angle brackets
7076
ADAPTED=r"<.+>$",
7177
).items()
7278
}
7379

74-
# Types that require special handling (stored in attribute comment for reconstruction)
75-
SPECIAL_TYPES = {"ADAPTED"} | set(CORE_TYPE_ALIASES)
80+
# Core types are stored in attribute comment for reconstruction
81+
CORE_TYPE_NAMES = {name.upper() for name in CORE_TYPES}
82+
83+
# Special types that need comment storage (core types + adapted)
84+
SPECIAL_TYPES = CORE_TYPE_NAMES | {"ADAPTED"}
7685

77-
# Native SQL types that pass through without modification
86+
# Native SQL types that pass through (with optional warning)
7887
NATIVE_TYPES = set(TYPE_PATTERN) - SPECIAL_TYPES
7988

8089
assert SPECIAL_TYPES <= set(TYPE_PATTERN)
8190

8291

8392
def match_type(attribute_type):
93+
"""Match an attribute type string to a category."""
8494
try:
8595
return next(category for category, pattern in TYPE_PATTERN.items() if pattern.match(attribute_type))
8696
except StopIteration:
@@ -444,7 +454,7 @@ def substitute_special_type(match, category, foreign_key_sql, context):
444454
Substitute special types with their native SQL equivalents.
445455
446456
Special types are:
447-
- Core type aliases (float32 → float, uuid → binary(16), etc.)
457+
- Core DataJoint types (float32 → float, uuid → binary(16), blob → longblob, etc.)
448458
- ADAPTED types (AttributeTypes in angle brackets)
449459
450460
:param match: dict containing with keys "type" and "comment" -- will be modified in place
@@ -462,9 +472,13 @@ def substitute_special_type(match, category, foreign_key_sql, context):
462472
category = match_type(match["type"])
463473
if category in SPECIAL_TYPES:
464474
substitute_special_type(match, category, foreign_key_sql, context)
465-
elif category in CORE_TYPE_ALIASES:
466-
# Core type alias - substitute with native SQL type
467-
match["type"] = CORE_TYPE_ALIASES[category]
475+
elif category in CORE_TYPE_NAMES:
476+
# Core DataJoint type - substitute with native SQL type if mapping exists
477+
core_name = category.lower()
478+
sql_type = CORE_TYPE_SQL.get(core_name)
479+
if sql_type is not None:
480+
match["type"] = sql_type
481+
# else: type passes through as-is (json, date, datetime, char, varchar, enum)
468482
else:
469483
assert False, f"Unknown special type: {category}"
470484

@@ -510,13 +524,22 @@ def compile_attribute(line, in_key, foreign_key_sql, context):
510524
raise DataJointError('An attribute comment must not start with a colon in comment "{comment}"'.format(**match))
511525

512526
category = match_type(match["type"])
527+
513528
if category in SPECIAL_TYPES:
514-
match["comment"] = ":{type}:{comment}".format(**match) # insert custom type into comment
529+
# Core types and AttributeTypes are recorded in comment for reconstruction
530+
match["comment"] = ":{type}:{comment}".format(**match)
515531
substitute_special_type(match, category, foreign_key_sql, context)
532+
elif category in NATIVE_TYPES:
533+
# Non-standard native type - warn user
534+
logger.warning(
535+
f"Non-standard native type '{match['type']}' in attribute '{match['name']}'. "
536+
"Consider using a core DataJoint type for better portability."
537+
)
516538

517539
# Check for invalid default values on blob types (after type substitution)
518-
final_category = match_type(match["type"])
519-
if final_category == "BLOB" and match["default"] not in {"DEFAULT NULL", "NOT NULL"}:
540+
# Note: blob → longblob, so check for NATIVE_BLOB or longblob result
541+
final_type = match["type"].lower()
542+
if ("blob" in final_type) and match["default"] not in {"DEFAULT NULL", "NOT NULL"}:
520543
raise DataJointError("The default value for blob attributes can only be NULL in:\n{line}".format(line=line))
521544

522545
sql = ("`{name}` {type} {default}" + (' COMMENT "{comment}"' if match["comment"] else "")).format(**match)

src/datajoint/heading.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from .attribute_adapter import get_adapter
99
from .attribute_type import AttributeType
1010
from .declare import (
11-
CORE_TYPE_ALIASES,
11+
CORE_TYPE_NAMES,
1212
SPECIAL_TYPES,
1313
TYPE_PATTERN,
1414
)
@@ -348,7 +348,7 @@ def _init_from_database(self):
348348

349349
if category == "UUID":
350350
attr["uuid"] = True
351-
elif category in CORE_TYPE_ALIASES:
351+
elif category in CORE_TYPE_NAMES:
352352
# Core type alias - already resolved in DB
353353
pass
354354

0 commit comments

Comments
 (0)