Skip to content

[Bug] routine load 在非严格模式下导入 varchar 目标列时,非英文字符的长度截断不正确 #64334

@123shang60

Description

@123shang60

Search before asking

  • I had searched in the issues and found no similar issues.

Version

doris-4.1.0-rc03-5960d4cea0e

What's Wrong?

我在 Doris 中创建了如下的表

CREATE TABLE test_table
(
    `env` VARCHAR(32) NOT NULL DEFAULT ''
)
ENGINE = OLAP
DUPLICATE KEY(env)
DISTRIBUTED BY RANDOM BUCKETS 1
properties(
    "compression" = "zstd",
    "enable_single_replica_compaction" = "true",
    "replication_allocation" = "tag.location.default: 1",
    "storage_format" = "V3"
);

同时开启了一个 routine load 任务

CREATE ROUTINE LOAD dwd.test_table ON test_table
COLUMNS(`env`)
PROPERTIES(
    "format"="json",
    "strict_mode"="false",
    "max_filter_ratio" = "1",
    "timezone" = "Asia/Shanghai",
    "max_error_number" = "5000000",
    "max_batch_interval" = "10",
    "max_batch_rows" = "5000000",
    "desired_concurrent_number" = "8",
    "load_to_single_tablet" = "true",
    "exec_mem_limit" = "8589934592"
)
FROM KAFKA(
    "kafka_broker_list" = "127.0.0.1:9092",
    "kafka_topic" = "test_load",
    "property.kafka_default_offsets" = "OFFSET_END",
    "property.group.id" = "test_load_doris"
);

之后,我向 kafka 中发送了一个消息:

{"env":"${jnd${upper:ı}:ldap://test.comxxxxxx}"}

这个 json 中, upper: 字符后有一个特殊的 U+0131 字符;此时 Doris 的 routine load 任务会出现如下报错:

Reason: column_name[env], the length of input is too long than schema. first 32 bytes of input str: [${jnd${upper:ı}:ldap://test.com] schema length: 32; actual length: 33; . src line []; 

同样的,导入如下的中文字符,也会出现问题:

{"env":"中123456789012345678901234567890"}

报错如下:

Reason: column_name[env], the length of input is too long than schema. first 32 bytes of input str: [中12345678901234567890123456789] schema length: 32; actual length: 33; . src line []; 

What You Expected?

能够正确的按照 varchar 定义的长度截断字符串,并成功导入数据

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions