Skip to content

HBase backend: edges cannot be persisted and vertex ID mismatch #3032

@linmengmeng-1314

Description

@linmengmeng-1314

Issue Draft: HBase Backend - Edges Cannot Be Persisted

Environment

  • HugeGraph: 1.7.0 (server), pyhugegraph: 1.7.0 (client)
  • HBase: 2.1.2
  • ID Strategy: PRIMARY_KEY
  • Serializer: binary

Description

When HugeGraph uses HBase as the backend storage, edges cannot be persisted regardless of the creation method used (REST API or Gremlin). Vertex creation works, but all edge creation approaches either fail explicitly or silently discard the data.

Reproduction Steps

Schema Setup

// Property keys
schema.propertyKey('name').asText().ifNotExist().create()
schema.propertyKey('date').asText().ifNotExist().create()

// Vertex label with PRIMARY_KEY strategy
schema.vertexLabel('person').properties('name').usePrimaryKeyId().primaryKeys('name').ifNotExist().create()

// Edge label
schema.edgeLabel('roommate').sourceLabel('person').targetLabel('person').properties('date').nullableKeys('date').ifNotExist().create()

Create Vertices

// REST API - works fine
POST /graphs/{graph}/graph/vertices
{"label": "person", "properties": {"name": "Alice"}}
{"label": "person", "properties": {"name": "Bob"}}

Response IDs: "1:Alice", "1:Bob" (composite format from PRIMARY_KEY strategy).

Attempt Edge Creation

Method 1: REST API with returned composite IDs — fails with error

POST /graphs/{graph}/graph/edges
{"label": "roommate", "outV": "1:Alice", "inV": "1:Bob", "properties": {"date": "2024"}}
→ 400 Bad Request: IllegalArgumentException: Invalid vertex id '1:Alice'

Method 2: REST API with actual stored IDs (from g.V().id()) — also fails

g.V().id() returns integers (e.g., 9988829793) instead of the composite IDs returned by addVertex. Using these:

POST /graphs/{graph}/graph/edges
{"label": "roommate", "outV": 9988829793, "inV": 9837833573, "properties": {"date": "2024"}}
→ 400 Bad Request: IllegalArgumentException: Invalid vertex id '9988829793'

Note: The vertex CAN be retrieved via REST API GET /vertices/9988829793 (returns 200), but edge creation rejects the same ID.

Method 3: Gremlin addE — appears to succeed but data is not persisted

g.V().hasLabel('person').has('name','Alice').addE('roommate').to(__.V().hasLabel('person').has('name','Bob')).property('date','2024')

Response: Returns the edge object with ID and properties (appears successful).

Verification:

g.E().count()       → returns stale/incorrect count (e.g., 3)
g.E().toList()      → returns [] (empty, correct)
GET /graph/edges    → {"edges": []} (empty)

The edge is created in memory but not persisted to HBase.

Additional Observations

1. Vertex ID Mismatch

With PRIMARY_KEY strategy and HBase backend:

  • addVertex REST API returns composite IDs: "1:Alice"
  • g.V().id() returns auto-generated integers: 9988829793
  • g.V().valueMap(true) returns the integer ID
  • The vertex is accessible via GET /vertices/{integer_id} but NOT via GET /vertices/1:Alice
  • g.V({integer_id}) lookup also returns empty (cannot find vertex by its own ID)

2. Vertex ID Collision Across Labels

Different vertex labels with different primary key values may share the same internal ID:

person:James (pk="James")           → stored as 9837833573
webpage:James的个人网站 (pk="James的个人网站") → also stored as 9837833573

This suggests the HBase backend generates IDs based on a hash that ignores the vertex label, causing cross-label collisions.

3. g.E().count() Returns Incorrect Results

After Gremlin addE calls that don't persist:

  • g.E().count() may return a non-zero stale count
  • g.E().toList() returns empty (correct)
  • REST API /edges returns empty (correct)

Expected Behavior

  • Edge creation via REST API should find vertices by the IDs returned by addVertex
  • Edge creation via Gremlin addE should persist edges to the backend
  • g.V().id() should return IDs that can be used for subsequent lookups and edge creation
  • Vertex IDs should be unique across different vertex labels

Actual Behavior

  • No method of edge creation works with HBase backend
  • Vertex IDs are inconsistent between addVertex response and g.V().id()
  • Vertex IDs collide across labels
  • g.E().count() returns stale/incorrect counts

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghbaseHBase backend

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions