Skip to content

repository: ChunkIndex-based pack routing with range-load support#9744

Open
mr-raj12 wants to merge 9 commits into
borgbackup:masterfrom
mr-raj12:pack-files-step6-range-load
Open

repository: ChunkIndex-based pack routing with range-load support#9744
mr-raj12 wants to merge 9 commits into
borgbackup:masterfrom
mr-raj12:pack-files-step6-range-load

Conversation

@mr-raj12

@mr-raj12 mr-raj12 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Description

get() now resolves pack location (pack_id, obj_offset, obj_size) from the ChunkIndex instead of falling back to the N=1 assumption of pack_id == chunk_id.

set_chunk_index(chunks) gives the cache layer a way to pass a borrowed reference to the full index. get() reads only from that index; if an entry is missing or still UNKNOWN_BYTES32 (buffered in PackWriter, pack file not yet written), it raises ObjectNotFound.

put() calls _chunks.add() immediately and update_pack_info() once pack_results are available, so the index stays current within a session and dedup via seen_chunk() works. flush() also calls update_pack_info() internally, so cache.py does not need to handle pack_results directly.

The read_data=False path caps its store load to obj_size when set, avoiding over-reads on multi-chunk packs. The retry load is capped too, as a guard against a corrupted meta_size in the header.

Changes:

  • repository.py: set_chunk_index(), updated get(), put(), flush(), close()
  • testsuite/repository_test.py: tests for ChunkIndex routing, range-load, put dedup

refs #8572

Checklist

  • PR is against master
  • New code has tests and docs where appropriate
  • Tests pass
  • Commit messages are clean and reference related issues

@codecov

codecov Bot commented Jun 9, 2026

Copy link
Copy Markdown

❌ 6 Tests Failed:

Tests completed Failed Passed Skipped
1900 6 1894 370
View the top 3 failed test(s) by shortest run time
src.borg.testsuite.archiver.debug_cmds_test::test_debug_put_get_delete_obj[archiver]
Stack Traces | 0.608s run time
self = <borg.archiver.Archiver object at 0x7f30a4ba1150>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='3f2edfe5d8f6d61b463b9acea6327b784f6f9f039d408f587c8850236ca...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a4ba1150>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>
id = b'?.\xdf\xe5\xd8\xf6\xd6\x1bF;\x9a\xce\xa62{xOo\x9f\x03\x9d@\x8fX|\x88P#l\xa8\xe7K'
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key 3f2edfe5d8f6d61b463b9acea6327b784f6f9f039d408f587c8850236ca8e74b not found in repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'archiver'
request = <FixtureRequest for <Function test_debug_put_get_delete_obj[archiver]>>

    def test_debug_put_get_delete_obj(archivers, request):
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data"
        create_regular_file(archiver.input_path, "file", contents=data)
    
        output = cmd(archiver, "debug", "id-hash", "input/file")
        id_hash = output.strip()
    
        output = cmd(archiver, "debug", "put-obj", id_hash, "input/file")
        assert id_hash in output
    
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/file")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:64: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a4ba1150>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='3f2edfe5d8f6d61b463b9acea6327b784f6f9f039d408f587c8850236ca...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a4ba1150>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object 3f2edfe5d8f6d61b463b9acea6327b784f6f9f039d408f587c8850236ca8e74b not found.

.../borg/archiver/debug_cmd.py:211: RTError
src.borg.testsuite.archiver.debug_cmds_test::test_debug_format_obj_respects_type[archiver]
Stack Traces | 0.725s run time
self = <borg.archiver.Archiver object at 0x7f30a49528d0>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='af0a0c05d9429bccc3cdc7bdd771add773221af668bf5d9a818c4e1e129...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a49528d0>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>
id = b'\xaf\n\x0c\x05\xd9B\x9b\xcc\xc3\xcd\xc7\xbd\xd7q\xad\xd7s"\x1a\xf6h\xbf]\x9a\x81\x8cN\x1e\x12\x92Yf'
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key af0a0c05d9429bccc3cdc7bdd771add773221af668bf5d9a818c4e1e12925966 not found in repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'archiver'
request = <FixtureRequest for <Function test_debug_format_obj_respects_type[archiver]>>

    def test_debug_format_obj_respects_type(archivers, request):
        """Test format-obj uses the type from metadata JSON, not just ROBJ_FILE_STREAM."""
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data" * 100
        meta_dict = {"some": "property", "type": ROBJ_ARCHIVE_STREAM}
        meta = json.dumps(meta_dict).encode()
        create_regular_file(archiver.input_path, "data.bin", contents=data)
        create_regular_file(archiver.input_path, "meta.json", contents=meta)
        output = cmd(archiver, "debug", "id-hash", "input/data.bin")
        id_hash = output.strip()
        cmd(archiver, "debug", "format-obj", id_hash, "input/data.bin", "input/meta.json", "input/repoobj.bin")
        output = cmd(archiver, "debug", "put-obj", id_hash, "input/repoobj.bin")
        assert id_hash in output
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/object.bin")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a49528d0>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='af0a0c05d9429bccc3cdc7bdd771add773221af668bf5d9a818c4e1e129...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a49528d0>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object af0a0c05d9429bccc3cdc7bdd771add773221af668bf5d9a818c4e1e12925966 not found.

.../borg/archiver/debug_cmd.py:211: RTError
src.borg.testsuite.archiver.debug_cmds_test::test_debug_id_hash_format_put_get_parse_obj[archiver]
Stack Traces | 0.741s run time
self = <borg.archiver.Archiver object at 0x7f30a4df1b90>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='7c5b52f0532e88d73f1ec8414ae86ea4e2a5accef622c51425bc81b1c5d...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a4df1b90>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>
id = b'|[R\xf0S.\x88\xd7?\x1e\xc8AJ\xe8n\xa4\xe2\xa5\xac\xce\xf6"\xc5\x14%\xbc\x81\xb1\xc5\xdeV\xa1'
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key 7c5b52f0532e88d73f1ec8414ae86ea4e2a5accef622c51425bc81b1c5de56a1 not found in repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'archiver'
request = <FixtureRequest for <Function test_debug_id_hash_format_put_get_parse_obj[archiver]>>

    def test_debug_id_hash_format_put_get_parse_obj(archivers, request):
        """Test format-obj and parse-obj commands."""
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data" * 100
        meta_dict = {"some": "property"}
        meta = json.dumps(meta_dict).encode()
        create_regular_file(archiver.input_path, "plain.bin", contents=data)
        create_regular_file(archiver.input_path, "meta.json", contents=meta)
        output = cmd(archiver, "debug", "id-hash", "input/plain.bin")
        id_hash = output.strip()
        cmd(
            archiver,
            "debug",
            "format-obj",
            id_hash,
            "input/plain.bin",
            "input/meta.json",
            "output/data.bin",
            "--compression=zstd,2",
        )
        output = cmd(archiver, "debug", "put-obj", id_hash, "output/data.bin")
        assert id_hash in output
    
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/object.bin")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a4df1b90>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='7c5b52f0532e88d73f1ec8414ae86ea4e2a5accef622c51425bc81b1c5d...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a4df1b90>>)
repository = <Repository proto='file', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object 7c5b52f0532e88d73f1ec8414ae86ea4e2a5accef622c51425bc81b1c5de56a1 not found.

.../borg/archiver/debug_cmd.py:211: RTError
src.borg.testsuite.archiver.debug_cmds_test::test_debug_put_get_delete_obj[remote_archiver]
Stack Traces | 10.1s run time
self = <borg.archiver.Archiver object at 0x7f30a2aad910>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='3e542b384d283a517a441104459d026fc048a71fb8acde2737f8c9e361f...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2aad910>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>
id = b">T+8M(:QzD\x11\x04E\x9d\x02o\xc0H\xa7\x1f\xb8\xac\xde'7\xf8\xc9\xe3a\xf7\xc6\x89"
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key 3e542b384d283a517a441104459d026fc048a71fb8acde2737f8c9e361f7c689 not found in repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'remote_archiver'
request = <FixtureRequest for <Function test_debug_put_get_delete_obj[remote_archiver]>>

    def test_debug_put_get_delete_obj(archivers, request):
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data"
        create_regular_file(archiver.input_path, "file", contents=data)
    
        output = cmd(archiver, "debug", "id-hash", "input/file")
        id_hash = output.strip()
    
        output = cmd(archiver, "debug", "put-obj", id_hash, "input/file")
        assert id_hash in output
    
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/file")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:64: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a2aad910>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='3e542b384d283a517a441104459d026fc048a71fb8acde2737f8c9e361f...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2aad910>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_put_get_delete_obj_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object 3e542b384d283a517a441104459d026fc048a71fb8acde2737f8c9e361f7c689 not found.

.../borg/archiver/debug_cmd.py:211: RTError
src.borg.testsuite.archiver.debug_cmds_test::test_debug_id_hash_format_put_get_parse_obj[remote_archiver]
Stack Traces | 11.8s run time
self = <borg.archiver.Archiver object at 0x7f30a2c6ef90>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='1b7eff779ae251a9cca0bf3783a678235fbb2ab6436d7aab009e7aea4c1...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2c6ef90>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>
id = b'\x1b~\xffw\x9a\xe2Q\xa9\xcc\xa0\xbf7\x83\xa6x#_\xbb*\xb6Cmz\xab\x00\x9ez\xeaL\x1f\xd7\xeb'
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key 1b7eff779ae251a9cca0bf3783a678235fbb2ab6436d7aab009e7aea4c1fd7eb not found in repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'remote_archiver'
request = <FixtureRequest for <Function test_debug_id_hash_format_put_get_parse_obj[remote_archiver]>>

    def test_debug_id_hash_format_put_get_parse_obj(archivers, request):
        """Test format-obj and parse-obj commands."""
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data" * 100
        meta_dict = {"some": "property"}
        meta = json.dumps(meta_dict).encode()
        create_regular_file(archiver.input_path, "plain.bin", contents=data)
        create_regular_file(archiver.input_path, "meta.json", contents=meta)
        output = cmd(archiver, "debug", "id-hash", "input/plain.bin")
        id_hash = output.strip()
        cmd(
            archiver,
            "debug",
            "format-obj",
            id_hash,
            "input/plain.bin",
            "input/meta.json",
            "output/data.bin",
            "--compression=zstd,2",
        )
        output = cmd(archiver, "debug", "put-obj", id_hash, "output/data.bin")
        assert id_hash in output
    
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/object.bin")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:105: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a2c6ef90>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='1b7eff779ae251a9cca0bf3783a678235fbb2ab6436d7aab009e7aea4c1...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2c6ef90>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_id_hash_format_put_0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object 1b7eff779ae251a9cca0bf3783a678235fbb2ab6436d7aab009e7aea4c1fd7eb not found.

.../borg/archiver/debug_cmd.py:211: RTError
src.borg.testsuite.archiver.debug_cmds_test::test_debug_format_obj_respects_type[remote_archiver]
Stack Traces | 11.8s run time
self = <borg.archiver.Archiver object at 0x7f30a2c4e7d0>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='ffe037c963ea419b08df56c6ce979cb3842ead72ec700218a67f59b6099...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2c4e7d0>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
>           data = repository.get(id)
                   ^^^^^^^^^^^^^^^^^^

.../borg/archiver/debug_cmd.py:209: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>
id = b'\xff\xe07\xc9c\xeaA\x9b\x08\xdfV\xc6\xce\x97\x9c\xb3\x84.\xadr\xecp\x02\x18\xa6\x7fY\xb6\t\x9bJn'
read_data = True, raise_missing = True

    def get(self, id, read_data=True, raise_missing=True):
        self._lock_refresh()
        entry = self.chunks.get(id)
        if entry is None:
            if raise_missing:
>               raise self.ObjectNotFound(id, str(self._location))
E               borg.repository.Repository.ObjectNotFound: Object with key ffe037c963ea419b08df56c6ce979cb3842ead72ec700218a67f59b6099b4a6e not found in repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'.

.../src/borg/repository.py:644: ObjectNotFound

During handling of the above exception, another exception occurred:

archivers = 'remote_archiver'
request = <FixtureRequest for <Function test_debug_format_obj_respects_type[remote_archiver]>>

    def test_debug_format_obj_respects_type(archivers, request):
        """Test format-obj uses the type from metadata JSON, not just ROBJ_FILE_STREAM."""
        archiver = request.getfixturevalue(archivers)
        cmd(archiver, "repo-create", RK_ENCRYPTION)
        data = b"some data" * 100
        meta_dict = {"some": "property", "type": ROBJ_ARCHIVE_STREAM}
        meta = json.dumps(meta_dict).encode()
        create_regular_file(archiver.input_path, "data.bin", contents=data)
        create_regular_file(archiver.input_path, "meta.json", contents=meta)
        output = cmd(archiver, "debug", "id-hash", "input/data.bin")
        id_hash = output.strip()
        cmd(archiver, "debug", "format-obj", id_hash, "input/data.bin", "input/meta.json", "input/repoobj.bin")
        output = cmd(archiver, "debug", "put-obj", id_hash, "input/repoobj.bin")
        assert id_hash in output
>       output = cmd(archiver, "debug", "get-obj", id_hash, "output/object.bin")
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.../testsuite/archiver/debug_cmds_test.py:140: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
.../testsuite/archiver/__init__.py:148: in cmd
    ret, output = exec_cmd(
.../testsuite/archiver/__init__.py:88: in exec_cmd
    ret = archiver.run(args)  # calls setup_logging internally
          ^^^^^^^^^^^^^^^^^^
.../borg/archiver/__init__.py:513: in run
    rc = func(args)
         ^^^^^^^^^^
.../borg/archiver/_common.py:173: in wrapper
    return method(self, args, repository=repository, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <borg.archiver.Archiver object at 0x7f30a2c4e7d0>
args = Namespace(subcommand='debug get-obj', debug_topics=[], id='ffe037c963ea419b08df56c6ce979cb3842ead72ec700218a67f59b6099...pattern_roots=[], func=<bound method DebugMixIn.do_debug_get_obj of <borg.archiver.Archiver object at 0x7f30a2c4e7d0>>)
repository = <Repository proto='rest', user=None, pass=None, host=None, port=None, path='.../popen-gw2/test_debug_format_obj_respects0/repository'>

    @with_repository(manifest=False)
    def do_debug_get_obj(self, args, repository):
        """Gets object contents from the repository and writes them to a file."""
        hex_id = args.id
        try:
            id = hex_to_bin(hex_id, length=32)
        except ValueError as err:
            raise CommandError(f"object id {hex_id} is invalid [{str(err)}].")
        try:
            data = repository.get(id)
        except Repository.ObjectNotFound:
>           raise RTError("object %s not found." % hex_id)
E           borg.helpers.errors.RTError: Runtime error: object ffe037c963ea419b08df56c6ce979cb3842ead72ec700218a67f59b6099b4a6e not found.

.../borg/archiver/debug_cmd.py:211: RTError

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from 0e81162 to 2393263 Compare June 9, 2026 17:22
@ThomasWaldmann

Copy link
Copy Markdown
Member

From the PR comment:

The read_data=False path clamps both load sizes to obj_size when set. Right now with N=1 packs this changes nothing (one chunk per file), but once N>1 packs land an unclamped size would overshoot into the next chunk,
so the clamp goes in now.

This is rather confusing.

For the read_data=False path, it first reads 1KB (assuming that this usually contains the header and all the metadata), an overshoot into next object is no problem as the parse_meta function will only read the metadata using the correct length from the header and will ignore the trailing bytes.

If meta_size in the header tells that we did not read enough data, we do a 2nd attempt, this time with exactly to correct size for what we need.

The whole point of doing it like this is to avoid just reading the header (few bytes) and then having to do another read for just the few bytes of metadata. 2x latency.

@mr-raj12 mr-raj12 changed the title repository: add obj_offset/obj_size range-load params to get() repository: add obj_offset/obj_size range-load params to Store.load calls Jun 9, 2026
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch 2 times, most recently from e4b4f47 to a0ceb9c Compare June 10, 2026 06:07

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the way to go.

  • it uses a ton of additional memory by using less memory-efficient python data structures instead of using the highly efficient ChunksIndex.
  • the _pack_info dict does not know anything about chunks written in previous sessions. so if you make a backup and later try a restore, it won't do anything.

@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from a0ceb9c to 5cca46e Compare June 10, 2026 10:46

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always have and use self._chunks.

If you would use it in .put() and update it immediately with (UNKNOWN_BYTES32, size, offset), you could enhance deduplication.

Just imagine a file with long ranges of zeros, resulting in a lot of same-size all-zero chunks. If you immediately update the chunks index in each .put call, it won't store all these chunks, but only the first one and after that it will know that it already has that chunk. At the end, just update the pack IDs.

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
@mr-raj12 mr-raj12 changed the title repository: add obj_offset/obj_size range-load params to Store.load calls repository: ChunkIndex-based pack routing with range-load support Jun 10, 2026
@mr-raj12 mr-raj12 requested a review from ThomasWaldmann June 10, 2026 17:37
Comment thread src/borg/testsuite/repository_test.py Outdated

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, that looks much better already. some ideas.

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py Outdated
@ThomasWaldmann

Copy link
Copy Markdown
Member

OK, as expected, quite some tests get broken when using sha256 pack ids from the index (but using them from the index is the end goal).

Maybe add that env var and extra CI job as I mentioned in one of my previous reviews.

Comment thread src/borg/repository.py
@mr-raj12 mr-raj12 requested a review from ThomasWaldmann June 11, 2026 04:43
mr-raj12 added 6 commits June 11, 2026 11:10
…alls

retry_size min() guards against corrupted meta_size, no-op for healthy objects.
…refs borgbackup#8572

Replace _pack_info (session-scoped dict) with a borrowed ChunkIndex reference.
Cache passes its index via set_chunk_index(); get() routes correctly for all sessions.
…dles update_pack_info

Remove obj_offset/obj_size params from get(); always initialize _chunks to an
empty ChunkIndex so callers never need to guard for None.
… put()

get() raises ObjectNotFound when entry is missing or UNKNOWN_BYTES32; put()
marks the id in _chunks immediately so the index is live after each write.
On add(), marks chunk with UNKNOWN_BYTES32; on flush(), replaces with real
pack_id. put(), flush(), and set_chunk_index() simplified accordingly.
… at N=1

Adds tox env and informational CI job (continue-on-error) to track progress toward full sha256 pack_id adoption, refs borgbackup#8572
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from 6e81d2e to 6fbc8cb Compare June 11, 2026 05:47

@ThomasWaldmann ThomasWaldmann left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some feedback

Comment thread src/borg/repository.py Outdated
Comment thread src/borg/repository.py
Comment thread src/borg/repository.py
Comment thread src/borg/repository.py Outdated
…() through ChunkIndex

PackWriter now always owns a ChunkIndex; the N=1 fallback in get() is removed.
Comment thread src/borg/repository.py Outdated
Remove overlay loop: put() accesses self.chunks first so PackWriter.chunks is updated before any write.
@ThomasWaldmann

ThomasWaldmann commented Jun 11, 2026

Copy link
Copy Markdown
Member

You could change that informational sha256 packids CI job so it does not get cancelled by a failing other CI job.

Comment thread pyproject.toml
…currency to sha256 job

env var was wiped before every test; sha256 job now gets its own concurrency group so it is not cancelled mid-run
@mr-raj12 mr-raj12 force-pushed the pack-files-step6-range-load branch from 3663d3c to 4837278 Compare June 11, 2026 13:53
@mr-raj12 mr-raj12 requested a review from ThomasWaldmann June 11, 2026 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants