Skip to content

DB v4 progress#2385

Closed
ljeub-pometry wants to merge 447 commits into
masterfrom
db_v4
Closed

DB v4 progress#2385
ljeub-pometry wants to merge 447 commits into
masterfrom
db_v4

Conversation

@ljeub-pometry
Copy link
Copy Markdown
Collaborator

What changes were proposed in this pull request?

Progress towards the new version of the underlying storage

Why are the changes needed?

Does this PR introduce any user-facing change? If yes is this documented?

How was this patch tested?

Are there any further changes required?

fabubaker and others added 30 commits October 22, 2025 15:43
ljeub-pometry and others added 28 commits May 6, 2026 16:53
* make sure we cancel all tasks when the running server is dropped

* update optd

* add domain for NodeOp

* avoid unnecessarily re-filtering the domain when it is correct

* changes to better support Bn edge sized graphs

fixing last compile error

track count temporal edges

* remove accidental pyo3 import

* small import updates

* should call list_filtered in nodes

* const_value_in_domain should be the same as const_value by default

* possible improvements to UI for very large graphs

* still need to check that the edge exists in the layer, even if we have the edge ref already

* no optimisation in with_debug as they make debugging more annoying

* filtering by node is really bad for window so change this back

* fix materialize double-adding temporal edges

* for a persistent graph the update history and properties for exploded edges are not the same

* need to look at explode() for history on persistent graphs

* attempt at faster node_valid

* include updates from static graph in node_valid check for layers

* cleanup

* fix search feature

* make component test easier to debug on failure

* add our own union find implementation based on the old connected components algorithm (maybe can be optimised but at least it seems correct)

* clean up dependencies

* storage dependency is definitely used

* avoid compiling the vectors feature in benchmarks unless it is actually needed

* implement has_layer_inner directly

* optimise last for filtered additions

* add fast path for getting edge ref out again

* attempt to optimise SVM

* use optimised active check

* some inlines

* minimise the size of the MemEdgeRef while still including src/dst information

* add src/dst to MemEdgeEntry as well

* remove sorted_vector_map dependency and clean up

* no real reason to capture src/dst on the MemEdgeRef/MemEdgeEntry as these should be cheap to look up

* fix subgraph filtering

* chore: apply tidy-public auto-fixes

* more optimisations for windowing

* cleanup

* remove dbg

* when working with disk storage, in-memory references don't always exist

* minor cleanup

* bring num_nodes up to speed

* more fixes for layered graphs

* replace some kmerge with fast_merge

* more optimisations for windowing

* add check for filtering that excludes layer

* make list properties always return numpy arrays

---------

Co-authored-by: Ben Steer <b.a.steer@qmul.ac.uk>
Co-authored-by: Fabian Murariu <murariu.fabian@gmail.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Test node types are served correctly by the server

* Run fmt

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add read only version of graph to allow python access
Add explicit flush for graph
Add fix for metadata in namespace

* tidy

* tidy

* Read only graph

* Test metadata

* chore: apply tidy-public auto-fixes

* Patch the cache

* read only index

* Adding tests for metadata segments

* added new tests

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add read only version of graph to allow python access
Add explicit flush for graph
Add fix for metadata in namespace

* tidy

* tidy

* Read only graph

* Test metadata

* chore: apply tidy-public auto-fixes

* Patch the cache

* read only index

* Adding tests for metadata segments

* added new tests

* chore: apply tidy-public auto-fixes

* Fixes for check metadata

* Function names

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* move deletion flag to edge id

* clean up more python tests that were node-order dependent

* clean up a lot of warnings

* add num_updates computation

* clean up repr implementations and add repr for OptionalEventTime

* try and fix the slow materialise on small graphs

* surface the error in materialize

* we need to track graph property updates as well

* more fixes for slow materialise

* add num_updates for graph props

* fix the doc tests

* update more repr tests

---------

Co-authored-by: Fabian Murariu <murariu.fabian@gmail.com>
* impl prop redaction

* add graph prop redaction, ref

* bool optimazation, ref

* add test

* review suggestions

* fmt

* chore: apply tidy-public auto-fixes

* Make sure the or implementation for the graphql filters uses the underlying or in raphtory

* move permissions test to pometry-storage

* ref

* Fix temporal property redaction in materialization and expose exclude_*_properties/metadata on GraphViewOps

* ref

* expose get_graph_with_permissions on data

* PropertyRedaction ref

* fix schema redaction, simplify redaction API, and filter node rows() at source

* Push node temporal prop visibility filtering down to storage level in temp_prop_rows

* get prop_ids once

* restore #[graphql(desc)] annotations lost during db_v4 merge

All graphql(desc) annotations from commit a17131e (Ben Steer) were
dropped when resolving the merge conflict in raphtory-graphql/src/model/mod.rs
during the db_v4 merge (9c715d0). This restores them exactly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* move prop_ids collect outside loops in db_tests

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fmt

* logging independently

* ref

* intro PermissionError

* enforce graph access through typed permission methods and replace string errors with PermissionError

* get graphs with read permissions for copy and create subgraph

* add comment

* fmt

* merge from db_v4

* make get_graph private, fix permissions leak

* fix test

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Lucas Jeub <lucas.jeub@pometry.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ownload (#2605)

* pass arguments through to pytest from tox and disable capture for debugging

* make fetch_file atomic

* make the atomic swap work in windows where it can fail if the file already exists and is in use

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add namespace creation/deletion to graphql

Add TestSetup struct, setup_with_graphs, run_mutation, and assert_is_namespace_dir
helpers to mod graphql_test in raphtory-graphql/src/lib.rs for use by namespace tests
in later tasks.
Previously called validate_path_for_insert which created a graph-folder
skeleton + dirty marker on disk and leaked them, so the new namespace
appeared as a MetaGraph. Now uses validate_path_for_namespace_create
plus fs::create_dir_all.
test: createNamespace creates nested directories
test: createNamespace rejects path of existing graph
test: createNamespace rejects path of existing namespace
test: createNamespace rejects invalid paths
test: tighten createNamespace existing-namespace error check
test: add FakePolicy and setup_with_policy helpers
test: createNamespace denied without parent write
test: tighten FakePolicy docs and silence dead-code warning
test: deleteNamespace removes empty namespace
test: deleteNamespace removes namespace with children
test: deleteNamespace rejects empty path
test: deleteNamespace rejects non-existent path
test: deleteNamespace denied when descendant graph unwritable
test: deleteNamespace invalidates cached graphs
test: clarify deleteNamespace denied-test comments
feat(graphql): deleteNamespace infrastructure
- auth.rs: add is_exclusive_write so deleteNamespace acquires the
  exclusive write lock alongside updateGraph
- namespace.rs: expose current_dir() and relative_path() accessors used
  by Mut::delete_namespace and the data layer

* Mark paths dirty before cache eviction and in create_namespace

* chore: apply tidy-public auto-fixes

* Fix race condition in create_namespace

* Add tests asserting failure due to lack of permissions

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Shivam <4599890+shivamka1@users.noreply.github.com>
* explicit permissions

* fix silent failure in on_graph_created

* skip auto-grant on graph create for admin users

* fmt

* on_graph_created: pass ctx instead of role, delegate identity extraction to policy

* grant_namespace_recursive: accept iterator directly, remove intermediate vec allocations

* fix CI

* simplify on_graph_created error propagation and fix FakePolicy for Option<NamespacePermission>

* remove role extraction from server layer; role logic belongs in auth policy

* expose Namespace/MetaGraph/NamespacedItem as pub; replace enumerate_namespace_descendants with get_namespace

* fix comment
Update UI to v0.3.0

Co-authored-by: Fabian Murariu <2404621+fabianmurariu@users.noreply.github.com>
* remove self dependency in raphtory

* trying to fix the features issues with test-utils

* move most tests outside of raphtory

* is search broken?

* move search tests

* fmt

* fix testutils in graphql

# Conflicts:
#	raphtory-graphql/src/lib.rs

* fix graphql testutils

* rename raphtory-test-utils to raphtory-tests

* remove some useless cfg

* chore: apply tidy-public auto-fixes

* Remove duplicate members from workspace

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…2617)

* remove proto completely as we are no longer planning to support it

* use build-fast for python tests

* cargo.lock

* remove prost from workspace

* make sure tests compile

* clean up the rest of the merge issues
* cache wip

* attempt to make this work with dashmap but it is getting complicated

* implement dirty graph handling using pinning only

* fix python

* remove test of tti-based eviction as it is no longer a thing

* chore: apply tidy-public auto-fixes

* make the server startup work with port=0 and add fallback when the server is started without giving a specific port

* some cleanup

* add function to look up port on server

* make the cli port behaviour consistent

* enable panic on drop errors for tests

* make sure graph is dropped before replacing it

* update tests more tests so they work with arbitrary ports

* make the python tests work even if there is a raphtory server running

* Need to actually use the newly materialized graph and not the old graph when inserting with disk storage enabled

* moving a graph to the same name should be a no-op, not delete the graph

* remove dbg

* make sure the timeout test isn't flaky if the server happens to start quickly

* chore: apply tidy-public auto-fixes

* delete empty tests file

* make sure we don't return the unfiltered graph when only filtered access is available

* get list of nodes and edges by querying the graph on the server to avoid any potential ordering issues in the test

* explicitly control the drop order in test to make sure graph and data are dropped before the directory is deleted

* fix drop order problem in tests that would cause the directory to be deleted before the graph is dropped and enable panic on drop for graphql tests by default

* port=0 doesn't work for embedding server

* refactor replacement and invalidation logic to make it easier to understand and make sure vectorisation is working correctly

* cleanup

* wait for unique reference when dropping graph

* make sure the TempDir is dropped last

* add explicit scopes for temporary directories in tests to avoid errors due to directory being cleaned up too early

* enable panic-on-drop in tests

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* updates to deal with dependabot

* rustls

* New tantivy api

---------

Co-authored-by: miratepuffin <b.a.steer@qmul.ac.uk>
* Add row_size to WriteLockedPropMapper

* Add some docs

* Add some docs

* Setup print debugging

* Some cleanup

* Add test for estimated size on unified types

* Update row_size after unifying types

* Add tests for WriteLockedPropMapper

* Run fmt

---------

Co-authored-by: Lucas Jeub <lucas.jeub@pometry.com>
* Added update of .meta file when on flush. It should update node and edge counts in this file

* Updated refresh of .meta file to not recompute anything other than node and edge counts. .meta file also refreshed when graph goes out of scope

* Move .meta file update logic from Raphtory crate into pometry-storage crate (db4-disk-storage). Move tests over as well.

* Move GRAPH_META_PATH constant from Raphtory to storage (both db4-storage and db4-disk-storage).
* don't set port in tests

* more ports to remove

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* added a cache across record batches when loading edges

# Conflicts:
#	Cargo.toml

* added global node cache on edge loading

* update node cache on edge loading

* update clam-core

* remove cache stats prints

* chore: apply tidy-public auto-fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@fabianmurariu
Copy link
Copy Markdown
Collaborator

Not yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.