feat!: UTA access class separation of concerns, postgres lib fixes by jsstevenson · Pull Request #465 · GenomicMedLab/cool-seq-tool

jsstevenson · 2026-04-02T00:29:54Z

Basically a lot of tech debt fixes

Migrate from asyncpg to psycopg. Honestly, asyncpg might be faster but we are generally using psycopg elsewhere so might as well stick with one thing.
(breaking) break the DB access class into 2 things: 1) a repository pattern-style lookup class that just lasts as long as a DB connection, and 2) a connection pool-holder class for easily launching repository instances.
- This means that applications which want to have more direct control over the lifespan of the connection pool can do so directly, and separates the connection-management logistics from the actual query/data transformation logic.
- Mark the "get AWS secrets" function as deprecated. In the back of my head, I felt like we were trying to move away from secrets manager for DB connection params, but maybe I made that up. Either way, it's not ideal to be writing code specific to our AWS deployment into a library, so I threw down a deprecation warning and maybe we can figure out something better in 10 years
- Previously we were basically doing a connection pool status check every time we ran a query, and (re)creating the pool if it was gone. It's a little funky to be delaying pool setup/config until runtime like this. I included a child class of the new UtaDatabase class that basically retains this behavior so we can minimize needed changes in the short term (+ less work for me to update tests)
(breaking) set schema in the way you are supposed to per postgres docs: ie postgresql://uta_admin@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public rather than /uta/uta_20241220
- the postgresql://stuff/(db_name)/(schema_name) thing is nonstandard (I think it was basically invented for the hgvs library) and not in keeping with more typical postgres access patterns. In general, there are two standards for describing postgres connections -- as a key-value string and as a URI. So, I think if we're gonna have users go with the URI way, we should adhere to the standard.
- env var name stays the same, but the old value won't work any more (it'll give you the wrong search path) and the default value is new
- edit: I thought about it and it would probably be easy to retain backwards compatibility with the old URL format and just declare it deprecated or something, might do that instead
Don't hardcode schema into queries anymore. Instead, set the search path when creating the connection. (Still assume uta_20241220 by default, but this is set at connection pool creation via the connection string)
Don't hard-code variables into queries anymore, but pass them as params using the psycopg %(argname)s syntax
Except for 1-2 exceptions that were too hard, make all queries static instead of dynamically constructing them
Allow the public UTA query execution function to accept params as an additional arg
Change one UTA DB function that returned a (result, failure_description) tuple where the result and the failure values were mutually exclusive into a function that returned the result value, and raised an exception in case of failure.
Refactor the genomic table initializer method, and also make it optional for the pool factory function to attempt creation of the genomic table (I think this addresses UTA database recreates schema on startup #430)

basically I think the ideal FastAPI usage for just UTA should look like

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncGenerator:
    uta_pool = await create_uta_connection_pool()
    app.state.uta_pool = uta_pool
    yield
    await uta_pool.close()

async def get_uta(request: Request) -> Generator(UtaRepository, None, None):
    async with request.app.state.uta_pool.connection() as conn:
        yield UtaRepository(conn)

@app.get("/check_exists")
async def check_exists(gene: str, uta: Annotated[UtaRepository, Depends(get_uta)]):
    return await uta.gene_exists(gene)

and within coolseqtool itself, or in other contexts where you might want to pass around the entire coolseqtool god class instance, you can do something like

uta_db: UtaDatabase  # assume this exists
async with uta_db.repository() as uta:
    print(await uta.gene_exists("BRAF"))

A few misc engineering notes I thought about while working on this

Yes, we want to be using connection pools vs plain connections in basically all contexts. The pool more efficiently recycles connections, so even in contexts where everything is sequential, there's still a speedup
We probably want to stay async. It would be a pain to move back. Maybe in the future we can move the rest of the library to async/dial down all the blocking file open calls.

korikuzma · 2026-04-02T12:10:44Z

Will review this tomorrow

korikuzma

Great work, minor things

korikuzma · 2026-04-02T19:45:00Z

src/cool_seq_tool/sources/uta_database.py

+    return f"postgresql://{username}{':' + password if password else ''}@{host}:{port}/{database}?options=-csearch_path%3D{schema},public"
+
+
+DEFAULT_UTA_DB_URL = "postgresql://uta_admin@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"


Trying this out with docker install, needed to set to this. I think since we are expecting people to use our docker instructions, we should update

Suggested change

DEFAULT_UTA_DB_URL = "postgresql://uta_admin@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"

DEFAULT_UTA_DB_URL = "postgresql://anonymous@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"

Looks like usage was updated correctly, but forgot to update here

korikuzma · 2026-04-02T19:49:54Z

src/cool_seq_tool/sources/uta_database.py

+    if initialize_genomic_table:
+        try:
+            async with pool.connection() as conn:
+                await UtaRepository(conn).create_genomic_table()
+        # catch all exceptions -- this is probably a critical error, it's good to
+        # close the pool first
+        except:
+            await pool.close()
+            raise


So I think we can actually remove this since Docker compose handles this. Or we can have the default set to False

Suggested change

if initialize_genomic_table:

try:

async with pool.connection() as conn:

await UtaRepository(conn).create_genomic_table()

# catch all exceptions -- this is probably a critical error, it's good to

# close the pool first

except:

await pool.close()

raise

korikuzma · 2026-04-02T19:52:59Z

tests/sources/test_uta_database.py

+    resp = await uta_repo.get_alt_ac_start_or_end("NM_152263.3", 822, 892, None)
+    assert resp == tpm3_1_8_end_genomic
+
+    with pytest.raises(NoMatchingAlignmentError):


Consistency?

Suggested change

with pytest.raises(NoMatchingAlignmentError):

with pytest.raises(

NoMatchingAlignmentError,

match=re.escape(

"Unable to find a result where NM_152263.63 has transcript coordinates (tx_exon_start=822, tx_exon_end=892) between an exon's start and end coordinates on gene=None"

),

):

korikuzma · 2026-04-02T19:54:38Z

tests/sources/test_uta_database.py

-async def test_data_from_result(test_db, tx_exon_aln_data, data_from_result):
-    """Test that data_from_result works correctly."""
-    resp = test_db.data_from_result(tx_exon_aln_data)
-    assert resp == data_from_result


Should this be added back? If not, remove the data_from_result test fixture

src/cool_seq_tool/sources/uta_database.py

Co-authored-by: Kori Kuzma <korikuzma@gmail.com>

jsstevenson added 8 commits March 30, 2026 09:59

stash

c308acc

update usage elsewhere in the library

fef0ccc

minor tweaks to factory method

4154d16

add todo

f386b30

fix up query api issues

b0a8445

cleanup eventual error msg

d7606e9

ope

bbc4dcd

update deps

5272243

jsstevenson added the priority:medium Medium priority label Apr 2, 2026

jsstevenson added 3 commits April 1, 2026 20:55

refactor genomic table initializer + make it optional

3f899ec

Merge branch 'main' into uta-refactor

4cf205e

this doesnt need to be function-scoped

14c4d12

jsstevenson marked this pull request as ready for review April 2, 2026 01:14

jsstevenson requested a review from a team as a code owner April 2, 2026 01:14

korikuzma mentioned this pull request Apr 2, 2026

feat: add gene-based search biocommons/anyvar#416

Draft

korikuzma requested changes Apr 2, 2026

View reviewed changes

Update src/cool_seq_tool/sources/uta_database.py

075b775

Co-authored-by: Kori Kuzma <korikuzma@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: UTA access class separation of concerns, postgres lib fixes#465

feat!: UTA access class separation of concerns, postgres lib fixes#465
jsstevenson wants to merge 12 commits intomainfrom
uta-refactor

jsstevenson commented Apr 2, 2026 •

edited

Loading

Uh oh!

korikuzma commented Apr 2, 2026

Uh oh!

korikuzma left a comment

Uh oh!

korikuzma Apr 2, 2026

Uh oh!

korikuzma Apr 2, 2026

Uh oh!

korikuzma Apr 2, 2026

Uh oh!

korikuzma Apr 2, 2026

Uh oh!

korikuzma Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return f"postgresql://{username}{':' + password if password else ''}@{host}:{port}/{database}?options=-csearch_path%3D{schema},public"


		DEFAULT_UTA_DB_URL = "postgresql://uta_admin@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"

	DEFAULT_UTA_DB_URL = "postgresql://uta_admin@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"
	DEFAULT_UTA_DB_URL = "postgresql://anonymous@localhost:5432/uta?options=-csearch_path%3Duta_20241220,public"

-    with pytest.raises(NoMatchingAlignmentError):
+    with pytest.raises(
+        NoMatchingAlignmentError,
+        match=re.escape(
+            "Unable to find a result where NM_152263.63 has transcript coordinates (tx_exon_start=822, tx_exon_end=892) between an exon's start and end coordinates on gene=None"
+        ),
+    ):

Conversation

jsstevenson commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

korikuzma commented Apr 2, 2026

Uh oh!

korikuzma left a comment

Choose a reason for hiding this comment

Uh oh!

korikuzma Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

korikuzma Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

korikuzma Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

korikuzma Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

korikuzma Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jsstevenson commented Apr 2, 2026 •

edited

Loading