Skip to content

refactor: use jubilant for main integration tests#1507

Open
imanenami wants to merge 16 commits into16/edgefrom
refactor/16/jubilant
Open

refactor: use jubilant for main integration tests#1507
imanenami wants to merge 16 commits into16/edgefrom
refactor/16/jubilant

Conversation

@imanenami
Copy link
Copy Markdown

@imanenami imanenami commented Mar 5, 2026

Issue

Migration of tests to jubilant. All test files in the tests/integration root folder are migrated. HA & new_relations tests are skipped for now, to see how comfortable we are with this approach...

Solution

Experimented with an Adapter-based approach. Quick and easy. Each test file took 5-10 min. to migrate once the adapters module was mature.

Findings:

  • The model.block_until method of python-libjuju is hard to adapt, the current implementation tries to leverage jubilant.Juju.wait, but comes with a performance penalty.

Checklist

  • I have added or updated any relevant documentation.
  • I have cleaned any remaining cloud resources from my accounts.

@github-actions github-actions bot added the Libraries: OK The charm libs used are OK and in-sync label Mar 5, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.43%. Comparing base (cda19e7) to head (4193ae9).

Additional details and impacted files
@@           Coverage Diff            @@
##           16/edge    #1507   +/-   ##
========================================
  Coverage    70.43%   70.43%           
========================================
  Files           15       15           
  Lines         4282     4282           
  Branches       694      694           
========================================
  Hits          3016     3016           
  Misses        1057     1057           
  Partials       209      209           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@imanenami imanenami added the enhancement New feature, UI change, or workload upgrade label Mar 5, 2026
@imanenami imanenami force-pushed the refactor/16/jubilant branch 6 times, most recently from 5a2b318 to 8ee09c4 Compare March 9, 2026 08:49
@imanenami imanenami changed the title refactor: use jubilant for backup tests refactor: use jubilant for main integration tests Mar 9, 2026
@imanenami imanenami force-pushed the refactor/16/jubilant branch 5 times, most recently from c764515 to cce8959 Compare March 9, 2026 15:12
@imanenami imanenami marked this pull request as ready for review March 10, 2026 09:58
@imanenami imanenami requested a review from a team as a code owner March 10, 2026 09:58
@imanenami imanenami requested review from carlcsaposs-canonical, dragomirp, juju-charm-bot, marceloneppel and taurus-forever and removed request for a team March 10, 2026 09:58
Copy link
Copy Markdown
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is massive, but promising.
LGTM from my side, but some questions first.
The decision here is on @marceloneppel

Comment thread tests/integration/jubilant_helpers.py
Comment thread tests/integration/jubilant_helpers.py
Comment thread tests/integration/jubilant_helpers.py Outdated
@taurus-forever
Copy link
Copy Markdown
Contributor

taurus-forever commented Mar 10, 2026

Also, I have noticed this backups:Restore failed: Unit cannot restore backup as it was not elected the leader unit yet

It looks like we are restoring backup having two units /0 and /1 ... once /1 is a leader the test failed.
The test must scaledown to one unit before restoring backup... is it a bug in test or in Jubilant/fixtures/scale-down? @imanenami , Tnx!

P.S. I do not recall such error in non-Jubilant tests

@marceloneppel
Copy link
Copy Markdown
Member

I suspect the issue @taurus-forever commented about (Unit cannot restore backup as it was not elected the leader unit yet) may be due to a bug in ModelAdapter.block_until (adapters.py:251-257):

def block_until(self, *conditions, timeout=None, wait_period=0.5):
    self._juju.wait(
        lambda status: all(conditions),
        ...
    )

conditions is a tuple of callable objects (due to *conditions packing — see Python docs). The built-in all() checks the truthiness of each element (Python docs), and since function objects are always truthy (Python docs — "By default, an object is considered true")), all(conditions) is always True regardless of what the conditions would actually return.

This would cause juju.wait() to return immediately without ever polling, so the scale-down wait in backup_helpers.py would complete instantly and the restore would run before the second unit is actually removed. The test may still pass sometimes due to incidental latency from Juju API calls and action dispatch giving Patroni enough time to converge, which would explain the flaky behavior.

If that's the case, the fix would be to actually call each condition:

lambda status: all(c() for c in conditions),

@imanenami
Copy link
Copy Markdown
Author

@taurus-forever @marceloneppel

Thanks Alex for spotting that, and special thanks to Marcello for finding the root cause, which I also think is the issue. I'll fix that 👍

@imanenami imanenami force-pushed the refactor/16/jubilant branch from c3cf05b to 1b927b2 Compare March 11, 2026 06:43
@imanenami imanenami marked this pull request as draft March 11, 2026 07:16
@imanenami imanenami marked this pull request as ready for review March 11, 2026 12:40
Comment on lines +1711 to +1715
output = juju.cli("machines")

for line in output.splitlines():
if instance in line:
return line.split()[2]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be fragile, consider:

> juju machines
Machine  State    Address        Inst id        Base          AZ  Message
0        started  10.58.203.143  juju-cb4589-0  ubuntu@25.10  bb  Running
1        down                    pending        ubuntu@26.04      no matching image found
2        pending                 juju-cb4589-2  ubuntu@24.04  bb  Running
3        pending                 juju-cb4589-3  ubuntu@24.04  bb  Container started

If the state is weird like "1", the function returns None and not a string.
If the state is weird like "2", the function returns instance id instead of the ip address.

The "instance in line" check relies on instance ids being sufficiently unique and/or your instance appearing first. I mean if you look for "juju-cb4589-2" you could find "juju-cb4589-20" instead.

What we do in jubilant is juju machines --format json and coerce the output into dataclasses.

In this case, though, why not use juju.status()?
The ip addresses are visible there:

In [11]: sum((m.ip_addresses for m in juju.status().machines.values() if m.hostname == "juju-cb4589-3"), [])
Out[11]: ['10.58.203.230']

If you need ip addresses; or .dns_name if you just need something to connect to (it's actually an ip address for lxd, dunno about other substrates).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I kept the syntax from the ha_tests/helpers.py module here:

async def instance_ip(ops_test: OpsTest, instance: str) -> str:
"""Translate juju instance name to IP.
Args:
ops_test: pytest ops test helper
instance: The name of the instance
Returns:
The (str) IP address of the instance
"""
_, output, _ = await ops_test.juju("machines")
for line in output.splitlines():
if instance in line:
return line.split()[2]

I think it's ok to keep it this way since we only use that in active|idle state...

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In our team, we'd open a separate issue to clean this up later, add a comment with the link in the code and merge this PR in the current state. I'm not sure about your team's rules.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. When you see await ops_test.juju("machines")...

that means the developer had to use the escape hatch (the Juju CLI was not the norm), probably because the juju-rpc-based API didn't provide the field that was needed. It's a good hint that this code needs to be refactored, because the Juju CLI (warpped by Jubilant) is the norm.

I would check these cop-outs carefully and refactor.

Comment thread tests/integration/jubilant_helpers.py
@dimaqq
Copy link
Copy Markdown

dimaqq commented Mar 12, 2026

At a high level

It's an interesting take. I enjoyed reading the PR.

I think it could also be useful, if the idea is to follow up with a series or PRs to dramatically simplify things.

In our experience, Jubilant integration tests should be slightly more concise than pytest-operator-based. I'd expect the extra 2 KLOC this PR adds to eventually get removed.

@benhoyt
Copy link
Copy Markdown

benhoyt commented Mar 12, 2026

This is an interesting idea, and may pay off in the short term, especially if you can reuse adapters.py for multiple charms. However, I'm not a big fan of it, as it's doing a bunch of work only to leave legacy code in place -- now you have to maintain legacy pylibjuju-style code plus the adapters library.

@tonyandrewmeyer has had good success using recent AI models to migrate pytest-operator tests to Jubilant + pytest-jubilant: here is his write-up on Charmhub Discourse. I'd recommend you at least try to reproduce those results to go fully Jubilant-native, without leaving legacy code around. Be sure to use the same approach Tony recommends here, as he tried quite a lot of approaches to get to this result.

@imanenami
Copy link
Copy Markdown
Author

Thanks @benhoyt and @dimaqq for your valuable insights.

Personally, I put this method alongside the AI-driven approach (which could be explored on this repo by interested developers) as possible alternatives for jubilant migration.

I could imagine different scenarios where the Adapter approach or AI approach might be the choice. For instance, we have around 15 maintenance-mode charm repos in Kafka alone, which we want them to be testable with Juju 4, but don't want to put a lot of time into migration... This adapter-based approach works best in that scenario. But for actively developed repos, maybe the choice would be different depending on the product manager priorities and developer time available :)

@benhoyt
Copy link
Copy Markdown

benhoyt commented Mar 12, 2026

I could imagine different scenarios where the Adapter approach or AI approach might be the choice. For instance, we have around 15 maintenance-mode charm repos in Kafka alone, which we want them to be testable with Juju 4, but don't want to put a lot of time into migration... This adapter-based approach works best in that scenario. But for actively developed repos, maybe the choice would be different depending on the product manager priorities and developer time available :)

Yep, that's fair, thanks.

@imanenami imanenami dismissed stale reviews from dragomirp and marceloneppel via a51998f March 13, 2026 05:51
@imanenami imanenami force-pushed the refactor/16/jubilant branch from fb579ea to a51998f Compare March 13, 2026 05:51
@imanenami
Copy link
Copy Markdown
Author

imanenami commented Mar 13, 2026

@marceloneppel @dragomirp

Sorry for making the reviews stale, I just had to rebase and sign the commits, and did some final, small tidy ups in this commit: 4193ae9

I'd also like to thank all the reviewers who provided valuable input.

No need to mention that the decision to merge this or not is completely on the repo's maintainers, specially @marceloneppel. I have plans to make the migration full and remove pytest-operator dependency altogether in a follow-up PR (The same way I did in this Kafka PR) so I'd highly appreciate if we keep the branch and not delete it ❤️

Copy link
Copy Markdown
Member

@marceloneppel marceloneppel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this, @imanenami! I've re-reviewed after the rebase, and the changes look good. Approving. Looking forward to the follow-up PR to go fully Jubilant-native, like the Kafka one! If the branch gets auto-deleted after the merge, we can restore it.

Copy link
Copy Markdown
Contributor

@taurus-forever taurus-forever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's give it a try as 1) it works 2) build more experience and confidence in Jubilant 3) can help us merge PG14+PG16 tests.

We are open for AI tests rewrite to native Jubilant. @tonyandrewmeyer can you help us make it success story?

@tonyandrewmeyer
Copy link
Copy Markdown
Contributor

We are open for AI tests rewrite to native Jubilant. @tonyandrewmeyer can you help us make it success story?

Happy to help. I'll take a closer look tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature, UI change, or workload upgrade Libraries: OK The charm libs used are OK and in-sync

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants