Make async functions mappable #493

dwskoog · 2025-12-19T00:07:35Z

Creates new map_async API on Stream and updates the documentation around async work to note it.

I tried map with a coroutine and it failed spectacularly:

    @gen_test()
    def test_map_async_tornado():
        @gen.coroutine
        def add_tor(x=0, y=0):
            return x + y

        source = Stream(asynchronous=True)
        L = source.map(add_tor, y=1).map(add_tor, y=2).sink_to_list()

        yield source.emit(0)

        yield gen.moment  # yield to the event loop to ensure it finished
>       assert L == [3]
E       assert [<Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")>] == [3]
E
E         At index 0 diff: <Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")> != 3
E
E         Full diff:
E           [
E         -     3,
E         +     <Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")>,
E           ]

So I made a new map_async that uses native asyncio plumbing to await the coroutine before feeding it downstream.

The distinguishing of map and map_async was inspired by prior work with mapAsync from Akka Streams as the use of an asyncio.Queue and a callback running on the loop to drain it seems like a decent amount of overhead to avoid when we do not need it.

I'm still trying to figure out if inspect would work to adapt dynamically given either a native async def or a Tornado gen.coroutine. If so, diverging the API would not be needed.

dwskoog · 2025-12-19T00:10:19Z

streamz/tests/test_core.py

+@gen_test()
+def test_map_async_tornado():
+    @gen.coroutine
+    def add_tor(x=0, y=0):
+        return x + y
+
+    async def add_native(x=0, y=0):
+        return x + y
+
+    source = Stream(asynchronous=True)
+    L = source.map_async(add_tor, y=1).map_async(add_native, y=2).sink_to_list()
+
+    yield source.emit(0)
+
+    yield gen.moment  # Must yield to the event loop to ensure it finished
+    assert L == [3]
+
+
+@pytest.mark.asyncio
+async def test_map_async():


I'm not a Tornado user so I still do not really understand the implication of using it vs native asyncio so I wrote the test twice to show that either harness will run coroutines from each other.

dwskoog · 2025-12-19T00:18:07Z

streamz/dataframe/tests/test_dataframes.py

    assert_eq(pd.concat(L), expected)


+@flaky(max_runs=3, min_passes=1)


I noticed this one leaving an upstream around on occasion. It might be related to GC changes in 3.13+.

martindurant · 2025-12-19T20:53:05Z

I believe there are other places in the code where iscoroutinefunction (or isawaitable ?) to decide what to do? It could be argued that, from the current state of the package, all nodes should be asynchronous...

dwskoog · 2025-12-19T21:40:47Z

I believe there are other places in the code where iscoroutinefunction (or isawaitable ?) to decide what to do? It could be argued that, from the current state of the package, all nodes should be asynchronous...

Things I learned about Tornado and asyncio:

gen.isawaitable is inspect.isawaitable
gen.is_coroutine_function and inspect.iscoroutinefunction are completely different
gen.sleep and asyncio.sleep would induce wildly different sleep times in total
gen.coroutine was turning work into asyncio.Future results fairly aggressively so creating Task objects out of native coroutines and Tornado coroutines takes some consideration.

I ended up figuring out a way to mimic Akka Streams's parallelism parameter for mapAsync which warrants keeping map and map_async distinct as now the user is making a meaningful design decision about how wide to open up the parallel evaluation for mapping the stream elements.

I have definitely run into problems in stream processing when the mapping function is heavy-weight and the system was creating a strong happens-before relationship between mapping successive elements so I wanted to give an option to make that a weaker condition than forcing a total ordering on each await.

martindurant · 2025-12-20T18:42:16Z

Akka Streams's parallelism parameter for mapAsync which warrants keeping map and map_async distinct as now the user is making a meaningful design decision

You mean the use of gather?

I am wondering whether we should be adding any further tornado-specific functionality at this points. If people make their async functions with gen, maybe that's just wrong in 2025...

dwskoog · 2025-12-22T14:45:48Z

Akka Streams's parallelism parameter for mapAsync which warrants keeping map and map_async distinct as now the user is making a meaningful design decision

You mean the use of gather?

In Akka Streams, the API is designed such that map immediately transforms the element and passes it downstream. There is no design consideration around the cost of the transform (JVM threads and concurrency primitives ensure that if the function takes a substantial amount of time, that backpressure will slow the upstream). The mapAsync API, on the other hand, requires the caller to specify a parallelism parameter: Stream.mapAsync(parallelism: int)(f: (Out) => Future[T]) to factor in how slowly the mapped function runs. The transforms guarantees that Departure Order of the mapped f(x) elements corresponds exactly to the Arrival Order of the input elements x and that up to parallelism applications of f are running in parallel while awaiting the resulting Futures.

In a prior job, we had a problem where one of our enrichment functions used mapAsync and the lookup service we called had a substantial slow down in response time. We had a few days where the entire dataflow was afflicted with that because of the backpressure it induced. When we finally tracked it down, solving it was a matter of making the parallelism of that one mapAsync wider generally, and then scaling the processing of that step dynamically based on the health of that lookup service (Akka has a mechanism for creating multiple copies of a transform while ensuring the order of elements across all copies using the underlying actors).

I am wondering whether we should be adding any further tornado-specific functionality at this points. If people make their async functions with gen, maybe that's just wrong in 2025...

In this case, I was able to support Tornado coroutines without making them a primary design decision. I made the docstring examples for map_async entirely native asyncio as well as the example in the async.rst file. Given the prevalence of Tornado through the rest of the existing plumbing and documentation, it seemed advisable to tolerate it here.

I tried map with a coroutine and it failed spectacularly: ``` @gen_test() def test_map_async_tornado(): @gen.coroutine def add_tor(x=0, y=0): return x + y source = Stream(asynchronous=True) L = source.map(add_tor, y=1).map(add_tor, y=2).sink_to_list() yield source.emit(0) yield gen.moment # yield to the event loop to ensure it finished > assert L == [3] E assert [<Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")>] == [3] E E At index 0 diff: <Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")> != 3 E E Full diff: E [ E - 3, E + <Future finished exception=TypeError("unsupported operand type(s) for +: '_asyncio.Future' and 'int'")>, E ] ``` So I made a new `map_async` that uses native asyncio plumbing to await the coroutine before feeding it downstream.

The background task can't return obviously if we want the stream to continue operating.

Use an asyncio.Queue of the tasks to ensure that arrival and departure order of elements match. Asserts back pressure when a new value arrives via update but the work queue is full. Because asyncio.Queue cannot peak, the parallelism factor is not precise as the worker callback can have either zero or one task in hand but it must free up a slot in the queue to do so. Under pressure, the parallelism will generally be `(parallelism + 1)` instead of `parallelism` as given in the `__init__` as one Future will be in the awaited in the worker callback while the queue fills up from update calls.

martindurant

After (too much) thinking about this, I don't see an obvious way to simplify things. It ends up more complicated than I would have thought, because of passing the queue around - but the user doesn't need to know anything about that.

martindurant · 2026-01-15T13:44:08Z

streamz/core.py

+                result = await task
+            except Exception as e:
+                logger.exception(e)
+                raise


This is the only way to exit the loop. There should probably be a stop() method, no?

timed_window, timed_window_unique, delay, buffer, and latest all use the same while True: ... construct for their work callback.

Fun Fact: the event loop itself only holds a weak reference to any task so when the enclosing node is GCed, the underlying task can be swept away as long as it is not currently running. Once the queue starves it will get stuck waiting on an item that will never come and never schedule back in.

https://docs.python.org/3.14/library/asyncio-task.html#asyncio.create_task

I'm not actually sure that raising the exception is correct as it will kill the worker task and clog the stream. map raises the exception from update which should blow up the entire stream directly, right?

martindurant · 2026-01-15T13:45:35Z

streamz/core.py

+                self._release_refs(metadata)
+
+    async def _wait_for_work_slot(self):
+        while self.work_queue.full():


I was worried this would end up a busy loop eating the CPU - but if the queue is full, there must be coroutines waiting, so the sleep below will always yiel;d the loop to something else, right? I think, then, that this is fine.

Yes, await asyncio.sleep(0) is the defined way to yield the loop:
https://docs.python.org/3.14/library/asyncio-task.html#asyncio.sleep

Sleep always yields so any other tasks on the loop have priority for next schedule slot, not just the ones held in the work queue here, but we are guaranteed to have at least one of those since the queue is full and we have the work callback. If the work items are all long running and they are blocked (say on IO) and the queue is full then back pressure will propagate upsteam via the Task enclosing _insert_job which cannot progress until this loop exits and eventually the only task on the loop can make progress is technically this one, but it will always immediately yield the loop so as soon as any other task can make progress, that task will take the loop.

martindurant · 2026-01-15T13:46:08Z

streamz/core.py

+            await self._wait_for_work_slot()
+            coro = self.func(x, *self.args, **self.kwargs)
+            task = self._create_task(coro)
+            await self.work_queue.put((task, metadata))


Is a race possible with the await here?

Not within the semantics of traditional interpreter. In free-threaded mode, maybe? The asyncio Queue is not thread safe but within an event loop (which must run entirely within a single thread) the get/put pair will not yield the event loop until the internal state of the Queue is consistent and they have achieved the requested action. If they could not complete the action in the current state, they block themselves on a Future that can only complete once the complementary action resolves entirely and once that Future comes back with a result, they do not yield the loop until they are done modifying the internal deque.

dwskoog commented Dec 19, 2025

View reviewed changes

dwskoog force-pushed the map_async branch from 94d029a to f2d3dc5 Compare December 19, 2025 00:16

dwskoog commented Dec 19, 2025

View reviewed changes

dwskoog force-pushed the map_async branch 2 times, most recently from 781e691 to c102bbd Compare December 19, 2025 16:42

dwskoog added 8 commits December 23, 2025 10:25

Add map_async to api.rst

1683bfc

Add documentation on map_async

aa122bd

Use await_for idiom to make tests declarative

5c24950

Mark the dataframe GC test as flaky

39bcbd8

Fix the callback shutting itself off

2506a92

The background task can't return obviously if we want the stream to continue operating.

Take a hint from buffer about backpressure

34509c7

dwskoog force-pushed the map_async branch from ea5139c to ea57a59 Compare December 23, 2025 15:25

martindurant mentioned this pull request Jan 9, 2026

Refresh examples to ensure that they run in current version #494

Open

martindurant reviewed Jan 15, 2026

View reviewed changes

		assert_eq(pd.concat(L), expected)


		@flaky(max_runs=3, min_passes=1)

Make async functions mappable #493

Are you sure you want to change the base?

Make async functions mappable #493

Conversation

dwskoog commented Dec 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

martindurant commented Dec 19, 2025

Uh oh!

dwskoog commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martindurant commented Dec 20, 2025

Uh oh!

dwskoog commented Dec 22, 2025

Uh oh!

martindurant left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dwskoog commented Dec 19, 2025 •

edited

Loading