carpentries-incubator · JostMigenda · Mar 17, 2026 · Mar 16, 2026 · Mar 16, 2026 · Mar 16, 2026
diff --git a/episodes/optimisation-latency.md b/episodes/optimisation-latency.md
@@ -129,30 +129,32 @@ downloaded_files = []
 def sequentialDownload():
     for mass in range(10, 20):
         url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
-        f = download_file(url, f"seq_{mass}.h5")
-        downloaded_files.append(f)
+        local_filename = f"seq_{mass}.h5"
+        try:
+            f = download_file(url, local_filename)
+            downloaded_files.append(f)
+        except Exception:
+            print(f"Downloading {local_filename} failed.")
 
 def parallelDownload():
     # Initialise a pool of 6 threads to share the workload
-    pool = ThreadPoolExecutor(max_workers=6)
-    jobs = []
-    # Submit each download to be executed by the thread pool
-    for mass in range(10, 20):
-        url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
-        local_filename = f"par_{mass}.h5"
-        jobs.append(pool.submit(download_file, url, local_filename))
-
-    # Collect the results (and errors) as the jobs are completed
-    for result in as_completed(jobs):        
-        if result.exception() is None:
-            # handle return values of the parallelised function
-            f = result.result()
-            downloaded_files.append(f)
-        else:
-            # handle errors
-            print(result.exception())
-
-    pool.shutdown(wait=False)
+    with ThreadPoolExecutor(max_workers=6) as pool:
+        jobs = {}
+        for mass in range(10, 20):
+            url = f"https://github.com/SNEWS2/snewpy-models-ccsn/raw/refs/heads/main/models/Warren_2020/stir_a1.23/stir_multimessenger_a1.23_m{mass}.0.h5"
+            local_filename = f"par_{mass}.h5"
+            # Submit each download to be executed by the thread pool
+            job = pool.submit(download_file, url, local_filename)
+            jobs[job] = local_filename
+
+        # Collect the results (and errors) as the jobs are completed
+        for job in as_completed(jobs):        
+            if job.exception() is None:
+                # return value of the executed function is available as job.result()
+                downloaded_files.append(job.result())
+            else:
+                # handle errors
+                print(f"Downloading {jobs[job]} failed.")
 
 
 print(f"sequentialDownload: {timeit(sequentialDownload, globals=globals(), number=1):.3f} s")

diff --git a/episodes/optimisation-numpy.md b/episodes/optimisation-numpy.md
@@ -383,37 +383,37 @@ Pandas allows its own functions to be applied to rows in many cases by passing `
 
 ```python
 from timeit import timeit
-import pandas
-import numpy
+import pandas as pd
+import numpy as np
 
 N = 100_000  # Number of rows in DataFrame
 
 def genDataFrame():
-    numpy.random.seed(12)  # Ensure each dataframe is identical
-    return pandas.DataFrame(
+    np.random.seed(12)  # Ensure each dataframe is identical
+    return pd.DataFrame(
     {
-        "f_vertical": numpy.random.random(size=N),
-        "f_horizontal": numpy.random.random(size=N),
+        "length": np.random.random(size=N),
+        "width": np.random.random(size=N),
         # todo some spurious columns
     })
 
 def pythagoras(row):
-    return (row["f_vertical"]**2 + row["f_horizontal"]**2)**0.5
+    return (row["length"]**2 + row["width"]**2)**0.5
 
 def for_range():
     rtn = []
     df = genDataFrame()
     for row_idx in range(df.shape[0]):
         row = df.iloc[row_idx]
         rtn.append(pythagoras(row))
-    return pandas.Series(rtn)
+    return pd.Series(rtn)
 
 def for_iterrows():
     rtn = []
     df = genDataFrame()
     for row_idx, row in df.iterrows():
         rtn.append(pythagoras(row))
-    return pandas.Series(rtn)
+    return pd.Series(rtn)
 
 def pandas_apply():
     df = genDataFrame()
@@ -439,18 +439,18 @@ However, rows don't exist in memory as arrays (columns do!), so `apply()` does n
 
 ::::::::::::::::::::::::::::::::::::: challenge
 
-We can extract the individual columns of the data frame. These are of the type `pandas.Series`, which supports array broadcasting, just like a NumPy array.
+We can extract the individual columns of the data frame. These are of the type `pd.Series`, which supports array broadcasting, just like a NumPy array.
 Instead of using the `pythagoras(row)` function, can you write a vectorised version of this calculation?
 
 ```python
 def vectorize():
     df = genDataFrame()
-    vertical = df["f_vertical"]
-    horizontal = df["f_horizontal"]
+    length = df["length"]
+    width = df["width"]
 
     result = ...  # Your code goes here
 
-    return pandas.Series(result)
+    return pd.Series(result)
 ```
 
 Once you’ve done that, measure your performance by running
@@ -475,15 +475,22 @@ print(ar + ar)  # array([2, 4, 6])
 
 :::::::::::::::::::::::: solution
 
+We start with the original implementation of the `pythagoras()` function:
+```python
+(row["length"]**2 + row["width"]**2)**0.5
+```
+Instead of `row["length"]` and `row["width"]`, which are individual entries in the dataframe, we use the `length` and `width` columns.
+Additionally, we can use NumPy’s `np.sqrt()` function instead of Python’s builtin `**` operator. (This is not strictly necessary, but avoids a bit of performance overhead from mixing the two worlds, as discussed at the beginning of this episode.)
+
 ```python
 def vectorize():
     df = genDataFrame()
-    vertical = df["f_vertical"]
-    horizontal = df["f_horizontal"]
+    length = df["length"]
+    width = df["width"]
 
-    result = numpy.sqrt(vertical**2 + horizontal**2)
+    result = np.sqrt(length**2 + width**2)
 
-    return pandas.Series(result)
+    return pd.Series(result)
 
 print(f"vectorize: {timeit(vectorize, number=repeats)-gentime:.3f} s")
 ```
@@ -507,7 +514,7 @@ An alternate approach is converting your DataFrame to a Python dictionary using
 def to_dict():
     df = genDataFrame()
     df_as_dict = df.to_dict(orient='index')
-    return pandas.Series([(r['f_vertical']**2 + r['f_horizontal']**2)**0.5 for r in df_as_dict.values()])
+    return pd.Series([(r['length']**2 + r['width']**2)**0.5 for r in df_as_dict.values()])
 
 print(f"to_dict: {timeit(to_dict, number=repeats)-gentime:.2f} s")
 ```
@@ -522,12 +529,12 @@ This is because indexing into Pandas' `Series` (rows) is significantly slower th
 
 ```python
 from timeit import timeit
-import pandas as pandas
+import pandas as pd
 
 N = 100_000  # Number of rows in DataFrame
 
 def genInput():
-    s = pandas.Series({'a' : 1, 'b' : 2})
+    s = pd.Series({'a' : 1, 'b' : 2})
     d = {'a' : 1, 'b' : 2}
     return s, d
 

diff --git a/episodes/profiling-functions.md b/episodes/profiling-functions.md
@@ -436,9 +436,10 @@ Download and profile <a href="files/pred-prey/predprey.py" download>the Python p
 
 *This exercise uses the packages `numpy` and `matplotlib`, they can be installed via `pip install numpy matplotlib`.* 
 
-> The predator prey model is a simple agent-based model of population dynamics. Predators and prey co-exist in a common environment and compete over finite resources. 
+> The predator prey model is a simple model of population dynamics, where a number of predators, prey and grass exist in a two dimensional grid.
+> Predators eat prey, prey eat grass; predators and prey can each reproduce, while grass can regrow. Accordingly, the size of each population changes over time. Depending on the parameters of the model, the populations may oscillate, grow or collapse due to the availability of their food source.
 >
-> The three agents; predators, prey and grass exist in a two dimensional grid. Predators eat prey, prey eat grass. The size of each population changes over time. Depending on the parameters of the model, the populations may oscillate, grow or collapse due to the availability of their food source.
+> Since the behaviour of each individual predator/prey/grass is modeled, this is called an agent-based model. Computational models like this are used in many areas of research, ranging from population dynamics to epidemiology (e.g., to simulate the effect of public health interventions during the COVID-19 pandemic), urban planning (e.g., simulating pedestrian flows) or economics (e.g., simulating financial markets).
 
 The program can be executed via `python predprey.py <steps>`.
 The value of `steps` for a full run is 400, which may take a few minutes. However, using 100–200 steps should be sufficient to find the bottlenecks.