Skip to content

Estimating BinderHub cluster size #60

@TomasBeuzen

Description

@TomasBeuzen

I'm trying to get a good estimate of the required cluster size using this guide in Z2JH.

Here are my assumptions:

Memory

  • Max users = 50
  • Max expected concurrent users = 60% * max users = 30 (because it is not likely that everyone will use at same time)
  • Expected memory usage per user:
    • I used nbresuse to estimate a user's memory usage in the notebook.
    • A notebook by itself is about 120mb I tried to take it to the extreme, executing all the code in multiple chapters and loading in plenty of datasets. I was pushing ~300mb memory usage.
    • A single chapter was more commonly 100-200mb (including data and plots).
    • Let's be conservative and assume 300mb (we can downgrade in future)
    • If a user uses more than the available amount of memory, their notebook kernel will restart and memory will be flushed.

memory = max concurrent users * memory per user + 128mb (for JH overhead) = 30 * 300mb + 128mb = ~9GB

CPU

  • This is harder to estimate but also less of an issue, if we're running low on CPU, things will just run slower but nothing will break.
  • I took a look at the JupyterHub Tiffany set up for MDS and it's had a peak usage of just 5% since we started MDS so obviously a very conservative instance.
  • The JH is using a m5.12xlarge:

Summary

To meet memory and CPU requirements I'm going to start with using 2 x m5.2xlarge instances (the cluster can scale to 4 if needed). I think this is conservative but we'll see. I'll report back.

Here's a comparison of the two instances I mentioned:

Instance CPU RAM Memory (GB)
m5.2xlarge 8 37 32
m5.12xlarge 48 168 192

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions