MCPcopy
hub / github.com/nadermx/backgroundremover / max_workers

Function max_workers

backgroundremover/bg.py:50–85  ·  view source on GitHub ↗

Estimate max safe worker processes based on available GPU/system memory. Each worker spawns a separate process that loads its own copy of the model plus a CUDA context. This estimates how many can fit in VRAM.

(model_name="u2net", gpu_batchsize=2)

Source from the content-addressed store, hash-verified

48
49
50def max_workers(model_name="u2net", gpu_batchsize=2):
51 """Estimate max safe worker processes based on available GPU/system memory.
52
53 Each worker spawns a separate process that loads its own copy of the model
54 plus a CUDA context. This estimates how many can fit in VRAM.
55 """
56 if torch.cuda.is_available():
57 try:
58 total_mem = torch.cuda.get_device_properties(0).total_memory
59 except Exception:
60 return 1
61
62 # Per-worker VRAM estimate:
63 # CUDA context per process: ~400MB
64 # Model weights (float32): ~175MB (u2net/human_seg), ~5MB (u2netp)
65 # JIT traced copy: same as model weights
66 # Batch inference tensors: ~30MB per frame in batch
67 if model_name == "u2netp":
68 model_bytes = 5 * 1024 * 1024
69 else:
70 model_bytes = 175 * 1024 * 1024
71
72 per_worker = (
73 400 * 1024 * 1024 # CUDA context overhead
74 + model_bytes * 2 # model + JIT trace
75 + gpu_batchsize * 30 * 1024 * 1024 # inference tensors
76 )
77
78 # Reserve 512MB for OS/driver/display
79 usable = total_mem - 512 * 1024 * 1024
80 calculated = max(1, int(usable // per_worker))
81 return calculated
82
83 # CPU/MPS: limit by CPU cores (inference is compute-bound)
84 cpu_count = os.cpu_count() or 2
85 return max(1, cpu_count // 2)
86
87class Net(torch.nn.Module):
88 def __init__(self, model_name):

Callers 1

mainFunction · 0.90

Calls

no outgoing calls

Tested by

no test coverage detected