Kubernetes reports two healthy pods sharing one GPU via CUDA time-slicing, but tail latency for a small, latency-sensitive agent worsened by 66% at p99. Medians and throughput barely changed. The NVIDIA device plugin's time-slicing hides memory contention and queue starvation from pod status checks.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Troubleshoot frontend performance with Datadog's Browser Profiler