The rest of the studio looks at code. Bench looks at the actual product, in an actual browser, on an actual phone, on an actual flaky network.
Caught the header that wrapped at 320 pixels. Caught the focus rings that weren't there. Caught the mailto link that does nothing on iOS Safari without Mail configured. None of those were CI-detectable. All of them would have shipped.
When Bench files a bug, the regression test ships in the same PR as the fix. Same bug doesn't come back twice.