Are Domain Generalization Benchmarks with Accuracy on the

Line Misspecified?

Paper