1 Comment

Human performance in SuperCLUE (96.5%!!!) was unreasonably high compared to similar benchmarks - SuperGLUE has an estimated human performance of 88% in their paper. Even Winograd Schemas only have human performance of ~92%. The Github page for SuperCLUE notes that this was because it was based on 3 college/grad students with access to the internet. Still, they must have been very talented and motivated individuals, because every single one of them got a score of 100% on the Classical Chinese section, a notoriously difficult subject. Do they even teach that in Chinese Colleges to non-majors?

Expand full comment