[D] What does it mean for AI/ML to outperform human benchmarks?
This post argues that beating human benchmarks on tests do not mean that the AI is actually better at the task. This seems to go well with recent controversies around BERT but is more fundamental than that.
Core argument (details and evidence in post):
- Reports that AI beat humans on certain benchmarks or very specialised tasks don’t mean that AI is actually better at those tasks than any individual human.
- They certainly don’t mean that AI is approaching the task with any of the same understanding of the world people do.
- People actually perform 100% on the tasks when administered individually under ideal conditions (no distraction, typical cognitive development, enough time, etc.) They will start making errors only if we give them too many tasks in too short a time.
- This means that just adding more of these results will NOT cumulatively approach general human cognition.
- But it may mean that AI can replace people on certain tasks that were previously mistakenly thought to require general human intelligence.
- All tests of artificial intelligence suffer from Goodhart’s law.
- A test more closely resembling an internship or an apprenticeship than a gameshow may be a more effective version of the Imitation Game.
- Worries about ‘superintelligence’ are very likely to be irrelevant because they are based on an unproven notion of arbitrary scalability of intelligence and ignore limits on computability.