Ceiling and Floor Effects
Well designed experiments can go wrong
What if all our algorithms do particularly well (or they all do badly)?
We’ve got little evidence to choose between them
Ceiling effects arise when test problems are insufficiently challenging
- floor effects the opposite, when problems too challenging
A problem in AI because we often use benchmark sets
But how do we detect the effect?