Experimental research conducted in Machine Perception (but, for that matter, easily generalized to a wide range of other experimental sciences) should integrate the following goals:
- develop algorithms that are robust and approach human levels of performance for specific tasks of interest.
- invent new methods that are better than known techniques.
- generate experimental results that are well-documented, understandable in context, and reproducible by others.
- build on past knowledge to yield new knowledge which moves us toward solutions for problems of vital importance.
These goals only seem to be rarely met.
Despite tremendous advances in computer and communications technologies, the way we conduct research and disseminate results is essentially unchanged over the past 50 years.
Identify problem → conceive solution → find data → test method → publish results (→ field system)
Modest steps forward include more stringent peer review processes (comparisons to prior work), attempts to create and share common datasets, and “competitions” at international conferences. These steps do not allow to achieve the goals set before.
Develop Algorithms that are Robust and Approach Human levels of Performance for Specific Tasks of Interest.
- We want algorithms to be general, but too often they are tested on small, overused datasets far removed from the real world.
- Nearly all experimental results reported in the literature are biased by algorithm developer’s intimate knowledge of the data.
- Current practices lack convincing evidence of generality.
- What does “human levels of performance” mean? Even experts can disagree on all but the most trivial of cases.
Following A. Torralba's keynote at ICPR 2010 it would seem that human performance and trying to achieve it might even be a real problem to define. In certain cases this may effectively be what we want, but is this actually always the case ? Perhaps in given circumstances we actually *don't* want our system to behave as a human ...
Invent New Methods that are Better than Known Techniques.
- How do we know when we have succeeded?
- Need to compare against previously published results creates over-reliance on standard datasets (which is counter-productive).
- Attempts to re-implement a published algorithm are problematic (incomplete descriptions, inherent conflict of interest).
- Competitions can be useful, but are infrequent.
- Most papers do not even bother to make such comparisons.
Generate Experimental Results that are Well-Documented, Understandable in Context, and Reproducible by Others.
- Are published descriptions sufficient to reproduce experiments? How often is this even attempted?
- Explicit and/or implicit bias in selecting and using data (e.g., discarding hard cases) makes context difficult to recover.
- “Publish or perish” mindset leads to overstated claims, poor understanding of generalizability of results.
Build on Past Knowledge to Yield new Knowledge which Moves us Toward Solutions for Problems of Vital Importance.
- Effort is wasted developing methods that do not improve on existing techniques, or are ill-suited for the task at hand.
- Much time spent “reinventing the wheel.”
- Like trying to build a pyramid out of shifting sand without first forming it into blocks.
- Impossible to fix without a paradigm shift across the community.