Would I Lie to You?
Diederik Stapel, a social psychologist and Dean of Social and Behavioural Sciences at University of Tilburg has been identified as the perpetrator of large-scale scientific fraud. He has, apparently, admitted producing data, from experiments that were, in fact, never conducted, in support of hypotheses advanced by collaborators. A penitent letter written by him has appeared in a Dutch newspaper. It is interesting because it reveals Stapel's struggle to understand his own actions, and his incomprehension in the face of the evidence that he cannot deny. It is difficult on a human level not to be moved, despite understanding the damage that has been done to colleagues and students, as well as to his subject.
Diederik Stapel is, of course, not the only such fraudster. Scientific fraud comes to light with disturbing regularity. Not always carried out in such an extensive fashion as in this case, and only rarely by a scientist of similar distinction and established reputation. All scientists can cite instances. Scientific journals have moved rapidly to withdraw papers which rely upon Stapel's data. This is a process that is now well practiced. Data shows a rising number of such withdrawals, it is unclear however, if this is the result of more fraud caused by the greater pressures on scientists to publish and succeed, greater awareness or better schemes for detection and whistle blowing.
Computer science -appears- to have been immune to the rising tide of fraud and associated paper withdrawals. This seems difficult to account for. After all, human nature is universal, the pressure to publish and to succeed professionally is no less than in the natural and social sciences, and the social structures are little different. Obviously mathematical analysis, proofs and so on, are difficult to fake, but computing experiments are at exactly the same risks as experiments in other sciences. Simple matters such as performance and scalability results can be readily invented or embellished. Code can be withheld (this can be for legitimate reasons, but also as a cover) and demos can be rigged by, for example, hard-coding. In broad terms the difficult boundary between 'demonstrating a principle' and fixing results can be crossed too easily.
I suspect that fraud is a phenomenon at least as common in computer science as elsewhere in the sciences - which means present but not widespread - and our failure to identify it reflects a problem for the discipline. We rarely replicate the results obtained by others largely because such replications would not be regarded as novel. We too rarely confront techniques that do not work, preferring to steer round them. If we encounter oddities or 'too perfect' results we are individually and collectively more inclined to sweep the matter under the carpet rather than confronting our unpleasant suspicions. Of course fraud can emerge in the context of trusted relationships, that between student and supervisor or between colleagues, it is difficult to even acknowledge this possibility, let alone confront it.
I would propose that the computer science community start to take the possibility of fraud seriously as a risk to the integrity of our discipline. Not only should we build a stronger culture of replication but we should also commit to examining some papers in detail. Perhaps a selection of papers from IEEE or ACM sponsored conferences could be identified each year and a team of scientists would then work with the authors to examine the raw data, code and analysis.
We can cross our fingers and hope that computer science is less prone to fraud than other disciplines and that the critical culture we have developed is some protection. Our alertness to the possibility however, can itself act as an important deterrent and will be more protection than wishful thinking.