(Let me begin by confessing to a little white lie: It wasn't
fclose() whose failure went undetected, but the POSIX close()
function; this part of the application used POSIX I/O. The lie
is harmless, though, because the C I/O facilities would have
failed in exactly the same way, and an undetected failure would
have had the same consequences. I'll describe what happened in
terms of C's I/O to avoid dwelling on POSIX too much.)
The situation was very much as Richard Tobin described.
The application was a document management system that loaded a
document file into memory, applied the user's edits to the in-
memory copy, and then wrote everything to a new file when told
to save the edits. It also maintained a one-level "old version"
backup for safety's sake: the Save operation wrote to a temp
file, and then if that was successful it deleted the old backup,
renamed the old document file to the backup name, and renamed the
temp file to the document. bak -> trash, doc -> bak, tmp -> doc.
The write-to-temp-file step checked almost everything. The
fopen(), obviously, but also all the fwrite()s and even a final
fflush() were checked for error indications -- but the fclose()
was not. And on one system it happened that the last few disk
blocks weren't actually allocated until fclose() -- the I/O
system sat atop VMS' lower-level file access machinery, and a
little bit of asynchrony was inherent in the arrangement.
The customer's system had disk quotas enabled, and the
victim was right up close to his limit. He opened a document,
edited for a while, saved his work thus far, and exceeded his
quota -- which went undetected because the error didn't appear
until the unchecked fclose(). Thinking that the save succeeded,
the application discarded the old backup, renamed the original
document to become the backup, and renamed the truncated temp
file to be the new document. The user worked a little longer
and saved again -- same thing, except you'll note that this time
the only surviving complete file got deleted, and both the
backup and the master document file are truncated. Result: the
whole document file became trash, not just the latest session
of work but everything that had gone before.
As Murphy would have it, the victim was the boss of the
department that had purchased several hundred licenses for our
software, and I got the privilege of flying to St. Louis to be
thrown to the lions.
[...]
In this case, the failure of fclose() would (if detected) have
stopped the delete-and-rename sequence. The user would have been
told "Hey, there was a problem saving the document; do something
about it and try again. Meanwhile, nothing has changed on disk."
Even if he'd been unable to save his latest batch of work, he would
at least not have lost everything that went before.