He's missing the most important part: ensuring that your application code is tre...

zokier · on Aug 20, 2011

I agree on most of your points, but disagree on that guessing encoding should not be done. I think that it conflicts with basic robustness principle "be conservative in what you do, be liberal in what you accept from others".

sid0 · on Aug 20, 2011

I personally think being liberal in what you accept from others is the second worst evil in computer science. The worst being null, of course.

tlrobinson · on Aug 20, 2011

I agree. It allows sloppy developers to be liberal in what they do, and leads to increasingly complex (and incompatible) implementations necessary to be compatible with all the edge cases.

HTML is a good example. Browsers are very tolerate of malformed HTML, which is nice for beginners who don't want to worry too much about perfect syntax.

The problem is each browser handles the unspecified cases differently, which leads to differences in the way pages are rendered, security issues like XSS, etc.

Robustness should just be built into the protocol/format/spec, if necessary. HTML5 gets this right by specifying an algorithm that all parsers should use to get consistent behavior, while still being tolerant of imperfect syntax: http://en.wikipedia.org/wiki/Tag_soup#HTML5

jrockway · on Aug 20, 2011

Hey now. If software started validating its input, what would virus writers do for a living?

guard-of-terra · on Aug 20, 2011

Then you will also personally produce programs which would be broken for 5/6ths of the world population who happen to use letters outside latin1.

There's no way to avoid it unless you wrap it up and add some explicit checks and guesses.

tlrobinson · on Aug 20, 2011

Won't all modern browsers include the encoding in the Content-Type header?

They should. If so there's no need to guess.

guard-of-terra · on Aug 20, 2011

It's not just browsers. Browsers are pretty sane when it comes to charsets, because they had the time to make it right and the pressure to do so. (It wasn't like that in times of NN4/IE4, which would interpret your text as whatever they want and won't even let you override)

Facing to something less agamant (like dreaded id3 tags), no such luck.