It shows us why just using C/C++ is not a performance-wise decision. You need to...

mhurron · on March 11, 2012

I think it more shows that just picking C/C++ over Python doesn't mean you automatically get awesome performance. You still need to know what your doing.

adobriyan · on March 11, 2012

like using `gettimeofday()` and not using cat(1), for starters.

jjc4p · on March 11, 2012

What's bad about cat(1)?

shabble · on March 11, 2012

Nothing, except it's entirely unnecessary for this task.

See "Useless use of cat awards" from days of yore: http://partmaps.org/era/unix/award.html#uucaletter

truncate · on March 11, 2012

I still prefer cat, as one simple mistake of < to > and the file is gone.

adobriyan · on March 11, 2012

OP intends to measure time to read data from file and maybe it process into internal representation.

cat(1) almost certainly internally buffers data (32KB here) thus context switches occurs. Shell creates pipe which is buffered inside kernel.

All of this muffles measurements.

What was one or several read(2) calls + processing is now one or several calls of maybe smaller sizes + whatever scheduling differences + in one of the examples OP used /usr/bin/time of the whole thing also.

This is of course is not visible because data were dumbed down by using time() which has horrible granularity, but when finer grained timer it'd be visible, I'm sure.

_riwy · on March 11, 2012

you're

meastham · on March 11, 2012

That's quite a lot to read into a tiny example like this.

_delirium · on March 11, 2012

I agree, though I think it could fairly be taken as one small bit of evidence in favor of "C++ has a lot of gotchas". In this case it looks like the culprit is C++'s C-compatibility-driven decision to sync with stdio by default, and therefore to avoid buffering input. Of course, if they made the opposite decision on defaults, "C++ doesn't sync with stdio by default" would be a different, probably also common, variety of "gotcha".

mikeash · on March 11, 2012

Why isn't cin implemented on top of C's stdin and FILE? That way you get both buffering and compatibility.

jemfinch · on March 11, 2012

I've not implemented the C++ std library, but my guess is it's because iostreams need to implement their own buffering anyway, so it would just add complexity and unpredictability to buffer atop an already-buffering library.

mikeash · on March 12, 2012

I'm not sure this quite makes sense. The buffering can already be disabled, clearly, since that's what's being discussed. The non-buffering implementation could be easily placed atop FILE (I don't know the details, but I can't imagine a FILE-based iostream implementation being at all complex) at which point you have a buffered implementation that also cooperates with pure C stdio. iostream would need buffering for other operations, but could just leave it off permanently for stdio, and the switch already exists.