Let's make TCP faster

Karellen · on Jan 24, 2012

Hmmmm....the article prelude, and points 1 and 3, and the rationale document linked for point 2, all seem to be about optimising TCP for HTTP/the Web.

The thing is, a heck of a lot more runs over the Internet/TCP than just HTTP/the web. Also, it can very well be argued that a lot of the "end-user" perceived problems they are trying to fix (e.g. HTTP total request-response round trip latency) are acutally problems with HTTP, rather than TCP - notably the fact that for "small" web requests all HTTP effectively does is re-implement a datagram protocol (albeit with larger packets than UDP) on top of TCP, with all the consequent overhead of setting up and tearing down a TCP connection.

It's an interesting set of fixes. But are they the right fixes, at the right level? Would moving to SPDY instead of HTTP fix the problems better, at a more appropriate level? With less chance of impacting all the other protocols that run (and are yet to run) over TCP?

drucken · on Jan 24, 2012

Agreed. 1, 2 and 4 seem uncontroversial to non-TCP experts like me. But I do think we'd need a lot more re-assurance from old-hand TCP crafters as to whether 3 (TCP Fast Open) makes much sense, let alone worth the huge deployment effort.

Changing the fundament of the Internet purely for the sake of a higher level protocol (albeit an important one) in the stack seems dangerous. This would be the case, if for no other reason that it sets a precedent for future changes. Changes at each layer should always be as stack-agnostic as possible. This is by design.

forcefsck · on Jan 24, 2012

Furthermore 1,2 and 3 seem to make an impact only in establishing the initial connection, which is not that much time, even 40% latency decrease would be less than half a second gain in most situations, and if I got it right, the article states in point 3 that only 33% of http traffic is preceded with a new connection establishment.

Flenser · on Jan 24, 2012

I'm guessing googlebot creates a lot of initial connections and if this was implemented widely it would speed up crawling.

giulivo · on Jan 23, 2012

I found this part to be the real great news:

All our work on TCP is open-source and publicly available. We disseminate our innovations through the Linux kernel, IETF standards proposals, and research publications.

ajross · on Jan 23, 2012

OK, dumb question which I'm too lazy to look up for myself: what is TCP Fast Open, and how is it different from T/TCP? My vague memory is that the latter was dropped because allocating port numbers without requiring an explicit round trip simply could not be made robust vs. DDOS attacks. What tricks is TFO using that T/TCP didn't?

(edit: Not so lazy after all I guess. The draft RFC here: http://tools.ietf.org/html/draft-cheng-tcpm-fastopen-00 and after a very quick perusal I don't see an attempt to solve the DOS problem either. It seems like it just requires apps to handle the transactions really fast and then close the connection?)

_l4lu · on Jan 23, 2012

The "Fast Open Cookies" are designed to protect against the DDoS attacks. These are acquired by the client on the first TFO connection. Subsequent SYN packets reuse the cookie. The number of outstanding TFO cookies is limited. The paper on TCP Fast Open explains this in more detail than the RFC (http://research.google.com/pubs/pub37517.html)

JoeAltmaier · on Jan 23, 2012

Lots of things about TCP&Co are stale, and don't work well in a modern network. That paper covers connection establishment. Other issues include network address establishment, device discovery and LAN broadcast.

In my last job creating mobile wireless drivers, we had a problem with wireless roaming. TCP/DHCP are set up assuming IP address establishment is a very infrequent operation. Typically it could take several seconds, which is fine if it only happens at boot or when a human trips over a cable and plugs it back in.

But wireless devices 'plug back in' each time they roam to a new AP. In an industrial environment (warehouse, 60 APs installed over several acres, forklift driving 20MPH) you may need to roam every second or so.

Its time to examine every aspect of TCP for large (huge) installations, very frequent device discovery (power-save in handheld devices), rapidly changing network topologies and so on.

IgorPartola · on Jan 24, 2012

Apparently Mac OS has a somewhat non-standard way to join networks with DHCP address assignment very fast. Otherwise, I agree.

modeless · on Jan 24, 2012

I don't understand why LAN DHCP address assignment is so slow. RTT to the DHCP server is almost always in the single-digit millisecond range, so why does DHCP often take multiple seconds? Can someone explain this?

w1nk · on Jan 24, 2012

It doesn't have to be slow. It can be (especially in larger networks) due to spanning tree on the switches. Check out: http://serverfault.com/questions/102346/dhcp-server-slow-to-...

JoeAltmaier · on Jan 24, 2012

Because the RFC calls for collecting DHCP responses over several seconds, then choosing the 'best' response.

If you select the 'best' as the fastest, then you can simply take the 1st response and run with it. Then it takes only milliseconds as you observed. That's what we did.

sc68cal · on Jan 24, 2012

HN discussion of OS X dhcp fast resolution:

http://news.ycombinator.com/item?id=2755461

While it works, and works fast, many people raised concerns about the implications.

A broad summary of the trade-offs are here: http://news.ycombinator.com/item?id=2758576

wmf · on Jan 24, 2012

It appears that Windows 8 has copied those optimizations as well. http://blogs.msdn.com/b/b8/archive/2012/01/20/engineering-wi...

tmcw · on Jan 23, 2012

I hope that this really actually helps everyone. SPDY has been in Chrome & on Google Maps and such for a long time, but not elsewhere: it's disabled on Firefox, unavailable on Safari and the like. And it's not implemented elsewhere: node-spdy is getting awesome but has taken a while to get there. Working for a place that could really benefit from something like SPDY, it seemed a bummer that only a duo of competitors products would work with an open protocol, for lack of documentation, interest, or what-have-you.

modeless · on Jan 24, 2012

SPDY seems poised for widespread adoption. It's only a matter of time before Firefox enables it, and the combined share of Chrome and Firefox is now over 50%. That should spur server adoption, and once it starts affecting benchmark scores the other browsers will be scrambling to implement it.

msmith · on Jan 23, 2012

This post is talking about changing TCP, not SPDY. The changes mentioned here would probably benefit SPDY as well, but to a lesser degree than HTTP, since SPDY was designed to handle multiple concurrent HTTP requests on a single TCP session.

wazoox · on Jan 23, 2012

Of course I don't know much about this, but I find the first call to action a bit surprising:

1. Increase TCP initial congestion window to 10 (IW10).

It seems contradictory with the general concept that too much buffering harms latency and may actually be aggravating congestion: http://queue.acm.org/detail.cfm?id=2071893

wmf · on Jan 23, 2012

Bufferbloat is caused by buffering hundreds of packets, not 10. Bandwidth-delay product has increased so much that the initial congestion window also needs to increase so that TCP can ramp up in a reasonable time.

ck2 · on Jan 23, 2012

Their own paper shows that it's still a controversial issue and after a certain point decreases performance.

sp332 · on Jan 23, 2012

This paper seems unreservedly in favor of larger windows. http://research.google.com/pubs/pub36640.html

Based on our large scale experiments, we are pursuing efforts in the IETF to standardize TCP’s initial congestion window to at least ten segments. Preliminary experiments with even higher initial windows show indications of beneﬁting latency further while keeping any costs to a modest level. Future work should focus on eliminating the initial congestion window as a manifest constant to scale to even large network speeds and Web page sizes.

ck2 · on Jan 23, 2012

I could be reading it wrong, but I think I see some issues:

https://docs.google.com/gview?url=http://www.cs.helsinki.fi/...

   IW10, while improving elapsed times, imposes higher queuing delay than IW3
   However, if self-congesting, IW3 is more aggressive in terms of queuing delay
   AQM (RED) failed to control the increase in the queuing delay

necro · on Jan 23, 2012

2 years ago we were discussing a few of the direct advantages of this in a comment here http://news.ycombinator.com/item?id=1143317 including tcp_slow_start_after_idle which also interacts with icwnd.

Also it's much easier as of late to get the benefit from a larger initial cwnd. Back then you needed to recompile the kernel with source tweaks, now you just use a backport or depending on your distro version you already have the benefit as kernel 2.6.39 has the change... http://kernelnewbies.org/Linux_2_6_39

youngtaff · on Jan 24, 2012

If you need it IW10 can now be implemented on Windows Server 2008 R2 see - http://www.andysnotebook.com/2011/11/increasing-the-tcp-init...

Jim Gettys article "IW10 Considered Harmful" is worth a read too - http://tools.ietf.org/html/draft-gettys-iw10-considered-harm...

vy8vWJlco · on Jan 23, 2012

TCP fast open (TFO) effectively fires data in the blind in the establishment phase and then handles the timeout gracefully. That sounds like vanilla UDP (or your favorite best-effort protocol) to me.

X-Istence · on Jan 24, 2012

Except that it is handled by the kernel, rather than by the program itself.

vy8vWJlco · on Jan 24, 2012

Do you mean that being ring-0 makes TCP faster, or insulates existing code?

humbledrone · on Jan 24, 2012

I think grandparent means that the kernel's TCP implementation handles subsequent retransmits, etc, whereas with UDP that's all up to the application. Maybe a TFO SYN is somewhat equivalent to a single UDP packet, but every packet after that gets to take advantage of TCP's reliability, which is obviously not handled by UDP.

vy8vWJlco · on Jan 24, 2012

I guess my point is: doesn't demoting the front-end of TCP constitute admission that it should have been carried by UDP in the first place?

TFO basically says the handshake is really just UDP, and the TCP connection doesn't really exist except as a byproduct of an ongoing UDP-based exchange. The 3-way handshake is just the first 3 messages in that chain, and the TCP channel doesn't exist until that many have occured, but the unreliable "phantom" UDP channel doesn't go away once reliability is established. The head/outstanding link in the chain is always unreliable.

I think TCP is a strange mental error: nobody ever needed a to make TCP a real transport protocol next to ICMP and UDP, etc. It didn't need an IP transport number of it's own. TCP is just the idea of "reliability" and can exist entirely in software (and for that reason should, since it's one less thing to maintain in the kernel). UDP is enough. (and ICMP, for example addresses a different problem: out-of-band network feedback.)

Existing code would work the same. I could still ask for a "TCP" connection, and start sending with the real data carried by UDP and benefit from 1 round trip if I don't need to send more.

TFO does that too -- allows some of the unreliability to creep in in the hope that the system is reliable enough that it's worth it -- but it also adds complexity to the existing name "TCP", and I'm not convinced that's good or worth it. TFO solves the right problem in the wrong place IMHO.

X-Istence · on Jan 25, 2012

I don't see how it adds complexity. Instead of sending just 3 packets we now send 10. Now we can fit more data into the initial startup window.

There is no additional complexity. This is baked into the kernel.

TCP being in user land in software would be absolutely terrible. There would be many different implementations, it wouldn't be standardised and the fact of the matter is that I as an application developer don't want to have to create TCP on top of UDP. I want to be able to say connect here and establish a connection and make sure my data makes it.

vy8vWJlco · on Jan 25, 2012

Sorry if I wasn't clear. I'm not talking about the window size, just TFO.

I agree, you as an application developer shouldn't have to recreate TCP. The code already exists, I'm just suggesting that it shouldn't live in the kernel/OS. There's no difference to users or developers at the application layer. (I think evolutionary pressure is a good thing, but there's no reason not to preserve the interfaces for compatibility.)

<?ego_rant("on")?> That said, since we build towers up -- and TCP has already been working for a long time -- it may be against the grain to redirect growth towards the perimeter. It feels retrograde and less snazzy. But if we don't take advantage of the land below us too, the building topples/the goal suffers. Examples would include redundant encapsulation of frames, unnecessary round trips, etc. Start imagining tunneling TCP over TCP (if you've ever forwarded X11 connections over SSH over a 56k modem, you probably know what that would be like). It begins to feel like we're base64-encoding everything.

I think there's an even more important example to think about though.

People jump through major hoops to make their webservers incredibly fast, and able to handle 100s of thousands of connections per second. Worker thread pools, I/O completion ports, you name it. Unfortunately, webservers are serving up TCP connections and TCP needs state to be reliable (otherwise it's just UDP). Unfortunately since TCP is being used to transfer HTTP, which is supposed to be stateless, these goals work against each other.

Imagine how fast a webserver might be if it didn't have to hold onto connection data at all... TFO alone doesn't get you there, it just gets you back to 1 round trip. <?ego_rant("off")?>

I am saying that we wouldn't need to "invent" TFO at this late date if we had started from there (no time like the present). TFO is like digging up though. :)

newman314 · on Jan 23, 2012

I can't see to find kernel patches for #2 or #3. Anyone else have better luck?

Also, I would like to see more emphasis given to research on mobile networks, which is my area of interest. Perf for large stable networks is not the same for choppy 3G-ish mobile networks.

newman314 · on Jan 24, 2012

Ninja commenting here...

After more digging, I was able to find this. If you scroll all the way to the end, there is some verbiage about just setting TCP_RTO_MIN to 1. However, the author claims this causes issues with delayed ACK unless another (missing) patch is applied.

https://github.com/vrv/linux-microsecondrto http://www.pdl.cmu.edu/PDL-FTP/Storage/sigcomm147-vasudevan....

DrCatbox · on Jan 23, 2012

Will this effect other uses of TCP than HTTP? Like IRC or SSH?

lorax · on Jan 23, 2012

Most of these changes seem focused on improving small, short-lived connections. IRC and SSH are mostly long-lived and I don't think it will have a noticeable impact on them. For large bulk data transfers (like sep or ftp) the Proportional Rate Reduction for TCP (PRR) should help.

exor · on Jan 24, 2012

Why do us small business owners care about optimizing TCP?

Why does Google? Because web search is behind billions of dollars of revenue. Micro-optimizations matter to them.