Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Let's make TCP faster (googlecode.blogspot.com)
287 points by flardinois on Jan 23, 2012 | hide | past | favorite | 36 comments


Hmmmm....the article prelude, and points 1 and 3, and the rationale document linked for point 2, all seem to be about optimising TCP for HTTP/the Web.

The thing is, a heck of a lot more runs over the Internet/TCP than just HTTP/the web. Also, it can very well be argued that a lot of the "end-user" perceived problems they are trying to fix (e.g. HTTP total request-response round trip latency) are acutally problems with HTTP, rather than TCP - notably the fact that for "small" web requests all HTTP effectively does is re-implement a datagram protocol (albeit with larger packets than UDP) on top of TCP, with all the consequent overhead of setting up and tearing down a TCP connection.

It's an interesting set of fixes. But are they the right fixes, at the right level? Would moving to SPDY instead of HTTP fix the problems better, at a more appropriate level? With less chance of impacting all the other protocols that run (and are yet to run) over TCP?


Agreed. 1, 2 and 4 seem uncontroversial to non-TCP experts like me. But I do think we'd need a lot more re-assurance from old-hand TCP crafters as to whether 3 (TCP Fast Open) makes much sense, let alone worth the huge deployment effort.

Changing the fundament of the Internet purely for the sake of a higher level protocol (albeit an important one) in the stack seems dangerous. This would be the case, if for no other reason that it sets a precedent for future changes. Changes at each layer should always be as stack-agnostic as possible. This is by design.


Furthermore 1,2 and 3 seem to make an impact only in establishing the initial connection, which is not that much time, even 40% latency decrease would be less than half a second gain in most situations, and if I got it right, the article states in point 3 that only 33% of http traffic is preceded with a new connection establishment.


I'm guessing googlebot creates a lot of initial connections and if this was implemented widely it would speed up crawling.


I found this part to be the real great news:

All our work on TCP is open-source and publicly available. We disseminate our innovations through the Linux kernel, IETF standards proposals, and research publications.


OK, dumb question which I'm too lazy to look up for myself: what is TCP Fast Open, and how is it different from T/TCP? My vague memory is that the latter was dropped because allocating port numbers without requiring an explicit round trip simply could not be made robust vs. DDOS attacks. What tricks is TFO using that T/TCP didn't?

(edit: Not so lazy after all I guess. The draft RFC here: http://tools.ietf.org/html/draft-cheng-tcpm-fastopen-00 and after a very quick perusal I don't see an attempt to solve the DOS problem either. It seems like it just requires apps to handle the transactions really fast and then close the connection?)


The "Fast Open Cookies" are designed to protect against the DDoS attacks. These are acquired by the client on the first TFO connection. Subsequent SYN packets reuse the cookie. The number of outstanding TFO cookies is limited. The paper on TCP Fast Open explains this in more detail than the RFC (http://research.google.com/pubs/pub37517.html)


Lots of things about TCP&Co are stale, and don't work well in a modern network. That paper covers connection establishment. Other issues include network address establishment, device discovery and LAN broadcast.

In my last job creating mobile wireless drivers, we had a problem with wireless roaming. TCP/DHCP are set up assuming IP address establishment is a very infrequent operation. Typically it could take several seconds, which is fine if it only happens at boot or when a human trips over a cable and plugs it back in.

But wireless devices 'plug back in' each time they roam to a new AP. In an industrial environment (warehouse, 60 APs installed over several acres, forklift driving 20MPH) you may need to roam every second or so.

Its time to examine every aspect of TCP for large (huge) installations, very frequent device discovery (power-save in handheld devices), rapidly changing network topologies and so on.


Apparently Mac OS has a somewhat non-standard way to join networks with DHCP address assignment very fast. Otherwise, I agree.


I don't understand why LAN DHCP address assignment is so slow. RTT to the DHCP server is almost always in the single-digit millisecond range, so why does DHCP often take multiple seconds? Can someone explain this?


It doesn't have to be slow. It can be (especially in larger networks) due to spanning tree on the switches. Check out: http://serverfault.com/questions/102346/dhcp-server-slow-to-...


Because the RFC calls for collecting DHCP responses over several seconds, then choosing the 'best' response.

If you select the 'best' as the fastest, then you can simply take the 1st response and run with it. Then it takes only milliseconds as you observed. That's what we did.


HN discussion of OS X dhcp fast resolution:

http://news.ycombinator.com/item?id=2755461

While it works, and works fast, many people raised concerns about the implications.

A broad summary of the trade-offs are here: http://news.ycombinator.com/item?id=2758576


It appears that Windows 8 has copied those optimizations as well. http://blogs.msdn.com/b/b8/archive/2012/01/20/engineering-wi...


I hope that this really actually helps everyone. SPDY has been in Chrome & on Google Maps and such for a long time, but not elsewhere: it's disabled on Firefox, unavailable on Safari and the like. And it's not implemented elsewhere: node-spdy is getting awesome but has taken a while to get there. Working for a place that could really benefit from something like SPDY, it seemed a bummer that only a duo of competitors products would work with an open protocol, for lack of documentation, interest, or what-have-you.


SPDY seems poised for widespread adoption. It's only a matter of time before Firefox enables it, and the combined share of Chrome and Firefox is now over 50%. That should spur server adoption, and once it starts affecting benchmark scores the other browsers will be scrambling to implement it.


This post is talking about changing TCP, not SPDY. The changes mentioned here would probably benefit SPDY as well, but to a lesser degree than HTTP, since SPDY was designed to handle multiple concurrent HTTP requests on a single TCP session.


Of course I don't know much about this, but I find the first call to action a bit surprising:

1. Increase TCP initial congestion window to 10 (IW10).

It seems contradictory with the general concept that too much buffering harms latency and may actually be aggravating congestion: http://queue.acm.org/detail.cfm?id=2071893


Bufferbloat is caused by buffering hundreds of packets, not 10. Bandwidth-delay product has increased so much that the initial congestion window also needs to increase so that TCP can ramp up in a reasonable time.


Their own paper shows that it's still a controversial issue and after a certain point decreases performance.


This paper seems unreservedly in favor of larger windows. http://research.google.com/pubs/pub36640.html

Based on our large scale experiments, we are pursuing efforts in the IETF to standardize TCP’s initial congestion window to at least ten segments. Preliminary experiments with even higher initial windows show indications of benefiting latency further while keeping any costs to a modest level. Future work should focus on eliminating the initial congestion window as a manifest constant to scale to even large network speeds and Web page sizes.


I could be reading it wrong, but I think I see some issues:

https://docs.google.com/gview?url=http://www.cs.helsinki.fi/...

   IW10, while improving elapsed times, imposes higher queuing delay than IW3
   However, if self-congesting, IW3 is more aggressive in terms of queuing delay
   AQM (RED) failed to control the increase in the queuing delay


2 years ago we were discussing a few of the direct advantages of this in a comment here http://news.ycombinator.com/item?id=1143317 including tcp_slow_start_after_idle which also interacts with icwnd.

Also it's much easier as of late to get the benefit from a larger initial cwnd. Back then you needed to recompile the kernel with source tweaks, now you just use a backport or depending on your distro version you already have the benefit as kernel 2.6.39 has the change... http://kernelnewbies.org/Linux_2_6_39


If you need it IW10 can now be implemented on Windows Server 2008 R2 see - http://www.andysnotebook.com/2011/11/increasing-the-tcp-init...

Jim Gettys article "IW10 Considered Harmful" is worth a read too - http://tools.ietf.org/html/draft-gettys-iw10-considered-harm...


TCP fast open (TFO) effectively fires data in the blind in the establishment phase and then handles the timeout gracefully. That sounds like vanilla UDP (or your favorite best-effort protocol) to me.


Except that it is handled by the kernel, rather than by the program itself.


Do you mean that being ring-0 makes TCP faster, or insulates existing code?


I think grandparent means that the kernel's TCP implementation handles subsequent retransmits, etc, whereas with UDP that's all up to the application. Maybe a TFO SYN is somewhat equivalent to a single UDP packet, but every packet after that gets to take advantage of TCP's reliability, which is obviously not handled by UDP.


I guess my point is: doesn't demoting the front-end of TCP constitute admission that it should have been carried by UDP in the first place?

TFO basically says the handshake is really just UDP, and the TCP connection doesn't really exist except as a byproduct of an ongoing UDP-based exchange. The 3-way handshake is just the first 3 messages in that chain, and the TCP channel doesn't exist until that many have occured, but the unreliable "phantom" UDP channel doesn't go away once reliability is established. The head/outstanding link in the chain is always unreliable.

I think TCP is a strange mental error: nobody ever needed a to make TCP a real transport protocol next to ICMP and UDP, etc. It didn't need an IP transport number of it's own. TCP is just the idea of "reliability" and can exist entirely in software (and for that reason should, since it's one less thing to maintain in the kernel). UDP is enough. (and ICMP, for example addresses a different problem: out-of-band network feedback.)

Existing code would work the same. I could still ask for a "TCP" connection, and start sending with the real data carried by UDP and benefit from 1 round trip if I don't need to send more.

TFO does that too -- allows some of the unreliability to creep in in the hope that the system is reliable enough that it's worth it -- but it also adds complexity to the existing name "TCP", and I'm not convinced that's good or worth it. TFO solves the right problem in the wrong place IMHO.


I don't see how it adds complexity. Instead of sending just 3 packets we now send 10. Now we can fit more data into the initial startup window.

There is no additional complexity. This is baked into the kernel.

TCP being in user land in software would be absolutely terrible. There would be many different implementations, it wouldn't be standardised and the fact of the matter is that I as an application developer don't want to have to create TCP on top of UDP. I want to be able to say connect here and establish a connection and make sure my data makes it.


Sorry if I wasn't clear. I'm not talking about the window size, just TFO.

I agree, you as an application developer shouldn't have to recreate TCP. The code already exists, I'm just suggesting that it shouldn't live in the kernel/OS. There's no difference to users or developers at the application layer. (I think evolutionary pressure is a good thing, but there's no reason not to preserve the interfaces for compatibility.)

<?ego_rant("on")?> That said, since we build towers up -- and TCP has already been working for a long time -- it may be against the grain to redirect growth towards the perimeter. It feels retrograde and less snazzy. But if we don't take advantage of the land below us too, the building topples/the goal suffers. Examples would include redundant encapsulation of frames, unnecessary round trips, etc. Start imagining tunneling TCP over TCP (if you've ever forwarded X11 connections over SSH over a 56k modem, you probably know what that would be like). It begins to feel like we're base64-encoding everything.

I think there's an even more important example to think about though.

People jump through major hoops to make their webservers incredibly fast, and able to handle 100s of thousands of connections per second. Worker thread pools, I/O completion ports, you name it. Unfortunately, webservers are serving up TCP connections and TCP needs state to be reliable (otherwise it's just UDP). Unfortunately since TCP is being used to transfer HTTP, which is supposed to be stateless, these goals work against each other.

Imagine how fast a webserver might be if it didn't have to hold onto connection data at all... TFO alone doesn't get you there, it just gets you back to 1 round trip. <?ego_rant("off")?>

I am saying that we wouldn't need to "invent" TFO at this late date if we had started from there (no time like the present). TFO is like digging up though. :)


I can't see to find kernel patches for #2 or #3. Anyone else have better luck?

Also, I would like to see more emphasis given to research on mobile networks, which is my area of interest. Perf for large stable networks is not the same for choppy 3G-ish mobile networks.


Ninja commenting here...

After more digging, I was able to find this. If you scroll all the way to the end, there is some verbiage about just setting TCP_RTO_MIN to 1. However, the author claims this causes issues with delayed ACK unless another (missing) patch is applied.

https://github.com/vrv/linux-microsecondrto http://www.pdl.cmu.edu/PDL-FTP/Storage/sigcomm147-vasudevan....


Will this effect other uses of TCP than HTTP? Like IRC or SSH?


Most of these changes seem focused on improving small, short-lived connections. IRC and SSH are mostly long-lived and I don't think it will have a noticeable impact on them. For large bulk data transfers (like sep or ftp) the Proportional Rate Reduction for TCP (PRR) should help.


Why do us small business owners care about optimizing TCP?

Why does Google? Because web search is behind billions of dollars of revenue. Micro-optimizations matter to them.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: