An interesting feature of the way the WebRTC data channel (DC) works is that it completely hides the underlying network from the application. There is no way for a developer to know if the underlying network is ready for reading/writing or is busy doing something else. Once a message is sent over the DC, it is not immediately sent over the network. It is put in an output buffer. Then, when the network is ready, the DC will automatically take the message from the buffer and send it. In the other direction, when the message is received from the network, it will be put in an input buffer and the application will be notified of the new incoming message.
If a lot of messages are sent together, they could not possibly be transmitted all at once. They are put in the input buffer. But the buffer cannot grow infinitely; it usually has some pre-determined limit. If the limit is exceeded then in Chrome >= 37 the data channel is closed, all messages discarded, and an exception is thrown. However in Chrome <= 36, only an exception is thrown. So this is an API-breaking change that was made in Chrome 37. Developers should be careful with this change if they are dealing with old browsers.
Controlling the bufferedAmount
An application should not send a large volume of messages very quickly or the buffer will overflow - and overflowing of buffers is always bad news. But then how can one know when a message could be sent and when there would be a buffer overflow? Well, an application has to monitor (a.k.a. poll at some interval) the bufferedAmount property of the DC and try to keep it in some optimal range. This is easier said than done. Polling a value is always an ugly piece of code. It is a violation of the event-driven development principles and developers simply do not like it.
But what does bufferedAmount actually represent and what is the optimal range that it should be kept within? In Chrome <= 36, bufferedAmount used to represent the number of buffered messages. And the limit was around 50 messages. But then in Chrome >= 37, the meaning of bufferedAmount was changed to represent the size of all messages in the output buffer and the limit was set to 16 Mbytes. Notice that the meaning of bufferedAmount was changed but its name and type (namely, integer) remained the same. This led to hard to find bugs. That is, if you skipped reading the Chrome release notes.
There is yet another problem with bufferedAmount - it was broken under Windows versions of Chrome prior to v37 - it was always 0. So what developers did back then was to send messages in a tight loop until the send method threw an exception. This was used as an indicator that the buffer was full and then the app slept for a while before it tried to send data again. In Chrome 37, this bug was fixed, but the behavior when the buffer overflowed was also changed. Not only an exception was thrown, but the DC was closed and all unsent messages got discarded. This, too, led to hard to find bugs and long debugging sessions.
In view of this multitude of changes that were made at the time of Chrome 37, I think it may be a good idea for a new application to support only versions of Chrome higher than or equal to 37. Support for older versions can be added later to the app if required. But one should keep in mind that supporting all the different browsers combinations and compensating for all browser bugs is tricky.
Verifying the transfer
OK. Now that we have split our big blob of data into smaller messages with optimal size and sent them in an efficient manner, which does not cause the output buffer to overflow, we have come to the point when we want to know if the messages we sent have been delivered and received by the other side. Well, the DC does not give us a way of knowing this. There is no way of knowing when or even whether a message has actually been sent over the network and if it has been received by the other side. If such a functionality is required, then it should be implemented in the application layer by the use of acknowledge messages.
Keep in mind
But what if the messages were not delivered due to some unknown network error? Then the DC will retransmit them again and again until it finally comes to the conclusion that it is hopeless and notify the application of an error. However, this error detection takes time so the moment when the app is notified and the moment when the error actually occurred are usually quite distant in time. Let’s take an example with an application that has to send 3 messages. It sends the first: No error. Then the second: No error again. Then after the third, it immediately gets an error. Were the first two messages delivered? When did the error occurr? During the transmission of the first, second or third message? There is no way of knowing. The DC simply does not provide such information. If it is required, then it should be implemented in the application layer.
By the way, detecting which messages were delivered is no easy task. Sending acknowledgments is a good way to do it but what if the acknowledge messages gets lost? Network protocols are not the subject of this article so we will not go into that direction but one should always be careful with networks and network protocols. In real-world networks, pretty much anything can happen, no matter how improbable or far fetched it may sound.
This is the sixth article in a series on the topic of the WebRTC data channel. All posts are based on a talk our Chief Architect, Svetlin Mladenov, delivered during the WebRTC conference expo Paris 2014. The accompanying slides are available here and the entire lecture itself can be seen on our YouTube channel.
[Go to Post #1: WebRTC Data Channel: An Overview]