Subject: RE: Packet drop on full socket problem

RE: Packet drop on full socket problem

From: Thomas Rauscher <trauscher_at_loytec.com>
Date: Thu, 7 Oct 2010 11:34:30 +0200

I just had to recall some details. The core problem is not the SSH window,
but the socket buffer. If the socket is non-blocking and send() returns
with a short write, then the problem is triggered.

>
> The function will/should then make sure that it doesn't try to send any more
> data than the remote has a window for. In this case, it should further
> decrease the amount of data this function will attempt to send.
>
> > * In _libssh2_transport_write()
> >
> > _libssh2_send returns -1 (EAGAIN) and the current packet is
> saved to
> > p->odata, p->olen ...
>
> You mean that it returns EAGAIN immediately or after having sent the first 12K
> of data? I assume you mean that it first sends some data and then when it
> loops it gets EAGAIN back.
>

The problem here seem to be not the window size, but the socket buffer. The socket
send() returned less than 12k, so the code decided to save the entire packet into
the p->odata buffer.

> > * _libssh2_transport_write() returns LIBSSH2_ERROR_EAGAIN to
> > _libssh2_channel_write() which executes
> >
> > if(wrote) {
> > _libssh2_transport_drain(session);
> > goto _channel_write_done;
> > }
>
> ... as it would only execute that if 'wrote' actually wasn't zero.

Yes, but wrote is not zero, as this happens in a later iteration of the
while(buflen>0) loop.

>
> > _libssh2_transport_drain() frees p->outbuf and sets it to NULL.
> >
> > * _libssh2_transport_write then returns "wrote" (12k) to
> the application.
>
> Right, as it did in fact successfully send away 12K.

I think that this is the core problem. The code has sent away 12k, but failed
to send the rest. It saves the 12k-Buffer to odata and the upper level function
_libssh2_channel_write throws it away again.
This means that this buffer is missing in the SSH stream.

>
> > 2) Application calls _libssh2_channel_write(..., 128*1024) again.
>
> Right, but that buffer should now be pointing 12K further into the data as 12K
> was in fact sent in the previous invoke.
>
> > _libssh2_transport_write() now calls send_existing() first which
> > immediately returns because p->outbuf is NULL.
> >
> > if (!p->outbuf) {
> > *ret = 0;
> > return LIBSSH2_ERROR_NONE;
> > }
>
> Right, there's nothing save there. What do you think it
> should have saved there?

The buffer which did not fit into the socket buffer. The former call to
_libssh2_transport_write has stored the "not-sent" packet there.

> ... as you can see I didn't follow how it ended up like this!
> I'll get myself a dropbear install and see if I can repeat this. Is uploading
> data with a 128K buffer enough to trigger it? Like with the
> sftp_write_nonblock.c example?

I'm not sure if it is that easy. It only occurred on XP PCs here,
but not on Windows 7. Also, the target file was a unix pipe on an embedded device
which could delay the SSH stream for some time. This is probably the cause why the
Windows socket buffer filled up.

Thomas.
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-10-07