Subject: Re: Debugging SCP failure: "Timed out waiting for SCP response"

Re: Debugging SCP failure: "Timed out waiting for SCP response"

From: Tim Martin <timm_at_africonnect.co.zm>
Date: Mon, 11 Oct 2010 16:01:09 +0200 (CAT)

----- "Daniel Stenberg" <daniel_at_haxx.se> wrote:

> On Mon, 11 Oct 2010, Tim Martin wrote:
>
> >> Try changing the error codition to (rc < 0 and add a
> >>
> >> else if(rc == 0)
> >> continue;
>
> > If I do that, I get what looks like an infinite loop (I gave up
> after 60
> > seconds).
>
> Hm, it shouldn't get into an infinite loop. _libssh2_channel_read()
> should
> return EAGAIN if called again without there being any data to return.
>
> Oh, it deliberately returns 0 when there's an EOF... see below:
>
> > All I've been able to figure out so far is that within
> _libssh2_channel_read
> > it gets a NULL pointer from the call to _libssh2_list_first on line
> ~1708.
> > Changing the condition as described above just causes it to hit that
> line
> > again and again, getting NULL each time. I'm assuming it's to do
> with the
> > call to _libssh2_transport_read() above not getting any data.
>
> Right, that code reads data off the socket, puts packets in queues for
> each
> channel and the _libssh2_list_first() then returns the first packet
> for your
> channel and when that is NULL there's nothing there to take care of.

Just to summarise what I've found, the chain of return codes is:

    _libssh2_recv() returns EAGAIN
  _libssh2_transport_read() returns LIBSSH2_ERROR_EAGAIN
_libssh2_channel_read() returns 0

> In your log you posted there's an interesting EOF just before the
> -28:
>
> [libssh2] 1.001004 Transport: Packet type 96 received, length=5
> [libssh2] 1.001055 Conn: EOF received for channel 0/0
> [libssh2] 1.001111 Failure Event: -28 - Timed out waiting for SCP
> response
> [libssh2] 1.001164 Conn: Freeing channel 0/0 resources
>
> ... something makes the remote end close the channel!

FWIW, the backtrace to the point where it logs "EOF received" is

#0 _libssh2_packet_add (session=0x804b008, data=0x8054288 "`", datalen=5, macstate=0) at packet.c:720
#1 0x0014f3ae in fullpacket (session=0x804b008, encrypted=1) at transport.c:253
#2 0x0014fc2c in _libssh2_transport_read (session=0x804b008) at transport.c:587
#3 0x00134fda in _libssh2_channel_read (channel=0x8052fd8, stream_id=0, buf=0x804f664 "", buflen=1) at channel.c:1693
#4 0x00141df1 in scp_recv (session=0x804b008, path=0x8048e18 "firewall.rsc", sb=0xbffff38c) at scp.c:385
#5 0x00142b03 in libssh2_scp_recv (session=0x804b008, path=0x8048e18 "firewall.rsc", sb=0xbffff38c) at scp.c:777
#6 0x08048b3b in main (argc=2, argv=0xbffff4c4) at test.c:74

Are we likely to be able to get much further with this without being able to peer inside the server code? The RouterOS guys are not unhelpful, but it's closed source and since this bug has been fixed in the (yet to be released) new version, I can't imagine I'd get a lot of support from them.

If it seems to be a server error with no simple workaround I'll probably have to give up on it. At least I know where I stand.

Tim
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-10-11