Subject: SFTP upload speed

SFTP upload speed

From: Daniel Stenberg <daniel_at_haxx.se>
Date: Sun, 5 Dec 2010 00:12:04 +0100 (CET)

Hi friends,

Thanks to Mark Riordan for allowing me to use an account on his machine, I
have the following report to tell:

I've now done libssh2 SFTP uploads just as fast as openssh over my test
connection! The perhaps most intersting point of all: I didn't change the
library code to make this happen!

Test connection: 140ms latency, 100mbit upload link, unknown bandwidth
maximum in the receiving end. No compression was used.

Test system: Linux 2.6.30

Test file: 5MB (5 * 1024 * 1024) bytes.

OpenSSH: takes between 1 to 3 seconds to upload

libssh2: takes 2 seconds to upload (for some reason it seems more consistent
than OpenSSH in my timings): 5242880 bytes in 2 seconds makes 2621440.0
bytes/sec

(Due to limited disk space in the target server I've only worked with this
little file, I guess I'll learn more once I go to really large files.)

The secret

It struck me almost immediately once I started to analyze why the stock
sftp_write_nonblock.c problem doesn't run fast.

It is sort of an API limitation. If the API is used just like it traditionally
is we very easily get "boundary blockers" where libssh2 will wait until all
packets are received. If the size of the buffer passed to libssh2_sftp_write()
is not very large, like 100K or so, I then hardly see any higher speeds than
300K/sec here (16 seconds was what the example code originally took).

There are but two subtle changes I had to do to the application code to make
it really fly:

  1 - the most important: increase the buffer size A LOT. I upped it from 100K
      to 1000K, 1 megabyte. This alone makes a significant boost, only because
      the "boundary blockers" become much less frequent so libssh2 can do quite
      a few packets completely pipelined before they happen.

  2 - make the code use the buffer in a "sliding" manner so that as soon as
      libssh2 returns that data has been sent, the app moves the unsent data
      to the start of the buffer and reads data into the end of it in the
      gap of unused data. This makes the app avoid the blocking situation where
      it otherwise waits until the buffer is completely drained before it
      refills again and continues.

I've committed my modified example as sftp_write_sliding.c

Conclusion

The current library code works perfectly fine for this kind of pipelining and
so does in fact the API.

We should perhaps still consider somehow adding another API that allows the
application to achieve this somewhat easier without the need to do the massive
amount of memmove()s that I do now. At the very least it needs to be
documented if apps want to make use of libssh2's true SFTP powers.

What's next for me

1 - read your comments and feedback on this mail

2 - work on implementing the similar scheme for SFTP

3 - consider if another API is what we want/need and work with you guys what
     the best API/approach for that might be

-- 
  / daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
Received on 2010-12-05