Subject: Re: Why isn't scp 8-bit clean?

Re: Why isn't scp 8-bit clean?

From: Uli Zappe <>
Date: Tue, 31 Aug 2010 19:40:19 +0200

Am 31.08.2010 um 16:50 schrieb Daniel Stenberg:

> I guess a problem is that SCP is not a standard thing. It would be worthwhile to check how OpenSSH actually provides file names with non-ascii letters.

I've been working extensively with OpenSSH on Mac OS X, which uses UTF-8 file names. I've never experienced any problems with characters in file names (including Cyrillic and Asian ones), so I'm quite sure OpenSSH (including its implementation of scp) is fully UTF-8 compatible.

> I mean, if it truly is 8bit then surely the < 32 check is wrong as well?

I don't think so. Values < 0x20 (32) are always control sequences, AFAIK. They are in ASCII, and therefore obviously also in all 8-bit extensions of ASCII, which always only extend the ASCII character set, but don't change it in the 0x00-0x7F character range. In UTF-8, it's similar: one-byte characters are identical to ASCII, and for multibyte characters, each byte must be > 0x7F (127) by specification. I'm unsure about UTF-16, but I've never seen a file system that uses that for file names.

Of course, another question is what this if clause was intended to achieve in the first place. (I have no idea.)

> Uhm, UTF-8 file names can surely have bytes below 32, right?

Hm, these would be control characters, just like in ASCII. I don't know why a file system would forbid these characters in ASCII file names, but allow them in UTF-8 file names. In any case, I would think that this solely depends on the file system implementation and is nothing that's specific to UTF-8 file names.

> In fact, UTF-8 can even contain the bytes 0x0a and 0x0d so the checks for the end of line is then not good enough.

I'm not sure what you mean by that. Of course UTF-8 characters include 0x0a and 0x0d, as does ASCII, and they mean exactly the same thing in both cases.

> Now we only need to figure out the right fix...

If the whole purpose of this if clause is as unclear as it is to me, you could probably simply remove it completely. Other than that, I can see no harm if the < 32 check remains and only the > 127 check is removed.


  Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany
  Fon: +49-700-ULIZAPPE
  Fax: +49-700-ZAPPEFAX

Received on 2010-08-31