SOME MINOR ADDITIONS TO THE KERMIT PROTOCOL Begun: November 1997 D R A F T # 11 Thu Oct 17 16:33:57 2002 ROUGH DRAFT - TO BE FLESHED OUT AFTER DISCUSSION For next book... prefixing. Suppose you have 98 Ctrl-C's in a row, and Ctrl-C is unprefixed. Then you get ~~xxx (where x is Ctrl-C), because 97th and 98th are only two in a row and don't get collapsed. But this leaves three of them in a row in the packet, which can kill the transfer. Ditto for plus signs, or any other char. There has to be a rule to prevent this. (Maybe Ctrl-C should always be prefixed, since C-Kermit accepts ^C^C^C to break out of packet mode by default.) The same thing can happen (with very low probability) at the end of the data field, before the block check. For that matter, it's possible to have a Type-3 block check of "+++", which could disconnect the modem. Not that I've ever heard of such a thing happening in 20 years... Items that still need to be addressed: . Do something about Error packets -- block check type, frame number? . Add something about sending the file's original name -- a new attribute maybe. . A generic way to represent file permissions on the wire, and then REMOTE SET FILE PERMISSION, etc. Very cumbersome if done portably... Also, how to represent permissions in "universal directory listings"? 0. REMOTE SET TRANSFER MODE (DONE) Allows the client to set the server's transfer mode. Code: 410. Value: 0 for automatic, 1 for manual. How about REMOTE SET FILE PATTERNS { ON, OFF }? Not needed, since REMOTE SET TRANSFER-MODE takes care of this. If it were needed, however, then we'd also need to way for the client to modify the server's pattern list. 0.0.1. REMOTE SET MATCH { FIFO, DOTFILE } { ON, OFF } DOTFILE = 330 FIFO = 331 332-339 saved for other wildcard/match settings. (Note: 320 is FILE CHARACTER-SET) 0.1. WHATAMI2 For greater client/server coupling however, we're going to need a second WHATAMI field. 0.1.1. Transfer Mode REMOTE SET TRANSFER MODE isn't good enough. GET /BINARY or GET /TEXT really have to force the transfer mode to the one indicated, no matter what, just as SEND /TEXT (or /BINARY) does. But it only sets the *prevailing* mode. Internally these also set TRANSFER MODE to MANUAL for the duration of the command, but there is presently no protocol for the client to tell the server its transfer mode, so if the server has XFER MODE AUTO and PATTERNS ON, this will still take precedence on a per-file basis. In the worst case "get /text foo.com" will result in a binary transfer if "*.com" is in the server's binary pattern list. Thus we need a way for GET / to *temporarily* set the server's transfer mode to MANUAL. One way to do this is already sketched out for Extended GET. But it requires the client to send an O packet, which few servers understand. This should be implemented eventually anyway, but in the meantime we can do the same thing by adding a new WHATAMI bit. Unfortunately, however, the WHATAMI field is full, so now we need a second one, which directly follows the (variable-length) system ID field. The new field is called WHATAMI2, with the same format as WHATAMI: Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0 +--------+----------+----------+----------+-----------+----------+ | 1 | reserved | reserved | CHARSETS | RECURSIVE | XFERMODE | +--------+----------+----------+----------+-----------+----------+ Bit 5 Is set to 1 to indicate that the other bits are to be believed. This allows this field to be skipped over in the event it is not implemented, but a subsequent field is added. XFERMODE 0 = Automatic. 1 = Manual. RECURSIVE (See Section 0.1.2) 0 = This will not be a recursive transfer 1 = This is to be a recursive transfer CHARSETS (See Section 0.1.3) 0 = My TRANSFER CHARACTER SET is TRANSPARENT 1 = My TRANSFER CHARACTER SET is not TRANSPARENT The binary bit-encoded value is made printable by tochar() and then decoded by the receiver using unchar(). This mechanism lets the client control the server's transfer mode. The client's transfer mode is controlled by: a. The most recent SET TRANSFER MODE { AUTOMATIC, MANUAL }, which sets the prevailing mode (the default is AUTOMATIC); and: b. Any mode switch (/BINARY, /TEXT, etc) included with a SEND, GET, or other file-transfer command, which sets the transfer to MANUAL for the duration of the command; the prevailing mode is restored afterwards. The WHATAMI2 XFERMODE bit lets the client control the server's transfer mode. This is automatically "temporary", since each client SEND/GET command will set it appropriately. 0.1.2. Pathnames Next problem: GET /RECURSIVE works without having to tell the server to SET SEND PATHNAMES RELATIVE first because GET /RECURSIVE has its own packet type. But SEND /RECURSIVE still requires you to SET RECEIVE PATHNAMES RELATIVE on the server first, since there is nothing special in the I or S packet to tell the server to expect a recursive transfer. The WHATAMI2 RECURSIVE bit lets the client tell the server that incoming files will have pathnames attached and the server should automatically switch to RECEIVE PATHNAMES RELATIVE. But what if the user does not want the server to do this? We need: SET RECEIVE PATHNAMES AUTO to be the default, which means OFF normally, but RELATIVE if I'm a server and the client sets the WHATAMI2 RECURSIVE bit. If the user has set RECEIVE PATHNAMES to anything else but AUTO, then that value is used instead. The RECEIVE PATHNAMES value is saved and restored around the transfer, so if it was AUTO it goes back to AUTO. The reserved bits in the WHATAMI2 field must be set to 0. At such time as they are defined, their definitions must be such that the 0 value corresponds to previous and/or default behavior, so that their use will not cause interoperability problems with Kermit versions in which they are still reserved. 0.1.3. Character Sets Now, assuming all the aforementioned rules are in effect, we still have problem. Suppose the client and server are on "like platforms" and therefore would slip into FILE TYPE BINARY and TRANSFER MODE MANUAL automatically. This is just fine as long as character-set translation was not desired. Therefore the WHATAMI2 word also includes a CHARSETS bit, which is 0 for Kermits whose TRANSFER CHARACTER-SET is TRANSPARENT, and 1 if an actual transfer charset has been selected. Now the rule is: Do not switch to TRANSFER MODE MANUAL automatically if the WHATAMI2:CSET bit is 1. But this makes it potentially harder to recover broken transfers between like systems. 1. DIRECTORY OPERATIONS The aim of these changes is to allow the exchange of directory trees or file systems. It is assumed that all file systems are either tree-structured or flat. Hardly any protocol changes are needed, mainly just agreements on data formats. Most of the features are implemented outside the protocol: recursive SEND commands, automatic directory creation during RECEIVE commands, etc. 1.0. Directory Name Format Selection (DONE) (This is simplified considerably in Draft 2 after I implemented it in C-K...) SET FILE NAMES { CONVERTED, LITERAL } Now applies to pathnames too. For pathnames, CONVERTED means that the native directory notation is converted to standard format when sending, and the standard format is assumed when receiving. The related command: SET { SEND, RECEIVE } PATHNAMES { OFF, ABSOLUTE, RELATIVE } then applies as usual. PATHNAMES are OFF by default, in which case nothing is different. When SEND PATHNAMES is ABSOLUTE or RELATIVE, then the FILE NAMES setting is applied to them just as it is to the rest of the filename. When receiving files, a Kermit program should be expected to understand its own native format and the standard one; it cannot be expected to understand a foreign directory notation. Thus SET FILE NAMES CONVERTED should be used between unlike systems. Notes: 1. There is no reason why there can't be separate SET FILE NAMES commands and settings for each direction. 2. We haven't said anything that affects the protocol yet, that comes in the next section. 1.1. Kermit Protocol Directory Name Representation (DONE) UNIX notation shall be used for directories when FILE NAMES are CONVERTED. Forward slash (/) is the directory separator. If a / appears as a literal character in a directory name, then it should be written as //. A file or directory specification beginning with / is absolute, otherise it is relative. This is more or less the same scheme used by Info-ZIP and so it is widely proven in the real world. As always, the rule regarding letters when FILE NAMES are CONVERTED is to uppercase when sending. The receiver handles letters according to its own convention. Symbolic names like "." and ".." should be expanded before transmission. For the time being, we should use the rule that device names are always discarded (e.g. DOS disk letters, VMS disk names, etc). Note: I have this working now in VMS as well as UNIX: FILENAMES SEND PATHNAMES UNIX Result VMS Result CONVERTED OFF OOFA.TXT OOFA.TXT CONVERTED RELATIVE BLAH/OOFA.TXT BLAH/OOFA.TXT CONVERTED ABSOLUTE /W/FDC/TMP/BLAH/OOFA.TXT /FDC/BLAH/OOFA.TXT LITERAL OFF oofa.txt OOFA.TXT LITERAL RELATIVE blah/oofa.txt [.BLAH]OOFA.TXT LITERAL ABSOLUTE /w/fdc/tmp/blah/oofa.txt [FDC.BLAH]OOFA.TXT 1.2. Client/Server Directory Operations REMOTE MKDIR G packet function code "m" (yes, lowercase). Creates the specified directory. Names are as in 1.1 (absolute or relative). (DONE) REMOTE RMDIR G packet function code "d". Removes specified directory. Name can be wild. (DONE) REMOTE RMDIR /RECURSIVE G packet function code "t". Removes specified directory tree and all its contents. Like rm -Rf in UNIX. Name can be wild. (NOT DONE) 1.3. GET /RECURSIVE (DONE) New packet types: V for GET /RECURSIVE. Tells server to send all files that match the given specification in the current or given directory tree. Otherwise just like G for GET (DONE). W for GET /DELETE /RECURSIVE. Like V, but the server should delete each file after it is sent successfully (DONE). That should do it. 1.4. EXTENDED GET July 1998: No, it shouldn't. Because what about /RECOVER, etc? We are nearly out of (uppercase) packet types, and can't afford to add a new one for every combination of GET switches; even if we could, this unnecessarily overcomplicates the FSA that implements the protocol. Definition: Simple GET-Class Packet -- Any of the following: R - Original GET packet H - GET /DELETE (= RETRIEVE) V - GET /RECURSIVE W - GET /DELETE /RECURSIVE We should not have used up those packet types, but it's too late now. From now on, all new GET options go through a new Extended GET (XGET) packet, type "O", which is (a) capable of expressing all combinations of GET options (including those already expressed in the existing simple GET-class packets), and (b) extensible: O - (New) Extended GET With this addition, GET-Class Packets include the Simple GET-Class Packets plus the O-packet. Note that many GET commands are "ambiguous" in the sense that they could result in either a Simple GET-Class packet or an Extended GET packet. Suppose the client picks one form, but the server only implements the other? To resolve this situation in a user-friendly manner, the rule must be: Any Kermit client that implements Extended GET must also implement all of the Simple GET-Class packets (R, H, V, and W). Any GET command that can be expressed in a Simple GET-Class packet must be expressed that way; an Extended GET packet should be used only for combinations that are not expressible in a Simple GET-Class packet. The server, of course, should accept either form. Negotiation: None. If a server receives an O packet and does not understand it, it returns an Error packet in the normal fashion. Format: Packet type: O. Data field contains options and selectors in Modified PLV (Parameter, Length, Value) format. Modified PLV format is just like PLV format, except that a special escape character may be placed in the Length field to indicate that the Value field begins with a 2-character length. This allows for fields longer than 94, and in fact allows fields up to 8836 bytes long. Since any printable character is allowed in the regular PLV length field, this escape character must be either a control character or an 8-bit character, which is OK, since it will be encoded according to normal rules (see below). The escape mechanism should be used only when a value is longer than 94 bytes, which should happen only with filenames, and then only rarely. The escape character is Ctrl-V (SYN, ASCII 22). Parameters: O: Options (bits to ANDed together, result converted to a decimal string): 1 = Delete each source file after it is sent successfully (/DELETE) 2 = Recursive (/RECURSIVE) 4 = Recover (/RECOVER) 8 = Filename is a command (/COMMAND) 16 = Reserved 32 = Reserved *** Is there some reason this is limited to a single 6-bit byte? *** To be added (to first or second byte): xx = Use filemode given in yy, no matter what -- overrides all else yy = 0 = text, 1 = binary This is to allow GET /TEXT and GET /BINARY in the client to override any other kind of automatic transfer-mode determination in the server. If the user says /TEXT or /BINARY, they mean it. o: Reserved as a second Options byte. M: Local Transfer Mode (sets the server's mode for this transaction only): 0 = Text 1 = Binary or Image 2 = Auto (default) 3 = Labeled P: Pathnames: 0 = Server should send with pathnames stripped 1 = Server should send with relative pathnames 2 = Server should send with absolute pathnames N: Name Conversion: 0 = Server should send with literal names 1 = Server should send with converted names X: Transfer character set (client tells server which xfer charset to use; server picks corresponding file charset automatically by association). E: Exception name or pattern (was X). There can be more than one of these. The entire exception list applies to all filespecs. F: Filespec: Name or wildcard for requested file(s). There can be more than one of these. L: Larger than (size in bytes) S: Smaller than (size in bytes) A: After. File date-time, yyyymmdd hh:mm:ss, client's local time. Only send files modified AFTER the given date-time. a: After2. File date-time, yyyymmdd hh:mm:ss, client's local time. Only send files modified ON OR AFTER the given date-time. B: Before. File date-time, yyyymmdd hh:mm:ss, client's local time. Only send files modified BEFORE the given date-time. b: Before2. File date-time, yyyymmdd hh:mm:ss, client's local time. Only send files modified ON OR BEFORE the given date-time. C: After. File date-time, yyyymmdd hh:mm:ss, GMT. Only send files modified AFTER the given date-time. c: After2. File date-time, yyyymmdd hh:mm:ss, GMT. Only send files modified ON OR AFTER the given date-time. D: Before. File date-time, yyyymmdd hh:mm:ss, GMT. Only send files modified BEFORE the given date-time. c: Before2. File date-time, yyyymmdd hh:mm:ss, GMT. Only send files modified ON OR BEFORE the given date-time. @: End of Parameters Note that the P and N parameters raise a tricky question for the command language, since these parameters can apply separately at each end. For example, does GET /FILENAMES:LITERAL mean the server should send filenames literally, the client store them literally, or both? Ditto for GET /COMMAND? Currently this means the incoming file is to be fed to a command, as opposed to telling the server that it should be sending from a command (for which purpose we presently use "!" notation in the filename). O-Packets must be encoded -- unlike S/I/A packets -- because parameters and/or length fields might have any value at all. Thus PLV processing by the server must take place AFTER decoding. Examples: Here are some sample O packets: 1. ^A0 OO!7F&blah.x@ 0 ; GET /DEL /RECURS /RECOV blah.x 2. ^A3 OO!7M!1F&blah.x@ T ; GET /DEL /RECURS /RECOV /BIN blah.x 3. ^A. OO!7F##abc@ 7 ; GET /DEL /RECURS /RECOV abc 4. ^A. OO!7F##~#a@ R ; GET /DEL /RECURS /RECOV aaa 5. ^A O!3FO!7F#V!)abcdefghij...(lots more)...qrstuvwxyz@ 3 (1) shows that the M field is omitted when /TEXT or /BINARY not given. (2) shows that the M field is included when /TEXT or /BINARY is given. (3) shows how a length field of 3 is encoded as ##. (4) shows a filename that compresses to 3 characters. (5) shows what happens when a filename is more then 94 chars long -- the O-packet data field begins with an extended header ("!3F"), then the F parameter length field is a Control-V character, indicating the first two characters of the value field are a 2-byte length ("!)"). On clear channels or when Ctrl-V is unprefixed, it will be inserted literally rather than encoded as "#V". Protocol: If the client sends an option not understood by the server, the server MUST send an Error packet and return to server command wait. Otherwise, the resulting transfer could be incorrect (wrong mode, wrong file, wrong destination, etc). Thus, the client should not send options that were not specified by user (e.g. supply default options that were not given explicitly). Since filenames can be quite long, and any number of them can be included in an XGET command, the resulting parameter list could easily be greater than the negotiated packet length. Therefore we must allow for a series of O packets, as we do with A packets. We do, however, require that each parameter be totally contained within a packet, just as we do for A packets. Although it might be desirable to allow filenames, etc, to span packets, there is no pressing need for this (it is not allowed in F or R packets, nor with A-packet parameters, and nobody is complaining), and it would add considerable complication to the implementation. Therefore, the restriction that a filename must fit within the negotiated packet length is not changed by this protocol addition. Note that Simple GET-Class packets are not acknowledged; instead the server reverses the direction of the protocol by sending an S packet. O packets, on the other hand, must be numbered and, except for the last (or only) one, acknowledged individually. This means the final O packet MUST contain an End Of Parameters marker (@) as its last parameter. (Of course the final O-packet can be NAK'd, in which case the client must retransmit it.) As with any other GET operation, the server responds by resetting the sequence number to 0 and sending S(0), except in this case, only after the final O-Packet. A potential problem occurs if the S(0) sent in response to a final O-Packet whose sequence number was not 0 is lost. In this case, the client might time out and retransmit O(x). But x is not a valid sequence number any more so the server's transport layer will reject it with an error packet. But this is an unnecessary error, since all the server really needs to do is retransmit S(0). To avoid this situation, the following rule should be added: When the window size is 1 and a packet arrives, save it (or if memory is at a premium, save its control fields). Whenever a new packet arrives, compare it with the previous one and if it is a duplicate, ignore it. Or... if this causes too much overhead, put another ugly heuristic into the transport layer similar to the one for E packets... Wildcards and Patterns: Unlike regular GET, XGET should define a standard format for filenames and patterns, so clients need not know the special syntax of the server's underlying platform. Thus the following characters in filenames and patterns are reserved: * = matches any sequence of 0 or more characters ? = matches any single character / = directory separator (portable filenames are in UNIX format) But how to quote these characters when they are to be taken literally? First note that we also want to accept platform-specific syntax, and in a very common case, this includes DOS-format pathnames. Which rules out backslash as a quote character. Similarly for any other ASCII character. Therefore the quote character should be a control character: ^V (Control-V) This is natural for UNIX and TOPS-10/20 users and is very unlikely to appear in a filename (in case it does, it can quote itself). The Ctrl-V is encoded with the SET SEND CONTROl-PREFIX provided it is not included in the SET CONTROL UNPREFIX set. Possible conflicts occur on platforms that use wildcards differently, e.g. AOS/VS, where "*" matches any string of characters up to a period, and "+" matches any string of characters. If incoming "*" is translated to "+", then how would the client get the AOS/VS functionality? (With "*." -- so let's not worry about it.) 2. 32-BIT CRC We might as well, why not. The code for the CHKT field in the init string is "4". 32-bit CRC must not be implemented in the absence of 16-bit CRC. A special rule applies here, namely if one Kermit says "4" and the other says "3", then fall back to "3" instead of "1". The generating polynomial is: X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 taken "backwards" with the highest-order term in the lowest-order bit. The X^32 term is "implied"; the LSB is the X^31 term, etc. The X^0 term (usually shown as "+1") results in the MSB being 1. Code will be based on the well known and open Gary Brown code that everybody else uses. Unlike the type 1, 2, and 3 block checks, the 32-bit one should be encoded to never contain a blank. We can either use the same encoding as for the 16-bit CRC but excess-33 instead of -32 (resulting in 6 bytes), or we can write it more compactly as a base-94 number whose lowest digit is "!". (How many bytes is that?) (Joe notes that there might not be much value here, but we have learned that trying to persuade the masses that the reason we don't have such-and-such a feature that the others (read "Zmodem") have by filling blackboards full of math never works -- better to just go along... Anyway, this is just for the protocol definition, not necessarily to be implemented anywhere, and certainly not *required* anywhere.) 3. EX-POST-FACTO PER-FILE CRC CHECKING MS-DOS Kermit and C-Kermit can accumulate a 16-bit CRC of an entire transaction, and they include a rather cumbersome process for comparing the CRCs afterward, which works only in a client/server setting, and is script based: if fail remote query kermit crc16 if not = \v(query) \v(crc16) Obviously this can be expected to succeed only for binary-mode transfers, and so scripts that use this technique will break in text mode. A more general mechanism can be added to the protocol itself as follows: a. Add a new S/I packet parameter, after the last one that is defined, whatever that is (don't worry, I'll look it up). A single byte, this character has the same values as the Block Check parameter, except only "3" or "4" should be allowed. b. Add SET commands to turn the feature ON and OFF. It should be OFF by default, to avoid the extra overhead. c. When ON, it should be operative only for binary-mode transfers. d. At the end of file, the file sender puts the following in the Z-packet data field: The letter C and then the decimal character representation of the negotiated type of CRC for the file. e. If the CRC from (d) does not agree with the receiver's CRC, the receiver ACKs the Z packet with a Data field of N, optionally followed by its own CRC, otherwise it ACKs with either an empty data field or the letter C followed by the CRC (exactly as in the Z packet). It is up to the receiver how to dispose of the file when the CRCs don't match. f. When the sender receives a CRC mismatch indication, the SEND command must fail. But what does this mean when a file group is being sent? Should it stop and send an error packet or go on to the next file? This must be a user choice, so there will need to be some SET commands... In any case, if it is a SEND /DELETE (aka MOVE) operation, then the source file must not be deleted. Appropriate notations must be made in the transaction log, if any, etc. The per-file CRC mechanism operates independently of the \v(crc16) variable, which accumulates a CRC over the entire transfer, which could obviously become bollixed if a mixture of text and binary files were transferred in the same transaction, as can occur with VMS C-Kermit. 4. The Capabilities Mask We're out of bits, except for the "continued" bit. But if we use the continuation mechanism, we'll no doubt break every non-Kermit-Project Kermit implementation on earth, and probably also many of the old ones in our own collection. So to add more capability bits, we'll need to leave the "continued" bit blank, and add the second capabilities mask at the end. But the next available field is after a PLV field (system ID) and so it's also not in a fixed place... Solution: Recycle the three Checkpoint bytes, since Checkpointing has never been implemented and nobody has seen the spec. Currently we have (counting from 0): S[13] = '0'; <-- '0' means WONT CHECKPOINT. S[14] = '_'; S[15] = '_'; S[16] = '_'; S[13] (according to the checkpoint proposal) can have the following values: 0 = WONT I won't do it (SET CHECKPOINT DISABLED) 1 = WILL I will do it if asked (SET CHECKPOINT ENABLED) 2 = DO Please do it (SET CHECKPOINT ON) Now we give it a new one: 9 = XCAPAS (extended capability field) This clearly identifies the following bytes as capability words and not some vestige of checkpointing. The XCAPAS bytes are filled right to left in normal 6bit+32 format. Unused XCAPAS bytes are set to accent grave (`), which is outside the 6bit_32 range and therefore would not be mistaken for a capability word. New S[16] Capability bits: 1 = UTF8 Filenames (UTF8NAMES) 2 = GMT (UCT) file timestamps (GMTSTAMPS) 4 = (free...) 5. Info Exchange (NOT IMPLEMENTED YET) The idea is for the two Kermits to exchange information with each other that applies to the transaction as a whole, but is beyond the scope of (too voluminous for) the S/Y or I/Y exchange. a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). At this point, if the two Kermits have agreed, the sender may (but need not) send an "L" packet, which contains an unencoded parameter-length-value (PLV) sequence (just like an "A" packet) of information applying to the connection and the entire transfer. Parameters (all are optional): F = (Sender only) Number of files (expressed as decimal string) L = (Sender only) Total length, decimal string. Obviously iffy for text-mode transfers, but we've always had that problem. E = Encoding: Kermit transfer character-set designation for text used in any of these fields that can contain arbitrary text. Default = ASCII. Syntax: exactly as in A packet. H = Hostname (e.g. so local Kermit can show remote host's name on the file transfer display). D = Current directory, syntax according to SET FILE NAMES. O = Organization name. Arbitrary text, encoding specified in E. C = Country code (ISO 3166). T = Connection type (to allow automatic choices of various things based on whether the connection is known to be reliable -- e.g. TCP/IP at *both* ends). Number. 0 = unknown (usually the case when in remote mode); 1 = serial port; 2 = ISDN; 3 = TCP; 4 = UDP; 5 = CTERM; 6 = LAT; etc etc. A = Address. Interpreted according to connection type. This can be the IP hostname, IP address, or other address specific to the network type, or telephone phone number in +1(212)7654321 format, for display on the other Kermit's screen, or logging, or callback, or any other desired reason. All sorts of uses for this one can be imagined. X = Encryption identifier (this needs spelling out). K = Public key for X, when applicable (more thought needed). N = (Receiver only): No. Refuses the transaction. Optionally one or more more parameter letters are given as data, to indicate the reason for refusal. Also add specific platform identifier, OS name and version, Kermit software name and version, endianness, ... The order doesn't matter, except that if E is given, it must precede any arbitrary-text fields. We can have up to 96 parameters, one for each 7-bit graphic character. One must be reserved as an escape for when we run out. NOTE: "L" was our last unused uppercase letter for packet types. Additional packet types will be lowercase letters or other graphic characters. At least one must be reserved as an escape for when we run out. Notes on encyrption (from Jeff): Now that the PGP style of public key encryption is no longer covered by patent and it looks like the IETF is going to accept PGP encryption as their standard, Kermit public key encryption could work like this: . The sender and receiver would negotiate the type of encryption to use. . The receiver would then deliver its public key to the sender. . The sender would then encrypt all data for the transaction using that public key, which only the receiver would be able to decrypt. This would allow Kermit to generate keys completely on the fly without any need for local files or user intervention. 6. Extended Sequence Numbers and Window Size 32 just isn't big enough, e.g. for interplanetary transfers, not to mention the Internet some days. But we can't increase it beyond 32 because it is limited to the half the sequence-number range, which is 64. So for larger windows we must increse the sequence number space. But we can't do this in the regular sequence number field, at least not significantly, because it is restricted to a 64-byte codeset (in theory maybe 94, but that too would require a change in the protocol, and as long as we're changing it, let's shoot higher). (*** This not so important any more because of streaming ***) 6.1. Negotiation a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). d. Add another 2-byte field to the init string, XWINDO. This works exactly like long packet negotiation. If the bit is set then we fetch the actual window size from the two XWINDO bytes, which are in excess-32 base-95 notation, just like the extended packet length. The receiver that doesn't understand this option, of course, fetches the window size from the regular WINDO field. When this option is negotiated, the maximum sequence number is thus 95^2 - 1 = 9024, and the maximum window size is half that, or 4512. A 4512-packet window of 9024-byte packets (the theoretical maximum) would require about 7MB of packet buffers. Obviously a smaller actual maximum can be imposed by the implementation. 6.2. Packet Format When an extended window size is negotiated, the packet sequence number is indicated as ` (backquote, ASCII 96) to indicate that the full 2-byte base-95 packet number is included in the extended header. For long packets, this goes between the length and the header checksum. For short packets, it forms the extended header by itself (with the header checksum of course). 6.3. Improved Packet Framing There is nothing in a basic Kermit packet to indicate where the data ends and the block check begins. But we have the opportunity in extended-sequence packets to use a better format. In these packets, the packet length indicates the beginning of a PLV format block check. Parameters are the block-check codes (1, 2, 3, B, 4). The length indicates the number of bytes in the block check. Then the block check. In addition to preventing foulups, this allows the block check type to be varied dynamically throughout the transaction. It also allows a graphic character to be placed after the block check in case it ends with a blank. Thus "Kermit-II" packets add 6 bytes of overhead to short packets: . The wasted SEQ byte . The 3-byte extended header . 2 extra bytes for the packet block check and 5 bytes for long packets: . The wasted SEQ byte . 2 bytes in the extended header that is already there . 2 extra bytes for the packet block check 7. Supervisory Packets These can be used for "out of band" functions. Supervisory packets must be numbered, just like regular ones, because otherwise there is no way for the receiver to indicate that it was or wasn't received. Let's call this a "u" packet. It can be sent only by the file sender, and it can be sent at any time during a transaction if negotiated: a. Add a new capability bit for this. b. The file sender sets this bit in its S packet. c. The file receiver agrees by setting the same bit in its ACK(S). Contents are, again, the familiar PLV sequences. Some possible parameters: M = Message. To be logged or shown in the display. W = Change window size P = Change packet length R = Reset to defaults S = Sync D = Drain B = Buffer credit (I'm not really sure yet whether any of these make sense, or what they would do, or how they would work, or what else we can do here, so this is mainly just a placeholder.) The sender ACKs with the normal indications (Y or N, length, list of tags). If the file receiver wants to send a supervisory message, it can be placed into the data field of any D-packet ACK: the letter "u" followed by PLV sequences (we can't put these in *any* ACK because some already are allowed to contain arbitrary string data, e.g. ACK(F), tsk tsk). The file sender "acknowledges" by sending a "u" packet, which must then be ACK'd by the receiver with an empty ACK. 8. Compression (Note: much of this discussion also applies to per-file encryption...) This is indicated in the A packet. The book says attribute * (Encoding) is the place to do this and lists Huffman Encoding (Q) as an example of compression. So we can add something like "Z" for ZIP/Zlib compression. So far so good. The " (Type) field that lists the filetype, A (text) or B (binary). Unfortunately, this has become synonymous with "transfer mode". Which has not been a problem until now. What if we want to send a text file with compression? We must do all the character-set and record-format conversion first, then compress it, and the transfer must occur in binary mode, yet the receiver must know to apply its normal text-mode conversions upon it after decompressing. Questions: 1. Should we define a capability bit for compression? . Yes, so the two Kermits can negotiate about it in the normal way. . No, because there might be many compression methods. Maybe it's best to skip the capability bit and simply lump this in with Attribute capability, and then let the Attribute refusal mechanism take care of negotiation. But then there's no way for the sender to bid for compression but fall back to noncompression if the receiver fails to agree. UNLESS... If the receiver explicitly "ACKs" the compression in its ACK(A), then it will be compressed, otherwise it won't be. 2. How do we specify that we are sending a compressed text file? . The *Z attribute overrides the "A attribute? No, because old Kermits would not know to do this and so would corrupt the file. . Always send in binary mode ("B), but notify the receiver in some other way that once uncompressed, it's a text file. This would work with old Kermits (the received compressed file would be stored as sent, binary, and could be decompressed afterwards). But where is the other info? How about this: *ZA means compressed text, *ZB means compressed binary. When compression was selected, the SET FILE TYPE value would move to the *Z? field, and the "file type" would be binary 9. Format of System-Dependent File Permissions in A-Packets (DONE) The format of this field (the "," attribute) is interpreted according to the System ID ("." Attribute). For UNIX (System ID = U1), it's the familiar 3-digit octal number, the low-order 9 bits of the filemode: Owner, Group, World, e.g. 660 = read/write access for owner and group, none for world, recorded as a 3-digit octal string. For VMS (System ID = D7), it's a 4-digit hex string, representing the 16-bit file protection WGOS fields (World,Group,Owner,System), in that order (which is the reverse of how they're shown in a directory listing); in each field, Bit 0 = Read, 1 = Write, 2 = Execute, 3 = Delete. A bit value of 0 means permission is granted, 1 means permission is denied. Sample: r-01-00-^A/!FWERMIT.EXE'" s-01-00-^AE!Y/amd/watsun/w/fdc/new/wermit.exe.DV r-02-01-^A]"A."D7""B8#119980101 18:14:05!#8531&872960,$A20B-!7(#512@ #.Y s-02-01-^A%"Y.5! ^^^^^^ A VMS directory listing shows the file's protection as (E,RWED,RED,RE) which really means (S=E,O=RWED,G=RED,W=RE), which is reverse order from the internal storage, so (RE,RED,RWED,E). Now translate each letter to its corresponding bit: RE=0101, RED=1101, RWED=1111, E=0010 Now reverse the bits: RE=1010, RED=0010, RWED=0000, E=1101 This gives the 16-bit quantity: 1010001000001101 This is the internal representation of the VMS file permission; in hex: A20B as shown in the sample packet above. The VMS format probably would also apply to RSX or any other FILES-11 system. 10. Handling of Generic Protection To be used when the two systems are different (and/or do not recognize or understand each other's local protection codes). First of all, the book is wrong. This should not be the World protection, but the Owner protection. The other fields should be set according to system defaults (e.g. UNIX umask, VMS default protection, etc), except that no non-Owner field should give more permissions than the Owner field. 11. Dates and Times in Attribute Packets In keeping with good protocol design, conversions of dates and times between two Kermit partners, if they are to be done at all, require a standard date/time on the wire, so each Kermit program needs to know only how to convert between its local time and the standard, and does NOT need to know anything about the other Kermit's timezone. The standard time is GMT. The date-time attribute in the A packet should be clearly described as LOCAL TIME, not to be converted. The use of GMT can be negotiated via capability bits. See Section 4. Ditto for the Extended GET packet described above... 12. Tight Coupling of Client and Server via TELNET Protocol Described in IKSD and TELNET KERMIT OPTION RFCs. 13. REMOTE EXIT (DONE) BYE logs out the server's job. FINISH returns to either the command prompt or the shell depending on how the server was started. But the client does not necessarily know how the server was started. REMOTE EXIT addresses this (partially) by telling the server to exit to the shell, no matter how it was started. Format: Generic Server Command X. Protocol: If EXIT is disabled, server sends an Error packet and does not exit; otherwise, it sends an ACK and exits. The classic problem of the ACK being lost can occur here, just as it can with BYE, or the B packet. 14. REMOTE STATUS This one is in the Kermit book but was never described or implemented. Let's define it as a short string that indicates the server's capabilities, in PLV notation. The string is returned in the Data field of of the ACK to the REMOTE STATUS command, and thus may not exceed the negotiated packet length. The data field is encoded in the normal fashion. The parameters returned are: 0 - Login status (3 bytes) 0 = Not logged in but login is required 1 = Logged in as a user 2 = Logged in anonymously 1 - IKS status (3 bytes) 0 = IKS not available 1 = IKS available but not negotiated 2 = IKS negotiated, indicates tight coupling of client and server 2 - Acceptable Client Packet Type List (up to about 24 bytes): A string containing the "top level" commands that are available for for execution (i.e. that are both implemented and enabled). The string is composed of the packet types that may be sent to the server when it is in server command wait state, e.g. "CGHIJORSVW". There is no need to include standard types such as BDEFXNY, etc; if they are included, they are ignored. 3 - Acceptable REMOTE Command List (up to about 30 bytes): A string containing the REMOTE commands that are available for execution (implemented and enabled). The string is composed of the Generic Server Command subtypes, e.g. "ACDEFHIJKLQRSTUVWXdm". The client parses the response and sets local variables accordingly, and also may display an appropriate message, and set up detailed information to be displayed in a subsequent SHOW SERVER command. It might also disable/remove/mark client commands that are unavailable in the server. NOTE: In case this features should grow beyond the capacity of a single Data field, it can become a long-form reply, but a new packet would be needed to distinguish it from other server long-form replies. 15. REMOTE LOGIN (DONE) The book doesn't say what should happen if it fails. The server should send an Error packet with text "Access denied." Also, the book says nothing about the authentication method, which is fine. It depends entirely on the implementation. 16. UNICODE (DONE) UCS-2 Level 1, Group 00, Plane 00: FCS and TCS UTF-8 Level 1, Group 00, Plane 00: FCS and TCS TCS Kermit designators: UCS-2: I162 (= Level 1) UTF-8: I190 (= Level 1, but accept I196 incoming = Level unspecified). There is no restriction regarding breaking of UCS-2 or UTF-8 sequences across packets. 17. FILE SIZE (OPEN) When sending a file, we put the file size into the A packet. But this is not terribly useful when FCS is single-byte and TCS is multibyte (or v.v.). But the protocol definition says the file size must be used, not the estimated "transfer size". The receiver has no way of knowing about any expansion or compaction of the orignal file, since it can only see the transfer encoding. So we need another attribute: estimated expansion factor (as a percent). 18. CANCELLATION (DONE) The protocol allows X or Z in the ACK to a D packet. It must be (has been) extended to also allow this in the ACK to the Z packet, because of streaming, to catch the edge case when an entire file's contents fits in a single data packet, or for empty files. 19. UTF-8 FILENAMES The coding of filenames has never been specified because the A packet, in which the charset is given, comes before the F packet. So when a filename comes in, we have no idea what its character set is (we can *guess* that it is the current TRANSFER character set, but that's far from certain). Now that there is a universal character set and a standard Internet representation for it, i.e. UTF-8, we can use that for all filenames, regardless of the file or transfer character set, as long as the two Kermits agree beforehand. Negotiation is done simply by setting the UTF8NAMES capability bit. If both Kermits set it, names are encoded in UTF8. If not, the (unspecified and unpredictable) previous method is used. When UTF8 names are used: . When receiving files, we convert all incoming filenames from UTF8 to the current file character set (after deciding what it is based on the incoming transfer charset via file associations). The tricky part is, we don't know what the transfer charset is until the A-packet comes; therefore we have to defer opening the output file until we get the Z or first D packet -- but we already do that. But when the F-packet comes, we have to put the local name in the ACK, and what character set we use for that??? UTF-8 is the only one that makes sense, but since we generally return the full pathname and possibly do other conversions (or we have an as-name), this requires that we (a) convert the incoming UTF-8 name to whatever character set is used locally for filenames, then (b) construct the new filename, and (c) convert the result back to UTF-8. Unfortunately, this whole area is a minefield. We can neither assume that all local filenames have the same encoding, nor that every filename has the same encoding as the file's contents. In any case, binary file contents have no encoding at all. Also: - What about filename collisions? They will work only if we GUESS right about the encoding for the local filename before we have the information we must know to guess right (the transfer character set). - What about "set file names converted"? Case folding in Unicode requires lookups in two big databases. - Do we allow filenames to include combining characters? If we do, that's MORE database lookups (character properties) AND a sort, e.g. to convert to Normalization Form C. What if there is no mapping to the current character set? - etc etc etc... No doubt other problems would surface in the course of implementation. . When sending files, we get the local file's character set with filescan or whatever, and then ASSUME the filename is in the same character set, and convert to UTF-8 for the F-packet. Obviously this assumption might be wrong. Perhaps we can check it by filescanning the name itself, but names are generally not long enough to give a reliable result. Even if this works: - The ACK comes back in UTF-8. How do we display it? Convert it to the local FILE character set? How do we record it in the Transaction log? Do we need to let the user specify not only the file and transfer character sets, but also the console and log character sets? - etc etc. This is not as simple as it seemed at first!