INTERNET-DRAFT Charles H. Lindsey Usenet Format Working Group University of Manchester June 2003 Usenet Best Practice Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC 2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This Draft is intended as an informational document. Its purpose is to set out how software should behave and conventions which users should observe, in order that Netnews in general, and Usenet in particular, should provide the most effective service to its users. [Remarks enclosed in square brackets and aligned with the left margin, such as this one, are not part of this draft, but are editorial notes to explain matters amongst ourselves, or to point out alternatives, or to assist the RFC Editor.] [In this draft, references to [NNTP] are to be replaced by [RFC 977], or else by references to the RFC arising from the series of drafts draft- ietf-nntpext-base-*.txt, in the event that such RFC has been accepted at the time this document is published. Likewise, if may be possible to replace references to [RFC 2279] by references to [RFC 2279bis].] [This is a very preliminary draft of the "USEAGE" document intended to accompany [USEFOR]. Currently, it is composed largely of paragraphs extracted verbatim from earlier [USEFOR] drafts, and consequently if follows the section numbering of [USEFOR]. A heading appears for every section, though most of them are empty (indicating that no texts were taken over). Naturally, this situation will change as soon as serious editing work on this document commences, and the layout and structure of the final text is likely to be very different. The purpose of this preliminary draft is simply to establish what material has been taken over.] 1. Introduction 1.1. Basic Concepts "Netnews" is a set of protocols for generating, storing and retrieving news "articles" (which resemble email messages) and for exchanging them amongst a readership which is potentially widely distributed. It is organized around "newsgroups", with the expectation that each reader will be able to see all articles posted to each newsgroup in which he participates. These protocols are defined in [USEFOR]. "Usenet" is a particular worldwide open network based upon the Netnews protocols, with the newsgroups being organized into recognized "hierarchies". Anybody can join (it is simply necessary to negotiate an exchange of articles with one or more other participating hosts). Usenet "belongs" to those who administer the hosts of which it is comprised. There is no Cabal with overall authority to direct what is to be be allowed. Nevertheless, there do exist agencies within Usenet that have authority to establish policies and to perform administrative functions, but such authority derives solely from the consent of those sites which choose to recognize it (and who can decline to exchange articles with sites which choose not to recognize it). Usually, the authority of such an agency is restricted to a particular hierarchy, or group of hierarchies. A "policy" is a rule intended to facilitate the smooth operation of a network by establishing parameters which restrict behaviour that, whilst technically unexceptionable, would nevertheless contravene some accepted standard of "Good Netkeeping". Since the ultimate beneficiaries of a network are its human readers, who will be less tolerant of poorly designed interfaces than mere computers, articles in breach of established policy can cause considerable annoyance to their recipients. 1.2. Objectives The purpose of this document is to set out how software should behave and conventions which users should observe, in order that Netnews in general, and Usenet in particular, should provide the most effective service to its users. Often, these conventions are a matter of policy which may vary from network to network, from hierarchy to hierarchy within one network, and even between individual newsgroups within one hierarchy. It is assumed, for the purposes of this document, that agencies with varying degrees of authority to establish such policies will exist, and that where they do not, policy will be established by mutual agreement. However, it is NOT the purpose of this document to define how the authority of various agencies to exercise control or oversight of the various parts of Usenet is established (that is itself a matter of policy). For the benefit of networks and hierarchies without such established agencies, and to provide a basis upon which all agencies can build, this present document often provides default policy parameters, usually introducing them by a phrase such as "As a matter of policy ...". 1.3. Historical Outline 1.4. Transport 2. Definitions, Notations and Conventions 2.1. Definitions All the technical terms defined in [USEFOR] are considered to be defined in this document also. 2.2. Textual Notations This document contains explanatory NOTEs using the following format. These may be skipped by persons interested solely in the content of the specification. The purpose of the notes is to explain why choices were made, to place them in context, or to suggest possible implementation techniques. NOTE: While such explanatory notes may seem superfluous in principle, they often help the less-than-omniscient reader understand the true intent of the specification in cases where the wording is not entirely clear. Certain words, when capitalized, are used to define the significance of individual requirements. The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", "MAY" and "OPTIONAL", and any of those words associated with the word "NOT", are to be interpreted as described in [RFC 2119]. However, as provided in that RFC, the force of these words is lower here than would have been the case in a standards track document. In particular, violation of a MUST or SHOULD does not necessarily imply a failure of interoperability, but rather that established policy or accepted best practice would be breached, to the detriment of the good order of Usenet. NOTE: The extreme irritation caused to other readers by such violations is not to be underestimated; however, enforcement of such rules is more a matter of sensible design or of social pressure (whose effectiveness should not be underestimated, even though it cannot be prescribed). [That is the proposed new wording. However, for the moment, I have preserved the distinction between "Ought" and "SHOULD", so that you can see where each came from. Ultimately, all the "Oughts" will become "SHOULD"s, or be otherwise done away with. In the meantime, here follows the present definitions.] Certain words, when capitalized, are used to define the significance of individual requirements. The key words "MUST", "REQUIRED", "SHOULD", "RECOMMENDED", "MAY" and "OPTIONAL", and any of those words associated with the word "NOT", are to be interpreted as described in [RFC 2119]. In addition, the word "Ought", when applied to a poster, or to actions of posting and similar agents which a poster may easily override, indicates a recommendation whose violation would do no more than breach established policy, or accepted best practice. NOTE: The use of "MUST" or "SHOULD" implies a requirement that would or could lead to interoperability problems if not followed. Although not following an "Ought" recommendation might do no worse than cause extreme irritation to other readers, particularly in the case of the publicly distributed Usenet, that is no reason not to take it seriously. The essential distinction is that enforcement of a "MUST" or "SHOULD" is a matter of ensuring correct implementation, whereas enforcement of an "Ought" is more a matter of sensible design or of social pressure (whose effectiveness should not be underestimated, even though it cannot be prescribed by this standard). [End of old text, to be removed.] NOTE: A requirement imposed on a relaying or serving agent regarding some particular article should be understood as applying only if that article is actually accepted for processing (since any agent may always reject any article entirely, for reasons of site policy). [That NOTE can probably be removed, or severely rewritten, once we have a better idea of the requirements/recommendations we are going to make in this document.] Wherever the context permits, use of the masculine includes the feminine and use of the singular includes the plural, and vice versa. Throughout this document we will give various examples. In order to prevent possible conflict with "Real World" entities and people the top level domain ".example" is used in all sample domains and addresses. The hierarchy "example.*" is also used as a sample hierarchy. Information on the ".example" top level domain is in [RFC 2606]. 2.3. Relation To Email and MIME 2.4. Syntax 2.4.1. Syntax adapted from Email and MIME 2.4.2. Syntax copied from other standards 2.5. Language 3. Changes to the existing protocols 3.1. Principal Changes 3.2. Transitional Arrangements 4. Basic Format 4.1. Syntax of News Articles 4.2. Headers 4.2.1. Naming of Headers There is a preferred case convention, which posters and posting agents Ought to use: each hyphen-separated "word" has its initial letter (if any) in uppercase and the rest in lowercase, except that some abbreviations have all letters uppercase (e.g. "Message-ID" and "MIME-Version"). The forms given in the various rules defining headers in [USEFOR] are the preferred forms for them, but relaying and reading agents are expected to tolerate articles not obeying this convention. 4.2.2. MIME-style Parameters 4.2.3. White Space and Continuations Although header-contents are defined in such a way that folding can take place between many of the lexical tokens (and even within some of them), folding SHOULD be limited to placing the CRLF at higher- level syntactic breaks, and SHOULD also avoid leaving trailing WSP on the preceding line. For instance, if a header-content is defined as comma-separated values, it is RECOMMENDED that folding occur after the comma separating the values, even if it is allowed elsewhere. 4.2.4. Comments A comment is normally used to provide some human readable informational text, except at the end of a mailbox which contains no phrase, as in fred@foo.bar.example (Fred Bloggs) as opposed to "Fred Bloggs" . The former is a deprecated, but commonly encountered, usage and reading agents SHOULD take special note of such comments as indicating the name of the person whose mailbox it is. In all other situations a comment is semantically interpreted as a single SP. 4.2.5. Header Properties 4.2.5.1. Experimental Headers 4.2.5.2. Inheritable Headers 4.2.5.3. Variant Headers 4.2.6. Undesirable Headers Headers that merely state defaults explicitly (e.g., a Followup-To- header with the same content as the Newsgroups-header, or a MIME Content-Type-header with contents "text/plain; charset=us-ascii") or state information that reading agents can typically determine easily themselves (e.g. the length of the body in octets) are redundant and posters and posting agents Ought Not to include them. 4.3. Body 4.3.1. Body Format Issues Posters SHOULD avoid using control characters and escape sequences except for tab (US-ASCII 9), formfeed (US-ASCII 12) and, possibly, backspace (US-ASCII 8). Tab signifies sufficient horizontal white space to reach the next of a set of fixed positions; posters are warned that there is no standard set of positions, so tabs should be avoided if precise spacing is essential. Formfeed (which is sometimes referred to as the "spoiler character") signifies a point at which a reading agent Ought to pause and await reader interaction before displaying further text. NOTE: Passing other control characters or escape sequences unaltered to a display or printing device is likely to have unpredictable results, except in the case of a device adapted to the special needs of some particular character set. NOTE: Backspace was historically used for underlining, done by an underscore (US-ASCII 95), a backspace, and a character, repeated for each character that should be underlined. Posters are warned that underlining is not available on all output devices or supported by all reading agents and is best not relied on for essential meaning. 4.3.2. Body Conventions The following conventions for quotations, attributions and signatures, although not mandated by any standard, describe widely used practices, and it is RECOMMENDED that all posting, reading and followup agents should adhere to them. Since much software will attempt to recognize and act upon them, questions of interoperability can arise, and so the use of the words "MUST", "SHOULD", etc. is to be understood in that context. It is conventional for followup agents to enable the incorporation of the followed-up article (the "precursor") as a quotation. This SHOULD be done by prefacing each line of the quoted text (even if it is empty) with the character ">" (or perhaps with "> " in the case of a previously unquoted line). This will result in multiple levels of ">" when quoted content itself contains quoted content, and it will also facilitate the automatic analysis of articles. NOTE: Posters should edit quoted context to trim it down to the minimum necessary. However, followup agents Ought Not to attempt to enforce this beyond issuing a warning (past attempts to do so have been found to be notably counter-productive). The followup agent SHOULD also precede the quoted content by an "attribution line" (however, readers are warned not to assume that they are accurate, especially within multiply nested quotations). The following convention for such lines is intended to facilitate their automatic recognition and processing by sophisticated reading agents. The attribution SHOULD contain the name and/or the email address of the precursor's poster, as in Joe D. Bloggs wrote: or Helmut Schmidt schrieb: The attribution MAY contain also a single newsgroup-name (the one from which the followup is being made), the precursor's Message-ID and/or the precursor's Date and Time. Any of these that are present, SHOULD precede the name and/or email address. However, the inclusion or not of such fields Ought always to be under the control of the poster. To enable this line, and the Message-ID and the email address within it, to be recognized (for example to enable suitable reading agents to retrieve the precursor or email its poster by clicking on them), the following conventions SHOULD be observed: o The precursor's Message-ID SHOULD be enclosed within <...> or o The precursor's poster's email address SHOULD be enclosed within <...> o The various fields may be separated by arbitrary text and they may be folded in the same way as headers, but attributions SHOULD always be terminated by a ":" followed by CRLF. Further examples: On comp.foo in <1234@bar.example> on 24 Dec 2001 16:40:20 +0000, "Joe D. Bloggs" wrote: Am 24. Dez 2001 schrieb Helmut Schmidt : A "personal signature" is a short closing text automatically added to the end of articles by posting agents, identifying the poster and giving his network addresses, etc. Whenever a poster or posting agent appends such a signature to an article, it MUST be preceded with a delimiter line containing (only) two hyphens (US-ASCII 45) followed by one SP (US-ASCII 32). The signature is considered to extend from the last occurrence of that delimiter up to the end of the article (or up to the end of the part in the case of a multipart MIME body). Followup agents, when incorporating quoted text from a precursor, Ought Not to include the signature in the quotation. Posting agents Ought to discourage (at least with a warning) signatures of excessive length (4 lines is a commonly accepted limit). 4.4. Characters and Character Sets 4.4.1. Character Sets within Article Headers 4.4.2. Character Sets within Article Bodies It is not expected that reading agents will necessarily be able to present characters in all possible character sets. For example, a reading agent might be able to present only the ISO-8859-1 (Latin 1) characters [ISO 8859], in which case it Ought to present undisplayable characters using some distinctive glyph, or by exhibiting a suitable warning. 4.5. Size Limits Posting agents SHOULD endeavour to keep all header lines, so far as is possible, within 79 characters by folding them at suitable places. However, posting agents MUST permit the poster to include longer headers if he so insists. Likewise, injecting agents SHOULD fold any headers generated automatically by themselves. In plain-text messages (those with no MIME headers, or those with a MIME Content-Type of text/plain) posting agents Ought to endeavour to keep the length of body lines within some reasonable limit. The size of this limit is a matter of policy, the default being to keep within 79 characters at most, and preferably within 72 characters (to allow room for quoting in followups). Exceptionally, posting agents Ought Not to adjust the length of quoted lines in followups unless they are able to reformat them in a consistent manner. Moreover, posting agents MUST permit the poster to include longer lines if he so insists. NOTE: Plain-text messages are intended to be displayed "as-is" without any special action (such as automatic line splitting) on the part of the recipient. The policy limit (e.g. 72 or 79) should be expressed as a number of characters (as they will be displayed by a reading agent) rather than as the number of octets used to encode them. 5. Mandatory Headers 5.1. Date NOTE: A convention that is sometimes followed is to add a comment, after the date-time, containing the time zone in human-readable form, but many of the abbreviations commonly used for this purpose are ambiguous. The value given by the is the only definitive form. 5.1.1. Examples 5.2. From Each mailbox in the From-content SHOULD be a valid address, belonging to the poster(s) of the article, or the person or agent on whose behalf the article is posted. When, for whatever reason, a poster does not wish ro use a valid address, the mailbox concerned SHOULD, to comply with [USEFOR], end in the top level domain ".invalid" [RFC 2606]. NOTE: Since such addresses ending in ".invalid" are undeliverable, user agents Ought to warn any user attempting to reply to them and Ought Not, in any case, to attempt to deliver to them (since that would be pointless anyway). Whether or not a valid address can subsequently be extracted from such an address falls outside the scope of this document (obviously, posters wishing to disguise their address need to do more than just add ".invalid" to it). Be warned, however, that some injecting agents which are unable to detect that a mailbox belongs to the poster may choose to insert a Sender-header or some entry in an Injector-Info-header which discloses some valid address for the poster. 5.2.1. Examples: 5.3. Message-ID [This might be a good place to discuss sensible means of generating message identifiers to satisfy the "NEVER" requirement. Recall that we have a draft on www.landfield.com/usefor that was written in the early days of Usefor.] 5.4. Subject Although the addition of the back-reference "Re: " is not required, it is the normal practice, and followup agents SHOULD do it. Followup agents MAY remove strings that are known to be used erroneously as back-references (such as "Re(2): ", "Re:", "RE: ", or "Sv: ") from the Subject-content when composing the subject of a followup, and add a correct back-reference in front of the result. Agents SHOULD NOT depend on nor enforce the use of back references by followup agents. For compatibility with legacy news software, the Subject-content of a control message (i.e. an article that also contains a Control-header) MAY start with the string "cmsg ", and non-control messages MUST NOT start with the string "cmsg ". See also section 6.13. [Is that MUST NOT a little too strong nowadays? Do there really still exist servers or other agents that will recognize and act upon "cmsg" in a Subject-header? And if so, maybe that MUST NOT should be moved back into [USEFOR].] [Schmuel asked for "I'd like to see text that discourages relying on re: for threading."] 5.4.1. Examples In the following examples, please note that only "Re: " is mandated by [USEFOR]. "was: " is a convention used by many English-speaking posters to signal a change in subject matter. Software can always recognize such changes from the References-header. Subject: Film at 11 Subject: Re: Film at 11 Subject: Godwin's law considered harmful (was: Film at 11) Subject: Godwin's law (was: Film at 11) Subject: Re: Godwin's law (was: Film at 11) Subject: Re: Godwin's law 5.5. Newsgroups Agencies responsible for the administration of particular hierarchies SHOULD place restrictions on the newsgroup-names they allow within those hierarchies. [USEFOR] provides the following default restrictions upon which hierarchy administrators can build, and which SHOULD otherwise be applied in hierarchies not subject to such management. NOTE: These restrictions are intended to reflect existing practice and are intended both to avoid certain technical difficulties and to avoid unnecessary confusion. They may well change over time in the light of future experience. 1. Uppercase letters are forbidden. NOTE: Traditionally, newsgroup-names have been written in lowercase. However, posting agents Ought Not to convert uppercase characters to the corresponding lowercase forms except under the explicit instructions of the poster. 2. A component name is forbidden to consist entirely of digits. NOTE: This requirement was in [RFC 1036] but nevertheless several such groups have appeared in practice and implementors should be prepared for them. A common implementation technique uses each component as the name of a directory and uses numeric filenames for each article within a group. Such an implementation needs to be careful when this could cause a clash (e.g. between article 123 of group xxx.yyy and the directory for group xxx.yyy.123). 3. A component is limited to 30 component-graphemes and a newsgroup- name to 71 component-graphemes (counting also the '.'s separating the components). NOTE: Whilst there is no longer any technical reason to limit the length of a component (formerly, it was limited to 14 octets) nor of a newsgroup-name, it should be noted that these names are also used in the newsgroups-line where an overall policy limit applies and, moreover, excessively long names can be exceedingly inconvenient in practical use. Nevertheless, [USEFOR] requires serving and relaying agents to accept any syntactially correct newsgroup-name, even if it would violate one or more of these policy restrictions. Posting and injecting agents MAY attempt to enforce them but, because of the possibility that hierarchy policies or future standards may relax them, it SHOULD be possible for posters to override such checks, and software MUST be so written that they can be disabled altogether. [But here we should add that hierarchy administrators need to go much further that these three rules. They need to ensure that newsgroup-names make good sense in the languages used in their hierarchies, that frivolous names are avoided, and that sensible hierarchical principles are applies (David Wright has a FAQ on hierarchical naming which might give us some help). Moreover, we should mention the possibililty that there will in future be internationalized newsgroup-names, in which case there are lots more issues to consider in order to avoid unsuitable characters (see draft-ietf-usefor-article-08.txt for some of these).] Posting agents MAY, and followup agents SHOULD, accept articles crossposted to newsgroups which do not exist on their local hosts, though posting agents Ought at least to alert the poster to the situation and request confirmation. 5.5.1. Forbidden newsgroup-names 5.6. Path 5.6.1. Format 5.6.2. Adding a path-identity to the Path-header 5.6.3. The tail-entry 5.6.4. Path-Delimiter Summary 5.6.5. Suggested Verification Methods It is preferable to verify the claimed path-identity against the source than to make routine use of the '?' path-delimiter, with consequential wasteful double-entry Path additions. If the incoming article arrives through some TCP/IP protocol such as NNTP, the IP address of the source will be known, and will likely already have been checked against a list of known FQDNs, IP addresses, or other registered aliases that the receiving site has agreed to peer with. Since the source host may have several IP addresses, checking the claimed FQDN or IP address against the source IP, or finding a suitable FQDN to report with a '?' path-delimiter, may involve several DNS lookups, following CNAME chains as required. Note that any reverse DNS lookup that is involved needs to be confirmed by a forward one. If the incoming article arrives through some other protocol, such as UUCP, that protocol MUST include a means of verifying the source site. In UUCP implementations, commonly each incoming connection has a unique login name and password, and that login name (or some alias registered for it) would be expected as the path-identity. 6. Optional Headers 6.1. Reply-To In the absence of Reply-To, the reply address(es) is the address(es) in the From-header. For this reason a Reply-To SHOULD NOT be included if it just duplicates the From-header. NOTE: Use of a Reply-To-header is preferable to including a similar request in the article body, because replying agents can take account of Reply-To automatically. 6.2. Sender 6.3. Organization NOTE: Posting and injecting agents are discouraged from providing a default value for this header unless it is acceptable to all posters using those agents. Unless this header contains useful information (including some indication of the posters physical location) posters are discouraged from including it. 6.4. Keywords 6.5. Summary The summary should be terse. Authors Ought to avoid trying to cram their entire article into the headers; even the simplest query usually benefits from a sentence or two of elaboration and context, and not all reading agents display all headers. On the other hand the summary should give more detail than the Subject. 6.6. Distribution Posting agents Ought Not to provide a default Distribution-header without giving the poster an opportunity to override it. 6.7. Followup-To A Followup-To-header SHOULD NOT be included if it just duplicates the Newsgroups-header. 6.8. Mail-Copies-To A followup agent ought, when the Mail-Copies-To-header is absent, and especially when it is presnt and there is an explicit "nobody", to issue a warning and ask for confirmation if the user attempts email as well as followup. NOTE: This header is only relevant when posting followups to Netnews articles, and is to be ignored when sending pure email replies to the poster, which are handled as prescribed under the Reply-To-header. NOTE: In addition to the Posted-And-Mailed-header, some followup agents also include within the body a mention that the article is both posted and mailed, for the benefit of reading agents that do not normally show that header. 6.9. Posted-And-Mailed 6.10. References Followup agents SHOULD NOT trim message identifiers out of a References-header unless the number of message identifiers exceeds 21, at which time trimming SHOULD be done by removing sufficient identifiers starting with the second so as to bring the total down to 21 (but the first message identifier MUST NOT be trimmed). However, it would be wrong to assume that References-headers containing more than 21 message identifiers will not occur. 6.11. Expires An Expires-header should only be used in an article if the requested expiry time is earlier or later than the time typically to be expected for such articles. Local policy for each serving agent will dictate whether and when this header is obeyed and posters SHOULD NOT depend on it being completely followed. 6.12. Archive 6.13. Control 6.14. Approved 6.15. Supersedes 6.16. Xref 6.17. Lines 6.18. User-Agent NOTE: Comments in User-Agent-headers should be restricted to information regarding the product named to their left, such as its full name or platform information, and should be concise. Use as an advertising medium (in the mundane sense) is discouraged. 6.19. Injector-Info 6.19.1. Usage of Injector-Info-parameters The purpose of these parameters is to enable the injecting agent to make assertions about the origin of the article, in fulfilment of its responsibilities towards the rest of the network. These assertions can then be utilized as follows: 1. To enable the administrator of the injecting agent to respond to complaints and queries concerning the article. For this purpose, the parameters included SHOULD be sufficient to enable the administrator to identify its true origin (which parameters are best suited to this purpose will vary with the nature of the injecting site and of its relationship to the posters who use it - there is no benefit in including parameters which contribute nothing to this aim). An administrator MAY, with those parameters where the syntax so allows, use cryptic notations interpretable only by himself if he considers it appropriate to protect the privacy of that origin. 2. To enable relaying, serving and reading agents to recognize articles from origins which they might wish to reject, divert, or otherwise handle specially, for reasons of site policy. 3. To enable the timely identification of spews of articles arising from a common origin. NOTE: Administrators of injecting agents can choose which selection of the various parameters best enables them to fulfil their responsibilities. Some of these parameters identify the source of the article explicitly whereas others do so indirectly, thus affording more privacy to posters who value their anonymity, but also making harder the tracking of malicious disruption of the network, especially so if the administrators choose not to cooperate. There is thus a balance to be struck between the needs of privacy on the one hand and the good order of Usenet on the other, and administrators need to be aware of this when formulating their policies. 6.19.1.1. The posting-host-parameter 6.19.1.2. The posting-account-parameter 6.19.1.3. The posting-sender-parameter 6.19.1.4. The posting-logging-parameter 6.19.1.5. The posting-date-parameter 6.20. Complaints-To 6.21. MIME headers 6.21.1. Syntax 6.21.2. Content-Type When the Content-Type is "text/plain", the recommendations and limits on line lengths set out in section 4.5 Ought to be observed. The acceptability of other subtypes of Content-Type: "text" (such as "text/html") is a matter of policy (see 1.1), and posters Ought Not to use them unless established policy or custom in the particular hierarchies or groups involved so allows. Moreover, even in those cases, for the benefit of readers who see it only in its transmitted form, the material SHOULD be "pretty-printed" (for example by restricting its line length as above and by keeping sequences which control its layout or style separate from the meaningful text). In the same way, Content-Types requiring special processing for their display, such as "application", "image", "audio", "video" and "multipart/related" are discouraged except in groups specifically intended (by policy or custom) to include them. Exceptionally, those application types defined in [RFC 1847] and [RFC 3156] for use within "multipart/signed" articles, and the type "application/pgp-keys" (or other similar types containing digital certificates) may be used freely. NOTE: The Content-Type "message/partial" is not recommended for textual articles because the Content-Type, and in particular the charset, of the complete article cannot be determined by examination of the second and subsequent parts, and hence it is not possible to read them as separate articles (except when they are written in pure US-ASCII). Moreover, for full compliance with [RFC 2046] it would be necessary to use the "quoted- printable" encoding to ensure the material was 7bit-safe. In any case, breaking such long texts into several parts is usually unnecessary, since modern transport agents should have no difficulty in handling articles of arbitrary length. On the other hand, "message/partial" may be useful for binaries of excessive length, since reading of the individual parts on their own is not required and they would likely be encoded in a manner that was 7bit-safe. In the case where such an encapsulated news article with the Content-Type "message/rfc822" is to be transported by email and it has Content-Transfer-Encoding "8bit", the Content-Transfer-Encoding may need to be changed, although there should be no problems if the email transport supports 8BITMIME [RFC 2821]. The Content-Type "message/external-body" could be appropriate for texts which it would be uneconomic (in view of the likely readership) to distribute to the entire network. The Content-Types "multipart/mixed", "multipart/parallel" and "multipart/signed" may be used freely in news articles. However, except where policy or custom so allows, the Content-Type: "multipart/alternative" SHOULD NOT be used, on account of the extra bandwidth consumed and the difficulty of quoting in followups, but reading agents MUST accept it. The Content-Type: "multipart/digest" is commended for any article composed of multiple messages more conveniently viewed as separate entities, thus enabling reading agents to move rapidly between them. The "boundary" should be composed of 28 hyphens (US-ASCII 45) (which makes each boundary delimiter 30 hyphens, or 32 for the final one) so as to enable reading agents which currently support the digest usage described in [RFC 1153] to continue to operate correctly. NOTE: The various recommendations given above regarding the usage of particular Content-Types apply also to the individual parts of these multiparts. 6.21.3. Content-Transfer-Encoding The following are examples of situations where a Content-Transfer- Encoding of other that 8bit may be necessary. 1. The content type implies that the content is (or may be) "8bit- unsafe"; i.e. it may contain octets equivalent to the US-ASCII characters CR or LF (other than in the combination CRLF) or NUL. In that case one of the Content-Transfer-Encodings "base64" or "quoted-printable" MUST be used, and reading agents MUST be able to handle both of them. NOTE: If a future extension to the MIME standards were to provide a more compact encoding of binary suited to transport over an 8bit channel, it could be considered as an alternative to base64 once it had gained widespread acceptance. 2. It is often the case that "application" Content-Types are textual in nature, and intelligible to humans as well as to machines, and where this state can be recognized by the posting agent (either through knowledge of the particular application type or by testing) the material SHOULD NOT be treated as 8bit-unsafe; this has the added benefit, where the posting agent uses other than CRLF for line endings internally, of automatically ensuring that line endings are processed correctly during transport. If, on the other hand, the posting agent recognizes that the material is not textual, or cannot reasonably determine it to be so, then the material MUST be encoded as for 8bit-unsafe (however, in that case, it is the responsibility of the agent generating the material to ensure that lines endings, if any, are represented correctly). NOTE: All the application types defined by this [USEFOR], namely "application/news-transmission", "application/news-groupinfo" and "application/news-checkgroups" are textual, and indeed designed for human reading. 3. Although the "text" Content-Types should normally be encoded as 8bit (or 7bit), if the character set specified by the "charset=" parameter can include the 3 disallowed octets, then the material MUST be encoded as for 8bit-unsafe. This is most likely to arise in the case of 16-bit character sets such as UTF-16 ([UNICODE 3.2] or [ISO/IEC 10646]). In addition, where it is known that the material is subsequently to be gatewayed from Netnews to Email (8.8), the encoding "quoted-printable" MAY be used (otherwise the gateway might have to re-encode it itself). 4. Some protocols REQUIRE the use of a particular Content-Transfer- Encoding. In particular, the authentication protocol based on OpenPGP defined in [RFC 3156] mandates the use of one of the encodings "quoted-printable" or "base64". Whilst posters might be tempted to risk the use of "8bit" or "7bit" encodings (and indeed the referenced standard recommends that signed messages using those encodings be accepted and interpreted), they should be warned that differences in the treatment of trailing whitespace between OpenPGP [RFC 2440] and earlier versions of PGP may render signatures written with the one unverifiable by the other; and, moreover, Usenet articles are very likely to include trailing whitespace in the form of a personal signature (4.3.2). 5. The Content-Type message/partial [RFC 2046] is required to use encoding "7bit" (the encapsulated complete message may itself use encoding "quoted-printable" or "base64", but that information is only conveyed along with the first of the partial parts). NOTE: Although there would actually be no problem using encoding "8bit" in a pure Netnews (as opposed to Email) environment, this document discourages the use of "message/partial" except for binary material, which will likely be encoded to pass through "7bit" in any case. 6.21.4. Character Sets In principle, any character set may be specified in the "charset=" parameter of a content type. However, only those character sets (and the corresponding parts of character sets based on [UNICODE 3.2] such UTF-8) should be used which are appropriate for the customary language(s) of the hierarchy or newsgroup concerned (whose readers could be expected to possess agents capable of displaying them). 6.21.5. Content Disposition Reading agents Ought to honour any Content-Disposition-header that is provided (in particular, they Ought to display any part of a multipart for which the disposition is "inline", possibly distinguished from adjacent parts by some suitable separator). In the absence of such a header, the body of an article or any part of a multipart with Content-Type "text" Ought to be displayed inline. Followup agents which quote parts of a precursor (see 4.3.2) Ought initially to include all parts of the precursor that were displayed inline, as if they were a single part. 6.21.6. Definition of some new Content-Types 6.21.6.1. Application/news-transmission 6.21.6.2. Message/news obsoleted 6.22. Obsolete Headers 7. Control Messages Sites Ought to deny messages not issued by the appropriate administrative agencies, and therefore SHOULD take such steps as are reasonably practicable to validate their authenticity by validating digital signature in cases where they are provided. 7.1. Digital Signature of Headers 7.2. Group Control Messages In those hierarchies where appropriate administrative agencies exist (see 1.1), group control messages Ought Not to be issued except as authorized by those agencies. 7.2.1. The 'newgroup' Control Message The newsgroup-name Ought to conform to whatever policies have been established by the administrative agency, if any, for that hierarchy. Serving agents SHOULD, insofar as they are conveniently able to detect them, reject all newgroup messages not meeting those requirements. 7.2.1.1. The Body of the 'newgroup' Control Message 7.2.1.2. Application/news-groupinfo Although, in accordance with [RFC 2822] and [USEFOR], a newsgroups- line could have a maximum length of 998 octets, as a matter of policy a far lower limit, expressed in characters, Ought to be set. The current convention is to limit its length so that the newsgroup-name, the HTAB(s) (interpreted as 8-character tabs that takes one at least to column 24) and the newsgroup-description (excluding any moderation-flag) fit into 79 characters. However, this document does not seek to enforce any such rule, and reading agents SHOULD therefore enable a newsgroups-line of any length to be displayed, e.g. by wrapping it as required. 7.2.1.3. Initial Articles 7.2.1.4. Example 7.2.2. The 'rmgroup' Control Message 7.2.2.1. Example 7.2.3. The 'mvgroup' Control Message The second (new-)newsgroup-name Ought to conform to any established policies of the hierarchy. 7.2.3.1. Example 7.2.4. The 'checkgroups' Control Message 7.2.4.1. Application/news-checkgroups 7.3. Cancel A cancel message may be issued in the following circumstances. 1. The poster of an article (or, more specifically, any entity mentioned in the From-header or the Sender-header, whether or not that entity was the actual poster) is always entitled to issue a cancel message for that article, and serving agents SHOULD honour such requests. Posting agents SHOULD facilitate the issuing of cancel messages by posters fulfilling these criteria. 2. The agent which injected the article onto the network (more specifically, the entity identified by the path-identity in front of the leftmost '%' delimiter in the Path-header or in the Injector-Info-header and, where appropriate, the moderator (more specifically, any entity mentioned in the Approved-header) is always entitled to issue a cancel message for that article, and serving agents SHOULD honour such requests. 3. Other entities MAY be entitled to issue a cancel message for that article, in circumstances where established policy for any hierarchy or group in the Newsgroup-header, or established custom within Usenet, so allows (such policies and customs are not defined by this document). Such cancel messages MUST include an Approved-header identifying the responsible entity. Serving agents MAY honour such requests, but SHOULD first take steps to verify their appropriateness. 7.4. Ihave, sendme 7.5. Obsolete control messages. 8. Duties of Various Agents 8.1. General principles to be followed 8.2. Duties of an Injecting Agent An injecting agent MAY take account of the policies of any newsgroups or hierarchies that the article is posted to. As part of their responsibility for the actions of their posters, injecting agents MAY cancel articles which they have previously injected (see 7.3). [That paragraph will move back to USEFOR if the rules governing who may issue cancels are moved back.] 8.2.1. Proto-articles 8.2.2. Procedure to be followed by Injecting Agents An injecting agent MAY add other headers not already provided by the poster, but SHOULD NOT alter, delete, or reorder any existing header. However, the addition of non-mandatory headers by the injecting agent may alter the posting agent's preferred presentation of information. In particular, adding a Sender-header that exposes a sender's mailbox has privacy implications; where the main or only purpose for doing so is as tracing information, it is preferable to use instead one of the options provided for the Injector-Info header. 8.3. Duties of a Relaying Agent 8.4. Duties of a Serving Agent 8.5. Duties of a Posting Agent 8.6. Duties of a Followup Agent Followup agents Ought to observe appropriate quoting conventions in the body (see 4.3.2). 8.7. Duties of a Moderator A moderator MAY inform the poster if the article is accepted, and he Ought to inform the poster if it is rejected. A moderator Ought Not (absent any established and widely promulgated policy to the contrary) to remove any newsgroup-name from the Newsgroups-header, nor split an article into two versions with disjoint Newsgroups-headers. These are matters more usually within the prerogative of the poster; moreover splitting can lead to fragmentation of threads. 8.8. Duties of a Gateway 8.8.1. Duties of an Outgoing Gateway 8.8.2. Duties of an Incoming Gateway 8.8.3. Example 9. Security and Related Considerations 9.1. Leakage 9.2. Attacks 9.2.1. Denial of Service 9.2.2. Compromise of System Integrity 9.3. Liability 10. IANA Considerations 10. References [ISO 8859] International Standard - Information Processing - 8-bit Single-Byte Coded Graphic Character Sets. Part 1: Latin alphabet No. 1, ISO 8859-1, 1987. Part 2: Latin alphabet No. 2, ISO 8859-2, 1987. Part 3: Latin alphabet No. 3, ISO 8859-3, 1988. Part 4: Latin alphabet No. 4, ISO 8859-4, 1988. Part 5: Latin/Cyrillic alphabet, ISO 8859-5, 1988. Part 6: Latin/Arabic alphabet, ISO 8859-6, 1987. Part 7: Latin/Greek alphabet, ISO 8859-7, 1987. Part 8: Latin/Hebrew alphabet, ISO 8859-8, 1988. [ISO/IEC 10646] "International Standard - Information technology - Universal Multiple-Octet Coded Character Set (UCS) - Part 1: Architecture and Basic Multilingual Plane", ISO/IEC 10646- 1:2000, 2000. [RFC 1036] M. Horton and R. Adams, "Standard for Interchange of USENET Messages", RFC 1036, December 1987. [RFC 1153] F. Wancho, "Digest Message Format", RFC 1153, April 1990. [RFC 1847] J. Galvin, S. Murphy, S. Crocker, and N. Freed, "Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted", RFC 1847, October 1995. [RFC 2046] N. Freed and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types", RFC 2046, November 1996. [RFC 2119] S. Bradner, "Key words for use in RFCs to Indicate Requirement Levels", RFC 2119, March 1997. [RFC 2279] F. Yergeau, "UTF-8, a transformation format of ISO 10646", RFC 2279, January 1998. [RFC 2279bis] F. Yergeau, "UTF-8, a transformation format of ISO 10646", draft-yergeau-rfc2279bis-00.txt, April 2002. [RFC 2440] J. Callas, L. Donnerhacke, H. Finney, and R. Thayer, "OpenPGP Message Format", RFC 2440, November 1998. [RFC 2606] D. Eastlake and A. Panitz, "Reserved Top Level DNS Names", RFC 2606, June 1999. [RFC 2821] John C. Klensin and Dawn P. Mann, "Simple Mail Transfer Protocol", RFC 2821, April 2001. [RFC 2822] P. Resnick, "Internet Message Format", RFC 2822, April 2001. [RFC 3156] M. Elkins, D. Del Torto, R. Levien, and T. Roessler, "MIME Security with OpenPGP", RFC 3156, August 2001. [UNICODE 3.2] The Unicode Consortium, "The Unicode Standard - Version 3.2, being an amendment to [UNICODE 3.1]", Unicode Standard Annex #28 , 2002. [USEFOR] Charles H. Lindsey, "News Article Format", draft-ietf- usefor-article-format-*.txt. 11. Acknowledgements 12. Contact Address Editor Charles. H. Lindsey 5 Clerewood Avenue Heald Green Cheadle Cheshire SK8 3JU United Kingdom Phone: +44 161 436 6131 Email: chl@clw.cs.man.ac.uk [ Working group chairs Andrew Gierth Pete Resnick ] Comments on this draft should preferably be sent to the mailing list of the Usenet Format Working Group at usenet-format@landfield.com. This draft expires six months after the date of publication (see Page 1) (i.e. in Oct 2003). Appendix C - Notices Intellectual Property The IETF takes no position regarding the validity or scope of any intellectual property or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; neither does it represent that it has made any effort to identify any such rights. Information on the IETF's procedures with respect to rights in standards-track and standards-related documentation can be found in BCP-11. Copies of claims of rights made available for publication and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementors or users of this specification can be obtained from the IETF Secretariat. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights which may cover technology that may be required to practice this standard. Please address the information to the IETF Executive Director. Full Copyright Statement Copyright (C) The Internet Society (2002). All Rights Reserved This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.