Module MrMime_content

module MrMime_content: sig .. end
Module Content

This goal of this module is about the MIME header specification. So you can find somes mechanisms:

If you want to joke about the shit of the email, we can expose a good note to explain why the email is hard to parse:

Several of the mechanisms described in this set of documents may seem somewhat strange or even baroque at first reading. It is important to note that compatibility with existing standards AND robustness across existing practice were two of the highest priorities of the working group that developed this set of documents. In particular, compatibility was always favored over elegance.
See also RFC2045 § 1


module Map: module type of Map.Make(String)
Map with type key = string.
type raw = Rfc2047.raw = 
| QuotedPrintable of string
| Base64 of MrMime_base64.Decoder.result
It's an encoded-word from RFC2047. An encoded-word is a sequence of printable ASCII characters that begins with "=?", ends with "?=", and has two "?"s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.

MrMime recognizes encoded-words when they appear in certain protions of the message header. Instead of displaying the encoded-word "as is", it will reverse the encoding and display the original text.

NOTE: the client need to translate the original text into the designated character set (like utf-8) - this feature is in TODO.
See also RFC2047

type unstructured = [ `CR of int
| `CRLF
| `Encoded of string * raw
| `LF of int
| `Text of string
| `WSP ] list
Some field bodies in RFC5322 specification are defined simply as unstructured with no further restrictions. These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for folding and unfolding white space).

MrMime keeps somes informations:


type field = [ `Content of string * Rfc5322.unstructured
| `ContentDescription of Rfc5322.unstructured
| `ContentEncoding of Rfc2045.mechanism
| `ContentID of Rfc822.msg_id
| `ContentType of Rfc2045.content
| `MimeVersion of Rfc2045.version
| `Skip of string ]
MIME defines a number of new RFC822 header fields that are used to describe the content of a MIME entity. These headers fields occur in at least two context:

As MrMime_header.field, MrMime can skip somes lines with the `Skip constructor. That means MrMime can't use any rule from RFC2045 for this line and keep the data without any processing.

Another point and ever as MrMime_header.field, if MrMime recognizes a header field decribed by RFC2045 but it can't apply the formal definition, it returns `Unsafe with the MrMime_content.unstructured value - afterwards, the client can do a weird process.

type t = {
   ty : MrMime_contentType.content; (*
The "Content-Type" field.
*)
   encoding : MrMime_contentEncoding.mechanism; (*
The "Content-Transfer-Encoding" field.
*)
   version : MrMime_mimeVersion.version; (*
The "MIME-Version" field.
*)
   id : MrMime_msgID.msg_id option; (*
The "Content-ID" field. In constructing a high-level user-agent, it may be desirable to allow one body to make reference to another. Accordingly, bodies may be labelled using the "Content-ID" header field, which is syntactically identical to the "Message-ID" header field.

MrMime does not process the "Content-ID" as a reference or as a common representation between somes bodies with a "multipart/alternative" media-type (which describes a different semantic of this field). MrMime just parses this information and nothing else (like some extra check). But we can explain the semantic of this field in this document.

The "Content-ID" value may be used for uniquely identifying MIME entities in several contexts, particularly for caching data referenced by the "message/external-body" mechanism. Although the "Content-ID" header is generally optional, its use is mandatory in implementations which generate data of the optional MIME media type "message/external-body" (but MrMime does not check that).

It is also worth noting that the "Content-ID" value has special semantics in the cas of the "multipart/alternative" media type.

Each part of a "multipart/alternative" entity represents the same data, but the mappings between the two are not necessarily without information loss. For example, information is lost when translating ODA to PostScript or plain text. It is recommended that each part should have a different "Content-ID" value in the case where the information content of the two parts is not identical. And when the information content is identical - for exemple, where several parts of type "message/external-body" specify alternate ways to access the identical data - the same "Content-ID" field value should be used, to optimize any caching mechanisms that might be present on the recipient's end. However, the "Content-ID" values used by the parts should not be the same "Content-ID" value that describes the "multipart/alternative" as a whole, if there is any such "Content-ID" field. That is, one "Content-ID" value will refer to the "multipart/alternative" entity, while one or more other "Content-ID" values will refer to the parts inside its.
See also

*)
   description : unstructured option; (*
The "Content-Description" field. The ability to associate some descriptive information with a given body is often desirable. For example, it may be useful to mark an "image" body as "a picture of the SpaceShuttle Endeavoir.". Such text may be placed in the "Content-Description" header field. This hader is always optional.
See also RFC2045 § 8
*)
   content : unstructured list Map.t; (*
Future documents may elect to define additional MIME header fields for various purposes. Any new header field that further describes the content of a message should begin with the string "Content-" to allow such fields which appear in a message header to be distinguished from ordinary RFC822 message header fields.

MrMime considers all "Content-*" fields as an MrMime_content.unstructured value.
See also

*)
   unsafe : unstructured list Map.t; (*
As explained for the `Unsafe MrMime_content.field, when MrMime can't apply the formal definition of a field described by the RFC2045, it considers the field as an MrMime_content.unstructured value to let the client to do a weird processing.
*)
   skip : string list; (*
As explained for the `Skip MrMime_content.field, when MrMime can't apply any rule for a line, it stores this line in the skip field. MrMime keeps the order of the appearance inside the email.

Generally, this field is empty.

*)
}
A convenience record to deal with any "Content-*" fields.
val pp_raw : Format.formatter -> Rfc2047.raw -> unit
pp_raw raw prints an human readable representation of MrMime_content.raw.
val pp_unstructured : Format.formatter -> unstructured -> unit
pp_unstructured v prints an human readable representation of MrMime_content.unstructured.
val pp_field : Format.formatter -> field -> unit
pp_field field prints an human readable representation of MrMime_content.field.
val pp : Format.formatter -> t -> unit
pp content prints an human readable representation of MrMime_content.t.
val default : t
According to MrMime_contentType.default, MrMime_contentEncoding.default and MrMime_mimeVersion.default, we have a default value of a content header.
module Encoder: sig .. end
module Decoder: sig .. end