Split email archives downloaded from Google Takeout (Download Your Data) service into individual emails. Based on experimentation it looks like Google uses the mboxrd dialect of mbox format with CRLF lines as discussed at the Wikipedia mbox article
The project is licensed under the BSD 3-Clause License - see the
LICENSE.txt file included with the package.
The package provides both libraries and a buildable executable. See the code documentation on using the libraries.
The executable takes the following parameters:
-dir <name> : A directory to put the resulting messages to. The directory must exist before running the program. -mbox <name> : An mbox file to process and split into messages. -email <address> : An email which correspondence to be captured. Only the actual address should be provided.
The program does not preserve unfinished last line of the last message in the archive. In the resulting files all message lines end with CRLF after the processing.
During the processing it creates temporary message files and then moves them into the UTC-timestamped
.eml file. If the destination filename is already taken by another message, then the later message does not override it. It is left in the temporary file and the error is printed to
Also a message stays in a temporary file if the program fails to construct a name for the message file. Some forwarded messages, for example, lack the
- func Extract(mboxrd io.Reader, messages chan chan string, errors chan error)
- func TimeFromLine(line string, lc *time.Location) (string, error)
- func TimeNorm(line string, loc *time.Location) (string, error)
- func UnpackMessage(eml string, errors chan error, wg *sync.WaitGroup)
- func WriteOriginal(message chan string, emlName chan string, errors chan error, dir string, ...)
- type ByLineAdmit
- type ByLineName
- type Criterion
- type MboxError
- type MessageError
This section is empty.
This section is empty.
Extract processes all lines from the the mboxrd reader and puts resulting messages each as its own channel into the provided messages channel.
It will stop only if it runs into non-empty lines prior to a message header. Otherwise it will continue processing the lines in assumption that the message archive format is correct.
Each message's channel and the parent messages' channel are closed after the mbox data is exhausted.
func WriteOriginal ¶
func WriteOriginal( message chan string, emlName chan string, errors chan error, dir string, admit ByLineAdmit, name ByLineName, wg *sync.WaitGroup)
WriteOriginal receives a message text from the `message` channel and writes it into a file in the destination `dir` directory.
All error are posted in the `error` parameter channel.
An `admit` parameter allows to determine if the message is left in the target directory. The function is called for each line in the message, uncluding headers. The value returned by the `admit` function determines if the message is kept in the directory.
The message file name is constructed by the `name` parameter function. The function is called for each line in the message, uncluding headers, until it returns a non-empty string. If `name` parameter is `nill` then messages will stay in randomly named temporary files starting with `_msg_` prefix
The `WaitGroup` parameter must be properly initialised and incremented prior to calling this function, or be supplied as `nil` if not needed.
type ByLineAdmit ¶
func AdmitAnyPattern ¶
type ByLineName ¶
func NameFromTimeUser ¶
NameFromTimeUser returns a closed function used to extract a message file name based on the message timestamp and sender's username part of the email.
It is an example on how to construct the file name from multiple headers.
type MboxError string
MboxError type is returned when there are errors occurred reading or splitting a mboxrd archive.