package module
Version: v0.0.0-...-ce19cbc Latest Latest

This package is not in the latest version of its module.

Go to latest
Published: Mar 29, 2016 License: BSD-3-Clause Imports: 15 Imported by: 0



Split email archives downloaded from Google Takeout (Download Your Data) service into individual emails. Based on experimentation it looks like Google uses the mboxrd dialect of mbox format with CRLF lines as discussed at the Wikipedia mbox article


The project is licensed under the BSD 3-Clause License - see the LICENSE.txt file included with the package.

Using the mboxrd package

The package provides both libraries and a buildable executable. See the code documentation on using the libraries.


Using the mboxrd_split executable

The executable takes the following parameters:

-dir  <name>     : A directory to put the resulting messages to.
                   The directory must exist before running the program.

-mbox <name>     : An mbox file to process and split into messages.

-email <address> : An email which correspondence to be captured. Only
                   the actual address should be provided.

The program does not preserve unfinished last line of the last message in the archive. In the resulting files all message lines end with CRLF after the processing.

During the processing it creates temporary message files and then moves them into the UTC-timestamped .eml file. If the destination filename is already taken by another message, then the later message does not override it. It is left in the temporary file and the error is printed to stderr.

Also a message stays in a temporary file if the program fails to construct a name for the message file. Some forwarded messages, for example, lack the Date: header.




This section is empty.


This section is empty.


func Extract

func Extract(mboxrd io.Reader, messages chan chan string, errors chan error)

Extract processes all lines from the the mboxrd reader and puts resulting messages each as its own channel into the provided messages channel.

It will stop only if it runs into non-empty lines prior to a message header. Otherwise it will continue processing the lines in assumption that the message archive format is correct.

Each message's channel and the parent messages' channel are closed after the mbox data is exhausted.

func TimeFromLine

func TimeFromLine(line string, lc *time.Location) (string, error)

func TimeNorm

func TimeNorm(line string, loc *time.Location) (string, error)

func UnpackMessage

func UnpackMessage(eml string, errors chan error, wg *sync.WaitGroup)

func WriteOriginal

func WriteOriginal(
	message chan string,
	emlName chan string,
	errors chan error,
	dir string,
	admit ByLineAdmit,
	name ByLineName,
	wg *sync.WaitGroup)

WriteOriginal receives a message text from the `message` channel and writes it into a file in the destination `dir` directory.

All error are posted in the `error` parameter channel.

An `admit` parameter allows to determine if the message is left in the target directory. The function is called for each line in the message, uncluding headers. The value returned by the `admit` function determines if the message is kept in the directory.

The message file name is constructed by the `name` parameter function. The function is called for each line in the message, uncluding headers, until it returns a non-empty string. If `name` parameter is `nill` then messages will stay in randomly named temporary files starting with `_msg_` prefix

The `WaitGroup` parameter must be properly initialised and incremented prior to calling this function, or be supplied as `nil` if not needed.


type ByLineAdmit

type ByLineAdmit func(string, chan error) bool

func AdmitAnyPattern

func AdmitAnyPattern(criteria []Criterion, vetos []Criterion, errors chan error) ByLineAdmit

func AllWith

func AllWith(addrs []string, errors chan error) ByLineAdmit

type ByLineName

type ByLineName func(string, chan error) string

func NameFromTimeUser

func NameFromTimeUser(format string, errors chan error) ByLineName

NameFromTimeUser returns a closed function used to extract a message file name based on the message timestamp and sender's username part of the email.

It is an example on how to construct the file name from multiple headers.

type Criterion

type Criterion struct {
	OnlyHeaders bool
	RE          *regexp.Regexp

type MboxError

type MboxError string

MboxError type is returned when there are errors occurred reading or splitting a mboxrd archive.

func (MboxError) Error

func (mbe MboxError) Error() string

type MessageError

type MessageError string

MessageError type is returned when there are errors occurred writing a mesage to filesystem.

func (MessageError) Error

func (msge MessageError) Error() string


Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto
y or Y : Canonical URL