timeliner

package module
Version: v0.0.0-...-cf15516 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 26, 2021 License: AGPL-3.0 Imports: 23 Imported by: 0

README

Timeliner timeliner godoc

Timeliner is a personal data aggregation utility. It collects all your digital things from pretty much anywhere and stores them on your own computer, indexes them, and projects them onto a single, unified timeline.

The intended purpose of this tool is to help preserve personal and family history.

Things that are stored by Timeliner are easily accessible in a SQLite database or, for media items like photos and videos, are simply plopped onto your disk as regular files, organized in folders by date and data source.

WIP Notice: This project works as documented for the most part, but is still very young. Please consider this experimental until stable releases. The documentation needs a lot of work too, I know... so feel free to contribute!

About

In general, Timeliner obtains items from data sources and stores them in a timeline.

  • Items are anything that has content: text, image, video, etc. For example: photos, tweets, social media posts, even locations.
  • Data sources are anything that can provide a list of items. For example: social media sites, online services, archive files, etc.
  • Timelines are repositories that store the data. Typically, you will have one timeline that is your own, but timelines can support multiple people and multiple accounts per person if you desire to share it.

Technically speaking:

  • An Item implements this interface and provides access to its content and metadata.
  • A DataSource is defined by this struct which configures a Client to access it (by its NewClient field). Clients are the types that do the actual work of listing of items.
  • A Timeline is opened when being used. It consists of an underlying SQLite database and an adjacent data folder where larger/media items are stored as files. Timelines are essentially the folder that contains them. They are portable, so you can move them around and won't break things. However, don't change the contents of the folder directly! Don't add, remove, or modify items in the folder; you will break something. This does not mean timelines are read-only: they just have to be modified through the program in order to stay consistent.

Timeliner can pull data in from local or remote sources. It provides integrated support for OAuth2 and rate limiting where that is needed. It can also import data from local files. For example, some online services let you download all your data as an archive file. Timeliner can read those and index your data.

Timeliner data sources are strictly read-only meaning that no write permissions are needed and Timeliner will never change or delete from the source.

Features

  • Supported data sources
  • Checkpointing (resume interrupted downloads)
  • Pruning
  • Integrity checks
  • Deduplication
  • Timeframing
  • Differential reprocessing (only re-process items that have changed on the source)
  • Construct graph-like relationships between items and people
  • Memory-efficient for high-volume data processing
  • Built-in rate limiting for API clients
  • Built-in OAuth2 facilities for API clients
  • Configurable data merging behavior for similar/identical items
  • Ability to get and organize data from... almost anything, really, including export files

Some features are dependent upon the actual implementation of each data source. For example, differential reprocessing requires that the data source provide some sort of checksum or "ETag" for the item, but if that is not available, there's no way to know if an item has changed remotely without downloading the whole thing and reprocessing it.

Install

Minimum Go version required: Go 1.13

Clone this repository, then from the project folder, run:

$ cd cmd/timeliner
$ go build

Then move the resulting executable into your PATH.

Command line interface

This is a quick reference only. Be sure to read the tutorial below to learn how to use the program!

$ timeliner [<flags...>] <command> <args...>

Use timeliner -h to see available flags.

Commands
  • add-account adds a new account to the timeline and, if relevant, authenticates with the data source so that items can be obtained from an API. This only has to be done once per account per data source:
    $ timeliner add-account <data_source>/<username>...
    
     If the data source requires authentication (for example with OAuth), be sure the config file is properly created first.
    
  • reauth re-authenticates with a data source. This is only necessary on some data sources that expire auth leases after some time:
    $ timeliner reauth <data_source>/<username>...
    
  • import adds items from a local file:
    $ timeliner import <filename> <data_source>/<username>
    
  • get-all adds items from the service's API.
    $ timeliner get-all <data_source>/<username>...
    
  • get-latest adds only the latest items from the service's API (since the last checkpoint):
    $ timeliner get-latest <data_source>/<username>...
    
    
    

Flags can be used to constrain or customize the behavior of commands (timeliner -h to list flags).

See the wiki page for your data sources to know how to use the various data sources.

Tutorial

After you've read this tutorial, the Timeliner wiki has all the information you'll need for using each data source.

These are the basic steps for getting set up:

  1. Create a timeliner.toml config file (if any data sources require authentication)
  2. Add your data source accounts
  3. Fill your timeline

All items are associated with an account from whence they come. Even if a data source doesn't have the concept of accounts, Timeliner still has to think there is one.

Accounts are designated in the form <data source ID>/<user ID>, for example: twitter/mholt6. The data source ID is shown on each data source's wiki page. With some data sources (like the Twitter API), the user ID matters; so where possible, give the actual username or email address you use with that service. For data sources that don't have the concept of accounts or a login, choose a user ID you will recognize such that the data source ID + user ID are unique.

If we want to use accounts that require OAuth2, we need to configure Timeliner with OAuth2 app credentials. You can learn which data sources need OAuth2 and what their configuration looks like by reading their wiki page. By default, Timeliner will try to load timeliner.toml from the current directory, but you can use the -config flag to change that. Here's a sample timeliner.toml file for authenticating with Google:

[oauth2.providers.google]
client_id = "YOUR_APP_ID"
client_secret = "YOUR_APP_SECRET"
auth_url = "https://accounts.google.com/o/oauth2/auth"
token_url = "https://accounts.google.com/o/oauth2/token"

With that file in place, let's create an account to store our Google Photos:

$ timeliner add-account google_photos/you@gmail.com

This will open your browser window to authenticate with OAuth2.

You will notice that a folder called timeliner_repo was created in the current directory. This is your timeline. You can move it around if you want, and then use the -repo flag to work with that timeline.

Now let's get all our stuff from Google Photos. And I mean, all of it. It's ours, after all:

$ timeliner get-all google_photos/you@gmail.com

(You can list multiple accounts on a single command, except import commands.)

This process can take weeks if you have a large library. Even if you have a fast Internet connection, the client is carefully rate-limited to be a good API citizen, so the process will be slow.

If you open your timeline folder in a file browser, you will see it start to fill up with your photos from Google Photos. To see more verbose logging, use the -v flag (NOTE: this will drastically slow down processing that isn't bottlenecked by the network).

Data sources may create checkpoints as they go. If so, get-all or get-latest will automatically resume the last listing if it was interrupted, but only if the same command is repeated (you can't resume a get-latest with get-all, for example, or with different timeframe parameters). In the case of Google Photos, each page of API results is checkpointed. Checkpoints are not intended for long-term pauses. In other words, a resume should happen fairly shortly after being interrupted, and should be resumed using the same command as before. (A checkpoint will be automatically resumed only if the command parameters are identical.)

Item processing is idempotent, so as long as items have faithfully-unique IDs from their account, items that already exist in the timeline will be skipped and/or processed much faster.

Constraining within a timeframe

You can use the -start and -end flags to specify either absolute dates within which to constrain data collection, or with duration values to specify a date relative to the current timestamp. These flags appear before the subcommand.

To get all the items newer than a certain date:

$ timeliner -start=2019/07/1 get-all ...

This will get all items dated July 1, 2019 or newer.

To get all items older than certain date:

$ timeliner -end=2020/02/29 get-all ...

This processes all items before February 29, 2020.

To create a bounded window, use both:

$ timeliner -start=2019/07/01 -end=2020/02/29 get-all ...

Durations can be used for relative dates. To get all items up to 30 days old:

$ timeliner -end=-720h get-all ...

Notice how the duration value is negative; this is because you want the end date to be 720 hours (30 days) in the past, not in the future.

Pulling the latest

Once your initial download completes, you can run Timeliner so that only the latest items are retrieved:

$ timeliner get-latest google_photos/you@gmail.com

This will get only the items timestamped newer than the newest item in your timeline (from the last successful run).

If get-latest is interrupted after adding some newer items to the timeline, the next run of get-latest will not stop at the first new item added last time; it is smart enough to know that it was interrupted and needs to keep getting items all the way until the beginning of the last successful run, as long as the command's parameters are the same. For example, re-running the last command will automatically resume where it left off; but changing the -end flag, for example, won't be able to resume.

This subcommand supports the -end flag, but not the -start flag (since the start is determined from the last downloaded item). One thing I like to do is use -end=-720h with my Google Photos to only download the latest photos that are at least 30 days old. This gives me a month to delete unwanted/duplicate photos from my cloud library before I store them on my computer permanently.

Duplicate items

Timeliner often encounters the same items multiple times. By default, it skips items with the same ID as one already stored in the timeline because it is faster and more efficient, but you can also configure it to "reprocess" or "merge" duplicate items. These two concepts are distinct and important.

Reprocessing is when Timeliner completely replaces an existing item with a new one.

Merging is when Timeliner combines a new item's data with an existing item.

Neither happen by default because they can be less efficient or cause undesired results. In other words: by default, Timeliner will only download and process and item once. This makes its get-all, get-latest, and import commands idempotent.

Reprocessing

Reprocessing replaces items with the same ID. This happens if one of the following conditions is met:

  • You run with the -integrity flag which enables integrity checks, and an item's data file fails the integrity check. In that case, the item will be reprocessed to restore its correct data.

  • The item has changed on the data source and the data source indicates this change somehow. However, very few (if any?) data sources actually provide a hash or ETag to help us compare whether a resource has changed.

  • You run with the -reprocess flag. This does a "full reprocess" (or "forced reprocess") which indiscriminately reprocesses every item, just in case it changed. In other words, a forced reprocess will update your local copy with the source's latest for every item. This is often used because a data source might not provide enough information to automatically determine whether an item has changed. If you know you have changed your items on the data source, you could specify this flag to force Timeliner to update everything.

Merging

Merging combines two items without completely replacing the old item. Merges are additive: they'll never replace a field with a null value. By default, merges only add data that was missing and will not overwrite existing data (but this is configurable).

In theory, any two items can be merged, even if they don't have the same ID. Currently, the only way to trigger a merge is to enable "soft merging" which allows Timeliner to treat two items with different IDs as identical if ALL of these are true:

  • They have the same account (same data source)
  • They have the same timestamp
  • They have either the same text data OR the same data file name

Merging can be enabled and customized with the -merge flag. This flag accepts a comma-separated list of merge options:

  • soft (required): Enables soft merging. Currently, this is the only way to enable merging at all.
  • id: Prefer new item's ID
  • text: Prefer new item's text data
  • file: Prefer new item's data file
  • meta: Prefer new item's metadata

Soft merging simply updates the ID of either the existing, stored item or the new, incoming item to be the same as the other. (As with other fields, the ID of the existing item will be preferred by default, meaning the ID of the new item will be adjusted to match it.)

Example: I often use soft merging with Google Photos. Because the Google Photos API strips location data (grrr), I also use Google Takeout to import an archive of my photos. This adds the location data. However, although the archive has coordinate data, it does NOT have IDs like the Google Photos API provides. Thus, soft merging prevents a duplication of my photo library in my timeline.

To illustrate, I schedule this command to run regularly:

$ timeliner -merge=soft,id,meta -end=-720h get-latest google_photos/me

This uses the API to pull the latest photos up to 30 days old so I have time to delete unwanted photos from my library first. Notably, I enable soft merging and prefer the IDs and metadata given by the Google Photos API because they are richer and more precise.

Occasionally I will use Takeout to download an archive to add location data to my timeline, which I import like this:

$ timeliner -merge=soft import takeout.tgz google_photos/me

Note that soft merging is still enabled, but I always prefer existing data when doing this because all I want to do is fill in the missing location data.

This pattern takes advantage of soft merging and allows me to completely back up my Photos library locally, complete with location data, using both the API and Google Takeout.

Pruning your timeline

Suppose you downloaded a bunch of photos with Timeliner that you later deleted from Google Photos. Timeliner can remove those items from your local timeline, too, to save disk space and keep things clean.

To schedule a prune, just run with the -prune flag:

$ timeliner -prune get-all ...

However, this involves doing a complete listing of all the items. Pruning happens at the end. Any items not seen in the listing will be deleted. This also means that a full, uninterrupted listing is required, since resuming from a checkpoint yields an incomplete file listing. Pruning after a resumed listing will result in an error. (There's a TODO to improve this situation -- feel free to contribute! We just need to preserve the item listing along with the checkpoint.)

Beware! If your timeline has extra items added from auxillary sources (for example, using import with an archive file in addition to the regular API pulls), the prune operation may not see those extra items and thus delete them. Always back up your timeline before doing a prune.

Reauthenticating with a data source

Some data sources (Facebook) expire tokens that don't have recent user interactions. Every 2-3 months, you may need to reauthenticate:

$ timeliner reauth facebook/you

See the wiki for each data source to know if you need to reauthenticate and how to do so. Sometimes you have to go to the data source itself and authorize a reauthentication first.

More information about each data source

Congratulations, you've graduated to the wiki pages to learn more about how to set up and use each data source.

Motivation and long-term vision

The motivation for this project is two-fold. Both press upon me with a sense of urgency, which is why I dedicated some nights and weekends to work on this.

  1. Connecting with my family -- both living and deceased -- is important to me and my close relatives. But I wish we had more insights into the lives and values of those who came before us. What better time than right now to start collecting personal histories from all available sources and develop a rich timeline of our life for our family, and maybe even for our own reference or nostalgia.

  2. Our lives are better-documented than any before us, but the documentation is more ephemeral than any before us, too. We lose control of our data by relying on centralized, proprietary cloud services which are useful today, and gone tomorrow. I wrote Timeliner because now is the time to liberate my data from corporations who don't own it, yet who have the only copy of it. This reality has made me feel uneasy for years, and it's not going away soon. Timeliner makes it bearable.

Imagine being able to pull up a single screen with your data from any and all of your online accounts and services -- while offline. And there you see so many aspects of your life at a glance: your photos and videos, social media posts, locations on a map and how you got there, emails and letters, documents, health and physical activities, and even your GitHub projects (if you're like me), for any given day. You can "zoom out" and get the big picture. Machine learning algorithms could suggest major clusters based on your content to summarize your days, months, or years, and from that, even recommend printing physical memorabilia. It's like a highly-detailed, automated journal, fully in your control, which you can add to in the app: augment it with your own thoughts like a regular journal.

Then cross-reference your own timeline with a global public timeline: see how locations you went to changed over time, or what major news events may have affected you, or what the political/social climate was like at the time.

Or translate the projection sideways, and instead of looking at time cross-sections, look at cross-sections of your timeline by media type: photos, posts, location, sentiment. Look at plots, charts, graphs, of your physical activity.

And all of this runs on your own computer: no one else has access to it, no one else owns it, but you.

Viewing your Timeline

There is not yet a viewer for the timeline. For now, I've just been using Table Plus to browse the SQLite database, and my file browser to look at the files in it. The important thing is that you have them, at least.

However, a viewer would be really cool. It's something I've been wanting to do but don't have time for right now. Contributions are welcomed along these lines, but this feature must be thoroughly discussed before any pull requests will be accepted to implement a timeline viewer. Thanks!

Notes

Yeah, I know this is very similar to what Perkeep does. Perkeep is a way cooler project in my opinion. However, Perkeep is more about storage and sync, whereas Timeliner is more focused on constructing relationships between items and projecting your digital life onto a single timeline. If Perkeep is my unified personal data storage, then Timeliner is my automatic journal. (Believe me, my heart sank after I realized that I was almost rewriting parts of Perkeep, until I decided that the two are different enough to warrant a separate project.)

License

This project is licensed with AGPL. I chose this license because I do not want others to make proprietary software using this package. The point of this project is liberation of and control over one's own, personal data, and I want to ensure that this project won't be used in anything that would perpetuate the walled garden dilemma we already face today.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	RelReplyTo  = Relation{Label: "reply_to", Bidirectional: false}      // "<from> is in reply to <to>"
	RelAttached = Relation{Label: "attached", Bidirectional: true}       // "<to|from> is attached to <from|to>"
	RelQuotes   = Relation{Label: "quotes", Bidirectional: false}        // "<from> quotes <to>"
	RelCCed     = Relation{Label: "carbon_copied", Bidirectional: false} // "<from_item> is carbon-copied to <to_person>"
)

These are the standard relationships that Timeliner recognizes. Using these known relationships is not required, but it makes it easier to translate them to human-friendly phrases when visualizing the timeline.

View Source
var OAuth2AppSource func(providerID string, scopes []string) (oauth2client.App, error)

OAuth2AppSource returns an oauth2client.App for the OAuth2 provider with the given ID. Programs using data sources that authenticate with OAuth2 MUST set this variable, or the program will panic.

Functions

func Checkpoint

func Checkpoint(ctx context.Context, checkpoint []byte)

Checkpoint saves a checkpoint for the processing associated with the provided context. It overwrites any previous checkpoint. Any errors are logged.

func FakeCloser

func FakeCloser(r io.Reader) io.ReadCloser

FakeCloser turns an io.Reader into an io.ReadCloser where the Close() method does nothing.

func MarshalGob

func MarshalGob(v interface{}) ([]byte, error)

MarshalGob is a convenient way to gob-encode v.

func RegisterDataSource

func RegisterDataSource(ds DataSource) error

RegisterDataSource registers ds as a data source.

func UnmarshalGob

func UnmarshalGob(data []byte, v interface{}) error

UnmarshalGob is a convenient way to gob-decode data into v.

Types

type Account

type Account struct {
	ID           int64
	DataSourceID string
	UserID       string
	// contains filtered or unexported fields
}

Account represents an account with a service.

func (Account) NewHTTPClient

func (acc Account) NewHTTPClient() (*http.Client, error)

NewHTTPClient returns an HTTP client that is suitable for use with an API associated with the account's data source. If OAuth2 is configured for the data source, the client has OAuth2 credentials. If a rate limit is configured, this client is rate limited. A sane default timeout is set, and any fields on the returned Client valule can be modified as needed.

func (Account) NewOAuth2HTTPClient

func (acc Account) NewOAuth2HTTPClient() (*http.Client, error)

NewOAuth2HTTPClient returns a new HTTP client which performs HTTP requests that are authenticated with an oauth2.Token stored with the account acc.

func (Account) NewRateLimitedRoundTripper

func (acc Account) NewRateLimitedRoundTripper(rt http.RoundTripper) http.RoundTripper

NewRateLimitedRoundTripper adds rate limiting to rt based on the rate limiting policy registered by the data source associated with acc.

func (Account) String

func (acc Account) String() string

type AuthenticateFn

type AuthenticateFn func(userID string) ([]byte, error)

AuthenticateFn is a function that authenticates userID with a service. It returns the authorization or credentials needed to operate. The return value should be byte-encoded so it can be stored in the DB to be reused. To store arbitrary types, encode the value as a gob, for example.

type CheckpointFn

type CheckpointFn func(checkpoint []byte) error

CheckpointFn is a function that saves a checkpoint.

type Client

type Client interface {
	// ListItems lists the items on the account. Items should be
	// sent on itemChan as they are discovered, but related items
	// should be combined onto a single ItemGraph so that their
	// relationships can be stored. If the relationships are not
	// discovered until later, that's OK: item processing is
	// idempotent, so repeating an item from earlier will have no
	// adverse effects (this is possible because a unique ID is
	// required for each item).
	//
	// Implementations must honor the context's cancellation. If
	// ctx.Done() is closed, the function should return. Typically,
	// this is done by having an outer loop select over ctx.Done()
	// and default, where the next page or set of items is handled
	// in the default case.
	//
	// ListItems MUST close itemChan when returning. A
	// `defer close(itemChan)` will usually suffice. Closing
	// this channel signals to the processing goroutine that
	// no more items are coming.
	//
	// Further options for listing items may be passed in opt.
	//
	// If opt.Filename is specified, the implementation is expected
	// to open and list items from that file. If this is not
	// supported, an error should be returned. Conversely, if a
	// filename is not specified but required, an error should be
	// returned.
	//
	// opt.Timeframe consists of two optional timestamp and/or item
	// ID values. If set, item listings should be bounded in the
	// respective direction by that timestamp / item ID. (Items
	// are assumed to be part of a chronology; both timestamp and
	// item ID *may be* provided, when possible, to accommodate
	// data sources which do not constrain by timestamp but which
	// do by item ID instead.) The respective time and item ID
	// fields, if set, will not be in conflict, so either may be
	// used if both are present. While it should be documented if
	// timeframes are not supported, an error need not be returned
	// if they cannot be honored.
	//
	// opt.Checkpoint consists of the last checkpoint for this
	// account if the last call to ListItems did not finish and
	// if a checkpoint was saved. If not nil, the checkpoint
	// should be used to resume the listing instead of starting
	// over from the beginning. Checkpoint values usually consist
	// of page tokens or whatever state is required to resume. Call
	// timeliner.Checkpoint to set a checkpoint. Checkpoints are not
	// required, but if the implementation sets checkpoints, it
	// should be able to resume from one, too.
	ListItems(ctx context.Context, itemChan chan<- *ItemGraph, opt ListingOptions) error
}

Client is a type that can interact with a data source.

type Collection

type Collection struct {
	// The ID of the collection as given
	// by the service; for example, the
	// album ID. If the service does not
	// provide an ID for the collection,
	// invent one such that the next time
	// the collection is encountered and
	// processed, its ID will be the same.
	// An ID is necessary here to ensure
	// uniqueness.
	//
	// REQUIRED.
	OriginalID string

	// The name of the collection as
	// given by the service; for example,
	// the album title.
	//
	// Optional.
	Name *string

	// The description, caption, or any
	// other relevant text describing
	// the collection.
	//
	// Optional.
	Description *string

	// The items for the collection;
	// if ordering is significant,
	// specify each item's Position
	// field; the order of elememts
	// of this slice will not be
	// considered important.
	Items []CollectionItem
}

Collection represents a group of items, like an album.

type CollectionItem

type CollectionItem struct {
	// The item to add to the collection.
	Item Item

	// Specify if ordering is important.
	Position int
	// contains filtered or unexported fields
}

CollectionItem represents an item stored in a collection.

type DataSource

type DataSource struct {
	// A snake_cased name of the service
	// that uniquely identifies it from
	// all others.
	ID string

	// The human-readable or brand name of
	// the service.
	Name string

	// If the service authenticates with
	// OAuth2, fill out this field.
	OAuth2 OAuth2

	// Otherwise, if the service uses some
	// other form of authentication,
	// Authenticate is a function which
	// returns the credentials needed to
	// access an account on the service.
	Authenticate AuthenticateFn

	// If the service enforces a rate limit,
	// specify it here. You can abide it by
	// getting an http.Client from the
	// Account passed into NewClient.
	RateLimit RateLimit

	// NewClient is a function which takes
	// information about the account and
	// returns a type which can facilitate
	// transactions with the service.
	NewClient NewClientFn
}

DataSource has information about a data source that can be registered.

type Item

type Item interface {
	// The unique ID of the item assigned by the service.
	// If the service does not assign one, then invent
	// one such that the ID is unique to the content or
	// substance of the item (for example, an ID derived
	// from timestamp or from the actual content of the
	// item -- whatever makes it unique). The ID need
	// only be unique for the account it is associated
	// with, although more unique is, of course, acceptable.
	//
	// REQUIRED.
	ID() string

	// The originating timestamp of the item, which
	// may be different from when the item was posted
	// or created. For example, a photo may be taken
	// one day but uploaded a week later. Prefer the
	// time when the original item content was captured.
	//
	// REQUIRED.
	Timestamp() time.Time

	// A classification of the item's kind.
	//
	// REQUIRED.
	Class() ItemClass

	// The user/account ID of the owner or
	// originator of the content, along with their
	// username or real name. The ID is used to
	// relate the item with the person behind it;
	// the name is used to make the person
	// recognizable to the human reader. If the
	// ID is nil, the current account owner will
	// be assumed. (Use the ID as given by the
	// data source.) If the data source only
	// provides a name but no ID, you may return
	// the name as the ID with the understanding
	// that a different name will be counted as a
	// different person. You may also return the
	// name as the name and leave the ID nil and
	// have correct results if it is safe to assume
	// the name belongs to the current account owner.
	Owner() (id *string, name *string)

	// Returns the text of the item, if any.
	// This field is indexed in the DB, so don't
	// use for unimportant metadata or huge
	// swaths of text; if there is a large
	// amount of text, use an item file instead.
	DataText() (*string, error)

	// For primary content which is not text or
	// which is too large to be stored well in a
	// database, the content can be downloaded
	// into a file. If so, the following methods
	// should return the necessary information,
	// if available from the service, so that a
	// data file can be obtained, stored, and
	// later read successfully.
	//
	// DataFileName returns the filename (NOT full
	// path or URL) of the file; prefer the original
	// filename if it originated as a file. If the
	// filename is not unique on disk when downloaded,
	// it will be made unique by modifying it. If
	// this value is nil/empty, a filename will be
	// generated from the item's other data.
	//
	// DataFileReader returns a way to read the data.
	// It will be closed when the read is completed.
	//
	// DataFileHash returns the checksum of the
	// content as provided by the service. If the
	// service (or data source) does not provide a
	// hash, leave this field empty, but note that
	// later it will be impossible to efficiently
	// know whether the content has changed on the
	// service from what is stored locally.
	//
	// DataFileMIMEType returns the MIME type of
	// the data file, if known.
	DataFileName() *string
	DataFileReader() (io.ReadCloser, error)
	DataFileHash() []byte
	DataFileMIMEType() *string

	// Metadata returns any optional metadata.
	// Feel free to leave as many fields empty
	// as you'd like: the less fields that are
	// filled out, the smaller the storage size.
	// Metadata is not indexed by the DB but is
	// rendered in projections and queries
	// according to the item's classification.
	Metadata() (*Metadata, error)

	// Location returns an item's location,
	// if known. For now, only Earth
	// coordinates are accepted, but we can
	// improve this later.
	Location() (*Location, error)
}

Item is the central concept of a piece of content from a service or data source. Take note of which methods are required to return non-empty values.

The actual content of an item is stored either in the database or on disk as a file. Generally, content that is text-encoded can and should be stored in the database where it will be indexed. However, if the item's content (for example, the bytes of a photo or video) are not text or if the text is too large to store well in a database (for example, an entire novel), it should be stored on disk, and this interface has methods to accommodate both. Note that an item may have both text and non-text content, too: for example, photos and videos may have descriptions that are as much "content" as the media iteself. One part of an item is not mutually exclusive with any other.

type ItemClass

type ItemClass int

ItemClass classifies an item.

const (
	ClassUnknown ItemClass = iota
	ClassImage
	ClassVideo
	ClassAudio
	ClassPost
	ClassLocation
	ClassEmail
	ClassPrivateMessage
	ClassMessage
)

Various classes of items.

type ItemGraph

type ItemGraph struct {
	// The node item. This can be nil, but note that
	// Edges will not be traversed if Node is nil,
	// because there must be a node on both ends of
	// an edge.
	//
	// Optional.
	Node Item

	// Edges are represented as 1:many relations
	// to other "graphs" (nodes in the graph).
	// Fill this out to add multiple items to the
	// timeline at once, while drawing the
	// designated relationships between them.
	// Useful when processing related items in
	// batches.
	//
	// Directional relationships go from Node to
	// the map key.
	//
	// If the items involved in a relationship are
	// not efficiently available at the same time
	// (i.e. if loading both items involved in the
	// relationship would take a non-trivial amount
	// of time or API calls), you can use the
	// Relations field instead, but only after the
	// items have been added to the timeline.
	//
	// Optional.
	Edges map[*ItemGraph][]Relation

	// If items in the graph belong to a collection,
	// specify them here. If the collection does not
	// exist (by row ID or AccountID+OriginalID), it
	// will be created. If it already exists, the
	// collection in the DB will be unioned with the
	// collection specified here. Collections are
	// processed regardless of Node and Edges.
	//
	// Optional.
	Collections []Collection

	// Relationships between existing items in the
	// timeline can be represented here in a list
	// of item IDs that are connected by a label.
	// This field is useful when relationships and
	// the items involved in them are not discovered
	// at the same time. Relations in this list will
	// be added to the timeline, joined by the item
	// IDs described in the RawRelations, only if
	// the items having those IDs (as provided by
	// the data source; we're not talking about DB
	// row IDs here) already exist in the timeline.
	// In other words, this is a best-effort field;
	// useful for forming relationships of existing
	// items, but without access to the actual items
	// themselves. If you have the items involved in
	// the relationships, use Edges instead.
	//
	// Optional.
	Relations []RawRelation
}

ItemGraph is an item with optional connections to other items. All ItemGraph values should be pointers to ensure consistency. The usual weird/fun thing about representing graph data structures in memory is that a graph is a node, and a node is a graph. 🤓

func NewItemGraph

func NewItemGraph(node Item) *ItemGraph

NewItemGraph returns a new node/graph.

func (*ItemGraph) Add

func (ig *ItemGraph) Add(item Item, rel Relation)

Add adds item to the graph ig by making an edge described by rel from the node ig to a new node for item.

This method is for simple inserts, where the only thing to add to the graph at this moment is a single item, since the graph it inserts contains only a single node populated by item. To add a full graph with multiple items (i.e. a graph with edges), call ig.Connect directly.

func (*ItemGraph) Connect

func (ig *ItemGraph) Connect(node *ItemGraph, rel Relation)

Connect is a simple convenience function that adds a graph (node) to ig by an edge described by rel.

type ItemRow

type ItemRow struct {
	ID         int64
	AccountID  int64
	OriginalID string
	PersonID   int64
	Timestamp  time.Time
	Stored     time.Time
	Modified   *time.Time
	Class      ItemClass
	MIMEType   *string
	DataText   *string
	DataFile   *string
	DataHash   *string // base64-encoded SHA-256
	Metadata   *Metadata
	Location
	// contains filtered or unexported fields
}

ItemRow has the structure of an item's row in our DB.

type ListingOptions

type ListingOptions struct {
	// A file from which to read the data.
	Filename string

	// Time bounds on which data to retrieve.
	// The respective time and item ID fields
	// which are set must never conflict.
	Timeframe Timeframe

	// A checkpoint from which to resume
	// item retrieval.
	Checkpoint []byte

	// Enable verbose output (logs).
	Verbose bool
}

ListingOptions specifies parameters for listing items from a data source. Some data sources might not be able to honor all fields.

type Location

type Location struct {
	Latitude  *float64
	Longitude *float64
}

Location contains location information.

type MergeOptions

type MergeOptions struct {
	// Enables "soft" merging.
	//
	// If true, an item may be merged if it is likely
	// to be the same as an existing item, even if the
	// item IDs are different. For example, if a
	// service has multiple ways of listing items, but
	// does not provide a consistent ID for the same
	// item across listings, a soft merge will allow the
	// processing to treat them as the same as long as
	// other fields match: timestamp, and either data text
	// or data filename.
	SoftMerge bool

	// Overwrite existing (old) item's ID with the ID
	// provided by the current (new) item.
	PreferNewID bool

	// Overwrite existing item's text data.
	PreferNewDataText bool

	// Overwrite existing item's data file.
	PreferNewDataFile bool

	// Overwrite existing item's metadata.
	PreferNewMetadata bool
}

MergeOptions configures how items are merged. By default, items are not merged; if an item with a duplicate ID is encountered, it will be replaced with the new item (see the "reprocess" flag). Merging has to be explicitly enabled.

Currently, the only way to perform a merge is to enable "soft" merging: finding an item with the same timestamp and either text data or filename. Then, one of the item's IDs is updated to match the other. These merge options configure how the items are then combined.

As it is possible and likely for both items to have non-empty values for the same fields, these "conflicts" must be resolved non-interactively. By default, a merge conflict prefers existing values (old item's field) over the new one, and the new one only fills in missing values. (This seems safest.) However, these merge options allow you to customize that behavior and overwrite existing values with the new item's fields (only happens if new item's field is non-empty, i.e. a merge will never delete existing data).

type Metadata

type Metadata struct {
	// A hash or etag provided by the service to
	// make it easy to know if it has changed
	ServiceHash []byte

	// Locations
	LocationAccuracy int
	Altitude         int // meters
	AltitudeAccuracy int
	Heading          int // degrees
	Velocity         int

	GeneralArea string // natural language description of a location

	// Photos and videos
	EXIF map[string]interface{}

	Width  int
	Height int

	// TODO: Google Photos (how many of these belong in EXIF?)
	CameraMake      string
	CameraModel     string
	FocalLength     float64
	ApertureFNumber float64
	ISOEquivalent   int
	ExposureTime    time.Duration

	FPS float64 // Frames Per Second

	// Posts (Facebook so far)
	Link        string
	Description string
	Name        string
	ParentID    string
	StatusType  string
	Type        string

	Shares int // aka "Retweets" or "Reshares"
	Likes  int
}

Metadata is a unified structure for storing item metadata in the DB.

type NewClientFn

type NewClientFn func(acc Account) (Client, error)

NewClientFn is a function that returns a client which, given the account passed in, can interact with a service provider.

type OAuth2

type OAuth2 struct {
	// The ID of the service must be recognized
	// by the OAuth2 app configuration.
	ProviderID string

	// The list of scopes to ask for during auth.
	Scopes []string
}

OAuth2 defines which OAuth2 provider a service uses and which scopes it requires.

type Person

type Person struct {
	ID         int64
	Name       string
	Identities []PersonIdentity
}

Person represents a person.

type PersonIdentity

type PersonIdentity struct {
	ID           int64
	PersonID     string
	DataSourceID string
	UserID       string
}

PersonIdentity is a way to map a user ID on a service to a person.

type ProcessingOptions

type ProcessingOptions struct {
	Reprocess bool
	Prune     bool
	Integrity bool
	Timeframe Timeframe
	Merge     MergeOptions
	Verbose   bool
}

ProcessingOptions configures how item processing is carried out.

type RateLimit

type RateLimit struct {
	RequestsPerHour int
	BurstSize       int
	// contains filtered or unexported fields
}

RateLimit describes a rate limit.

type RawRelation

type RawRelation struct {
	FromItemID       string
	ToItemID         string
	FromPersonUserID string
	ToPersonUserID   string
	Relation
}

RawRelation represents a relationship between two items or people (or both) from the same data source (but not necessarily the same accounts; we assume that a data source's item IDs are globally unique across accounts). The item IDs should be those which are assigned/provided by the data source, NOT a database row ID. Likewise, the persons' user IDs should be the IDs of the user as associated with the data source, NOT their row IDs.

type Relation

type Relation struct {
	Label         string
	Bidirectional bool
}

Relation describes how two nodes in a graph are related. It's essentially an edge on a graph.

type Timeframe

type Timeframe struct {
	Since, Until             *time.Time
	SinceItemID, UntilItemID *string
}

Timeframe represents a start and end time and/or a start and end item, where either value could be nil which means unbounded in that direction. When items are used as the timeframe boundaries, the ItemID fields will be populated. It is not guaranteed that any particular field will be set or unset just because other fields are set or unset. However, if both Since or both Until fields are set, that means the timestamp and items are correlated; i.e. the Since timestamp is (approx.) that of the item ID. Or, put another way: there will never be conflicts among the fields which are non-nil.

func (Timeframe) String

func (tf Timeframe) String() string

type Timeline

type Timeline struct {
	// contains filtered or unexported fields
}

Timeline represents an opened timeline repository. The zero value is NOT valid; use Open() to obtain a valid value.

func Open

func Open(repo string) (*Timeline, error)

Open creates/opens a timeline at the given repository directory. Timelines should always be Close()'d for a clean shutdown when done.

func (*Timeline) AddAccount

func (t *Timeline) AddAccount(dataSourceID, userID string) error

AddAccount authenticates userID with the service identified within the application by dataSourceID, and then stores it in the database. The account must not yet exist.

func (*Timeline) Authenticate

func (t *Timeline) Authenticate(dataSourceID, userID string) error

Authenticate gets authentication for userID with dataSourceID. If the account already exists in the database, it will be updated with the latest authorization.

func (*Timeline) Close

func (t *Timeline) Close() error

Close frees up resources allocated from Open.

func (*Timeline) NewClient

func (t *Timeline) NewClient(dataSourceID, userID string) (WrappedClient, error)

NewClient returns a new Client that is ready to interact with the data source for the account uniquely specified by the data source ID and the user ID for that data source. The Client is actually wrapped by a type with unexported fields that are necessary for internal use.

type WrappedClient

type WrappedClient struct {
	Client
	// contains filtered or unexported fields
}

WrappedClient wraps a Client instance with unexported fields that contain necessary state for performing data collection operations. Do not craft this type manually; use Timeline.NewClient() to obtain one.

func (*WrappedClient) DataSourceID

func (wc *WrappedClient) DataSourceID() string

DataSourceID returns the ID of the data source wc was created from.

func (*WrappedClient) DataSourceName

func (wc *WrappedClient) DataSourceName() string

DataSourceName returns the name of the data source wc was created from.

func (*WrappedClient) GetAll

func (wc *WrappedClient) GetAll(ctx context.Context, procOpt ProcessingOptions) error

GetAll gets all the items using wc. If procOpt.Reprocess is true, items that are already in the timeline will be re-processed. If procOpt.Prune is true, items that are not listed on the data source by wc will be removed from the timeline at the end of the listing. If procOpt.Integrity is true, all items that are listed by wc that exist in the timeline and which consist of a data file will be opened and checked for integrity; if the file has changed, it will be reprocessed.

func (*WrappedClient) GetLatest

func (wc *WrappedClient) GetLatest(ctx context.Context, procOpt ProcessingOptions) error

GetLatest gets the most recent items from wc. It does not prune or reprocess; only meant for a quick pull (error will be returned if procOpt is not compatible). If there are no items pulled yet, all items will be pulled. If procOpt.Timeframe.Until is not nil, the latest only up to that timestamp will be pulled, and if until is after the latest item, no items will be pulled.

func (*WrappedClient) Import

func (wc *WrappedClient) Import(ctx context.Context, filename string, procOpt ProcessingOptions) error

Import is like GetAll but for a locally-stored archive or export file that can simply be opened and processed, rather than needing to run over a network. See the godoc for GetAll. This is only for data sources that support Import.

func (*WrappedClient) UserID

func (wc *WrappedClient) UserID() string

UserID returns the ID of the user associated with this client.

Directories

Path Synopsis
cmd
datasources
facebook
Package facebook implements the Facebook service using the Graph API: https://developers.facebook.com/docs/graph-api
Package facebook implements the Facebook service using the Graph API: https://developers.facebook.com/docs/graph-api
googlelocation
Package googlelocation implements a Timeliner data source for importing data from the Google Location History (aka Google Maps Timeline).
Package googlelocation implements a Timeliner data source for importing data from the Google Location History (aka Google Maps Timeline).
googlephotos
Package googlephotos implements the Google Photos service using its API, documented at https://developers.google.com/photos/.
Package googlephotos implements the Google Photos service using its API, documented at https://developers.google.com/photos/.
instagram
Package instagram implements a Timeliner data source for importing data from Instagram archive files.
Package instagram implements a Timeliner data source for importing data from Instagram archive files.
smsbackuprestore
Package smsbackuprestore implements a Timeliner data source for the Android SMS Backup & Restore app by SyncTech: https://synctech.com.au/sms-backup-restore/
Package smsbackuprestore implements a Timeliner data source for the Android SMS Backup & Restore app by SyncTech: https://synctech.com.au/sms-backup-restore/
twitter
Package twitter implements a Timeliner service for importing and downloading data from Twitter.
Package twitter implements a Timeliner service for importing and downloading data from Twitter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
t or T : Toggle theme light dark auto