git

package module
v3.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 1, 2025 License: BSD-3-Clause Imports: 17 Imported by: 6

README

go-whosonfirst-iterate-git

Go package implementing whosonfirst/go-whosonfirst-iterate/v3.Iterator functionality for Git repositories.

Documentation

Go Reference

Example

Version 3.x of this package introduce major, backward-incompatible changes from earlier releases. That said, migragting from version 2.x to 3.x should be relatively straightforward as a the basic concepts are still the same but (hopefully) simplified. Where version 2.x relied on defining a custom callback for looping over records version 3.x use Go's iter.Seq2 iterator construct to yield records as they are encountered.

import (
	"context"
	"flag"
	"log"

	_ "github.com/whosonfirst/go-whosonfirst-iterate-git/v3"
	"github.com/whosonfirst/go-whosonfirst-iterate/v3"
)

func main() {

     	var iterator_uri string

	flag.StringVar(&iterator_uri, "iterator-uri", "git:///tmp". "A registered whosonfirst/go-whosonfirst-iterate/v3.Iterator URI.")
	ctx := context.Background()
	
	iter, _:= iterate.NewIterator(ctx, iterator_uri)

	paths := flag.Args()
	
	for rec, _ := range iter.Iterate(ctx, paths...) {
		defer rec.Body.Close()
		log.Printf("Indexing %s\n", rec.Path)
	}
}

Error handling removed for the sake of brevity.

Version 2.x (the old way)

This is how you would do the same thing using the older version 2.x code:

import (
       "context"
       "flag"
       "io"
       "log"

       _ "github.com/whosonfirst/go-whosonfirst-iterate-github/v2"
       
       "github.com/whosonfirst/go-whosonfirst-iterate/emitter/v2"       
       "github.com/whosonfirst/go-whosonfirst-iterate/indexer/v2"
)

func main() {

	emitter_uri := flag.String("emitter-uri", "githubapi://", "A valid whosonfirst/go-whosonfirst-iterate/emitter URI")
	
     	flag.Parse()

	ctx := context.Background()

	emitter_cb := func(ctx context.Context, path string, fh io.ReadSeeker, args ...interface{}) error {
		log.Printf("Indexing %s\n", path)
		return nil
	}

	iter, _ := iterator.NewIterator(ctx, *emitter_uri, cb)

	uris := flag.Args()
	iter.IterateURIs(ctx, uris...)
}

Error handling omitted for the sake of brevity.

Iterators

This package exports the following iterators:

git://
git://{PATH}?{PARAMETERS}

Where {PATH} is an optional path on disk where a repository will be clone to (default is to clone repository in memory) and {PARAMETERS} may be:

Name Type Required Notes
include string no Zero or more aaronland/go-json-query query strings containing rules that must match for a document to be considered for further processing.
exclude string no Zero or more aaronland/go-json-query query strings containing rules that if matched will prevent a document from being considered for further processing.
include_mode string no A valid aaronland/go-json-query query mode string for testing inclusion rules. Default is "AND".
exclude_mode string no A valid aaronland/go-json-query query mode string for testing exclusion rules. Default is "AND".
preserve boolean no A boolean value indicating whether a Git repository (cloned to disk) should not be removed after processing. Default is false.
depth int no An integer value indicating the number of commits to fetch. Default is 1.
githubapi://
githubapi://{GITHUB_ORGANIZATION}/{GITHUB_REPO}?{PARAMETERS}

Iterate through all the files in a specific GitHub respository using the GitHub API.

Query parameters

In addition to the default go-whosonfirst-iterate query parameters the following query parameters are supported:

Name Value Required Notes
access_token String Yes A valid GitHub API access token
branch String No The branch to use when iterating the repository contents
concurrent Bool No If true iterate through documents concurrently. There is still a throttle on the number of API requests per second but this can speed things up significantly with the risk that you will still trigger GitHub API limits.

This iterator requires importing github.com/whosonfirst/go-whosonfirst-iterate/v3/github.

githuborganization://
githuborganization://?{PARAMETERS}

Iterate through all the files in one or more GitHub respositories for an organization. The list of respositories is derived using the GitHub API but, as of this writing, fetching the respository files to iterate over is done using the whosonfirst/go-whosonfirst-iterate-git/v3 package and plain-vanilla git clone operations rather than (or in addition to) the GitHub API.

In addition to the default go-whosonfirst-iterate query parameters the following query parameters are supported:

Name Value Required Notes
dedupe Boolean No Skip Who's On First records (IDs) which may occur in multiple repositories.

The URIs passed to this iterator's Iterate method should be recognized `whosonfirst/go-whosonfirst-iterate-git URIs.

This iterator requires importing github.com/whosonfirst/go-whosonfirst-iterate/v3/github.

Filters

QueryFilters

You can also specify inline queries by appending one or more include or exclude parameters to a emitter.Emitter URI, where the value is a string in the format of:

{PATH}={REGULAR EXPRESSION}

Paths follow the dot notation syntax used by the tidwall/gjson package and regular expressions are any valid Go language regular expression. Successful path lookups will be treated as a list of candidates and each candidate's string value will be tested against the regular expression's MatchString method.

For example:

repo://?include=properties.wof:placetype=region

You can pass multiple query parameters. For example:

repo://?include=properties.wof:placetype=region&include=properties.wof:name=(?i)new.*

The default query mode is to ensure that all queries match but you can also specify that only one or more queries need to match by appending a include_mode or exclude_mode parameter where the value is either "ANY" or "ALL".

Tools

$> make cli
go build -mod vendor -o bin/count cmd/count/main.go
go build -mod vendor -o bin/emit cmd/emit/main.go
count

Count files in one or more whosonfirst/go-whosonfirst-iterate/emitter sources.

$> ./bin/count -h
Count files in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources.
Usage:
	 ./bin/count [options] uri(N) uri(N)
Valid options are:

  -iterator-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/v3.Iterator URI. Supported iterator URI schemes are: cwd://,directory://,featurecollection://,file://,filelist://,geojsonl://,git://,githubapi://,githuborg://,null://,repo:// (default "repo://")
  -verbose
    	Enable verbose (debug) logging.

For example:

$> ./bin/count -iterator-uri git:///tmp https://github.com/sfomuseum-data/sfomuseum-data-architecture.git
2025/06/24 05:27:43 INFO Counted records count=2019 time=7.248527917s

By default go-whosonfirst-iterate-git clones Git repositories in to memory. If your iterator URI contains a path then repositories will be cloned in that path:

By default repositories cloned in to a path are removed. If you want to preserve the cloned repository include a ?preserve=1 query parameter in your URI string:

$> /bin/count -iterator-uri 'git:///tmp?preserve=1' https://github.com/sfomuseum-data/sfomuseum-data-architecture.git
2025/06/24 05:29:22 INFO Counted records count=2019 time=4.728772811s

$> ls -al /tmp/sfomuseum-data-architecture.git 
total 48
drwxr-xr-x   9 asc   wheel    288 Jun 24 05:29 .
drwxrwxrwt   5 root  wheel    160 Jun 24 05:29 ..
drwxr-xr-x   8 asc   wheel    256 Jun 24 05:29 .git
-rw-r--r--   1 asc   wheel     14 Jun 24 05:29 .gitignore
drwxr-xr-x  18 asc   wheel    576 Jun 24 05:29 data
-rw-r--r--   1 asc   wheel  10462 Jun 24 05:29 LICENSE
-rw-r--r--   1 asc   wheel   1141 Jun 24 05:29 Makefile
drwxr-xr-x   3 asc   wheel     96 Jun 24 05:29 qgis
-rw-r--r--   1 asc   wheel    771 Jun 24 05:29 README.md
emit

Emit records in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources as structured data.

$> ./bin/emit -h
Emit records in one or more whosonfirst/go-whosonfirst-iterate/v3.Iterator sources as structured data.
Usage:
	 ./bin/emit [options] uri(N) uri(N)
Valid options are:

  -geojson
    	Emit features as a well-formed GeoJSON FeatureCollection record.
  -iterator-uri string
    	A valid whosonfirst/go-whosonfirst-iterate/v3.Iterator URI. Supported iterator URI schemes are: cwd://,directory://,featurecollection://,file://,filelist://,geojsonl://,git://,githubapi://,githuborg://,null://,repo:// (default "repo://")
  -json
    	Emit features as a well-formed JSON array.
  -null
    	Publish features to /dev/null
  -stdout
    	Publish features to STDOUT. (default true)
  -verbose
    	Enable verbose (debug) logging.

For example:

$> ./bin/emit \
	-geojson \
	-iterator-uri 'git://?include=properties.mz:is_current=1&include=properties.sfomuseum:placetype=gate' \
	https://github.com/sfomuseum-data/sfomuseum-data-architecture.git \

| jq '.features[]["properties"]["wof:label"]'

"B1 (2021-05-25)"
"C10R (2021-05-25)"
"A13R (2021-05-25)"
"G12S (2021-05-25)"
"G12V (2021-05-25)"
"F15M (2021-05-25)"
"C4U (2021-05-25)"
...and so on

See also

Documentation

Overview

Package git implements the `whosonfirst/go-whosonfirst-iterate/v2` interfaces for iterating (Who's On First) documents stored in a Git repository.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func NewGitIterator

func NewGitIterator(ctx context.Context, uri string) (iterate.Iterator, error)

NewGitIterator() returns a new `GitIterator` instance configured by 'uri' in the form of:

git://{PATH}?{PARAMETERS}

Where {PATH} is an optional path on disk where a repository will be clone to (default is to clone repository in memory) and {PARAMETERS} may be: * `?include=` Zero or more `aaronland/go-json-query` query strings containing rules that must match for a document to be considered for further processing. * `?exclude=` Zero or more `aaronland/go-json-query` query strings containing rules that if matched will prevent a document from being considered for further processing. * `?include_mode=` A valid `aaronland/go-json-query` query mode string for testing inclusion rules. * `?exclude_mode=` A valid `aaronland/go-json-query` query mode string for testing exclusion rules. * `?preserve=` A boolean value indicating whether a Git repository (cloned to disk) should not be removed after processing. * `?depth=` An integer value indicating the number of commits to fetch. Default is 1.

Types

type GitIterator

type GitIterator struct {
	iterate.Iterator
	// contains filtered or unexported fields
}

GitIterator implements the `Iterator` interface for crawling records in a Git repository.

func (*GitIterator) Close

func (it *GitIterator) Close() error

Close performs any implementation specific tasks before terminating the iterator.

func (*GitIterator) IsIterating

func (it *GitIterator) IsIterating() bool

IsIterating() returns a boolean value indicating whether 'it' is still processing documents.

func (*GitIterator) Iterate

func (it *GitIterator) Iterate(ctx context.Context, uris ...string) iter.Seq2[*iterate.Record, error]

Iterate will return an `iter.Seq2[*Record, error]` for each record encountered in 'uris'.

func (*GitIterator) Seen

func (it *GitIterator) Seen() int64

Seen() returns the total number of records processed so far.

Directories

Path Synopsis
cmd
count command
emit command

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL