go-mod-licenses

command module
v0.0.0-...-a5334b5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 9, 2021 License: Apache-2.0 Imports: 1 Imported by: 0

README

go-mod-licenses

A tool to automate license management workflow for go module project's dependencies and transitive dependencies.

DISCLAIMER

This tool does not provide any legal advices, please check license results by yourself.

Install

Install the go binary:

go get github.com/Bobgy/go-mod-licenses

Output Example

NOTICES folder is an example of generated NOTICES for go-mod-licenses tool itself.

Usage

One-off License Update
  1. Create a GitHub personal access token and place it in file ~/.github_access_token or environment variable GITHUB_ACCESS_TOKEN.

    The token doesn't need any user permissions. It's only used to increase GitHub's rate limiting on API requests.

  2. Get version of the repo you need licenses info:

    git clone <go-mod-repo-you-need-license-info>
    cd <go-mod-repo-you-need-license-info>
    git checkout <version>
    
  3. Get dependencies from go modules and generate a license_info.csv file of their licenses:

    go-mod-licenses csv
    

    The csv file has three columns: depdency, license download url and inferred license type.

    Note, the format is consistent with google/go-licenses.

  4. The tool may fail to identify:

    • Location of a license: they will be left out in the csv.
    • SPDX ID of a license: they will be named Unknown in the csv.

    Please check them manually and update your license_dict.csv, refer to the example.

  5. Download notices, licenses and source folders that should be distributed along with the built binary:

    go-mod-licenses save
    

    Notices and licenses will be concatenated to a single file called NOTICES/license.txt. Source code folders will be copied to NOTICES/<module/import/path>.

    Some licenses will be rejected based on https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341.

Integrating in CI

An early idea is to add a simple script that

  1. If go.mod has been updated, but not the license files.
  2. Fails and says you should update the license files.

The check could be stricter if we actually run this tool in CI, we might worry about flakiness, because various dependencies could be down temporarily.

Implementation Details

Rough idea of steps in the two commands.

go-mod-licenses csv does the following to generate the license_info.csv:

  1. Load license_dict.csv if it exists, use its entries as lookup table.
  2. All dependencies and transitive dependencies are listed by go mod list -m all.
  3. Get a dependency's github repo by fetching meta info like curl 'https://k8s.io/client-go?go-get=1'.
  4. Use GitHub repo license API to get its license file download URL and type.
  5. GitHub may not be able to identify some licenses, use <github.com/google/licenseclassifier> to identify the remaining ones.
  6. The tool reject those with less than 85% confidence.
  7. The tool give a warning for identifications with less than 95% confidence.
  8. Generate CSV output as described above.
  9. Report dependencies the tool failed to deal with during the process.

go-mod-licenses save does the following:

  1. Read from license_info.csv generated in go-mod-licenses csv.
  2. Call github.com/google/licenseclassifier to get license type.
  3. Three types of reactions to license type:

Known Caveats

  • This tool assumes one Go module can only have one license.
  • This tool pulls all dependencies in a Go module, it does not know which exact dependencies were used when building multiple binaries. So it may pull in e.g. dev dependencies. I am assuming keeping extra notices and licenses is no harm (except for container image size).

Comparison with similar tools

  • go-mod-licenses is a rewrite of kubeflow/testing/go-license-tools in go, with many improvements
    • better & more robust github repo resolution ratio
    • better license classification rate using google/licenseclassifier (it especially handles BSD-2-Clause and BSD-3-Clause significantly better than GitHub license API).
    • automates licenses that require distributing source code with it (copied from local module src cache)
    • simpler process e2e (instead of too many intermediate steps and config files)
    • rewritten in go, so it's easier to redistribute the binary than python
  • go-mod-licenses is heavily affected by github.com/google/go-licenses, with the difference:
    • go-mod-licenses works with go modules, while go-licenses works with GOPATH.
    • go-mod-licenses gets initial license info from GitHub license API, while go-licenses detects by heuristics in local source folders.
    • go-mod-licenses supports using a manually maintained lookup table licenses_dict.csv, so recurring license changes during release can reuse existing information.
  • go-mod-licenses was mostly written before I learned github.com/github/licensed is a thing. I have never tried github/licensed, because I'm not familiar with its tool chain -- Ruby, but just reading its documentation, I can see that it has become a fairly mature eco-system that supports many languages and a robust workflow for managing licenses.

Roadmap

Ideas of ways to improve this tool:

  • Find better default locations of generated files.
  • Make some default options configurable.
  • Improve logging format & consistency.
  • Examples for integrating it in CI/CD.

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis
NOTICES

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL