go-mod-licenses
A tool to automate license management workflow for go module project's dependencies and transitive dependencies.
DISCLAIMER
This tool does not provide any legal advices, please check license results by yourself.
Install
Install the go binary:
go get github.com/Bobgy/go-mod-licenses
Output Example
NOTICES folder is an example of generated NOTICES for go-mod-licenses tool itself.
Usage
One-off License Update
-
Create a GitHub personal access token
and place it in file ~/.github_access_token
or environment variable GITHUB_ACCESS_TOKEN
.
The token doesn't need any user permissions. It's only used to increase GitHub's rate limiting on API requests.
-
Get version of the repo you need licenses info:
git clone <go-mod-repo-you-need-license-info>
cd <go-mod-repo-you-need-license-info>
git checkout <version>
-
Get dependencies from go modules and generate a license_info.csv
file of their licenses:
go-mod-licenses csv
The csv file has three columns: depdency
, license download url
and inferred license type
.
Note, the format is consistent with google/go-licenses.
-
The tool may fail to identify:
- Location of a license: they will be left out in the csv.
- SPDX ID of a license: they will be named
Unknown
in the csv.
Please check them manually and update your license_dict.csv
, refer to the example.
-
Download notices, licenses and source folders that should be distributed along with the built binary:
go-mod-licenses save
Notices and licenses will be concatenated to a single file called NOTICES/license.txt
.
Source code folders will be copied to NOTICES/<module/import/path>
.
Some licenses will be rejected based on https://github.com/google/licenseclassifier/blob/df6aa8a2788bdf5ac382148c2453a407a29819b8/license_type.go#L341.
Integrating in CI
An early idea is to add a simple script that
- If
go.mod
has been updated, but not the license files.
- Fails and says you should update the license files.
The check could be stricter if we actually run this tool in CI, we might worry
about flakiness, because various dependencies could be down temporarily.
Implementation Details
Rough idea of steps in the two commands.
go-mod-licenses csv
does the following to generate the license_info.csv
:
- Load
license_dict.csv
if it exists, use its entries as lookup table.
- All dependencies and transitive dependencies are listed by
go mod list -m all
.
- Get a dependency's github repo by fetching meta info like
curl 'https://k8s.io/client-go?go-get=1'
.
- Use GitHub repo license API to get its license file download URL and type.
- GitHub may not be able to identify some licenses, use <github.com/google/licenseclassifier> to identify the remaining ones.
- The tool reject those with less than 85% confidence.
- The tool give a warning for identifications with less than 95% confidence.
- Generate CSV output as described above.
- Report dependencies the tool failed to deal with during the process.
go-mod-licenses save
does the following:
- Read from
license_info.csv
generated in go-mod-licenses csv
.
- Call github.com/google/licenseclassifier to get license type.
- Three types of reactions to license type:
Known Caveats
- This tool assumes one Go module can only have one license.
- This tool pulls all dependencies in a Go module, it does not know which exact dependencies were used when building multiple binaries. So it may pull in e.g. dev dependencies. I am assuming keeping extra notices and licenses is no harm (except for container image size).
- go-mod-licenses is a rewrite of kubeflow/testing/go-license-tools in go, with many improvements
- better & more robust github repo resolution ratio
- better license classification rate using google/licenseclassifier (it especially handles BSD-2-Clause and BSD-3-Clause significantly better than GitHub license API).
- automates licenses that require distributing source code with it (copied from local module src cache)
- simpler process e2e (instead of too many intermediate steps and config files)
- rewritten in go, so it's easier to redistribute the binary than python
- go-mod-licenses is heavily affected by github.com/google/go-licenses, with the difference:
- go-mod-licenses works with go modules, while go-licenses works with GOPATH.
- go-mod-licenses gets initial license info from GitHub license API, while go-licenses detects by heuristics in local source folders.
- go-mod-licenses supports using a manually maintained lookup table
licenses_dict.csv
, so recurring license changes during release can reuse existing information.
- go-mod-licenses was mostly written before I learned github.com/github/licensed is a thing. I have never tried github/licensed, because I'm not familiar with its tool chain -- Ruby, but just reading its documentation, I can see that it has become a fairly mature eco-system that supports many languages and a robust workflow for managing licenses.
Roadmap
Ideas of ways to improve this tool:
- Find better default locations of generated files.
- Make some default options configurable.
- Improve logging format & consistency.
- Examples for integrating it in CI/CD.