cr-audit-commits
What is this app?
cr-audit-commits is a GAE go app intended to verify that changes landed in the
git repositories monitored, comply with certain policies. E.g. Timely code
review approvals, automated rolls only modify allowed files, automated reverts
always identify a valid CI failure, release branch merges have correct
approvals, etc.
What can it do?
Monitor a ref in a git repo continuously, apply custom rules to relevant commits
to decide if they comply with policies, issue notifications (bug filing, email
sending) if a violation is detected.
At the moment, in order to decide whether a policy has been broken, the
application has access to the commit's information as exposed by gitiles, the
originating changelist information as exposed by gerrit and information from
the continuous integration system, i.e. chromium's main waterfall.
How does it work?
Scheduler
A Scheduler cron job periodically runs and it iterates over the
configured repositories defined in the RuleMap, it resolves any
dynamic refs and creates datastore entries for any monitored repos that do not
have it yet (i.e. for the first run after a repo has been added to the
configuration or when a dynamic ref changes) It then schedules audit tasks for
each monitored repository.
Audit Task
The audit task is a TaskQueue task that does the following
- Requests a log from gitiles since the last known commit,
- Each commit is checked with a function defined by the ruleset to see if the
commit is relevant and needs to be audited matches
- Creates a datastore entry for each new commit that needs
to be audited.
- Updates the datastore entry for the repo with the newest LastKnownCommit.
- Scans the datastoreto get the commits that have
not yet been audited.
- A pool of worker goroutines are started and a job for
auditing each commit will be sent to this pool.
- Each worker will then take one job, and execute each of the
rules for that commit, determine the status of the
commit based on the result of the rules and send it back to the main
goroutine.
- If any of the rules cannot be decided e.g. due to a failure in another
service, the current approach is for the rule to panic and for the worker
to recover, increment the number of retries attempted for the commit and
move on. There is a bug to change this to
regular golang error handling.
- The main routine then saves the statuses of all the audited commits in a
a single batch to the datastore.
- After this, the task scans the datastore for all the commits
that need a notification to be sent, sends the notification, and saves a
notification state to the datastore (E.g. to avoid repeated notifications
for the same reason if for example the task issuing the notification times
out).
How Rules Work
Rules are functions (wrapped as a method of an empty struct) that
decide whether a given commit complies with a given policy.
Rules receive some information about the repo being audited, and the information
about the commit to audit, as well a set of clients initialized and ready to
talk to external services (such as monorail) that may be needed to determine if
the commit complies with policy.
Rules are expected to return a RuleResult.
For ownership and organization, it is expected that related rules live together
in a separate file. E.g. tbr_rules.go
Notifications
Each RuleSet is responsible for providing a notification function,
that will be called with each commit that has failed an audit (or has been
determined that needs to issue a notification for some other reason that may not
be a policy violation).
Details can be seen at notification.go
Extending the app
This is an example CL that adds support for a repository.
Deployment procedure
- Check out the revision to deploy,
- Sync dependencies with
gclient sync
- Verify unit-tests run successfully with
go test
from the app
directory
- Use this script
gae.py upload
to deploy a new version.
- Use
gae.py switch
to make the new version default. Or use the web console.
- Wait for the next run of the cron job (or manually trigger it)
and examine the logs for any unexpected failures.
Known Issues
See bug queue