README

ARCHIVED

Gryffin (beta) Build Status GoDoc

Gryffin is a large scale web security scanning platform. It is not yet another scanner. It was written to solve two specific problems with existing scanners: coverage and scale.

Better coverage translates to fewer false negatives. Inherent scalability translates to capability of scanning, and supporting a large elastic application infrastructure. Simply put, the ability to scan 1000 applications today to 100,000 applications tomorrow by straightforward horizontal scaling.

Coverage

Coverage has two dimensions - one during crawl and the other during fuzzing. In crawl phase, coverage implies being able to find as much of the application footprint. In scan phase, or while fuzzing, it implies being able to test each part of the application for an applied set of vulnerabilities in a deep.

Crawl Coverage

Today a large number of web applications are template-driven, meaning the same code or path generates millions of URLs. For a security scanner, it just needs one of the millions of URLs generated by the same code or path. Gryffin's crawler does just that.

Page Deduplication

At the heart of Gryffin is a deduplication engine that compares a new page with already seen pages. If the HTML structure of the new page is similar to those already seen, it is classified as a duplicate and not crawled further.

DOM Rendering and Navigation

A large number of applications today are rich applications. They are heavily driven by client-side JavaScript. In order to discover links and code paths in such applications, Gryffin's crawler uses PhantomJS for DOM rendering and navigation.

Scan Coverage

As Gryffin is a scanning platform, not a scanner, it does not have its own fuzzer modules, even for fuzzing common web vulnerabilities like XSS and SQL Injection.

It's not wise to reinvent the wheel where you do not have to. Gryffin at production scale at Yahoo uses open source and custom fuzzers. Some of these custom fuzzers might be open sourced in the future, and might or might not be part of the Gryffin repository.

For demonstration purposes, Gryffin comes integrated with sqlmap and arachni. It does not endorse them or any other scanner in particular.

The philosophy is to improve scan coverage by being able to fuzz for just what you need.

Scale

While Gryffin is available as a standalone package, it's primarily built for scale.

Gryffin is built on the publisher-subscriber model. Each component is either a publisher, or a subscriber, or both. This allows Gryffin to scale horizontally by simply adding more subscriber or publisher nodes.

Operating Gryffin

Pre-requisites
  1. Go - go1.13 or later
  2. PhantomJS, v2
  3. Sqlmap (for fuzzing SQLi)
  4. Arachni (for fuzzing XSS and web vulnerabilities)
  5. NSQ ,
    • running lookupd at port 4160,4161
    • running nsqd at port 4150,4151
    • with --max-msg-size=5000000
  6. Kibana and Elastic search, for dashboarding
Installation
go get -u github.com/yahoo/gryffin/...
Run

(WIP)

TODO

  1. Mobile browser user agent
  2. Preconfigured docker images
  3. Redis for sharing states across machines
  4. Instruction to run gryffin (distributed or standalone)
  5. Documentation for html-distance
  6. Implement a JSON serializable cookiejar.
  7. Identify duplicate url patterns based on simhash result.

Talks and Slides

Credits

Licence

Code licensed under the BSD-style license. See LICENSE file for terms.

Documentation

Overview

    Package gryffin is an application scanning infrastructure.

    Index

    Constants

    This section is empty.

    Variables

    This section is empty.

    Functions

    func GenRandomID

    func GenRandomID() string

      GenRandomID generates a random ID.

      func SetLogWriter

      func SetLogWriter(w io.Writer)

        SetLogWriter sets the log writer.

        func SetMemoryStore

        func SetMemoryStore(m *GryffinStore)

          SetMemoryStore sets the package internal global variable for the memory store.

          Types

          type Fingerprint

          type Fingerprint struct {
          	Origin             uint64 // origin
          	URL                uint64 // origin + path
          	Request            uint64 // method, url, body
          	RequestFull        uint64 // request + header
          	ResponseSimilarity uint64
          }

            Fingerprint contains all the different types of hash for the Scan (Request & Response)

            type Fuzzer

            type Fuzzer interface {
            	Fuzz(*Scan) (int, error)
            }

              Fuzzer runs the fuzzing.

              type GryffinStore

              type GryffinStore struct {
              	Oracles map[string]*distance.Oracle
              	Hashes  map[string]bool
              	Hits    map[string]int
              	Mu      sync.RWMutex
              	// contains filtered or unexported fields
              }

                GryffinStore includes data and handles for Gryffin message processing,

                func NewGryffinStore

                func NewGryffinStore() *GryffinStore

                func NewSharedGryffinStore

                func NewSharedGryffinStore() *GryffinStore

                func (*GryffinStore) GetRcvChan

                func (s *GryffinStore) GetRcvChan() chan []byte

                func (*GryffinStore) GetSndChan

                func (s *GryffinStore) GetSndChan() chan []byte

                func (*GryffinStore) Hit

                func (s *GryffinStore) Hit(prefix string) bool

                func (*GryffinStore) See

                func (s *GryffinStore) See(prefix string, kind string, v uint64)

                func (*GryffinStore) Seen

                func (s *GryffinStore) Seen(prefix string, kind string, v uint64, r uint8) bool

                type HTTPDoer

                type HTTPDoer interface {
                	Do(*http.Request) (*http.Response, error)
                }

                  HTTPDoer interface is to be implemented by http.Client

                  type Job

                  type Job struct {
                  	ID             string
                  	DomainsAllowed []string // Domains that we would crawl
                  }

                    Job stores the job id and config (if any).

                    type LogMessage

                    type LogMessage struct {
                    	Service string
                    	Msg     string
                    	Method  string
                    	Url     string
                    	JobID   string
                    }

                      LogMessage contains the data fields to be marshalled as JSON for forwarding to the log processor.

                      type PublishMessage

                      type PublishMessage struct {
                      	F string // function, i.e. See or Seen
                      	T string // type (kind), i.e. oracle or hash
                      	K string // key
                      	V string // value
                      }

                        PublishMessage is the data in the messages handled by Gryffin.

                        type Renderer

                        type Renderer interface {
                        	Do(*Scan)
                        	GetRequestBody() <-chan *Scan
                        	GetLinks() <-chan *Scan
                        }

                          Renderer is an interface for implementation HTML DOM renderer and obtain the response body and links. Since DOM construction is very likely to be asynchronous, we return the channels to receive response and links.

                          type Scan

                          type Scan struct {
                          	// ID is a random ID to identify this particular scan.
                          	// if ID is empty, this scan should not be performed (but record for rate limiting).
                          	ID           string
                          	Job          *Job
                          	Request      *http.Request
                          	RequestBody  string
                          	Response     *http.Response
                          	ResponseBody string
                          	Cookies      []*http.Cookie
                          	Fingerprint  Fingerprint
                          	HitCount     int
                          }

                            A Scan consists of the job, target, request and response.

                            func NewScan

                            func NewScan(method, url, post string) *Scan

                              NewScan creates a scan.

                              func NewScanFromJson

                              func NewScanFromJson(b []byte) *Scan

                                NewScanFromJson creates a Scan from the passed JSON blob.

                                func (*Scan) CrawlAsync

                                func (s *Scan) CrawlAsync(r Renderer)

                                  CrawlAsync run the crawling asynchronously.

                                  func (*Scan) Error

                                  func (s *Scan) Error(service string, err error)

                                    TODO - LogFmt (fmt string) TODO - LogI (interface) Error logs the error for the given service.

                                    func (*Scan) Fuzz

                                    func (s *Scan) Fuzz(fuzzer Fuzzer) (int, error)

                                      Fuzz runs the vulnerability fuzzer, return the issue count.

                                      func (*Scan) IsDuplicatedPage

                                      func (s *Scan) IsDuplicatedPage() bool

                                        IsDuplicatedPage checks if we should proceed based on the Response

                                        func (*Scan) IsScanAllowed

                                        func (s *Scan) IsScanAllowed() bool

                                          IsScanAllowed check if the request URL is allowed per Job.DomainsAllowed.

                                          func (*Scan) Json

                                          func (s *Scan) Json() []byte

                                            Json serializes Scan as JSON.

                                            func (*Scan) Log

                                            func (s *Scan) Log(v interface{})

                                              Log encodes the given argument as JSON and writes it to the log writer.

                                              func (*Scan) Logf

                                              func (s *Scan) Logf(format string, a ...interface{})

                                                Logf logs using the given format string.

                                                func (*Scan) Logm

                                                func (s *Scan) Logm(service, msg string)

                                                  Logm sends a LogMessage to Log processor.

                                                  func (*Scan) Logmf

                                                  func (s *Scan) Logmf(service, format string, a ...interface{})

                                                    Logmf logs the message for the given service.

                                                    func (*Scan) MergeRequest

                                                    func (s *Scan) MergeRequest(req *http.Request)

                                                      MergeRequest merge the request field in scan with the existing one.

                                                      func (*Scan) Poke

                                                      func (s *Scan) Poke(client HTTPDoer) (err error)

                                                        Poke checks if the target is up.

                                                        func (*Scan) RateLimit

                                                        func (s *Scan) RateLimit() int

                                                          RateLimit checks whether we are under the allowed rate for crawling the site. It returns a delay time to wait to check for ReadyToCrawl again.

                                                          func (*Scan) ReadResponseBody

                                                          func (s *Scan) ReadResponseBody()

                                                            ReadResponseBody read Response.Body and fill it to ReadResponseBody. It will also reconstruct the io.ReaderCloser stream.

                                                            func (*Scan) ShouldCrawl

                                                            func (s *Scan) ShouldCrawl() bool

                                                              ShouldCrawl checks if the links should be queued for next crawl.

                                                              func (*Scan) Spawn

                                                              func (s *Scan) Spawn() *Scan

                                                                Spawn spawns a new scan object with a different ID.

                                                                func (*Scan) UpdateFingerprint

                                                                func (s *Scan) UpdateFingerprint()

                                                                  UpdateFingerprint updates the fingerprint field.

                                                                  type SerializableRequest

                                                                  type SerializableRequest struct {
                                                                  	*http.Request
                                                                  	Cancel string
                                                                  }

                                                                    SerializableRequest is a Scan extended with serializable request field.

                                                                    type SerializableResponse

                                                                    type SerializableResponse struct {
                                                                    	*http.Response
                                                                    	Request *SerializableRequest
                                                                    }

                                                                      SerializableResponse is a Scan extended with serializable response field.

                                                                      type SerializableScan

                                                                      type SerializableScan struct {
                                                                      	*Scan
                                                                      	Request  *SerializableRequest
                                                                      	Response *SerializableResponse
                                                                      }

                                                                        SerializableScan is a Scan extended with serializable request and response fields.

                                                                        Directories

                                                                        Path Synopsis
                                                                        cmd
                                                                        Package data provides an interface for common data store operations.
                                                                        Package data provides an interface for common data store operations.
                                                                        fuzzer
                                                                        Package distance is a go library for computing the proximity of the HTML pages.
                                                                        Package distance is a go library for computing the proximity of the HTML pages.