phishdetect

package module
v1.13.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2020 License: AGPL-3.0 Imports: 39 Imported by: 1

README

Build Status Go Report Card GoDoc

PhishDetect

NOTE: This project is experimental. It is not to be used yet, particularly with at-risk users.

PhishDetect is a library and a platform to detect potential phishing pages. It attempts doing so by identifying suspicious and malicious properties both in the domain names and URL provided, as well as in the HTML content of the page opened.

PhishDetect can take HTML strings as input, but it can also just be provided with an URL which will then be opened through a dedicated Docker container which automatically instruments a Google Chrome browser, whose behavior is monitored while navigating to the suspicious link.

Table of Contents

Building

Install Docker Community Edition for Windows, Mac or Linux.

Particularly when using this with PhishDetect Node, you should be looking into installing Docker in Rootless Mode. You can find more information about this in the Node's documentation.

Download the Docker image from Docker Hub using:

$ docker pull phishdetect/phishdetect

You will also need to install Yara and its library. In order to do so, please follow the instructions provided by the official Yara Project documentation.

Now you can download the PhishDetect library:

$ go get -u github.com/phishdetect/phishdetect

For ease of versioning, you should consider using Go 1.11+ Modules in your own project.

Using PhishDetect as a library

You can then use it to analyze a URL or a domain like so:

package main

import (
    "fmt"
    "github.com/phishdetect/phishdetect"
)

func main() {
    // Instantiate an Analysis. The second argument is
    // an HTML string.
    a := phishdetect.NewAnalysis("example.com", "")
    // Perform the analysis of the URL/domain.
    a.AnalyzeURL()
    // Retrieve the name of the the impersonated brand.
    brand := a.Brands.GetBrand()

    // If the domain is recognized as safelisted, this
    // will show as true, otherwise as false.
    fmt.Println(a.Safelisted)
    // This is a total numeric value that is the sum of
    // all the score values of the warnings that were
    // matched during the analysis.
    fmt.Println(a.Score)
    // Print the brand. It will be an empty string if
    // no brand was identified.
    fmt.Println(brand)

    // Print all the matched warnings from the analysis.
    for _, warning := range a.Warnings {
        fmt.Println(warning.Description)
    }
}

If you want to analyze a URL by launching the dockerized Google Chrome:

package main

import (
    "fmt"
    "github.com/phishdetect/phishdetect"
)

func main() {
    url := "example.com"
    // Instantiate a new Browser.
    // The first argument is the URL to analyze.
    // The second argument is the path to the file where to save the screenshot.
    // The third argument is a boolean value to enable or disable routing through Tor.
    b := phishdetect.NewBrowser(url, "/path/to/screen.png", false, false, "")
    // Run the browser.
    b.Run()

    // Now we analyze the results.
    a := phishdetect.NewAnalysis(url, b.HTML)
    a.AnalyzeURL()
    // Analyze the HTML string.
    a.AnalyzeHTML()
    brand := a.Brands.GetBrand()

    // In addition to the results explained in the previous example, we have
    // soma additional information provided by the browser execution.
    // FinalURL will show the last visited URL by the browser. This might differ
    // from the original URL if the browser was redirected.
    fmt.Println(b.FinalURL)

    // Visits contains a list of all the URLs visited by the browser.
    // Normally 302 redirects or JavaScript redirects should appear (although in
    // the latter case, some might not appear if it took to long to load.)
    for _, visit := range b.Visits {
        fmt.Println(visit)
    }

    // In addition to the URL analysis warnings, we should also have any matched
    // HTML analysis warnings.
}

For more information, please refer to the Godoc.

Adding new Brands to the existing list

PhishDetect comes pre-compiled with a fixed set of brands. You might want to load custom ones from external sources. You can easily do so when creating a new Analysis.

import (
    "github.com/phishdetect/phishdetect"
    "github.com/phishdetect/phishdetect/brand"
)

func main() {
    // We create a new Brand.
    myBrand := brand.Brand{
        Name:       "MyBrand",
        Original:   []string{"MyBrand", "MyBrandProduct"},
        Safelist:  []string{"mybrand.com", "mybrand.net", "mybrand.org"},
        Suspicious: []string{"mybland.com", "mybrend.com", "mgbrand.com"},
    }

    // We instantiate a new analysis.
    a := phishdetect.NewAnalysis("example.com", "")
    // We access the list of brands from the current analysis and add a new one.
    a.Brands.AddBrand(myBrand)
    // Finally, we analyze the domain.
    a.AnalyzeURL()
}
Adding Yara rules to the HTML classifier

If you want to scan the visited page's HTML with Yara rules of your own, you just need to initialize PhishDetect's scanner using phishdetect.InitializeYara() and by providing the path (as a string) to either a Yara rule file or a folder containing Yara rule files (with .yar or .yara extensions).

For example:

err := phishdetect.InitializeYara(rulesPath)
if err != nil {
    log.Error("I failed to initialize the Yara scanner: ", err.Error())
}

This needs to be done only once (perhaps in your program's init() function). All following analysis will make use of the same initialized scanner.

Using PhishDetect CLI

Firstly, make sure you have Go 1.11+ installed. We require Go 1.11 or later versions because of the native support for Go Modules, which we use to manage dependencies. If it isn't available for your operating system of choice, we recommend trying gvm.

Now you can either install PhishDetect's command-line interface by simply launching:

go get github.com/phishdetect/phishdetect/cli

Or build the binary from the source code. In order to do so, proceed cloning the Git repository:

$ git clone github.com/phishdetect/phishdetect.git

Move to directory you just cloned and proceed with downloading the depedencies:

$ make deps

In order to build binaries for GNU/Linux:

$ make

Once the compilation is completed, you will find the command-line interface in the build/ folder.

Launch phishdetect-cli -h to view the help message:

Usage of phishdetect-cli:
      --api-version string    Specify which Docker API version to use (default "1.37")
      --brands string         Specify a folder containing YAML files with Brand specifications
      --container string      Specify a name for a docker image to use (default "phishdetect/phishdetect")
      --debug                 Enable debug logging
      --html string           Specify a path to save the HTML from the visited page
      --safebrowsing string   Specify a file path containing your Google SafeBrowsing API key
      --screen string         Specify the file path to store the screenshot
      --tor                   Route connection through the Tor network
      --url-only              Only perform URL analysis
      --yara string           Specify a path to a file or folder contaning Yara rules

Specify a URL and the preferred options and wait for the results to appear:

$ build/linux/phishdetect-cli -screen /tmp/screen.png -tor http://[REDACTED].com/Login
INFO[0000] Analyzing URL http://[REDACTED].com/Login
INFO[0000] Using User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.89 Safar$
INFO[0000] Using debug port: 9564
INFO[0000] Enabled route through the Tor network
INFO[0000] Started container with ID e43f6df4ab0fb8e29453df3ebaede0fe6a4bcbafa4fabaaa1da95573a28552ff
INFO[0000] Attempting to connect to debug port...
INFO[0001] Connection to debug port established!
INFO[0013] Saved screenshot at /tmp/screen.png
INFO[0013] Killed container with ID e43f6df4ab0fb8e29453df3ebaede0fe6a4bcbafa4fabaaa1da95573a28552ff
INFO[0013] Starting to analyze HTML...
INFO[0013] Matched password-input
INFO[0013] Matched suspicious-title
INFO[0014] Starting to analyze the URL...
INFO[0014] Matched suspicious-hostname
INFO[0014] Matched no-tls
INFO[0014] Visits:
INFO[0014]      - http://[REDACTED].com/Login
INFO[0014]      - http://[REDACTED].com/Login/
INFO[0014] Final URL: http://[REDACTED].com/Login/
INFO[0014] Safelisted: false
INFO[0014] Final score: 90
INFO[0014] Brand: tutanota
INFO[0014] Warnings:
INFO[0014]      - The page contains a password input         name=password-input score=10
INFO[0014]      - The page has a suspicious title            name=suspicious-title score=30
INFO[0014]      - The domain contains suspicious words       name=suspicious-hostname score=30
INFO[0014]      - The website is not using a secure transport layer (HTTPS)  name=no-tls score=20

License

PhishDetect is released under GNU Affero General Public License 3.0 and is copyrighted to Claudio Guarnieri.

Documentation

Index

Constants

View Source
const BrowserEventWaitTime time.Duration = 15

BrowserEventWaitTime is the seconds we wait while attempting to fetch some events from DevTools, before failing.

View Source
const BrowserTimeout time.Duration = 1

BrowserTimeout is the minutes we will wait before declaring failed the connection to our debugged browser or to the URL failed.

View Source
const BrowserWaitTime time.Duration = 5

BrowserWaitTime is the seconds we will wait before fetching navigation results.

Variables

View Source
var SafeBrowsingKey string

SafeBrowsingKey contains the API key to use Google SafeBrowsing API.

View Source
var YaraRules *yara.Rules

YaraRules will contain compiled Yara rules provided by InitializeYara.

Functions

func GetSHA256Hash added in v1.8.0

func GetSHA256Hash(text string) string

GetSHA256Hash retrieves a SHA256 hash of a string.

func InitializeYara

func InitializeYara(yaraRulesPath string) error

InitializeYara will load any rule files found at the specified path and compile them into a Rules object.

func NormalizeURL

func NormalizeURL(url string) string

NormalizeURL fixes a URL that is e.g. missing a scheme, etc.

func SliceContains

func SliceContains(slice []string, item string) bool

SliceContains checks whether a string is contained in a slice of strings.

func TextContains

func TextContains(text, pattern string) bool

TextContains will determine if a substring is present in a string. It is case-insensitive.

Types

type Analysis

type Analysis struct {
	URL        string    `json:"url"`
	FinalURL   string    `json:"final_url"`
	HTML       string    `json:"html"`
	Warnings   []Warning `json:"warnings"`
	Score      int       `json:"score"`
	Safelisted bool      `json:"safelisted"`
	Dangerous  bool      `json:"dangerous"`
	Brands     *Brands   `json:"brands"`
}

Analysis contains information on the outcome of the URL and/or HTML analysis.

func NewAnalysis

func NewAnalysis(url, html string) *Analysis

NewAnalysis instantiates a new Analysis struct.

func (*Analysis) AnalyzeDomain

func (a *Analysis) AnalyzeDomain() error

AnalyzeDomain performs all the available checks to be run on a URL or domain.

func (*Analysis) AnalyzeHTML

func (a *Analysis) AnalyzeHTML() error

AnalyzeHTML performs all the available checks to be run on an HTML string.

func (*Analysis) AnalyzePage added in v1.11.0

func (a *Analysis) AnalyzePage(resources []Resource) error

AnalyzePage performs all the available checks to be run on an HTML string as well as the provided list of resources (e.g. downloaded scripts).

func (*Analysis) AnalyzeURL

func (a *Analysis) AnalyzeURL() error

AnalyzeURL performs all the available checks to be run on a URL or domain.

type Brands

type Brands struct {
	Top  *brand.Brand
	List []*brand.Brand
}

Brands defines the attribute of our list of supported brands.

func NewBrands

func NewBrands() *Brands

NewBrands instantiates a new Brands struct.

func (*Brands) AddBrand

func (b *Brands) AddBrand(brand *brand.Brand)

AddBrand adds a new brand to the list.

func (*Brands) GetBrand

func (b *Brands) GetBrand() string

GetBrand determines which among the marked brands is most likely the one impersonated by the page.

func (*Brands) IsDomainSafelisted

func (b *Brands) IsDomainSafelisted(domain, brandName string) bool

IsDomainSafelisted checks if the specified domain is in any of the safelists of the supported brands.

func (*Brands) IsLinkDangerous added in v1.9.1

func (b *Brands) IsLinkDangerous(link, brandName string) bool

IsLinkDangerous checks if the specified link matches a brand's dangerous regexp.

type Browser

type Browser struct {
	URL            string     `json:"url"`
	FinalURL       string     `json:"final_url"`
	Visits         []string   `json:"visits"`
	Resources      []Resource `json:"resources"`
	Downloads      []Download `json:"downloads"`
	Dialogs        []Dialog   `json:"dialogs"`
	HTML           string     `json:"html"`
	ScreenshotPath string     `json:"screenshot_path"`
	ScreenshotData string     `json:"screenshot_data"`
	UseTor         bool       `json:"use_tor"`
	DebugPort      int        `json:"debug_port"`
	DebugURL       string     `json:"debug_url"`
	LogEvents      bool       `json:"log_events"`
	UserAgent      string     `json:"user_agent"`
	ImageName      string     `json:"image_name"`
	ContainerID    string     `json:"container_id"`
}

Browser is a struct containing details over a browser navigation to a URL.

func NewBrowser

func NewBrowser(url string, screenshotPath string, useTor bool, logEvents bool, imageName string) *Browser

NewBrowser instantiates a new Browser struct.

func (*Browser) Run

func (b *Browser) Run() error

Run launches our browser and navigates to the specified URL.

type Check

type Check struct {
	Call        CheckFunction
	Score       int
	Name        string
	Description string
}

Check defines the general proprties of a CheckFunction.

func GetDomainChecks

func GetDomainChecks() []Check

GetDomainChecks returns a list of only the checks that work for domain names.

func GetHTMLChecks

func GetHTMLChecks() []Check

GetHTMLChecks returns a list of all the available HTML checks.

func GetURLChecks

func GetURLChecks() []Check

GetURLChecks returns a list of all the available URL checks.

type CheckFunction

type CheckFunction func(*Link, *Page, *Brands) bool

CheckFunction defines the functions used to implement URL or HTML checks.

type Dialog added in v1.13.0

type Dialog struct {
	URL     string `json:"url"`
	Type    string `json:"type"`
	Message string `json:"message"`
}

Dialog contains details of JavaScript dialogs opened.

type Download added in v1.13.0

type Download struct {
	URL      string `json:"url"`
	FileName string `json:"file_name"`
}

Download contains details of files which were offered for download at the link.

type Link struct {
	URL        string
	Scheme     string
	Domain     string
	Port       string
	TopDomain  string
	Path       string
	RawQuery   string
	Parameters map[string]string
}

Link defines details of a parsed URL.

func NewLink(urlString string) (*Link, error)

NewLink instantiates a Link struct.

type LogCodec added in v1.13.0

type LogCodec struct {
	// contains filtered or unexported fields
}

Adapted from: https://pkg.go.dev/github.com/mafredri/cdp#example-package-Logging LogCodec captures the output from writing RPC requests and reading responses on the connection. It implements rpcc.Codec via WriteRequest and ReadResponse.

func (*LogCodec) ReadResponse added in v1.13.0

func (c *LogCodec) ReadResponse(resp *rpcc.Response) error

ReadResponse unmarshals from the connection into v whilst echoing what is read into a buffer for logging.

func (*LogCodec) WriteRequest added in v1.13.0

func (c *LogCodec) WriteRequest(req *rpcc.Request) error

WriteRequest marshals v into a buffer, writes its contents onto the connection and logs it.

type Page

type Page struct {
	HTML      string
	Soup      soup.Root
	Text      string
	Resources []Resource
}

Page contains information on the HTML page.

func NewPage

func NewPage(html string, resources []Resource) (*Page, error)

NewPage instantiates a new Page struct.

func (*Page) GetEntities

func (p *Page) GetEntities(entityType string) []soup.Root

GetEntities returns any HTML entity of the specified type.

func (*Page) GetInputs

func (p *Page) GetInputs(inputType string) []soup.Root

GetInputs returns any form input.

func (*Page) GetTitle

func (p *Page) GetTitle() string

GetTitle returns the content of the <title> tag from the HTML page.

type Resource added in v1.8.0

type Resource struct {
	Status  int    `json:"status"`
	URL     string `json:"url"`
	Type    string `json:"type"`
	SHA256  string `json:"sha256"`
	Content string `json:"content"`
}

Resource contains details of a resource that was fetched.

type Warning added in v1.8.3

type Warning struct {
	Score       int    `json:"score"`
	Name        string `json:"name"`
	Description string `json:"description"`
}

Warning is a converstion of Check containing only results.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL