dor

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2018 License: MIT Imports: 14 Imported by: 4

README

Build Status

DOR - Domain Ranker

Fast HTTP service which shows a specified domain rank from following providers:

Can be used as a base for a domain categorization / network filters / suspicious domain detection.

Data is updated once a day automatically.

Right now only in-memory and MongoDB storages are supported, but Dor was built with flexibility in mind, so you can easily add the storage you like by implementing Storage interface.

Installation

Check out the releases page.

Manual build

dor supports Go 1.9 and later

go get -u github.com/ilyaglow/dor
go install ./...

Web service usage

Use MongoDB storage located at mongoserver and bind to port 8080

DOR_MONGO_URL=mongoserver DOR_PORT=8080 dor-web-mongodb

Fill database with the data

DOR_MONGO_URL=mongoserver go run cmd/dor-insert-mongo/dor-insert-mongo

Or if you want just in-memory database:

dor-web-inmemory -h

Usage of dor-web-inmemory:
  -listen string
    	Listen address to bind (default "127.0.0.1:8080")

Docker usage

Project has docker-compose that uses MongoDB as a storage. Make changes here accordingly if any (folder for data persistence, ports etc).

docker-compose up -d

Client usage

$: curl 127.0.0.1:8080/rank/github.com

{
  "data": "github.com",
  "ranks": [
    {
      "domain": "github.com",
      "rank": 33,
      "last_update": "2018-01-11T18:01:27.251103268Z",
      "source": "majestic"
    },
    {
      "domain": "github.com",
      "rank": 66,
      "last_update": "2018-01-11T18:01:27.97067767Z",
      "source": "statvoo"
    },
    {
      "domain": "github.com",
      "rank": 72,
      "last_update": "2018-01-11T18:04:26.267833256Z",
      "source": "alexa"
    },
    {
      "domain": "github.com",
      "rank": 2367,
      "last_update": "2018-01-11T18:06:50.866600102Z",
      "source": "umbrella"
    },
    {
      "domain": "github.com",
      "rank": 115,
      "last_update": "2018-03-27T17:01:13.535Z",
      "source": "pagerank"
    }
  ],
  "timestamp": "2018-01-11T18:07:09.186271429Z"
}

Documentation

Overview

Package dor is a domain rank data collection library and fast HTTP service (build on top of amazing iris framework) which shows a specified domain's rank from the following providers: Alexa, Majestic, Umbrella OpenDNS, Statvoo and Open PageRank.

Can be used as a base for a domain categorization / network filters / suspicious domain detection. Data is updated once a day automatically, but it is configurable.

Usage:

dor-web-inmemory -h
Usage of dor-web-inmemory:
-host string
	IP-address to bind (default "127.0.0.1")
-port string
	Port to bind (default "8080")

Client request example:

curl 127.0.0.1:8080/rank/github.com

Server response:

{
  "data": "github.com",
  "ranks": [
    {
      "domain": "github.com",
      "rank": 33,
      "last_update": "2018-01-11T18:01:27.251103268Z",
      "description": "majestic"
    },
    {
      "domain": "github.com",
      "rank": 66,
      "last_update": "2018-01-11T18:01:27.97067767Z",
      "description": "statvoo"
    },
    {
      "domain": "github.com",
      "rank": 72,
      "last_update": "2018-01-11T18:04:26.267833256Z",
      "description": "alexa"
    },
    {
      "domain": "github.com",
      "rank": 2367,
      "last_update": "2018-01-11T18:06:50.866600102Z",
      "description": "umbrella"
    },
    {
      "domain": "github.com",
      "rank": 115,
      "last_update": "2018-03-27T17:01:13.535Z",
      "source": "pagerank"
    }
  ],
  "timestamp": "2018-01-11T18:07:09.186271429Z"
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type AlexaIngester added in v1.0.0

type AlexaIngester struct {
	IngesterConf
}

AlexaIngester represents Ingester implementation for Alexa Top 1 Million websites

func NewAlexa added in v1.0.0

func NewAlexa() *AlexaIngester

NewAlexa bootstraps AlexaIngester

func (*AlexaIngester) Do added in v1.0.0

func (in *AlexaIngester) Do() (chan Rank, error)

Do implements Ingester Do func with the data from Alexa Top 1M CSV file

type App added in v1.0.0

type App struct {
	Ingesters []Ingester
	Storage   Storage
	Keep      bool
}

App represents Dor configuration options

func New added in v1.0.0

func New(stn string, stl string, keep bool) (*App, error)

New bootstraps App struct.

stn - storage name
stl - storage location string
keep - keep new data or overwrite old one (always false for MemoryStorage)

func (*App) Fill added in v1.0.0

func (d *App) Fill() error

Fill fills available Ingester interfaces.

func (*App) FillByTimer added in v1.0.0

func (d *App) FillByTimer(duration time.Duration) error

FillByTimer combines filling and updating on a specific duration

func (*App) Find added in v1.0.0

func (d *App) Find(domain string, sources ...string) (*FindResponse, error)

Find represents find operation on the storage available

type ExtendedRank added in v1.0.0

type ExtendedRank struct {
	Domain     string    `json:"domain" db:"domain" bson:"domain"`
	Rank       uint      `json:"rank" db:"rank" bson:"rank"`
	LastUpdate time.Time `json:"last_update" bson:"last_update"`
	Source     string    `json:"source" bson:"source"`
}

ExtendedRank is a SimpleRank with extended fields

func (ExtendedRank) GetDomain added in v1.0.0

func (s ExtendedRank) GetDomain() string

GetDomain is a simple getter for a Domain

func (ExtendedRank) GetRank added in v1.0.0

func (s ExtendedRank) GetRank() uint

GetRank is a simple getter for a Rank

type FindResponse

type FindResponse struct {
	RequestData string    `json:"data"`
	Hits        []Rank    `json:"ranks"`
	Timestamp   time.Time `json:"timestamp"`
}

FindResponse is a find request response.

type Ingester added in v1.0.0

type Ingester interface {
	Do() (chan Rank, error) // returns a channel for consumers
	GetDesc() string        // simple getter for the source
}

Ingester fetches data and uploads it to the Storage

type IngesterConf added in v1.0.0

type IngesterConf struct {
	sync.Mutex
	Description string
	Timestamp   time.Time
}

IngesterConf represents a top popular domains provider configuration.

Implemented ingesters by now are:

  • Alexa Top 1 Million
  • Majestic Top 1 Million
  • Umbrella Top 1 Million
  • Statvoo Top 1 Million

func (*IngesterConf) GetDesc added in v1.0.0

func (in *IngesterConf) GetDesc() string

GetDesc is a simple getter for a collection's description

type LookupMap

type LookupMap map[string]uint

LookupMap represents map with domain - rank pairs

type MajesticIngester added in v1.0.0

type MajesticIngester struct {
	IngesterConf
	// contains filtered or unexported fields
}

MajesticIngester is a List implementation which downloads data and translates it to LookupMap

More info: https://blog.majestic.com/development/alexa-top-1-million-sites-retired-heres-majestic-million/

func NewMajestic added in v1.0.0

func NewMajestic() *MajesticIngester

NewMajestic bootstraps MajesticIngester

func (*MajesticIngester) Do added in v1.0.0

func (in *MajesticIngester) Do() (chan Rank, error)

Do implements Ingester interface with the data from Majestic CSV file

type MajesticRank added in v1.0.0

type MajesticRank struct {
	GlobalRank     uint   `json:"rank" bson:"rank"`
	TLDRank        uint   `json:"tld_rank" bson:"tld_rank"`
	Domain         string `json:"domain" bson:"domain"`
	TLD            string `json:"tld" bson:"tld"`
	RefSubNets     uint   `json:"ref_sub_nets" bson:"ref_sub_nets"`
	RefIPs         uint   `json:"ref_ips" bson:"ref_ips"`
	IDNDomain      string `json:"idn_domain" bson:"idn_domain"`
	IDNTLD         string `json:"idn_tld" bson:"idn_tld"`
	PrevGlobalRank uint   `json:"prev_global_rank" bson:"prev_global_rank"`
	PrevTLDRank    uint   `json:"prev_tld_rank" bson:"prev_tld_rank"`
	PrevRefSubNets uint   `json:"prev_ref_sub_nets" bson:"prev_ref_sub_nets"`
	PrevRefIPs     uint   `json:"prev_ref_ips" bson:"prev_ref_ips"`
}

MajesticRank implements Rank interface

func (*MajesticRank) GetDomain added in v1.0.0

func (m *MajesticRank) GetDomain() string

GetDomain is a simple getter for the MajesticRank's domain

func (*MajesticRank) GetRank added in v1.0.0

func (m *MajesticRank) GetRank() uint

GetRank is a simple getter for the MajesticRank's rank

type MemoryStorage added in v1.0.0

type MemoryStorage struct {
	Maps map[string]*memoryCollection
}

MemoryStorage implements Storage interface as in-memory storage

func (*MemoryStorage) Get added in v1.0.0

func (ms *MemoryStorage) Get(d string, sources ...string) ([]Rank, error)

Get implements Get method of the Storage interface

func (*MemoryStorage) GetMore added in v1.0.0

func (ms *MemoryStorage) GetMore(d string, lps int, sources ...string) ([]Rank, error)

GetMore is not supported for the memory storage

func (*MemoryStorage) Put added in v1.0.0

func (ms *MemoryStorage) Put(c <-chan Rank, s string, t time.Time) error

Put implements Put method of the Storage interface

type MongoStorage added in v1.0.0

type MongoStorage struct {
	// contains filtered or unexported fields
}

MongoStorage implements the Storage interface for MongoDB

func NewMongoStorage added in v1.0.0

func NewMongoStorage(u string, db string, col string, size int, w int, ret bool) (*MongoStorage, error)

NewMongoStorage bootstraps MongoStorage, creates indexes

u is the Mongo URL
db is the database name
col is the collection name
size is the bulk message size
w is number of workers
ret is the data retention option

func (*MongoStorage) Get added in v1.0.0

func (m *MongoStorage) Get(d string, sources ...string) ([]Rank, error)

Get implements Storage interface method Get

func (*MongoStorage) GetMore added in v1.0.0

func (m *MongoStorage) GetMore(d string, lps int, sources ...string) ([]Rank, error)

GetMore implements Storage GetMore function

func (*MongoStorage) Put added in v1.0.0

func (m *MongoStorage) Put(c <-chan Rank, s string, t time.Time) error

Put implements Storage interface method Put

s - is the data source
t - is the data datetime

type PageRankIngester added in v1.0.0

type PageRankIngester struct {
	IngesterConf
}

PageRankIngester represents Ingester implementation for Domcop PageRank top 10M domains

func NewPageRank added in v1.0.0

func NewPageRank() *PageRankIngester

NewPageRank bootstraps PageRankIngester

func (*PageRankIngester) Do added in v1.0.0

func (in *PageRankIngester) Do() (chan Rank, error)

Do implements Ingester Do func with the data from DomCop

type Rank added in v1.0.0

type Rank interface {
	GetDomain() string
	GetRank() uint
}

Rank is an interface for different ranking systems

type SimpleRank added in v1.0.0

type SimpleRank struct {
	Domain string `json:"domain" db:"domain" bson:"domain"`
	Rank   uint   `json:"rank" db:"rank" bson:"rank"`
}

SimpleRank is a simple domain rank structure.

func (SimpleRank) GetDomain added in v1.0.0

func (s SimpleRank) GetDomain() string

GetDomain is a simple getter for a Domain

func (SimpleRank) GetRank added in v1.0.0

func (s SimpleRank) GetRank() uint

GetRank is a simple getter for a Rank

type StatvooIngester added in v1.0.0

type StatvooIngester struct {
	IngesterConf
}

StatvooIngester represents top 1 million websites by statvoo

More info: https://statvoo.com/top/sites

func NewStatvoo added in v1.0.0

func NewStatvoo() *StatvooIngester

NewStatvoo boostraps StatvooIngester

func (*StatvooIngester) Do added in v1.0.0

func (in *StatvooIngester) Do() (chan Rank, error)

Do implements Ingester Do func with the data from Statvoo Top 1M

type Storage added in v1.0.0

type Storage interface {
	Put(<-chan Rank, string, time.Time) error                          // Put is usually a bulk inserter from the channel that works in a goroutine, second argument is a Source of the data and third is the last update time
	Get(domain string, sources ...string) ([]Rank, error)              // Get is a simple getter for the latest rank of the domain in a particular domain rank provider
	GetMore(domain string, lps int, sources ...string) ([]Rank, error) // GetAll is a getter that retreives historical data on the domain limited by lps (limit per source)
}

Storage represents an interface to store and query ranks.

type UmbrellaIngester added in v1.0.0

type UmbrellaIngester struct {
	IngesterConf
}

UmbrellaIngester represents Ingester implementation for OpenDNS Umbrella Top 1M domains

More info: https://umbrella.cisco.com/blog/2016/12/14/cisco-umbrella-1-million/

func NewUmbrella added in v1.0.0

func NewUmbrella() *UmbrellaIngester

NewUmbrella bootstraps UmbrellaIngester

func (*UmbrellaIngester) Do added in v1.0.0

func (in *UmbrellaIngester) Do() (chan Rank, error)

Do implements Ingester Do func with the data from OpenDNS

Directories

Path Synopsis
cmd
service

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL