whoisparser

package module
v0.0.0-...-6ee28f3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 28, 2019 License: Apache-2.0 Imports: 2 Imported by: 1

README

Whois parser

License GoDoc Build Status Go Report Card codecov

Description

Extendable whois parser written in Go.

This project is in development stage and is not ready for production systems usage. Any support will be appreciated.

Installation
go get -u github.com/icamys/whois-parser
Usage

To try just copy and paste the following example to golang playground (don't forget to check the "imports" flag):

package main

import (
    "encoding/json"
    "fmt"
    whoisparser "github.com/icamys/whois-parser"

)

func main() {
    domain := "google.com"
    whoisRaw := "Domain Name: GOOGLE.COM"
    
    // whoisRecord is of Record type, see ./record.go
    whoisRecord := whoisparser.Parse(domain, whoisRaw)
    whois2b, _ := json.Marshal(whoisRecord)
    fmt.Println(string(whois2b))
}

Supported zones

Contributing

Self-check

Before contributing any code please check that following commands have no warnings nor errors.

  1. Check cyclomatic complexity (15 is max acceptable value):

    $ gocyclo -over 15 ./
    
  2. Run tests:

    # Use -count=1 to disable cache usage
    $ go test -count=1 ./...
    
  3. Lint code:

    $ golint ./...
    
Adding new parser for a particular TLD

Let's create new parser for TLDs .jp and .co.jp

  1. Create file named parser_jp.go in the root directory

  2. Define parser and register it:

    package whoisparser
    
    import (
        "github.com/icamys/whois-parser/internal/constants"
        "regexp"
    )
    
    // Defining new parser with regular expressions for each parsed section
    var jpParser = &Parser{
    
        errorRegex: &ParseErrorRegex{
            NoSuchDomain:     regexp.MustCompile(`No match!`),
            RateLimit:        nil,
            MalformedRequest: regexp.MustCompile(`<JPRS WHOIS HELP>`),
        },
    
        registrarRegex: &RegistrarRegex{
            CreatedDate:    regexp.MustCompile(`(?i)\[Created on] *(.+)`),
            DomainName:     regexp.MustCompile(`(?i)\[Domain Name] *(.+)`),
            DomainStatus:   regexp.MustCompile(`(?i)\[Status] *(.+)`),
            Emails:         regexp.MustCompile(`(?i)` + EmailRegex),
            ExpirationDate: regexp.MustCompile(`(?i)\[Expires on] *(.+)`),
            NameServers:    regexp.MustCompile(`(?i)\[Name Server] *(.+)`),
            UpdatedDate:    regexp.MustCompile(`(?i)\[Last Updated] *(.+)`),
        },
    
        registrantRegex: &RegistrantRegex{
            Name:         regexp.MustCompile(`(?i)\[Registrant] *(.+)`),
            Organization: regexp.MustCompile(`(?i)\[Organization] *(.+)`),
        },
    
        adminRegex: &RegistrantRegex{
            ID: regexp.MustCompile(`(?i)\[Administrative Contact] *(.+)`),
        },
    
        techRegex: &RegistrantRegex{
            ID: regexp.MustCompile(`(?i)\[Technical Contact] *(.+)`),
        },
    }
    
    // Register newly created parser for the particular TLD
    func init() {
        RegisterParser(".jp", jpParser)
    }
    
  3. Create file named parser_co_jp.go in the root directory.

  4. The whois for .co.jp extends whois for .jp. So we copy the .jp parser and extend in init() function:

    package whoisparser
    
    import "regexp"
    
    // copy jpParser
    var coJpParser = jpParser
    
    func init() {
        // extend coJpParser with additional regexes
        coJpParser.registrarRegex.CreatedDate = regexp.MustCompile(`\[Registered Date\] *(.+)`)
        coJpParser.registrarRegex.ExpirationDate = regexp.MustCompile(`\[State\] *(.+)`)
        coJpParser.registrarRegex.UpdatedDate = regexp.MustCompile(`\[Last Update\] *(.+)`)
    
        RegisterParser(".co.jp", coJpParser)
    }
    
  5. Write tests.

    1. Creating whois fixture test/whois_co_jp.txt with valid whois
    2. Write your parser tests in parser_co_jp_test.go
Parsing address with single regex

In some cases the whole address is provided in a way that it would be more convenient and performant to parse the address using only one regular expression. For this purpose we use regex named groups.

Use regex group name for particular fields:

Field Regex group name
Street street
StreetExt streetExt
City city
PostalCode postalCode
Province province
Country country
Example

Lets take a look at an example.

  1. Suppose we have an address:

    Address:          Viale Del Policlinico 123/B
                      Roma
                      00263
                      RM
                      IT
    
  2. We can craft a regular expression as follows:

    (?ms)Registrant(?:.*?Address: *(?P<street>.*?)$.*?)\n *(?P<city>.*?)\n *(?P<postalCode>.*?)\n *(?P<province>.*?)\n *(?P<country>.*?)\n.*?Creat
    

    Here all address regex groups are optional. If any group name is missing, an empty string will be assigned as value.

  3. Now we assign our crafted regex to some parser structure and the address will be successfully parsed:

    var itParser = &Parser{
        registrantRegex: &RegistrantRegex{
            Address:    regexp.MustCompile(`(?ms)Registrant(?:.*?Address: *(?P<street>.*?)$.*?)\n *(?P<city>.*?)\n *(?P<postalCode>.*?)\n *(?P<province>.*?)\n *(?P<country>.*?)\n.*?Creat`),
        },
        // ...
    }
    

    Parsing result:

    {
        "registrant": {
            "street" : "Viale Del Policlinico 123/B",
            "city": "Roma",
            "province": "RM",
            "postal_code": "00263",
            "country": "IT"
        }
    }
    
  4. Note that if the Address field is set, than any other address regex fields will be ignored:

    registrantRegex: &RegistrantRegex{
        Address:    regexp.MustCompile(`(?ms)Registrant(?:.*?Address: *(?P<street>.*?)$.*?)\n *(?P<city>.*?)\n *(?P<postalCode>.*?)\n *(?P<province>.*?)\n *(?P<country>.*?)\n.*?Creat`),
        City:       regexp.MustCompile(`City (.*)`), // This regex will be ignored as Address is set
    },
    

Documentation

Index

Constants

View Source
const (
	// EmailRegex regular expression for email parsing, a bit hacky
	EmailRegex = `(?:[a-z0-9!#$%&'*+/=?^_` + "`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`" + `{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9]))\.){3}(?:(2(5[0-5]|[0-4][0-9])|1[0-9][0-9]|[1-9]?[0-9])|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])`
)

Variables

View Source
var DefaultParser = Parser{
	// contains filtered or unexported fields
}

DefaultParser is used in case if no parser for TLD is found

Functions

func GetErrCodeDescription

func GetErrCodeDescription(code ErrCode) string

GetErrCodeDescription returns error code description

func RegisterParser

func RegisterParser(zone string, parser *Parser)

RegisterParser is used to register parsers in catalog which is used to select parser for specific domain

Types

type ErrCode

type ErrCode int

ErrCode contains the numeric error code

const (
	// ErrCodeNoError is returned when no request errors encountered
	ErrCodeNoError ErrCode = 0

	// ErrCodeNoSuchDomain is returned when we've got "no such domain" error
	ErrCodeNoSuchDomain ErrCode = 1

	// ErrCodeRequestRateLimit is returned when the request rate limit reached
	ErrCodeRequestRateLimit ErrCode = 2

	// ErrCodeMalformedRequest is returned when a malformed request sent
	ErrCodeMalformedRequest ErrCode = 3

	// ErrCodeTldHasNoServer is returned when the requested TLD has no whois server
	ErrCodeTldHasNoServer ErrCode = 4

	// ErrCodeEmptyWhois is returned when the whois text is empty
	ErrCodeEmptyWhois ErrCode = 5

	// ErrCodeNoErrorRegex is returned when the error checking regular expressions
	// are not set for current parser
	ErrCodeNoErrorRegex ErrCode = 6
)

type IParser

type IParser interface {
	Parse(string) *Record
}

IParser is the parser interface

type ParseErrorRegex

type ParseErrorRegex struct {
	NoSuchDomain     *regexp.Regexp
	RateLimit        *regexp.Regexp
	MalformedRequest *regexp.Regexp
}

ParseErrorRegex contains regular expressions for different kinds of errors

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser represents a structure with regular expressions for specific whois sections

func (*Parser) Parse

func (p *Parser) Parse(text string) *Record

Parse parses whois text

type Record

type Record struct {
	ErrCode    ErrCode     `json:"error_code,omitempty"`
	Registrar  *Registrar  `json:"registrar,omitempty"`
	Registrant *Registrant `json:"registrant,omitempty"`
	Admin      *Registrant `json:"admin,omitempty"`
	Tech       *Registrant `json:"tech,omitempty"`
	Bill       *Registrant `json:"bill,omitempty"`
}

Record is a structure that contains parsed info for particular whois sections

func Parse

func Parse(domain string, text string) *Record

Parse parses whois text for specified domain. Domain is used to identify the domain zone and to choose the parser should be used for this zone

type Registrant

type Registrant struct {
	ID           string `json:"id,omitempty"`
	Name         string `json:"name,omitempty"`
	Organization string `json:"organization,omitempty"`
	Street       string `json:"street,omitempty"`
	StreetExt    string `json:"street_ext,omitempty"`
	City         string `json:"city,omitempty"`
	Province     string `json:"province,omitempty"`
	PostalCode   string `json:"postal_code,omitempty"`
	Country      string `json:"country,omitempty"`
	Phone        string `json:"phone,omitempty"`
	PhoneExt     string `json:"phone_ext,omitempty"`
	Fax          string `json:"fax,omitempty"`
	FaxExt       string `json:"fax_ext,omitempty"`
	Email        string `json:"email,omitempty"`
}

Registrant is a structure that stores parsed registrant info. Registrant is registered by the registrar.

type RegistrantRegex

type RegistrantRegex struct {
	Address      *regexp.Regexp
	ID           *regexp.Regexp
	Name         *regexp.Regexp
	Organization *regexp.Regexp
	Street       *regexp.Regexp
	StreetExt    *regexp.Regexp
	City         *regexp.Regexp
	Province     *regexp.Regexp
	PostalCode   *regexp.Regexp
	Country      *regexp.Regexp
	Phone        *regexp.Regexp
	PhoneExt     *regexp.Regexp
	Fax          *regexp.Regexp
	FaxExt       *regexp.Regexp
	Email        *regexp.Regexp
}

RegistrantRegex struct with regular expressions used to parse Registrant

type Registrar

type Registrar struct {
	CreatedDate    string `json:"created_date,omitempty"`
	DomainDNSSEC   string `json:"domain_dnssec,omitempty"`
	DomainID       string `json:"domain_id,omitempty"`
	DomainName     string `json:"domain_name,omitempty"`
	DomainStatus   string `json:"domain_status,omitempty"`
	ExpirationDate string `json:"expiration_date,omitempty"`
	NameServers    string `json:"name_servers,omitempty"`
	ReferralURL    string `json:"referral_url,omitempty"`
	RegistrarID    string `json:"registrar_id,omitempty"`
	RegistrarName  string `json:"registrar_name,omitempty"`
	UpdatedDate    string `json:"updated_date,omitempty"`
	WhoisServer    string `json:"whois_server,omitempty"`
	Emails         string `json:"emails,omitempty"`
}

Registrar is a structure that stores parsed registrar info. Registrar registers the registrant.

type RegistrarRegex

type RegistrarRegex struct {
	CreatedDate    *regexp.Regexp
	DomainDNSSEC   *regexp.Regexp
	DomainID       *regexp.Regexp
	DomainName     *regexp.Regexp
	DomainStatus   *regexp.Regexp
	Emails         *regexp.Regexp
	ExpirationDate *regexp.Regexp
	NameServers    *regexp.Regexp
	ReferralURL    *regexp.Regexp
	RegistrarID    *regexp.Regexp
	RegistrarName  *regexp.Regexp
	UpdatedDate    *regexp.Regexp
	WhoisServer    *regexp.Regexp
}

RegistrarRegex struct with regular expressions used to parse Registrar

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL