nzcovid19cases

package module
v0.0.0-...-64e42a0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 5, 2020 License: MIT Imports: 7 Imported by: 0

README

NZ COVID-19 cases scraper

UPDATE: online scraper API retried

After one too many arbitrary format changes on the MOH web site I've decided to stop updating the scraper and shut down the online API. There are alternative sources of both live statistics and case data (see section below).

For me, this project was an object lesson in the futility of scraping hand-edited information. Open Data is necessary for the public to (feasibly) automatically process government-owned data. It turns out, in a crisis, Open Data is not a priority (indeed, as of 5 July 2020, in NZ the official government portal has scant COVID-19 datasets).

Sorry for the inconvenience, and thank you for your interest.

UPDATE: discontinuation of case detail scraping

On April 12, the MOH stopped publishing all COVID-19 case details in a single table, and began reporting monthly cases. At this point I don't think it makes sense for this API to offer detailed case information. The last successfully scraped case data is now archived. I will leave the scraping code as is for those who want to use the CLI tool to download the current month's case data.

Similarly, the location (per-DHB statistics) which were derived from scraped cases will now be incorrect, and MOH's own per-DHB case summary table is also only for the current month. Again, I will remove the API for /location/* and leave the CLI function in place, in case it is useful to anyone (unlikely, but who knows).

For those who are interested in obtaining a full snapshot of case information, the best source I know of is the via the arcgis.com dashboard linked from the MOH webste.

Specifically tables that appear to be obtained from, or maintained by ESR in the backend web service, can be dumped in JSON format with the right query strings:

UPDATE: real-time NZ COVID-19 statistics

ESR now provides a dashboard that (presumably) renders statistics directly from the authoritative database that all the NZ COVID-19 comes from (EpiSurv): https://nzcoviddashboard.esr.cri.nz/

Unfortunately there is no usable API. As far as I can tell, R Shiny-server uses a baroque home-grown protocol. It exchanges strangely encoded messages (mixed with JSON) over streaming XHR connections:

Client: ["0#0|o|"]
Server: a["1#0|m|{\"busy\":\"busy\"}"]

If anyone feels there is significant value in reverse-engineering this, feel free to open an issue.

Overview

This code is intended to scrape the following sources of COVID-19 data in New Zealand, and render the data in various formats suitable for mapping, visualisation and analysis:

Use this with caution - the NZ government may change their pages and break the scraper at any time.

This code is used as the core of an API service I'm running: https://nzcovid19api.xerra.nz/

Courtesy of @gizmoguy, the metrics exported are scraped by a Prometheus server, and visualised on a Grafana dashboard:

Building

Building directly

To build the utilities, you'll need a go 1.13+ toolchain installed (check out https://golang.org/dl/ for details).

Running ./build.sh will build each tool in the cmd/ subdirectories.

Building with Docker

If you don't want to futz with Go, a Dockerfile is provided. Use docker to build a container:

$ docker build -t nzcovid19cases .
<snip>
Successfully tagged nzcovid19cases:latest

Usage

For now there is a CLI tool.

Running the directly built binaries
cmd/nzcovid19-cli$ ./nzcovid19-cli 

Usage: ./cmd/nzcovid19-cli/nzcovid19-cli <action>
	Where <action> is one of:
		- cases/json
		- cases/csv
		- locations/json
		- locations/csv
		- alertlevel/json
		- casestats/json
		- clusters/json
		- clusters/csv

Running the docker container

$ docker run -ti --rm nzcovid19cases alertlevel/json
{
  "Level": 4,
  "LevelName": "Eliminate"
}

Code license

This code is published under the MIT license.

The data processed by this tool is published under:

Documentation

Index

Constants

View Source
const (
	CSVRenderType  = "csv"
	JSONRenderType = "json"
)
View Source
const NumLevelREMatches = 2
View Source
const TimeFormat = "2/01/2006"

Variables

View Source
var ValidDHBsList = []string{
	"Auckland",
	"Bay of Plenty",
	"Canterbury",
	"Capital and Coast",
	"Counties Manukau",
	"Hawke's Bay",
	"Hutt Valley",
	"Lakes",
	"MidCentral",
	"Nelson Marlborough",
	"Northland",
	"South Canterbury",
	"Southern",
	"Tairawhiti",
	"Taranaki",
	"Waikato",
	"Wairarapa",
	"Waitemata",
	"West Coast",
	"Whanganui",
}

Functions

func BuildLocations

func BuildLocations(normCases []*NormalisedCase) map[string]*Location

func RenderCaseStats

func RenderCaseStats(cS CaseStatsResponse, viewType string) (string, error)

func RenderCases

func RenderCases(normCases []*NormalisedCase, viewType string) (string, error)

func RenderClusters

func RenderClusters(clusters []*Cluster, viewType string) (string, error)

func RenderGrants

func RenderGrants(gS GrantsSummary, gR GrantsRegions, viewType string) (string, error)

func RenderLevels

func RenderLevels(levelInt int, levelString, viewType string) (string, error)

func RenderLocations

func RenderLocations(locations map[string]*Location, viewType string) (string, error)

func ScrapeGrants

func ScrapeGrants() (GrantsSummary, GrantsRegions, error)

func ScrapeLevel

func ScrapeLevel() (int, string, error)

Types

type AgeRange

type AgeRange struct {
	Valid             bool
	OlderOrEqualToAge int
	YoungerThanAge    int
}

type AlertLevel

type AlertLevel struct {
	Level     int
	LevelName string
}

type CaseStatsResponse

type CaseStatsResponse struct {
	ConfirmedCasesTotal  int
	ConfirmedCasesNew24h int
	ProbableCasesTotal   int
	ProbableCasesNew24h  int
	RecoveredCasesTotal  int
	RecoveredCasesNew24h int
	//HospitalisedCasesTotal  int
	//HospitalisedCasesNew24h int
	DeathCasesTotal  int
	DeathCasesNew24h int
}

func ScrapeCaseStats

func ScrapeCaseStats() (CaseStatsResponse, error)

type Cluster

type Cluster struct {
	Name        string
	Location    string
	Cases       int
	CasesNew24h int
}

func ScrapeClusters

func ScrapeClusters() ([]*Cluster, error)

type Grants

type Grants struct {
	Summary GrantsSummary
	Regions GrantsRegions
}

type GrantsRegions

type GrantsRegions struct {
	Auckland    int
	EastCoast   int
	BayOfPlenty int
	Northland   int
	Wellington  int
	Nelson      int
	Canterbury  int
	Southern    int
	Other       int
	Total       int
}

type GrantsSummary

type GrantsSummary struct {
	Clients        int
	Grants         int
	SumGrantAmount int
}

type InvalidUsageError

type InvalidUsageError struct {
	Problem string
}

func (InvalidUsageError) Error

func (e InvalidUsageError) Error() string

type Location

type Location struct {
	LocationName string
	CaseCount    int
}

type NormalisedCase

type NormalisedCase struct {
	CaseNumber       int
	ReportedDate     time.Time
	LocationName     string
	Age              AgeRange
	Gender           string
	IsTravelRelated  TravelRelated
	DepartureDate    TravelDate
	ArrivalDate      TravelDate
	LastCityBeforeNZ string
	FlightNumber     string
	CaseType         string
}

func NormaliseCases

func NormaliseCases(rawCases []*RawCase) ([]*NormalisedCase, error)

func (*NormalisedCase) FromRaw

func (n *NormalisedCase) FromRaw(r *RawCase) error

type RawCase

type RawCase struct {
	ReportedDate     string
	Case             int
	Location         string
	Age              string
	Gender           string
	TravelRelated    string
	LastCityBeforeNZ string
	FlightNumber     string
	DepartureDate    string
	ArrivalDate      string
	CaseType         string
}

func ScrapeCases

func ScrapeCases() ([]*RawCase, error)

type TravelDate

type TravelDate struct {
	Valid bool
	Value time.Time
}

type TravelRelated

type TravelRelated struct {
	Valid bool
	Value bool
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL