readability

package module
v0.0.0-...-2be133e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 7, 2018 License: MIT Imports: 17 Imported by: 0

README

Go-Readability

GoDoc

Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.

This package is fork from readability by ying32, which inspired by readability for node.js and readability for python. I also add some function from the readibility by Mozilla.

Why fork ?

There are severals reasons as to why I create a new fork instead sending a PR to original repository :

  • It seems GitHub is hard to access from China, that's why ying32 is not really active on his repository.
  • Most of comment and documentation in original repository is in Chinese language, which unfortunately I still not able to understand.

Example

package main

import (
	"fmt"
	"github.com/RadhiFadlillah/go-readability"
	"time"
)

func main() {
	url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"

	article, err := readability.Parse(url, 5*time.Second)
	if err != nil {
		panic(err)
	}

	fmt.Println(article.Meta.Title)
	fmt.Println(article.Meta.Excerpt)
	fmt.Println(article.Meta.Author)
	fmt.Println(article.Content)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Article

type Article struct {
	URL        string
	Meta       Metadata
	Content    string
	RawContent string
}

Article is the content of an URL

func ParseFromPageSource

func ParseFromPageSource(url string, pageSource []byte, timeout time.Duration) (Article, error)

ParseFromPageSource parses a page source with an URL to readability format

func ParseFromURL

func ParseFromURL(url string, timeout time.Duration) (Article, error)

ParseFromURL parses an URL to readability format

type Metadata

type Metadata struct {
	Title       string
	Image       string
	Excerpt     string
	Author      string
	MinReadTime int
	MaxReadTime int
}

Metadata is metadata of an article

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL