readability

package module
v0.0.0-...-61a0ddd Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 30, 2018 License: MIT Imports: 18 Imported by: 0

README

Go-Readability

GoDoc Travis CI Go Report Card

Go-Readability is a Go package that cleans a HTML page from clutter like buttons, ads and background images, and changes the page's text size, contrast and layout for better readability.

This package is fork from readability by ying32, which inspired by readability for node.js and readability for python. I also add some function from the readibility by Mozilla.

Why fork ?

There are severals reasons as to why I create a new fork instead sending a PR to original repository :

  • It seems GitHub is hard to access from China, that's why ying32 is not really active on his repository.
  • Most of comment and documentation in original repository is in Chinese language, which unfortunately I still not able to understand.

Example

package main

import (
	"fmt"
	nurl "net/url"
	"time"

	"github.com/RadhiFadlillah/go-readability"
)

func main() {
	// Create URL
	url := "https://www.nytimes.com/2018/01/21/technology/inside-amazon-go-a-store-of-the-future.html"
	parsedURL, _ := nurl.Parse(url)

	// Fetch readable content
	article, err := readability.FromURL(parsedURL, 5*time.Second)
	if err != nil {
		panic(err)
	}

	// Show results
	fmt.Println(article.Meta.Title)
	fmt.Println(article.Meta.Excerpt)
	fmt.Println(article.Meta.Author)
	fmt.Println(article.Content)
}

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Article

type Article struct {
	URL        string
	Meta       Metadata
	Content    string
	RawContent string
}

Article is the content of an URL

func FromReader

func FromReader(reader io.Reader, url *nurl.URL) (Article, error)

FromReader get readable content from the specified io.Reader

func FromURL

func FromURL(url *nurl.URL, timeout time.Duration) (Article, error)

FromURL get readable content from the specified URL

type Metadata

type Metadata struct {
	Title       string
	Image       string
	Excerpt     string
	Author      string
	MinReadTime int
	MaxReadTime int
}

Metadata is metadata of an article

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL