model

package
v0.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 21, 2019 License: Apache-2.0 Imports: 8 Imported by: 0

Documentation

Overview

Package model provides models for item rating and ranking.

There are two kinds of models: rating model and ranking model. Although rating models could be used for ranking, performance won't be guaranteed and even won't make sense, vice versa.

  • Item rating models include: Random, Baseline, SVD(optimizer=Regression), SVD++, NMF, KNN, SlopeOne, CoClustering
  • Item ranking models includes: ItemPop, WRMF, SVD(optimizer=BPR)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type BPR

type BPR struct {
	Base
	// Model parameters
	UserFactor [][]float64 // p_u
	ItemFactor [][]float64 // q_i

	// Fallback model
	UserRatings []*base.MarginalSubSet
	ItemPop     *ItemPop
	// contains filtered or unexported fields
}

BPR means Bayesian Personal Ranking, is a pairwise learning algorithm for matrix factorization model with implicit feedback. The pairwise ranking between item i and j for user u is estimated by:

p(i >_u j) = \sigma( p_u^T (q_i - q_j) )

Hyper-parameters:

 Reg 		- The regularization parameter of the cost function that is
			  optimized. Default is 0.01.
 Lr 		- The learning rate of SGD. Default is 0.05.
 nFactors	- The number of latent factors. Default is 10.
 NEpochs	- The number of iteration of the SGD procedure. Default is 100.
 InitMean	- The mean of initial random latent factors. Default is 0.
 InitStdDev	- The standard deviation of initial random latent factors. Default is 0.001.

func NewBPR

func NewBPR(params base.Params) *BPR

NewBPR creates a BPR model.

func (*BPR) Fit

func (bpr *BPR) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the BPR model.

func (*BPR) Predict

func (bpr *BPR) Predict(userId, itemId string) float64

Predict by the BPR model.

func (*BPR) SetParams

func (bpr *BPR) SetParams(params base.Params)

SetParams sets hyper-parameters of the BPR model.

type Base

type Base struct {
	Params      base.Params   // Hyper-parameters
	UserIndexer *base.Indexer // Users' ID set
	ItemIndexer *base.Indexer // Items' ID set
	// contains filtered or unexported fields
}

Base model must be included by every recommendation model. Hyper-parameters, ID sets, random generator and fitting options are managed the Base model.

func (*Base) Fit

func (model *Base) Fit(trainSet core.DataSet, options *base.RuntimeOptions)

Fit has not been implemented,

func (*Base) GetParams

func (model *Base) GetParams() base.Params

GetParams returns all hyper-parameters.

func (*Base) Init

func (model *Base) Init(trainSet core.DataSetInterface)

Init the Base model. The method must be called at the beginning of Fit.

func (*Base) Predict

func (model *Base) Predict(userId, itemId int) float64

Predict has not been implemented.

func (*Base) SetParams

func (model *Base) SetParams(params base.Params)

SetParams sets hyper-parameters for the Base model.

type BaseLine

type BaseLine struct {
	Base
	UserBias   []float64 // b_u
	ItemBias   []float64 // b_i
	GlobalBias float64   // mu
	// contains filtered or unexported fields
}

BaseLine predicts the rating for given user and item by

\hat{r}_{ui} = b_{ui} = μ + b_u + b_i

If user u is unknown, then the Bias b_u is assumed to be zero. The same applies for item i with b_i. Hyper-parameters:

Reg         - The regularization parameter of the cost function that is
            optimized. Default is 0.02.
Lr          - The learning rate of SGD. Default is 0.005.
NEpochs     - The number of iteration of the SGD procedure. Default is 20.
RandomState - The random seed. Default is 0.

func NewBaseLine

func NewBaseLine(params base.Params) *BaseLine

NewBaseLine creates a baseline model.

func (*BaseLine) Fit

func (baseLine *BaseLine) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the BaseLine model.

func (*BaseLine) Predict

func (baseLine *BaseLine) Predict(userId, itemId string) float64

Predict by the BaseLine model.

func (*BaseLine) SetParams

func (baseLine *BaseLine) SetParams(params base.Params)

SetParams sets hyper-parameters for the BaseLine model.

type CoClustering

type CoClustering struct {
	Base
	GlobalMean       float64     // A^{global}
	UserMeans        []float64   // A^{R}
	ItemMeans        []float64   // A^{R}
	UserClusters     []int       // p(i)
	ItemClusters     []int       // y(i)
	UserClusterMeans []float64   // A^{RC}
	ItemClusterMeans []float64   // A^{CC}
	CoClusterMeans   [][]float64 // A^{COC}
	// contains filtered or unexported fields
}

CoClustering [5] is a novel collaborative filtering approach based on weighted co-clustering algorithm that involves simultaneous clustering of users and items.

Let U={u_i}^m_{i=1} be the set of users such that |U|=m and P={p_j}^n_{j=1} be the set of items such that |P|=n. Let A be the m x n ratings matrix such that A_{ij} is the rating of the user u_i for the item p_j. The approximate matrix \hat{A}_{ij} is given by

\hat{A}_{ij} = A^{COC}_{gh} + (A^R_i - A^{RC}_g) + (A^C_j - A^{CC}_h)

where g=ρ(i), h=γ(j) and A^R_i, A^C_j are the average ratings of user u_i and item p_j, and A^{COC}_{gh}, A^{RC}_g and A^{CC}_h are the average ratings of the corresponding co-cluster, user-cluster and item-cluster respectively.

Hyper-parameters:

NEpochs       - The number of iterations of the optimization procedure. Default is 20.
NUserClusters - The number of user clusters. Default is 3.
NItemClusters - The number of item clusters. Default is 3.
RandomState   - The random seed. Default is 0.

func NewCoClustering

func NewCoClustering(params base.Params) *CoClustering

NewCoClustering creates a CoClustering model.

func (*CoClustering) Fit

func (coc *CoClustering) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the CoClustering model.

func (*CoClustering) Predict

func (coc *CoClustering) Predict(userId, itemId string) float64

Predict by the CoClustering model.

func (*CoClustering) SetParams

func (coc *CoClustering) SetParams(params base.Params)

SetParams sets hyper-parameters for the CoClustering model.

type FM

type FM struct {
	Base
	UserFeatures []*base.SparseVector
	ItemFeatures []*base.SparseVector
	// Model parameters
	GlobalBias float64     // w_0
	Bias       []float64   // w_i
	Factors    [][]float64 // v_i

	// Fallback model
	UserRatings []*base.MarginalSubSet
	ItemPop     *ItemPop
	// contains filtered or unexported fields
}

FM is the implementation of factorization machine [12]. The prediction is given by

\hat y(x) = w_0 + \sum^n_{i=1} w_i x_i + \sum^n_{i=1} \sum^n_{j=i+1} <v_i, v_j>x_i x_j

Hyper-parameters:

 Reg 		- The regularization parameter of the cost function that is
			  optimized. Default is 0.02.
 Lr 		- The learning rate of SGD. Default is 0.005.
 nFactors	- The number of latent factors. Default is 100.
 NEpochs	- The number of iteration of the SGD procedure. Default is 20.
 InitMean	- The mean of initial random latent factors. Default is 0.
 InitStdDev	- The standard deviation of initial random latent factors. Default is 0.1.

func NewFM

func NewFM(params base.Params) *FM

NewFM creates a factorization machine.

func (*FM) Fit

func (fm *FM) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the factorization machine.

func (*FM) Predict

func (fm *FM) Predict(userId string, itemId string) float64

Predict by the factorization machine.

func (*FM) SetParams

func (fm *FM) SetParams(params base.Params)

SetParams sets hyper-parameters of the factorization machine.

type ItemPop

type ItemPop struct {
	Base
	Pop []float64
}

ItemPop recommends items by their popularity. The popularity of a item is defined as the occurrence frequency of the item in the training data set.

func NewItemPop

func NewItemPop(params base.Params) *ItemPop

NewItemPop creates an ItemPop model.

func (*ItemPop) Fit

func (pop *ItemPop) Fit(set core.DataSetInterface, options *base.RuntimeOptions)

Fit the ItemPop model.

func (*ItemPop) Predict

func (pop *ItemPop) Predict(userId, itemId string) float64

Predict by the ItemPop model.

type KNN

type KNN struct {
	Base
	GlobalMean   float64
	SimMatrix    [][]float64
	LeftRatings  []*base.MarginalSubSet
	RightRatings []*base.MarginalSubSet
	UserRatings  []*base.MarginalSubSet
	LeftMean     []float64 // Centered KNN: user (item) Mean
	StdDev       []float64 // KNN with Z Score: user (item) standard deviation
	Bias         []float64 // KNN Baseline: Bias
	// contains filtered or unexported fields
}

KNN for collaborate filtering.

Type        - The type of KNN ('Basic', 'Centered', 'ZScore', 'Baseline').
                 Default is 'basic'.
Similarity  - The similarity function. Default is MSD.
UserBased      - User based or item based? Default is true.
K              - The maximum k neighborhoods to predict the rating. Default is 40.
MinK           - The minimum k neighborhoods to predict the rating. Default is 1.

func NewKNN

func NewKNN(params base.Params) *KNN

NewKNN creates a KNN model.

func (*KNN) Fit

func (knn *KNN) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the KNN model.

func (*KNN) Predict

func (knn *KNN) Predict(userId, itemId string) float64

Predict by the KNN model.

func (*KNN) SetParams

func (knn *KNN) SetParams(params base.Params)

SetParams sets hyper-parameters for the KNN model.

type KNNImplicit

type KNNImplicit struct {
	Base
	Matrix [][]float64
	Users  []*base.MarginalSubSet
}

KNNImplicit is the KNN model for implicit feedback.

func NewKNNImplicit

func NewKNNImplicit(params base.Params) *KNNImplicit

NewKNNImplicit creates a KNN model for implicit feedback.

func (*KNNImplicit) Fit

func (knn *KNNImplicit) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the KNN model.

func (*KNNImplicit) Predict

func (knn *KNNImplicit) Predict(userId, itemId string) float64

Predict by the KNN model.

type NMF

type NMF struct {
	Base
	GlobalMean float64     // the global mean of ratings
	UserFactor [][]float64 // p_u
	ItemFactor [][]float64 // q_i
	// contains filtered or unexported fields
}

NMF [3] is the Matrix Factorization process with non-negative latent factors. During the MF process, the non-negativity, which ensures good representativeness of the learnt model, is critically important. Hyper-parameters:

	 Reg      - The regularization parameter of the cost function that is
             optimized. Default is 0.06.
	 NFactors - The number of latent factors. Default is 15.
	 NEpochs  - The number of iteration of the SGD procedure. Default is 50.
	 InitLow  - The lower bound of initial random latent factor. Default is 0.
	 InitHigh - The upper bound of initial random latent factor. Default is 1.

func NewNMF

func NewNMF(params base.Params) *NMF

NewNMF creates a NMF model.

func (*NMF) Fit

func (nmf *NMF) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the NMF model.

func (*NMF) Predict

func (nmf *NMF) Predict(userId, itemId string) float64

Predict by the NMF model.

func (*NMF) SetParams

func (nmf *NMF) SetParams(params base.Params)

SetParams sets hyper-parameters of the NMF model.

type SVD

type SVD struct {
	Base
	// Model parameters
	UserFactor [][]float64 // p_u
	ItemFactor [][]float64 // q_i
	UserBias   []float64   // b_u
	ItemBias   []float64   // b_i
	GlobalMean float64     // mu

	// Fallback model
	UserRatings []*base.MarginalSubSet
	ItemPop     *ItemPop
	// contains filtered or unexported fields
}

SVD algorithm, as popularized by Simon Funk during the Netflix Prize. The prediction \hat{r}_{ui} is set as:

\hat{r}_{ui} = μ + b_u + b_i + q_i^Tp_u

If user u is unknown, then the Bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i and q_i. Hyper-parameters:

  UseBias    - Add useBias in SVD model. Default is true.
	 Reg 		- The regularization parameter of the cost function that is
				  optimized. Default is 0.02.
	 Lr 		- The learning rate of SGD. Default is 0.005.
	 nFactors	- The number of latent factors. Default is 100.
	 NEpochs	- The number of iteration of the SGD procedure. Default is 20.
	 InitMean	- The mean of initial random latent factors. Default is 0.
	 InitStdDev	- The standard deviation of initial random latent factors. Default is 0.1.

func NewSVD

func NewSVD(params base.Params) *SVD

NewSVD creates a SVD model.

func (*SVD) Fit

func (svd *SVD) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the SVD model.

func (*SVD) Predict

func (svd *SVD) Predict(userId, itemId string) float64

Predict by the SVD model.

func (*SVD) SetParams

func (svd *SVD) SetParams(params base.Params)

SetParams sets hyper-parameters of the SVD model.

type SVDpp

type SVDpp struct {
	Base
	TrainSet   core.DataSetInterface
	UserFactor [][]float64 // p_u
	ItemFactor [][]float64 // q_i
	ImplFactor [][]float64 // y_i
	UserBias   []float64   // b_u
	ItemBias   []float64   // b_i
	GlobalMean float64     // mu
	// contains filtered or unexported fields
}

SVDpp (SVD++) [10] is an extension of SVD taking into account implicit interactions. The predicted \hat{r}_{ui} is:

\hat{r}_{ui} = \mu + b_u + b_i + q_i^T\left(p_u + |I_u|^{-\frac{1}{2}} \sum_{j \in I_u}y_j\right)

Where the y_j terms are a new set of item factors that capture implicit interactions. Here, an implicit rating describes the fact that a user u rated an item j, regardless of the rating value. If user u is unknown, then the bias b_u and the factors p_u are assumed to be zero. The same applies for item i with b_i, q_i and y_i. Hyper-parameters:

	 Reg        - The regularization parameter of the cost function that is
               optimized. Default is 0.02.
	 Lr         - The learning rate of SGD. Default is 0.007.
	 NFactors   - The number of latent factors. Default is 20.
	 NEpochs    - The number of iteration of the SGD procedure. Default is 20.
	 InitMean   - The mean of initial random latent factors. Default is 0.
	 InitStdDev - The standard deviation of initial random latent factors. Default is 0.1.

func NewSVDpp

func NewSVDpp(params base.Params) *SVDpp

NewSVDpp creates a SVD++ model.

func (*SVDpp) Fit

func (svd *SVDpp) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the SVD++ model.

func (*SVDpp) Predict

func (svd *SVDpp) Predict(userId, itemId string) float64

Predict by the SVD++ model.

func (*SVDpp) SetParams

func (svd *SVDpp) SetParams(params base.Params)

SetParams sets hyper-parameters of the SVD++ model.

type SlopeOne

type SlopeOne struct {
	Base
	GlobalMean  float64                // Mean of ratings in training set
	UserRatings []*base.MarginalSubSet // Ratings by each user
	UserMeans   []float64              // Mean of each user's ratings
	Dev         [][]float64            // Deviations
}

SlopeOne [4] predicts ratings by the form f(x) = x + b, which precompute the average difference between the ratings of one item and another for users who rated both.

First, deviations between pairs of items are computed. Given a training set χ, and any two items j and i with ratings u_j and u_i respectively in some user evaluation u (annotated as u∈S_{j,i}(χ)), the average deviation of item i with respect to item j is computed by:

dev_{j,i} = \sum_{u∈S_{j,i}(χ)} \frac{u_j-u_i} {card(S_{j,i}(χ)}

The computation on deviations could be parallelized.

In the predicting stage, Given that dev_{j,i} + u_i is a prediction for u_j given u_i, a reasonable predictor might be the average of all such predictions

P(u)_j = \frac{1}{card(R_j) \sum_{i∈R_j}(dev_{j,i} + u_i)

where R_j = {i|i ∈ S(u), i \ne j, card(S_{j,i}(χ)) > 0} is the set of all relevant items. The subset of the set of items consisting of all those items which are rated in u is S(u).

func NewSlopOne

func NewSlopOne(params base.Params) *SlopeOne

NewSlopOne creates a SlopeOne model.

func (*SlopeOne) Fit

func (so *SlopeOne) Fit(trainSet core.DataSetInterface, options *base.RuntimeOptions)

Fit the SlopeOne model.

func (*SlopeOne) Predict

func (so *SlopeOne) Predict(userId, itemId string) float64

Predict by the SlopeOne model.

type WRMF

type WRMF struct {
	Base
	// Model parameters
	UserFactor *mat.Dense // p_u
	ItemFactor *mat.Dense // q_i

	// Fallback model
	UserRatings []*base.MarginalSubSet
	ItemPop     *ItemPop
	// contains filtered or unexported fields
}

WRMF [7] is the Weighted Regularized Matrix Factorization, which exploits unique properties of implicit feedback datasets. It treats the data as indication of positive and negative preference associated with vastly varying confidence levels. This leads to a factor model which is especially tailored for implicit feedback recommenders. Authors also proposed a scalable optimization procedure, which scales linearly with the data size. Hyper-parameters:

NFactors   - The number of latent factors. Default is 10.
NEpochs    - The number of training epochs. Default is 50.
InitMean   - The mean of initial latent factors. Default is 0.
InitStdDev - The standard deviation of initial latent factors. Default is 0.1.
Reg        - The strength of regularization.

func NewWRMF

func NewWRMF(params base.Params) *WRMF

NewWRMF creates a WRMF model.

func (*WRMF) Fit

func (mf *WRMF) Fit(set core.DataSetInterface, options *base.RuntimeOptions)

Fit the WRMF model.

func (*WRMF) Predict

func (mf *WRMF) Predict(userId, itemId string) float64

Predict by the WRMF model.

func (*WRMF) SetParams

func (mf *WRMF) SetParams(params base.Params)

SetParams sets hyper-parameters for the WRMF model.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL