= otf-classifier
Web service to align free text to the learning progressions text as document classification
NOTE: This is experimental and proof-of-concept code
NOTE: Project has moved to go modules, remember to export GO111MODULE=on for go get and go build if you are working without module support as default.
This code is based on https://github.com/nsip/curriculum-align[]. It puts in place a document classifier
(https://en.wikipedia.org/wiki/Tf–idf[]) to classify arbitrary text as aligning to the learning progressions
the code is provisioned with, and outputting the alignments as a web service.
The code is set up to align to indicators; however training corpora are built up against both indicator codes and development level codes, and the latter can be used instead. You can switch by setting the variable `granularity` in classifier.go to be `"Devlevel"` instead of `"Indicator"`.
Binary distributions of the code are available in the `build/` directory.
The web service is made available as a library (`Align()`); the `cmd` directory contains a sample shell for it, which is used in the binary distribution. In the sample shell, the web service runs on port 1576. The test script `test.sh` issues representative REST queries against the web service.
The web service takes the following arguments:
[source,console]
----
GET http://localhost:1576/align?area=W&text=....
----
where _area_ is the learning area (`Numeracy` or `Literacy`), and _text_ is the text to be aligned. Both the _text_ and the _area_ parameters are obligatory.
For example:
[source,console]
----
http://localhost:1576/align?area=Numeracy&text=information
----
For larger payloads or automated environments calling the classifier, an equivalent POST method is also avialable which will accept a JSON payload:
[source,console]
-----
curl http://localhost:1576/align -H 'Content-Type: application/json' -d'{"area":"literacy","text":"confident sentences"}'
-----
The response is a JSON list of structs, one for each curriculum standard that the service is configured for, with the following fields:
* Item: the identifier of the curriculum item (indicator) whose alignment is reported
* DevLevel: the identifier of the development level corresponding to the curriculum item (indicator)
* Path: the path down to the indicator or development level
* Text: the text of the curriculum item whose alignment is reported
* Score: the score of the alignment. This is the score generated by github.com/jbrukh/bayesian: it is a negative number, and the higher the number (i.e. the closer to zero), the better the alignment of the text to the curriculum standard.
* Matches: the top five words that were the basis for the alignment of the curriculum item to the text
** Text: the matching word
** Score: the logarithmic score
For example:
[source,console]
----
[
{
"Item": "uri/version/00b902b5-7065-430f-b6de-f9b92aac85ff",
"Text": "identifies symmetry in the environment",
"DevLevel": "UGP3",
"Path": [
{
"Key": "General Capability",
"Val": "Numeracy"
},
{
"Key": "Element",
"Val": "Measurement and geometry"
},
{
"Key": "Sub-element",
"Val": "Understanding geometric properties"
},
{
"Key": "Progression Level",
"Val": "UGP3"
},
{
"Key": "Heading",
"Val": "Transformations"
},
{
"Key": "Indicator",
"Val": "identifies and creates patterns involving one- and two-step transformations of shapes (e.g. uses pattern blocks to create a pattern and describes how the pattern was created)"
}
],
"Score": -58.50905189596907,
"Matches": [
{
"Word": "collects",
"Score": -25.328436022934504
},
{
"Word": "information",
"Score": -25.328436022934504
}
]
}
]
----
The documents passed to the document classifier are also indexed, and can be queried through
the web service `index`:
[source,console]
----
GET http://localhost:1576/index?search=word
----
The `Path` lookup of an indicator or progression level code or URI can be queried through the web service
`lookup`:
[source,console]
----
GET http://localhost:1576/lookup?search=UGP3
----
To use embedded in other labstack.echo webservers, replicate the `cmd/main.go` main() code:
[source,console]
----
align.Init()
e := echo.New()
e.GET("/align", align.Align)
e.GET("/lookup", func(c echo.Context) error {
query := c.QueryParam("search")
ret, err := align.Lookup(query)
if err != nil {
return err
} else {
return c.String(http.StatusOK, string(ret))
}
})
e.GET("/index", func(c echo.Context) error {
query := c.QueryParam("search")
ret, err := align.Search(query)
if err != nil {
return err
} else {
return c.String(http.StatusOK, string(ret))
}
})
----
The web service is configured to read any JSON files in the `curricula` folder of the executable; the file included in the distribution is a mockup of the proposed machine encoding of the National Learning Progressions.
== 1576
https://en.wikipedia.org/wiki/Curriculum[]:
> The word "curriculum" began as a Latin word which means "a race" or "the course of a race" (which in turn derives from the verb _currere_ meaning "to run/to proceed"). The first known use in an educational context is in the https://books.google.com.au/books?id=bG5EAAAAcAAJ&printsec=frontcover&hl=el&source=gbs_ge_summary_r&cad=0#v=onepage&q=curriculum&f=false[_Professio Regia_], a work by University of Paris professor https://en.wikipedia.org/wiki/Petrus_Ramus[Petrus Ramus] published https://en.wikipedia.org/wiki/St._Bartholomew%27s_Day_massacre[posthumously] in 1576.