llmselector

package module

v0.0.0-...-2349f7e Latest Latest Go to latest Published: Jul 11, 2025 License: Apache-2.0 Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/kikuchy/llmselector

Links

Open Source Insights

README ¶

LLM Selector for Playwright

llmselector is a Go library that integrates with playwright-go to allow developers to select DOM elements using natural language prompts. It leverages the power of Large Language Models (LLMs) to understand the user's intent and find the corresponding playwright.Locator objects.

This allows for more intuitive and readable web automation scripts, as you can replace complex CSS selectors or XPath expressions with simple descriptions.

Features

Select web elements using natural language (e.g., "the login button").
Seamlessly integrates with playwright-go.
Supports any OpenAI-compatible LLM API.
Automatic removal of irrelevant HTML tags (<script>, <style>) for better performance and accuracy.

Installation

go get github.com/kikuchy/llmselector

Usage

Here's a basic example of how to use llmselector:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/kikuchy/llmselector"
	"github.com/playwright-community/playwright-go"
)

func main() {
	// Initialize Playwright
	pw, err := playwright.Run()
	if err != nil {
		log.Fatalf("could not start playwright: %v", err)
	}
	defer pw.Stop()

	browser, err := pw.Chromium.Launch()
	if err != nil {
		log.Fatalf("could not launch browser: %v", err)
	}
	defer browser.Close()

	page, err := browser.NewPage()
	if err != nil {
		log.Fatalf("could not create page: %v", err)
	}

	// Navigate to a page (replace with your target URL)
	if _, err := page.Goto("https://example.com"); err != nil {
		log.Fatalf("could not goto: %v", err)
	}

	// Create a new selector instance
	// Make sure to set your API key via environment variables or directly.
	selector, err := llmselector.New(
		llmselector.WithAPIKey("YOUR_OPENAI_API_KEY"), // Or use os.Getenv("OPENAI_API_KEY")
		// Optional: Specify model, endpoint, etc.
		// llmselector.WithModel("gpt-4o"),
	)
	if err != nil {
		log.Fatalf("failed to create selector: %v", err)
	}

	// Find an element using a natural language prompt
	prompt := "the 'More information...' link"
	locators, err := selector.Find(context.Background(), page, prompt)
	if err != nil {
		log.Fatalf("failed to find locators: %v", err)
	}

	if len(locators) == 0 {
		fmt.Println("No locators found for prompt:", prompt)
		return
	}

	// Interact with the found element
	fmt.Printf("Found %d locator(s). Clicking the first one...\n", len(locators))
	err = locators[0].Click()
	if err != nil {
		log.Fatalf("failed to click locator: %v", err)
	}

	fmt.Println("Successfully clicked the link!")
}

How It Works

The library takes the playwright.Page object and a natural language prompt as input.
It reads the HTML content from the page.
It preprocesses the HTML by removing <script> and <style> tags to create a clean version for the LLM.
It sends the cleaned HTML and the user's prompt to the specified LLM API.
The LLM is instructed to return a JSON object containing an array of XPath expressions that match the prompt.
The library parses the response and converts each XPath into a playwright.Locator object.
A slice of these locators is returned to the user for further interaction.

Configuration

The llmselector.New function accepts functional options to configure the client:

WithAPIKey(string): (Required) Sets the API key for your LLM provider.
WithEndpoint(string): Sets the API endpoint. Defaults to the standard OpenAI endpoint.
WithModel(string): Sets the model name to use (e.g., "gpt-4o", "gpt-3.5-turbo"). Defaults to "gpt-4o".
WithRemoveScriptTags(bool): Toggles removal of <script> tags. Defaults to true.
WithRemoveStyleTags(bool): Toggles removal of <style> tags. Defaults to true.

License

This project is licensed under the Apache 2.0 License.

Documentation ¶

Index ¶

type Option
type Options
type Selector
- func New(opts ...Option) (*Selector, error)
- func (s *Selector) Find(ctx context.Context, page playwright.Page, prompt string) ([]playwright.Locator, []string, error)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Option ¶

type Option func(*Options) error

Option は、Options構造体に関数を適用するための型です。この関数型オプションパターンにより、柔軟な設定が可能になります。

func WithAPIKey ¶

func WithAPIKey(apiKey string) Option

WithAPIKey は、LLM APIの認証に使用するAPIキーを設定します。

func WithEndpoint ¶

func WithEndpoint(endpoint string) Option

WithEndpoint は、APIエンドポイントURLを設定します。

func WithModel ¶

func WithModel(model string) Option

WithModel は、使用するLLMモデルの名前を設定します。

func WithRemoveScriptTags ¶

func WithRemoveScriptTags(remove bool) Option

WithRemoveScriptTags は、HTMLから<script>タグを削除するかどうかを設定します。

func WithRemoveStyleTags ¶

func WithRemoveStyleTags(remove bool) Option

WithRemoveStyleTags は、HTMLから<style>タグを削除するかどうかを設定します。

type Options ¶

type Options struct {
	APIEndpoint      string
	APIKey           string
	Model            string
	RemoveScriptTags bool
	RemoveStyleTags  bool
}

Options は、llmselectorの動作をカスタマイズするための設定を保持します。

type Selector ¶

type Selector struct {
	// contains filtered or unexported fields
}

Selector は、自然言語からDOM要素を特定するためのメイン構造体です。

func New ¶

func New(opts ...Option) (*Selector, error)

New は、新しいSelectorインスタンスを生成します。 APIキーなどの設定は、関数型オプションパターンを用いて渡します。

func (*Selector) Find ¶

func (s *Selector) Find(ctx context.Context, page playwright.Page, prompt string) ([]playwright.Locator, []string, error)

Find は、与えられた自然言語プロンプトに基づき、ページ内から一致する可能性のあるDOM要素を検索し、それらを指し示す `playwright.Locator` のスライスとして返します。

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
llmselector command
internal
html
llm

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL