llmselector

package module
v0.0.0-...-2349f7e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 11, 2025 License: Apache-2.0 Imports: 7 Imported by: 0

README

LLM Selector for Playwright

Go Reference

llmselector is a Go library that integrates with playwright-go to allow developers to select DOM elements using natural language prompts. It leverages the power of Large Language Models (LLMs) to understand the user's intent and find the corresponding playwright.Locator objects.

This allows for more intuitive and readable web automation scripts, as you can replace complex CSS selectors or XPath expressions with simple descriptions.

Features

  • Select web elements using natural language (e.g., "the login button").
  • Seamlessly integrates with playwright-go.
  • Supports any OpenAI-compatible LLM API.
  • Automatic removal of irrelevant HTML tags (<script>, <style>) for better performance and accuracy.

Installation

go get github.com/kikuchy/llmselector

Usage

Here's a basic example of how to use llmselector:

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/kikuchy/llmselector"
	"github.com/playwright-community/playwright-go"
)

func main() {
	// Initialize Playwright
	pw, err := playwright.Run()
	if err != nil {
		log.Fatalf("could not start playwright: %v", err)
	}
	defer pw.Stop()

	browser, err := pw.Chromium.Launch()
	if err != nil {
		log.Fatalf("could not launch browser: %v", err)
	}
	defer browser.Close()

	page, err := browser.NewPage()
	if err != nil {
		log.Fatalf("could not create page: %v", err)
	}

	// Navigate to a page (replace with your target URL)
	if _, err := page.Goto("https://example.com"); err != nil {
		log.Fatalf("could not goto: %v", err)
	}

	// Create a new selector instance
	// Make sure to set your API key via environment variables or directly.
	selector, err := llmselector.New(
		llmselector.WithAPIKey("YOUR_OPENAI_API_KEY"), // Or use os.Getenv("OPENAI_API_KEY")
		// Optional: Specify model, endpoint, etc.
		// llmselector.WithModel("gpt-4o"),
	)
	if err != nil {
		log.Fatalf("failed to create selector: %v", err)
	}

	// Find an element using a natural language prompt
	prompt := "the 'More information...' link"
	locators, err := selector.Find(context.Background(), page, prompt)
	if err != nil {
		log.Fatalf("failed to find locators: %v", err)
	}

	if len(locators) == 0 {
		fmt.Println("No locators found for prompt:", prompt)
		return
	}

	// Interact with the found element
	fmt.Printf("Found %d locator(s). Clicking the first one...\n", len(locators))
	err = locators[0].Click()
	if err != nil {
		log.Fatalf("failed to click locator: %v", err)
	}

	fmt.Println("Successfully clicked the link!")
}

How It Works

  1. The library takes the playwright.Page object and a natural language prompt as input.
  2. It reads the HTML content from the page.
  3. It preprocesses the HTML by removing <script> and <style> tags to create a clean version for the LLM.
  4. It sends the cleaned HTML and the user's prompt to the specified LLM API.
  5. The LLM is instructed to return a JSON object containing an array of XPath expressions that match the prompt.
  6. The library parses the response and converts each XPath into a playwright.Locator object.
  7. A slice of these locators is returned to the user for further interaction.

Configuration

The llmselector.New function accepts functional options to configure the client:

  • WithAPIKey(string): (Required) Sets the API key for your LLM provider.
  • WithEndpoint(string): Sets the API endpoint. Defaults to the standard OpenAI endpoint.
  • WithModel(string): Sets the model name to use (e.g., "gpt-4o", "gpt-3.5-turbo"). Defaults to "gpt-4o".
  • WithRemoveScriptTags(bool): Toggles removal of <script> tags. Defaults to true.
  • WithRemoveStyleTags(bool): Toggles removal of <style> tags. Defaults to true.

License

This project is licensed under the Apache 2.0 License.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Option

type Option func(*Options) error

Option は、Options構造体に関数を適用するための型です。 この関数型オプションパターンにより、柔軟な設定が可能になります。

func WithAPIKey

func WithAPIKey(apiKey string) Option

WithAPIKey は、LLM APIの認証に使用するAPIキーを設定します。

func WithEndpoint

func WithEndpoint(endpoint string) Option

WithEndpoint は、APIエンドポイントURLを設定します。

func WithModel

func WithModel(model string) Option

WithModel は、使用するLLMモデルの名前を設定します。

func WithRemoveScriptTags

func WithRemoveScriptTags(remove bool) Option

WithRemoveScriptTags は、HTMLから<script>タグを削除するかどうかを設定します。

func WithRemoveStyleTags

func WithRemoveStyleTags(remove bool) Option

WithRemoveStyleTags は、HTMLから<style>タグを削除するかどうかを設定します。

type Options

type Options struct {
	APIEndpoint      string
	APIKey           string
	Model            string
	RemoveScriptTags bool
	RemoveStyleTags  bool
}

Options は、llmselectorの動作をカスタマイズするための設定を保持します。

type Selector

type Selector struct {
	// contains filtered or unexported fields
}

Selector は、自然言語からDOM要素を特定するためのメイン構造体です。

func New

func New(opts ...Option) (*Selector, error)

New は、新しいSelectorインスタンスを生成します。 APIキーなどの設定は、関数型オプションパターンを用いて渡します。

func (*Selector) Find

func (s *Selector) Find(ctx context.Context, page playwright.Page, prompt string) ([]playwright.Locator, []string, error)

Find は、与えられた自然言語プロンプトに基づき、ページ内から一致する可能性のあるDOM要素を検索し、 それらを指し示す `playwright.Locator` のスライスとして返します。

Directories

Path Synopsis
cmd
llmselector command
internal
llm

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL