LLM Selector for Playwright

llmselector
is a Go library that integrates with playwright-go
to allow developers to select DOM elements using natural language prompts. It leverages the power of Large Language Models (LLMs) to understand the user's intent and find the corresponding playwright.Locator
objects.
This allows for more intuitive and readable web automation scripts, as you can replace complex CSS selectors or XPath expressions with simple descriptions.
Features
- Select web elements using natural language (e.g., "the login button").
- Seamlessly integrates with
playwright-go
.
- Supports any OpenAI-compatible LLM API.
- Automatic removal of irrelevant HTML tags (
<script>
, <style>
) for better performance and accuracy.
Installation
go get github.com/kikuchy/llmselector
Usage
Here's a basic example of how to use llmselector
:
package main
import (
"context"
"fmt"
"log"
"github.com/kikuchy/llmselector"
"github.com/playwright-community/playwright-go"
)
func main() {
// Initialize Playwright
pw, err := playwright.Run()
if err != nil {
log.Fatalf("could not start playwright: %v", err)
}
defer pw.Stop()
browser, err := pw.Chromium.Launch()
if err != nil {
log.Fatalf("could not launch browser: %v", err)
}
defer browser.Close()
page, err := browser.NewPage()
if err != nil {
log.Fatalf("could not create page: %v", err)
}
// Navigate to a page (replace with your target URL)
if _, err := page.Goto("https://example.com"); err != nil {
log.Fatalf("could not goto: %v", err)
}
// Create a new selector instance
// Make sure to set your API key via environment variables or directly.
selector, err := llmselector.New(
llmselector.WithAPIKey("YOUR_OPENAI_API_KEY"), // Or use os.Getenv("OPENAI_API_KEY")
// Optional: Specify model, endpoint, etc.
// llmselector.WithModel("gpt-4o"),
)
if err != nil {
log.Fatalf("failed to create selector: %v", err)
}
// Find an element using a natural language prompt
prompt := "the 'More information...' link"
locators, err := selector.Find(context.Background(), page, prompt)
if err != nil {
log.Fatalf("failed to find locators: %v", err)
}
if len(locators) == 0 {
fmt.Println("No locators found for prompt:", prompt)
return
}
// Interact with the found element
fmt.Printf("Found %d locator(s). Clicking the first one...\n", len(locators))
err = locators[0].Click()
if err != nil {
log.Fatalf("failed to click locator: %v", err)
}
fmt.Println("Successfully clicked the link!")
}
How It Works
- The library takes the
playwright.Page
object and a natural language prompt
as input.
- It reads the HTML content from the page.
- It preprocesses the HTML by removing
<script>
and <style>
tags to create a clean version for the LLM.
- It sends the cleaned HTML and the user's prompt to the specified LLM API.
- The LLM is instructed to return a JSON object containing an array of XPath expressions that match the prompt.
- The library parses the response and converts each XPath into a
playwright.Locator
object.
- A slice of these locators is returned to the user for further interaction.
Configuration
The llmselector.New
function accepts functional options to configure the client:
WithAPIKey(string)
: (Required) Sets the API key for your LLM provider.
WithEndpoint(string)
: Sets the API endpoint. Defaults to the standard OpenAI endpoint.
WithModel(string)
: Sets the model name to use (e.g., "gpt-4o", "gpt-3.5-turbo"). Defaults to "gpt-4o".
WithRemoveScriptTags(bool)
: Toggles removal of <script>
tags. Defaults to true
.
WithRemoveStyleTags(bool)
: Toggles removal of <style>
tags. Defaults to true
.
License
This project is licensed under the Apache 2.0 License.