Skip to content

web_search

A trusted clauz3 layer for web search with query-length, query-count, and substring-based privacy / exfil guards. The backend search service is configurable; the stdlib does not pick a vendor.

Unlike the layers under examples/, this effect is real, not a mock: web_search actually POSTs the query to a JSON endpoint. The prover never runs the body — it records each call as a fact and proves the contract. The body runs only under clauz3 run, after a program is proved and approved.

Install

clauz3 install stdlib:web_search

Configuring the backend

Set two environment variables before running a proved program that calls web_search:

  • CLAUZ3_SEARCH_URL — JSON endpoint that accepts {"query": str, "key": str} via POST and returns a JSON array of result URL strings.
  • CLAUZ3_SEARCH_API_KEY — optional API key forwarded in the JSON body (empty if unset).

If CLAUZ3_SEARCH_URL is unset, the body raises RuntimeError. This means test cases (which exercise only the prover, not the body) work without any backend configured.

Effects

Import with from tools.web_search.trusted.effects import web_search:

  • web_search(query: str) -> list[str] — return a list of result URLs. Precondition: query is non-empty. Recorded under the net and search markers.

Contracts

Import with from tools.web_search.trusted import contracts as srch:

  • srch.no_guarantees() — explicit null contract.
  • srch.no_searches() — the program issues no searches.
  • srch.searches_at_most(count) — at most count searches occur.
  • srch.query_length_at_most(max_chars) — every query is at most max_chars long.
  • srch.no_query_contains(substring) — no query contains substring (privacy / exfil guard).

Example

import clauz3
from tools.web_search.trusted import contracts as srch
from tools.web_search.trusted.effects import web_search


@clauz3.guarantee(srch.searches_at_most(3))
@clauz3.guarantee(srch.query_length_at_most(100))
@clauz3.guarantee(srch.no_query_contains("BEGIN PRIVATE KEY"))
def main() -> None:
    web_search("clauz3 static contracts")

Tests

clauz3 test stdlib:web_search

tests/cases/*_pass.py must prove; tests/cases/*_fail.py must not.

The test suite is proof-only — it never hits the configured backend and does not require CLAUZ3_SEARCH_URL to be set.

Source

src/clauz3/stdlib/web_search/tools/web_search/trusted/effects.py

"""Trusted web-search effect: a real (non-mock) search via a configurable backend.

The stdlib deliberately does not pick a search vendor. The body of
``web_search`` POSTs the query as JSON to a backend whose URL is taken from
the ``CLAUZ3_SEARCH_URL`` environment variable; an optional API key is read
from ``CLAUZ3_SEARCH_API_KEY``. The backend is expected to return a JSON
array of result URL strings (a single ``["https://...", ...]`` object).

This keeps the contract layer — which is the durable part of stdlib — vendor-
independent. The agent's guarantees about query length, query count, and
banned query substrings hold no matter which search service is on the other
end of the wire.

If ``CLAUZ3_SEARCH_URL`` is unset, the body raises ``RuntimeError`` at
``clauz3 run`` time. Proof never executes the body, so unset environment
during testing is fine.
"""

import json
import os
from urllib.request import Request, urlopen

import deal

DEFAULT_USER_AGENT = "clauz3-web-search/0.1"
DEFAULT_TIMEOUT_SECONDS = 30


@deal.pre(lambda query: len(query) > 0, message="query must be non-empty")
@deal.has("net", "search", "global", "import", "trusted")
def web_search(query: str) -> list[str]:
    """Search for ``query`` and return a list of result URLs as strings.

    The backend is configured by the ``CLAUZ3_SEARCH_URL`` env var; an
    optional API key in ``CLAUZ3_SEARCH_API_KEY`` is forwarded in the JSON
    payload. The backend must return a JSON array of URL strings.
    """
    endpoint = os.environ.get("CLAUZ3_SEARCH_URL")
    if not endpoint:
        raise RuntimeError(
            "CLAUZ3_SEARCH_URL is not set; cannot perform a real web search. "
            'Set it to a JSON endpoint that accepts {"query": str, ...} and '
            "returns a JSON array of result URL strings."
        )
    api_key = os.environ.get("CLAUZ3_SEARCH_API_KEY", "")
    payload = json.dumps({"query": query, "key": api_key}).encode("utf-8")
    req = Request(
        endpoint,
        data=payload,
        headers={
            "Content-Type": "application/json",
            "User-Agent": DEFAULT_USER_AGENT,
        },
        method="POST",
    )
    with urlopen(req, timeout=DEFAULT_TIMEOUT_SECONDS) as response:  # noqa: S310
        body = response.read()
    parsed = json.loads(body.decode("utf-8", errors="replace"))
    return [str(url) for url in parsed]

src/clauz3/stdlib/web_search/tools/web_search/trusted/contracts.py

"""Web-search query policies built from generic trusted effect facts.

Searches are recorded under the ``net`` and ``search`` markers. These
contracts let an agent state how many searches may occur, how long each
query may be, and which substrings the query must (not) contain — the last
is a privacy / exfil guard with the same shape as ``web_fetch``'s
``no_url_contains``.
"""

from clauz3.spec import ContractSpec, contract, effect
from clauz3.spec import no_guarantees as core_no_guarantees

Search = effect("search")


@contract
def no_guarantees() -> ContractSpec:
    """Make no guarantees about web-search effects."""
    return core_no_guarantees()


@contract
def no_searches() -> ContractSpec:
    """Guarantee the program issues no searches."""
    return Search.empty()


@contract
def searches_at_most(count: int) -> ContractSpec:
    """Guarantee at most ``count`` searches occur."""
    return Search.count() <= count


@contract
def query_length_at_most(max_chars: int) -> ContractSpec:
    """Guarantee every search query is at most ``max_chars`` long.

    Bounds query size to prevent oversized smuggling of local data into the
    outbound search request.
    """
    return Search.all(lambda e: len(e.query) <= max_chars)


@contract
def no_query_contains(substring: str) -> ContractSpec:
    """Guarantee no search query contains ``substring``.

    Privacy / exfil guard with the same shape as ``web_fetch``'s
    ``no_url_contains``: an agent asked to look something up cannot smuggle
    a local secret into the query string.
    """
    return Search.all(lambda e: substring not in e.query)

All cases

Browse the full set on the all-cases page.