Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.getbindu.com/llms.txt

Use this file to discover all available pages before exploring further.

General Concepts

Bindu is an AI agent framework that speaks the A2A protocol (Agent-to-Agent communication) and the X402 micropayment extension. It handles infrastructure, identity (DIDs + Hydra OAuth2 + optional mTLS), observability, and payments, turning any local agent script into a production-ready microservice without rewriting code. Native AP2 support is on the roadmap, not shipped.
A2A is a JSON-RPC 2.0 protocol that defines task lifecycle, context sharing, agent cards, push notifications, and extensions — Bindu implements A2A v0.3.0. The reason Bindu uses it instead of rolling something custom: agents written in different languages, on different frameworks, deployed by different teams can talk to each other without a translation layer. The task-first execution model (every interaction becomes a tracked task with state) is what makes orchestration possible across that boundary.
Use a Workflow for sequential, step-by-step processing within a single agent (e.g., search -> extract -> summarize). Use a Team when you need multiple specialized agents with different identities and tools collaborating on a complex problem.
No. With auth off, the middleware never inspects the request and no DID is required. As soon as you turn auth on, the bearer token’s client_id becomes a DID (the Bindu default), and the agent demands a matching X-DID header + signature on every call. The cleanest path is to generate a DID once when you bindufy() your agent — bindufy writes it to .bindu/oauth_credentials.json and the same DID survives across restarts.
No — the seed is the private key. Without it, you cannot sign requests as that DID and you cannot prove ownership. Generate a fresh seed, derive a new DID, and re-register it with Hydra (with the new public key in metadata.public_key). Anything that referenced the lost DID (allowlists, peer agent configs) needs to be updated to point at the new one. Treat the seed file with .ssh/id_rsa-level care: chmod 600, never commit, back up to a password manager or HSM.
No. The DID’s last segment is sha256(public_key)[0..32] formatted as a UUID — change the keypair and the derived DID changes too. This is intentional: the DID is a fingerprint of the public key, so “rotating keys but keeping the DID” would defeat the verification chain. If you need to rotate, mint a new DID, register the new public key in Hydra, and update consumers.

Getting Started Confusion

Three layers in the same stack, doing different jobs:
  • Bindu — the agent framework itself. You wrap your handler with bindufy() and get a Starlette HTTP server speaking A2A on port 3773 by default.
  • Gateway — a separate service (gateway/) that plans and fans out work across multiple Bindu agents. You call POST /plan with a high-level intent; the gateway decides which agents to invoke and orchestrates the multi-agent flow.
  • Inbox — the UI (inbox/) that bootstraps a personal agent, registers it with Hydra, signs outbound messages, and renders agents as Gmail-shaped addresses (fleet+agent@getbindu.com). It’s the easiest way to talk to a fleet without writing a caller.
You can run Bindu standalone without the gateway or inbox. The other two are convenience layers.
Yes — bindufy() starts a uvicorn server and blocks. That’s the intended behavior for a single-agent process. If you want to do other work after starting the server, run it in a background thread, use asyncio.create_task on the underlying coroutine, or run multiple agents as separate processes orchestrated by a process manager like supervisord or concurrently.
Technically yes, but it’s almost always the wrong call. Each agent occupies its own port and has its own DID, manifest, and storage. Running them as separate processes is simpler, makes scaling and crash isolation cleaner, and matches how the examples/gateway_test_fleet/ pattern works. If you really need in-process composition, build one Bindu agent whose handler orchestrates several underlying framework agents (Agno team, LangGraph nodes, CrewAI crew).
Default is http://localhost:3773 (default_host=localhost, default_port=3773 in settings.py). Override with BINDU_PORT=4000 python agent.py or BINDU_HOST=0.0.0.0 BINDU_PORT=4000 python agent.py. If you need a different public URL than the bind address (e.g., behind a reverse proxy), set BINDU_DEPLOYMENT_URL=https://my-agent.example.com — that’s what the agent card advertises.
BinduApplication defaults url to "http://localhost" and bindufy() only overrides it when you pass deployment.url through the manifest. Set BINDU_DEPLOYMENT_URL in your env, or pass an explicit url=... into the deployment dict you give to bindufy(). Peers fetching /.well-known/agent.json will then see the real address.
Three options, ordered by use case:
  1. Development / demo: bindufy(launch=True, ...) creates an FRP tunnel and publishes a public URL. Convenient, not durable.
  2. Production: put the agent behind your own load balancer + TLS terminator and set BINDU_DEPLOYMENT_URL to the public address.
  3. Peer-to-peer over public internet: turn on mTLS so the agent serves HTTPS directly with a step-ca cert. See the mTLS section below.
Only if auth is on and you point at the hosted Hydra (https://hydra.getbindu.com). With AUTH__ENABLED=false, STORAGE_TYPE=memory, SCHEDULER_TYPE=memory, and a local LLM (Ollama, LM Studio), the agent runs fully offline. Hydra is the only mandatory external dependency when auth is on — and you can self-host Hydra in an air-gapped network.

Installation & Setup

Bindu requires Python 3.12 or higher.
Bindu is optimized for the uv package manager. You can install the core framework by running uv add bindu.
Using UV, run uv venv --python 3.12.9. Then activate it using source .venv/bin/activate (macOS/Linux) or .venv\Scripts\activate (Windows) before installing dependencies.
Yes, but UV is strongly recommended. If you use Conda, create your environment (conda create -n bindu-env python=3.12), activate it, and then run pip install uv to manage your Bindu dependencies.
Run uv add bindu --upgrade. Bindu follows semantic versioning, meaning minor and patch updates are backward-compatible.
Partially. Bindu can run entirely locally using InMemoryStorage and local LLMs (like Ollama). However, initial setup (downloading the LLM weights and installing Python packages) requires an internet connection.

Environment Variables

Yes. Bindu automatically loads variables from a .env file in your project root.
At a minimum, you must set an API key for your chosen LLM provider (e.g., OPENROUTER_API_KEY, OPENAI_API_KEY, or MINIMAX_API_KEY).
Use the export command in your terminal (e.g., export BINDU_PORT=4000), or add them to your ~/.bashrc or ~/.zshrc file.
In PowerShell, use the $env: prefix (e.g., $env:BINDU_PORT="4000").
Create separate files (e.g., .env.staging and .env.production) and load the specific file in your deployment environment, or inject the variables directly via your CI/CD pipeline.
Yes. You can use the ENV instruction in your Dockerfile, pass them via docker run -e, or use the environment mapping in a docker-compose.yml file.
Bindu delegates API call execution to your driver framework (Agno, LangChain). Most frameworks read the API key at initialization, meaning a restart is usually required unless you implement a custom dynamic key loader in your handler function.

Switching Models

Bindu is framework-agnostic — it doesn’t talk to the LLM directly, your driver framework does. Switch providers by changing the model config in your framework (e.g., Agno: swap OpenAIChatAnthropic; LangChain: swap ChatOpenAIChatAnthropic; CrewAI: swap the llm= argument) and update the API keys in your .env. The Bindu wrapper around your handler stays unchanged.
Configure your driver framework to point at your local endpoint (http://localhost:11434 for Ollama, http://localhost:1234/v1 for LM Studio) and pass that agent to bindufy(). Bindu sees a normal handler — it doesn’t care that the LLM happens to be local.
Yes. Instantiate each agent with its own model in your driver framework (Agno multi-agent team, LangGraph nodes, CrewAI crew, AutoGen group chat), wrap the orchestration logic in a single handler, and pass that handler to bindufy(). Bindu treats the whole composition as one agent with one DID; the model choices live entirely inside your handler.
Implement a standard try/except fallback loop or use a framework like Tenacity inside your handler function to catch API errors and retry the prompt with a secondary model instance.

Agent Communication

Because Bindu uses the standard A2A JSON-RPC protocol, you can use any HTTP client (like httpx or requests) to send a formatted POST request to the external agent’s URL from inside your handler.
Every Bindu agent has a unique Decentralized Identifier (DID). Agents expose their capabilities and DID via a .well-known/agent.json endpoint, allowing for cryptographically verifiable discovery.
Yes. You simply write your LangChain or LangGraph execution logic inside the handler function that you pass to bindufy().
Use the context_id provided in the incoming A2A message payload. Passing this ID to subsequent agents ensures they all read and write to the same conversational memory thread.
The task state transitions to failed and becomes immutable — refinements create a new task instead of reopening the old one. Bindu’s internal scheduler and storage layers have retry-with-backoff for transient infrastructure failures (Redis hiccups, Postgres locks), but your handler is not wrapped in automatic retries. If you want LLM-call retries, wrap them yourself with a library like tenacity inside your handler. See the Retry overview for what the framework retries on your behalf.

Tasks & A2A Protocol Mechanics

Three IDs, three scopes:
  • messageId — one inbound or outbound message. Cheap, single-use.
  • taskId — one unit of work the agent is tracking. Has a lifecycle (submittedworkinginput-required or completed). The task is the thing you poll, cancel, and reference.
  • contextId — a conversation thread that may contain many tasks. Use the same contextId across calls when you want the agent to see prior turns.
Rule of thumb: every task lives inside exactly one context. Every message lives inside exactly one task.
A2A’s task immutability rule. Once a task reaches a terminal state (completed, failed, canceled, rejected), it cannot be reopened. To continue the conversation, create a new task in the same contextId. If the new task should build on prior outputs, include the old task’s ID in referenceTaskIds. That’s the explicit dependency edge an orchestrator uses to chain work.
Send another message/send call with the same taskId (still in input-required, not terminal yet) and the user’s answer in the message body. The agent’s handler will be called again with the new message appended to the task history. Once your handler returns a non-input-required result, the task transitions to its terminal state.
It’s how you express dependencies between tasks. When you create Task4 and need Task2 and Task3’s outputs available, pass referenceTaskIds: [task2_id, task3_id] in the message. The agent (and any orchestrator like Sapthami) can then read the referenced tasks’ artifacts when planning Task4’s work. Without referenceTaskIds, dependencies live only in your application logic and orchestrators can’t reason about them.
Every message/send request must include params.configuration.acceptedOutputModes, even if it’s just ["application/json"]. The JSON-RPC schema validator rejects the request before auth or the handler even runs. Minimum valid params:
{
  "message": { "role": "user", "parts": [{"kind":"text","text":"..."}], "messageId": "...", "contextId": "...", "taskId": "..." },
  "configuration": { "acceptedOutputModes": ["application/json"] }
}
Yes — A2A defines a FilePart with two flavors: FileWithBytes (inline base64 payload) for small files, FileWithUri (presigned URL) for large ones. Your handler returns a part with kind: "file", and the artifact attached to the completed task carries the file part. Clients fetch it via tasks/get after the task reaches completed.
Yes. Call message/stream instead of message/send — the agent returns a Server-Sent Events (SSE) stream of intermediate status updates and the final result. Use this when your handler produces incremental output (token-by-token LLM responses, multi-step tool calls) and you want clients to render progress instead of waiting for the whole task.
Use push notifications. Call tasks/pushNotificationConfig/set with a webhook URL on a task, and Bindu will POST a notification to that URL on every status transition. Webhook payloads are DID-signed by the agent so the receiver can verify origin. See the Notifications page for the webhook contract.
With STORAGE_TYPE=memory, tasks live for the lifetime of the process. With STORAGE_TYPE=postgres, tasks persist forever unless you implement a retention policy yourself — Bindu does not auto-purge completed tasks. For production, set up a periodic job to archive or delete tasks older than your retention window (e.g., 30 or 90 days).

Multi-Agent Topologies

Yes. Inside your handler, use any HTTP client (httpx, requests) to POST a JSON-RPC message/send to the other agent’s URL. If the called agent has auth on, you need a bearer token + DID signature on the outbound call — easiest to use the gateway as a signing proxy, or use the helper in bindu/utils/did/signature.py to build the signed headers. Pass contextId through if you want the called agent to see your task’s conversation history.
Every Bindu agent serves its public manifest at /.well-known/agent.json. To discover an agent, fetch that URL — you get the agent’s DID, advertised skills, supported protocols, and capabilities. There is no central registry: a fleet “exists” because each agent’s URL is known to its callers (registered in the inbox, configured in the gateway, hardcoded by your orchestrator). For more sophisticated discovery, the negotiation extension (/agent/negotiation) lets agents bid on capability requests.
Use the gateway when you need plan-then-execute multi-agent workflows from outside (a client sends a high-level intent; the gateway decides who runs what). Use handler-to-handler calls when one agent has a deterministic dependency on another (an orchestrator agent that always delegates summarization to a summary agent, for example). Rule of thumb: gateway for dynamic, intent-driven fan-out; direct calls for static, code-driven composition.
Two patterns:
  1. Gateway-driven: POST a single /plan request to the gateway with the high-level intent. The gateway parses, dispatches to several agents in parallel, and aggregates the results.
  2. Handler-driven: in your orchestrator agent’s handler, use asyncio.gather to call N peer agents concurrently, then synthesize the responses. Each peer call produces its own taskId; you can include all of them in referenceTaskIds on the synthesis task so the dependency graph is explicit.
Yes — PostgresStorage namespaces every row by the agent’s DID. Pointing N agents at the same DATABASE_URL is safe and is the recommended way to run a horizontally scaled fleet: each agent has its own slice, but ops only manages one database. Tasks, contexts, and artifacts never leak across DIDs.
Yes — there’s a parallel gRPC transport in bindu/grpc/ for language-agnostic agent clients. The gRPC port is separate from the HTTP port (3773 by default for HTTP). Use gRPC when you need an agent driver in a non-Python language and don’t want to hand-roll the A2A JSON-RPC contract. See the Multi-Language Sidecar docs.

Memory & State

Bindu supports Task History (A2A message arrays), Context Memory (threaded conversations), and Agent State. It provides InMemoryStorage for testing and PostgresStorage for production.
Set STORAGE_TYPE=postgres and provide a DATABASE_URL in your environment variables. Bindu will automatically persist task histories and contexts.
Bindu delegates Retrieval-Augmented Generation (RAG) to your driver framework. Use tools like Agno’s PDFKnowledgeBase or LangChain’s vector store retrievers inside your handler logic.
Context window management (like sliding windows or token summarization) must be handled by your driver framework before passing the message array to the LLM.

Tools & Integrations

Define your custom tools as standard Python functions and provide them to your driver framework’s tool array.
Yes. A skill is a directory containing either a skill.yaml or a SKILL.md file describing the capability (id, name, description, input/output schema). You pass the directory path to load_skills([...]), and the agent advertises each loaded skill at /agent/skills/{skill_id}. There is no implicit skills/ folder the framework auto-scans — paths are registered explicitly via bindufy().
Use the database or vector integrations provided by your driver framework (e.g., LangChain’s SQLDatabaseToolkit or Agno’s PgVector).
If your agent needs human approval, have your handler return a dictionary with "state": "input-required" and a "prompt" asking for clarification. Bindu will pause the task until the user responds.
Bindu delegates this to the driver framework. You can easily attach tools like DuckDuckGoTools or ScrapeGraph to your agent logic.

Structured Outputs

Enforce JSON schemas using your driver framework (e.g., Pydantic response_model passing) and ensure your client sends acceptedOutputModes: ["application/json"] in the A2A request configuration.
If the underlying model does not support native JSON mode, you must prompt the model to return JSON text and manually parse/extract it in your handler function.
Use a retry library like Tenacity inside your handler to catch JSON parsing errors or Pydantic ValidationErrors, and feed the error back to the LLM as a retry prompt.

Rate Limiting & Cost Management

TPM limits are enforced by your LLM provider (OpenAI, Anthropic). You can resolve this by adding retry logic with exponential backoff to your handler, or switching to an enterprise tier.
Use Bindu’s Redis integration by setting SCHEDULER_TYPE=redis. This allows you to queue tasks and control worker concurrency across distributed instances.
Request caching and token counting must be implemented inside your driver framework’s handler logic before the API call is made.

Debugging & Common Errors

Set the environment variable LOGGING__DEFAULT_LEVEL=DEBUG before starting your agent.
The real error class is AuthenticationRequiredError (code -32009), and it means the bearer token is missing, expired, or not recognized by the Hydra introspection endpoint the agent is pointing at. Note: Bindu uses opaque Hydra tokens introspected at runtime, not JWTs — the legacy error message mentions “JWT” but the actual flow goes through /admin/oauth2/introspect.Quick fixes in order: (1) mint a fresh token, (2) confirm your agent’s HYDRA__ADMIN_URL matches the Hydra that issued the token, (3) for local development, set AUTH__ENABLED=false to bypass auth entirely. See Making Authenticated Requests for the 4-gate decoder table.
These are the three DID-signature failure modes the auth middleware reports — they only appear when the bearer token’s client_id is a DID (the Bindu default):
  • did_mismatch — the X-DID header doesn’t equal the token’s client_id. Mint the token with the same DID you send.
  • public_key_unavailable — Hydra has no metadata.public_key for that DID. GET /admin/clients/<did> and patch the metadata.
  • invalid_signature — body bytes changed between sign and send, JSON canonicalization mismatch (the JS-vs-Python whitespace gotcha), wrong seed, or clock skew >300s.
Full troubleshooting table and a canonical cross-language fixture lives at Making Authenticated Requests.
curl http://localhost:3773/health — if this returns 200 without an Authorization header, the agent is up but health is intentionally public. To check enforcement, POST a JSON-RPC method without a token: curl -X POST http://localhost:3773/ -H 'Content-Type: application/json' -d '{"jsonrpc":"2.0","id":"1","method":"message/send","params":{"message":{"role":"user","parts":[{"kind":"text","text":"hi"}],"messageId":"x","contextId":"y","taskId":"z"},"configuration":{"acceptedOutputModes":["application/json"]}}}'. If you get -32009, auth is on. If you get a normal task response, AUTH__ENABLED is false.
Tool looping is a model behavior issue. You can prevent it by setting a maximum tool iteration limit in your driver framework (e.g., max_tool_iterations=5).
This happens when system prompts are ambiguous. Ensure your tool descriptions are highly specific, and use validation inside your handler to catch and reject hallucinated tool calls.
Use Python’s unittest.mock.patch to mock the LLM provider’s response, or build a simple mock handler that returns hardcoded text if a TEST_MODE environment variable is true.
Set TELEMETRY_ENABLED=true and configure OLTP_ENDPOINT, OLTP_SERVICE_NAME, and OLTP_HEADERS in your environment. Bindu will automatically export traces to platforms like Langfuse, Arize, or any OTLP-compatible collector.
The env var prefix is OLTP_* (not OTLP_*) in the current code. Yes, that’s a typo of the OpenTelemetry Protocol acronym — it’s load-bearing in bindu/utils/config/enricher.py, so use OLTP_* when configuring your .env.

Deployment

Deploy multiple instances of your Bindu agent container behind a load balancer. Point all instances to the same Redis instance (REDIS_URL) and PostgreSQL database (DATABASE_URL) so they can share the task queue and memory state.
Prometheus-formatted metrics including request rate, request latency histograms, active task counts grouped by state (submitted, working, input-required), worker utilization, queue depth, and storage operation durations. Scrape interval 15s is a reasonable default. Combine with the OTLP traces (TELEMETRY_ENABLED=true) for full request-path observability.
Run at least two replicas behind a load balancer, both pointed at the same DATABASE_URL and REDIS_URL. Roll one replica at a time: drain in-flight HTTP requests via a SIGTERM-triggered graceful shutdown (uvicorn handles this), let the second replica pick up new tasks from the shared Redis queue, then bring the new version up. Because tasks live in Postgres and the queue lives in Redis, the rolling restart doesn’t drop work in flight.
GET /health returns 200 with a JSON body including application.penguin_id, application.agent_did, runtime version, and storage/scheduler readiness. The /healthz endpoint is a stricter k8s-style readiness probe — returns 200 only when storage and scheduler are both reachable. Use /health for liveness, /healthz for readiness gates.
Yes. A typical deployment is a Deployment with N replicas, a Service for in-cluster traffic, an Ingress for external traffic, plus a ConfigMap for non-secret env (STORAGE_TYPE=postgres, SCHEDULER_TYPE=redis) and a Secret for credentials (DATABASE_URL, REDIS_URL, Hydra client secrets). Use /healthz as the readiness probe and /health as the liveness probe. For mTLS, store cert files in a Secret mounted at ~/.bindu/ or let the agent fetch them from step-ca on boot.

mTLS & Wire Security

You don’t need mTLS for a single-tenant agent behind your own TLS terminator — Hydra + DID signing already prove who the caller is and that the body wasn’t tampered with. You do need mTLS when peer agents talk to each other directly over the open internet (no shared load balancer), when bearer tokens shouldn’t traverse the wire in cleartext at any layer, or when you want a cryptographic bind between the TCP socket and the DID. See Security Stack for the three-layer model.
export AUTH__ENABLED=true
export AUTH__PROVIDER=hydra
export HYDRA__ADMIN_URL=https://hydra-admin.getbindu.com
export HYDRA__PUBLIC_URL=https://hydra.getbindu.com
export MTLS__ENABLED=true
export MTLS__MODE=hybrid                    # mtls + Hydra both checked
export MTLS__REQUIRE_CLIENT_CERT=false      # set true for strict mTLS
export MTLS__CA_URL=https://ca.getbindu.com
export MTLS__CA_ROOT_URL=https://ca.getbindu.com/roots.pem
The agent then registers with Hydra, exchanges an OIDC token at step-ca for a 24h X.509 cert, and serves uvicorn over HTTPS. Cert TTL is 24h; renewal kicks in 8h before expiry.
The #1 cause is load_dotenv ordering. Bindu’s app_settings is constructed at module-import time. If your agent.py imports bindu before calling load_dotenv(), your MTLS__* env vars land in os.environ but never reach the settings singleton, and the agent silently falls back to HTTP.Fix: call load_dotenv() first, before any bindu import. Confirm by greping the boot log for Bootstrapping mTLS — if it’s missing, the settings never saw your env block.
Your Hydra client was registered before mTLS was enabled, so its audience array doesn’t include step-ca. Recent Bindu builds reconcile this drift on every boot — restart the agent and the registration flow patches the audience. If you’re on an older build, delete .bindu/oauth_credentials.json and restart to force a fresh registration.
openssl x509 \
  -in ~/.bindu/personal/.bindu/tls_cert.pem \
  -noout -subject -dates -ext subjectAltName
You should see the agent’s DID in the SAN URI as https://hydra.getbindu.com#did:bindu:.... Issuer should be CN=Bindu Intermediate CA. Cert validity should be 24h from the renewal timestamp.
Delete the cert files and restart the agent — it regenerates on next boot:
rm ~/.bindu/personal/.bindu/tls_*.pem ~/.bindu/personal/.bindu/ca_bundle.pem
There is no CRL or OCSP in this design — short TTL + renewal is the revocation strategy.