
A Minimal, Self-Evolving Autonomous Agent Framework
~3K lines of seed code · 9 atomic tools · ~100-line Agent Loop
📌 Official: GitHub + https://gaagent.ai only. DintalClaw is the sole authorized commercial partner; others are not affiliated.
GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).
Design philosophy — don't preload skills, evolve them.
Every time GenericAgent solves a new task, it automatically crystallizes the execution path into a reusable Skill. The longer you use it, the more skills accumulate — forming a personal skill tree grown entirely from 3K lines of seed code.
🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running
git initto every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.
| Feature | Description |
|---|---|
| 🧬 Self-Evolving | Automatically crystallizes each task into a Skill. Capabilities grow with every use, forming your personal skill tree. |
| 🪶 Minimal Architecture | ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead. |
| ⚡ Strong Execution | TMWebdriver injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system. |
| 🔌 High Compatibility | Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform. |
| 💰 Token Efficient | <30K context window — a fraction of the 200K–1M other agents consume. Less noise, fewer hallucinations, higher success rate, lower cost. |
| 🛡️ Real-Browser CAPTCHA Survival | 🌐 Autonomous Web Exploration |
![]() |
![]() |
| While configuring a Discord bot, an hCaptcha "Are you human?" challenge pops up mid-task — GA's real browser session passes it and the task continues. See Browser Realness. | Autonomously browses and periodically summarizes web content. |
| 🧋 Food Delivery Order | 📈 Quantitative Stock Screening |
![]() |
![]() |
| "Order me a milk tea" — navigates the delivery app, selects items, completes checkout. | "Find GEM stocks with EXPMA golden cross, turnover > 5%" — quantitative screening. |
| 💰 Expense Tracking | 💬 Batch Messaging |
![]() |
![]() |
| "Find expenses over ¥2K in the last 3 months" — drives Alipay via ADB. | Sends bulk WeChat messages, fully driving the WeChat client. |
⚠️ Python version: use Python 3.11 or 3.12. Do not use Python 3.14 — it is incompatible with
pywebviewand a few other GA dependencies.📖 Detailed installation guide: installation.md · installation_zh.md(中文)
Fetch the installation guide and follow it:
curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation.md
git clone https://github.com/lsdefine/GenericAgent.git && cd GenericAgent
uv venv && uv pip install -e ".[ui]"
cp mykey_template_en.py mykey.py # fill in your LLM API key
Dependencies are deliberately tiered: the agent core needs only requests, plus four lightweight packages (beautifulsoup4, bottle, simple-websocket-server, aiohttp) for TMWebdriver's local server. The [ui] extra pulls in frontend libraries (Streamlit, prompt_toolkit/rich for the TUI, …) — install it for the bundled UIs, or skip it entirely and drive the agent headless. No Playwright, no LangChain, no browser binaries to download.
Then launch:
python frontends/tui_v3.py # Terminal UI (recommended)
python launch.pyw # Streamlit web UI
Sets up a self-contained directory with an isolated Python environment, Git, and a ready-to-run package. The script is in assets/ if you'd like to read it first.
Windows PowerShell
powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.ps1 | iex"
Linux / macOS
GLOBAL=1 bash -c "$(curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.sh)"
💡 GenericAgent grows its environment through the Agent itself — don't pre-install everything. See Unlocking Advanced Capabilities below.
A lightweight, scrollback-first terminal interface built on prompt_toolkit + rich. Supports multiple concurrent sessions and real-time streaming.
python frontends/tui_v3.py
⚠️ Windows TUI Troubleshooting
TUI rendering on Windows can be flaky depending on terminal + font. Common causes:
prompt_toolkit / rich are not on the latest version — pip install -U prompt_toolkit rich first."My experience using
frontends/tui_v3.pyin PowerShell / cmd / Git Bash on Windows is very poor — lots of incompatibility. Please refer to Claude Code's best practices for the Windows terminal and fix all font and rendering incompatibilities."
python launch.pyw
GenericAgent also supports IM frontends such as Telegram, Discord, and Lark.
| Platform | Command |
|---|---|
| Telegram | python frontends/tgapp.py |
| Discord | python frontends/dcapp.py |
| Lark / Feishu | python frontends/fsapp.py |
WeChat, QQ, WeCom and DingTalk are also supported — see the Chinese section below. For detailed setup, ask GenericAgent itself.
In GA, advanced capabilities are unlocked by instructing the agent, not by reading docs or installing extras. Each instruction below makes GA read its pre-installed SOPs (battle-tested playbooks in its memory), install whatever is missing, adapt to your OS, and persist the result into its own memory.
| Capability | Just tell GA |
|---|---|
| 🌐 Web automation | "Set up your web automation capability." — GA guides you through the one manual step: dragging the bundled Chrome extension into chrome://extensions. |
| 🔤 OCR | "Set up your OCR capability with rapidocr and save it to memory." |
| 👁️ Vision | "Set up your vision capability from the template in memory/." — GA copies the template, wires it to your existing LLM keys, and self-tests. |
| 🖱️ Computer use | "Probe this system and set up your computer-use capability." |
💡 About language: the pre-installed SOPs are written in Chinese — GA reads them natively, so this never blocks you. If you prefer an English knowledge base, just say: "Read your pre-installed SOPs and rewrite them in English (keep code, paths and error strings verbatim)."
🌍 About platforms: the SOPs were honed on Windows, but cross-platform adaptation is itself a GA task — on macOS/Linux, GA swaps in the platform equivalents (window enumeration, input control, screenshots) on its own. Same self-evolution principle.
GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.
Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.
| Layer | Name | Description |
|---|---|---|
| L0 | Meta Rules | Core behavioral rules and system constraints |
| L1 | Insight Index | Minimal memory index for fast routing and recall |
| L2 | Global Facts | Stable knowledge accumulated over long-term operation |
| L3 | Task Skills / SOPs | Reusable workflows for completing specific task types |
| L4 | Session Archive | Archived task records distilled from finished sessions for long-horizon recall |
Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop
The entire core loop is just ~100 lines of code (agent_loop.py).
GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.
| Tool | Function |
|---|---|
code_run |
Execute arbitrary code (Python / PowerShell) |
file_read |
Read files |
file_write |
Write / create / overwrite files |
file_patch |
Patch / modify files |
web_scan |
Perceive web content |
web_execute_js |
Control browser behavior |
ask_user |
Human-in-the-loop confirmation |
update_working_checkpoint |
(memory) Short-term working notepad |
start_long_term_update |
(memory) Distill long-term memory |
Capable of dynamically creating new tools.
Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow Diagram
This is what fundamentally distinguishes GenericAgent from every other agent framework.
[New Task]
│
▼
[Autonomous Exploration] ─► install deps · write scripts · debug · verify
│
▼
[Crystallize into Skill] ─► write to memory layer
│
▼
[Direct Recall on Next Similar Task]
| What you say | First time | Every time after |
|---|---|---|
| "Read my WeChat messages" | Install deps → reverse DB → write read script → save Skill | one-line invoke |
| "Give me a morning digest of Hacker News" | Write scraper → build digest → schedule daily run → save Skill | one-line invoke |
| "Monitor stocks and alert me" | Install mootdx → build selection flow → configure cron → save Skill |
one-line start |
| "Send this file via Gmail" | Configure OAuth → write send script → save Skill | ready to use |
After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.
| Feature | GenericAgent | OpenClaw | Claude Code |
|---|---|---|---|
| Codebase | ~3K lines | ~530,000 lines | Open-sourced (large) |
| Deployment | pip install + API Key |
Multi-service orchestration | CLI + subscription |
| Browser Control | Real browse |
$ claude mcp add GenericAgent \
-- python -m otcore.mcp_server <graph>