hub / github.com/lsdefine/GenericAgent

github.com/lsdefine/GenericAgent @desktop-portable-v0.1.4 sqlite

repository ↗ · DeepWiki ↗ · release desktop-portable-v0.1.4 ↗

2,395 symbols 7,930 edges 76 files 540 documented · 23% 3 cross-repo links

README

GenericAgent Banner

GenericAgent

A Minimal, Self-Evolving Autonomous Agent Framework

~3K lines of seed code · 9 atomic tools · ~100-line Agent Loop

English · 中文

📌 Official: GitHub + https://gaagent.ai only. DintalClaw is the sole authorized commercial partner; others are not affiliated.

🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Design philosophy — don't preload skills, evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into a reusable Skill. The longer you use it, the more skills accumulate — forming a personal skill tree grown entirely from 3K lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📋 Key Features

Feature	Description
🧬 Self-Evolving	Automatically crystallizes each task into a Skill. Capabilities grow with every use, forming your personal skill tree.
🪶 Minimal Architecture	~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
⚡ Strong Execution	TMWebdriver injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
🔌 High Compatibility	Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
💰 Token Efficient	<30K context window — a fraction of the 200K–1M other agents consume. Less noise, fewer hallucinations, higher success rate, lower cost.

🎯 Demo Showcase

🛡️ Real-Browser CAPTCHA Survival	🌐 Autonomous Web Exploration

_{While configuring a Discord bot, an hCaptcha "Are you human?" challenge pops up mid-task — GA's real browser session passes it and the task continues. See Browser Realness.}	_{Autonomously browses and periodically summarizes web content.}
🧋 Food Delivery Order	📈 Quantitative Stock Screening

_{"Order me a milk tea" — navigates the delivery app, selects items, completes checkout.}	_{"Find GEM stocks with EXPMA golden cross, turnover > 5%" — quantitative screening.}
💰 Expense Tracking	💬 Batch Messaging

_{"Find expenses over ¥2K in the last 3 months" — drives Alipay via ADB.}	_{Sends bulk WeChat messages, fully driving the WeChat client.}

🚀 Quick Start

⚠️ Python version: use Python 3.11 or 3.12. Do not use Python 3.14 — it is incompatible with pywebview and a few other GA dependencies.

📖 Detailed installation guide: installation.md · installation_zh.md（中文）

For LLM Agents

Fetch the installation guide and follow it:

curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation.md

For Humans

Method 1 — Clone & install (recommended)

git clone https://github.com/lsdefine/GenericAgent.git && cd GenericAgent
uv venv && uv pip install -e ".[ui]"
cp mykey_template_en.py mykey.py   # fill in your LLM API key

Dependencies are deliberately tiered: the agent core needs only requests, plus four lightweight packages (beautifulsoup4, bottle, simple-websocket-server, aiohttp) for TMWebdriver's local server. The [ui] extra pulls in frontend libraries (Streamlit, prompt_toolkit/rich for the TUI, …) — install it for the bundled UIs, or skip it entirely and drive the agent headless. No Playwright, no LangChain, no browser binaries to download.

Then launch:

python frontends/tui_v3.py   # Terminal UI (recommended)
python launch.pyw            # Streamlit web UI

Method 2 — One-line installer (convenience)

Sets up a self-contained directory with an isolated Python environment, Git, and a ready-to-run package. The script is in assets/ if you'd like to read it first.

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.ps1 | iex"

Linux / macOS

GLOBAL=1 bash -c "$(curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.sh)"

💡 GenericAgent grows its environment through the Agent itself — don't pre-install everything. See Unlocking Advanced Capabilities below.

💻 Usage

Frontends

Terminal UI (recommended)

A lightweight, scrollback-first terminal interface built on prompt_toolkit + rich. Supports multiple concurrent sessions and real-time streaming.

python frontends/tui_v3.py

⚠️ Windows TUI Troubleshooting

TUI rendering on Windows can be flaky depending on terminal + font. Common causes:

prompt_toolkit / rich are not on the latest version — pip install -U prompt_toolkit rich first.
PowerShell / cmd ship with terminals that have rough Unicode + key-binding support. Prefer Git Bash on Windows, which is much better behaved.
If it still looks broken, ask GA itself to fix it:

"My experience using frontends/tui_v3.py in PowerShell / cmd / Git Bash on Windows is very poor — lots of incompatibility. Please refer to Claude Code's best practices for the Windows terminal and fix all font and rendering incompatibilities."

Streamlit UI

python launch.pyw

Bot Interface (IM)

GenericAgent also supports IM frontends such as Telegram, Discord, and Lark.

Platform	Command
Telegram	`python frontends/tgapp.py`
Discord	`python frontends/dcapp.py`
Lark / Feishu	`python frontends/fsapp.py`

WeChat, QQ, WeCom and DingTalk are also supported — see the Chinese section below. For detailed setup, ask GenericAgent itself.

🔓 Unlocking Advanced Capabilities

In GA, advanced capabilities are unlocked by instructing the agent, not by reading docs or installing extras. Each instruction below makes GA read its pre-installed SOPs (battle-tested playbooks in its memory), install whatever is missing, adapt to your OS, and persist the result into its own memory.

Capability	Just tell GA
🌐 Web automation	"Set up your web automation capability." — GA guides you through the one manual step: dragging the bundled Chrome extension into `chrome://extensions`.
🔤 OCR	"Set up your OCR capability with rapidocr and save it to memory."
👁️ Vision	"Set up your vision capability from the template in memory/." — GA copies the template, wires it to your existing LLM keys, and self-tests.
🖱️ Computer use	"Probe this system and set up your computer-use capability."

💡 About language: the pre-installed SOPs are written in Chinese — GA reads them natively, so this never blocks you. If you prefer an English knowledge base, just say: "Read your pre-installed SOPs and rewrite them in English (keep code, paths and error strings verbatim)."

🌍 About platforms: the SOPs were honed on Windows, but cross-platform adaptation is itself a GA task — on macOS/Linux, GA swaps in the platform equivalents (window enumeration, input control, screenshots) on its own. Same self-evolution principle.

🧠 Architecture

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

Layer	Name	Description
L0	Meta Rules	Core behavioral rules and system constraints
L1	Insight Index	Minimal memory index for fast routing and recall
L2	Global Facts	Stable knowledge accumulated over long-term operation
L3	Task Skills / SOPs	Reusable workflows for completing specific task types
L4	Session Archive	Archived task records distilled from finished sessions for long-horizon recall

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just ~100 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.

Tool	Function
`code_run`	Execute arbitrary code (Python / PowerShell)
`file_read`	Read files
`file_write`	Write / create / overwrite files
`file_patch`	Patch / modify files
`web_scan`	Perceive web content
`web_execute_js`	Control browser behavior
`ask_user`	Human-in-the-loop confirmation
`update_working_checkpoint`	(memory) Short-term working notepad
`start_long_term_update`	(memory) Distill long-term memory

4️⃣ Capability Extension

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow

GenericAgent Workflow Diagram

🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task]
   │
   ▼
[Autonomous Exploration]   ─►  install deps · write scripts · debug · verify
   │
   ▼
[Crystallize into Skill]   ─►  write to memory layer
   │
   ▼
[Direct Recall on Next Similar Task]

What you say	First time	Every time after
"Read my WeChat messages"	Install deps → reverse DB → write read script → save Skill	one-line invoke
"Give me a morning digest of Hacker News"	Write scraper → build digest → schedule daily run → save Skill	one-line invoke
"Monitor stocks and alert me"	Install `mootdx` → build selection flow → configure cron → save Skill	one-line start
"Send this file via Gmail"	Configure OAuth → write send script → save Skill	ready to use

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.

📊 Comparison

Feature	GenericAgent	OpenClaw	Claude Code
Codebase	~3K lines	~530,000 lines	Open-sourced (large)
Deployment	`pip install` + API Key	Multi-service orchestration	CLI + subscription
Browser Control	Real browse

Core symbols most depended-on inside this repo

get

called by 1341

frontends/conductor.py

called by 197

frontends/desktop/static/app.js

write

called by 76

frontends/genericagent_acp_bridge.py

_system

called by 76

frontends/tuiapp_v2.py

frontends/desktop_bridge.py

frontends/desktop/static/app.js

Shape

Function 1,339

Method 924

Class 104

Route 28

Languages

Python84%

TypeScript16%

Modules by API surface

frontends/desktop/static/app.js365 symbols

frontends/tuiapp_v2.py347 symbols

frontends/tui_v3.py242 symbols

frontends/desktop_bridge.py129 symbols

frontends/qtapp.py126 symbols

frontends/tgapp.py92 symbols

llmcore.py81 symbols

frontends/conductor.py64 symbols

frontends/fsapp.py62 symbols

frontends/continue_cmd.py59 symbols

frontends/tuiapp.py53 symbols

ga.py41 symbols

Dependencies from manifests, versioned

@tauri-apps/cli2 · 1×

aiohttp3.9 · 1×

beautifulsoup44.12 · 1×

bottle0.12 · 1×

requests2.28 · 1×

simple-websocket-server0.4 · 1×

For agents

$ claude mcp add GenericAgent \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact