MCPcopy Index your code
hub / github.com/lsdefine/GenericAgent

github.com/lsdefine/GenericAgent @desktop-portable-v0.1.4 sqlite

repository ↗ · DeepWiki ↗ · release desktop-portable-v0.1.4 ↗
2,395 symbols 7,930 edges 76 files 540 documented · 23% 3 cross-repo links
README

GenericAgent Banner

GenericAgent

A Minimal, Self-Evolving Autonomous Agent Framework

~3K lines of seed code · 9 atomic tools · ~100-line Agent Loop

Official Website Technical Report Reproduction Repo Tutorial Sophub

Trendshift

English · 中文

📌 Official: GitHub + https://gaagent.ai only. DintalClaw is the sole authorized commercial partner; others are not affiliated.


🌟 Overview

GenericAgent is a minimal, self-evolving autonomous agent framework. Its core is just ~3K lines of code. Through 9 atomic tools + a ~100-line Agent Loop, it grants any LLM system-level control over a local computer — covering browser, terminal, filesystem, keyboard/mouse input, screen vision, and mobile devices (ADB).

Design philosophy — don't preload skills, evolve them.

Every time GenericAgent solves a new task, it automatically crystallizes the execution path into a reusable Skill. The longer you use it, the more skills accumulate — forming a personal skill tree grown entirely from 3K lines of seed code.

🤖 Self-Bootstrap Proof — Everything in this repository, from installing Git and running git init to every commit message, was completed autonomously by GenericAgent. The author never opened a terminal once.

📑 Table of Contents


📋 Key Features

Feature Description
🧬 Self-Evolving Automatically crystallizes each task into a Skill. Capabilities grow with every use, forming your personal skill tree.
🪶 Minimal Architecture ~3K lines of core code. Agent Loop is ~100 lines. No complex dependencies, zero deployment overhead.
Strong Execution TMWebdriver injects into a real browser (preserving login sessions). 9 atomic tools take direct control of the system.
🔌 High Compatibility Supports Claude / Gemini / Kimi / MiniMax and other major models. Cross-platform.
💰 Token Efficient <30K context window — a fraction of the 200K–1M other agents consume. Less noise, fewer hallucinations, higher success rate, lower cost.

🎯 Demo Showcase

🛡️ Real-Browser CAPTCHA Survival 🌐 Autonomous Web Exploration
Discord hCaptcha passed in real browser Web Exploration
While configuring a Discord bot, an hCaptcha "Are you human?" challenge pops up mid-task — GA's real browser session passes it and the task continues. See Browser Realness. Autonomously browses and periodically summarizes web content.
🧋 Food Delivery Order 📈 Quantitative Stock Screening
Order Tea Stock Selection
"Order me a milk tea" — navigates the delivery app, selects items, completes checkout. "Find GEM stocks with EXPMA golden cross, turnover > 5%" — quantitative screening.
💰 Expense Tracking 💬 Batch Messaging
Alipay Expense WeChat Batch
"Find expenses over ¥2K in the last 3 months" — drives Alipay via ADB. Sends bulk WeChat messages, fully driving the WeChat client.

🚀 Quick Start

⚠️ Python version: use Python 3.11 or 3.12. Do not use Python 3.14 — it is incompatible with pywebview and a few other GA dependencies.

📖 Detailed installation guide: installation.md · installation_zh.md(中文)

For LLM Agents

Fetch the installation guide and follow it:

curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/refs/heads/main/docs/installation.md

For Humans

Method 1 — Clone & install (recommended)

git clone https://github.com/lsdefine/GenericAgent.git && cd GenericAgent
uv venv && uv pip install -e ".[ui]"
cp mykey_template_en.py mykey.py   # fill in your LLM API key

Dependencies are deliberately tiered: the agent core needs only requests, plus four lightweight packages (beautifulsoup4, bottle, simple-websocket-server, aiohttp) for TMWebdriver's local server. The [ui] extra pulls in frontend libraries (Streamlit, prompt_toolkit/rich for the TUI, …) — install it for the bundled UIs, or skip it entirely and drive the agent headless. No Playwright, no LangChain, no browser binaries to download.

Then launch:

python frontends/tui_v3.py   # Terminal UI (recommended)
python launch.pyw            # Streamlit web UI

Method 2 — One-line installer (convenience)

Sets up a self-contained directory with an isolated Python environment, Git, and a ready-to-run package. The script is in assets/ if you'd like to read it first.

Windows PowerShell

powershell -ExecutionPolicy Bypass -c "$env:GLOBAL=1; irm https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.ps1 | iex"

Linux / macOS

GLOBAL=1 bash -c "$(curl -fsSL https://raw.githubusercontent.com/lsdefine/GenericAgent/main/assets/ga_install.sh)"

💡 GenericAgent grows its environment through the Agent itself — don't pre-install everything. See Unlocking Advanced Capabilities below.


💻 Usage

Frontends

Terminal UI (recommended)

A lightweight, scrollback-first terminal interface built on prompt_toolkit + rich. Supports multiple concurrent sessions and real-time streaming.

python frontends/tui_v3.py

⚠️ Windows TUI Troubleshooting

TUI rendering on Windows can be flaky depending on terminal + font. Common causes:

  1. prompt_toolkit / rich are not on the latest version — pip install -U prompt_toolkit rich first.
  2. PowerShell / cmd ship with terminals that have rough Unicode + key-binding support. Prefer Git Bash on Windows, which is much better behaved.
  3. If it still looks broken, ask GA itself to fix it:

    "My experience using frontends/tui_v3.py in PowerShell / cmd / Git Bash on Windows is very poor — lots of incompatibility. Please refer to Claude Code's best practices for the Windows terminal and fix all font and rendering incompatibilities."

Streamlit UI

python launch.pyw

Bot Interface (IM)

GenericAgent also supports IM frontends such as Telegram, Discord, and Lark.

Platform Command
Telegram python frontends/tgapp.py
Discord python frontends/dcapp.py
Lark / Feishu python frontends/fsapp.py

WeChat, QQ, WeCom and DingTalk are also supported — see the Chinese section below. For detailed setup, ask GenericAgent itself.


🔓 Unlocking Advanced Capabilities

In GA, advanced capabilities are unlocked by instructing the agent, not by reading docs or installing extras. Each instruction below makes GA read its pre-installed SOPs (battle-tested playbooks in its memory), install whatever is missing, adapt to your OS, and persist the result into its own memory.

Capability Just tell GA
🌐 Web automation "Set up your web automation capability." — GA guides you through the one manual step: dragging the bundled Chrome extension into chrome://extensions.
🔤 OCR "Set up your OCR capability with rapidocr and save it to memory."
👁️ Vision "Set up your vision capability from the template in memory/." — GA copies the template, wires it to your existing LLM keys, and self-tests.
🖱️ Computer use "Probe this system and set up your computer-use capability."

💡 About language: the pre-installed SOPs are written in Chinese — GA reads them natively, so this never blocks you. If you prefer an English knowledge base, just say: "Read your pre-installed SOPs and rewrite them in English (keep code, paths and error strings verbatim)."

🌍 About platforms: the SOPs were honed on Windows, but cross-platform adaptation is itself a GA task — on macOS/Linux, GA swaps in the platform equivalents (window enumeration, input control, screenshots) on its own. Same self-evolution principle.


🧠 Architecture

GenericAgent accomplishes complex tasks through Layered Memory × Minimal Toolset × Autonomous Execution Loop, continuously accumulating experience during execution.

1️⃣ Layered Memory System

Memory crystallizes throughout task execution, letting the agent build stable, efficient working patterns over time.

Layer Name Description
L0 Meta Rules Core behavioral rules and system constraints
L1 Insight Index Minimal memory index for fast routing and recall
L2 Global Facts Stable knowledge accumulated over long-term operation
L3 Task Skills / SOPs Reusable workflows for completing specific task types
L4 Session Archive Archived task records distilled from finished sessions for long-horizon recall

2️⃣ Autonomous Execution Loop

Perceive environment state → Task reasoning → Execute tools → Write experience to memory → Loop

The entire core loop is just ~100 lines of code (agent_loop.py).

3️⃣ Minimal Toolset

GenericAgent provides only 9 atomic tools, forming the foundational capabilities for interacting with the outside world.

Tool Function
code_run Execute arbitrary code (Python / PowerShell)
file_read Read files
file_write Write / create / overwrite files
file_patch Patch / modify files
web_scan Perceive web content
web_execute_js Control browser behavior
ask_user Human-in-the-loop confirmation
update_working_checkpoint (memory) Short-term working notepad
start_long_term_update (memory) Distill long-term memory

4️⃣ Capability Extension

Capable of dynamically creating new tools.

Via code_run, GenericAgent can dynamically install Python packages, write new scripts, call external APIs, or control hardware at runtime — crystallizing temporary abilities into permanent tools.

GenericAgent Workflow

GenericAgent Workflow Diagram


🧬 Self-Evolution Mechanism

This is what fundamentally distinguishes GenericAgent from every other agent framework.

[New Task]
   │
   ▼
[Autonomous Exploration]   ─►  install deps · write scripts · debug · verify
   │
   ▼
[Crystallize into Skill]   ─►  write to memory layer
   │
   ▼
[Direct Recall on Next Similar Task]
What you say First time Every time after
"Read my WeChat messages" Install deps → reverse DB → write read script → save Skill one-line invoke
"Give me a morning digest of Hacker News" Write scraper → build digest → schedule daily run → save Skill one-line invoke
"Monitor stocks and alert me" Install mootdx → build selection flow → configure cron → save Skill one-line start
"Send this file via Gmail" Configure OAuth → write send script → save Skill ready to use

After a few weeks, your agent instance will have a skill tree no one else in the world has — all grown from 3K lines of seed code.


📊 Comparison

Feature GenericAgent OpenClaw Claude Code
Codebase ~3K lines ~530,000 lines Open-sourced (large)
Deployment pip install + API Key Multi-service orchestration CLI + subscription
Browser Control Real browse

Core symbols most depended-on inside this repo

get
called by 1341
frontends/conductor.py
t
called by 197
frontends/desktop/static/app.js
write
called by 76
frontends/genericagent_acp_bridge.py
_system
called by 76
frontends/tuiapp_v2.py
commit
called by 72
frontends/tui_v3.py
json_ok
called by 64
frontends/desktop_bridge.py
start
called by 61
frontends/qqapp.py
set
called by 56
frontends/desktop/static/app.js

Shape

Function 1,339
Method 924
Class 104
Route 28

Languages

Python84%
TypeScript16%

Modules by API surface

frontends/desktop/static/app.js365 symbols
frontends/tuiapp_v2.py347 symbols
frontends/tui_v3.py242 symbols
frontends/desktop_bridge.py129 symbols
frontends/qtapp.py126 symbols
frontends/tgapp.py92 symbols
llmcore.py81 symbols
frontends/conductor.py64 symbols
frontends/fsapp.py62 symbols
frontends/continue_cmd.py59 symbols
frontends/tuiapp.py53 symbols
ga.py41 symbols

Dependencies from manifests, versioned

@tauri-apps/cli2 · 1×
aiohttp3.9 · 1×
beautifulsoup44.12 · 1×
bottle0.12 · 1×
requests2.28 · 1×
simple-websocket-server0.4 · 1×

For agents

$ claude mcp add GenericAgent \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact