MCPcopy
hub / github.com/huichen/sego

github.com/huichen/sego @main sqlite

repository ↗ · DeepWiki ↗
108 symbols 255 edges 13 files 32 documented · 30%
README

sego

Go中文分词

词典用双数组trie(Double-Array Trie)实现, 分词器算法为基于词频的最短路径加动态规划。

支持普通和搜索引擎两种分词模式,支持用户词典、词性标注,可运行JSON RPC服务

分词速度单线程9MB/s,goroutines并发42MB/s(8核Macbook Pro)。

安装/更新

go get -u github.com/huichen/sego

使用

package main

import (
    "fmt"
    "github.com/huichen/sego"
)

func main() {
    // 载入词典
    var segmenter sego.Segmenter
    segmenter.LoadDictionary("github.com/huichen/sego/data/dictionary.txt")

    // 分词
    text := []byte("中华人民共和国中央人民政府")
    segments := segmenter.Segment(text)

    // 处理分词结果
    // 支持普通模式和搜索模式两种分词,见代码中SegmentsToString函数的注释。
    fmt.Println(sego.SegmentsToString(segments, false)) 
}

Core symbols most depended-on inside this repo

expect
called by 26
test_utils.go
splitTextToWords
called by 11
segmenter.go
bytesToString
called by 9
test_utils.go
SegmentsToString
called by 8
utils.go
LoadDictionary
called by 7
segmenter.go
Segment
called by 7
segmenter.go
textSliceToString
called by 6
utils.go
internalSegment
called by 6
segmenter.go

Shape

Function 79
Method 21
Struct 7
TypeAlias 1

Languages

Go61%
TypeScript39%

Modules by API surface

server/static/jquery.min.js42 symbols
segmenter.go14 symbols
utils_test.go10 symbols
utils.go8 symbols
dictionary.go8 symbols
token.go7 symbols
test_utils.go4 symbols
server/server.go4 symbols
segment.go4 symbols
segmenter_test.go3 symbols
tools/goroutines.go2 symbols
tools/example.go1 symbols

Dependencies from manifests, versioned

github.com/adamzy/cedar-gov0.0.0-2017080503471 · 1×
github.com/adamzy/segov0.0.0-2015100418492 · 1×
github.com/issue9/assertv1.4.1 · 1×

For agents

$ claude mcp add sego \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact