MCPcopy
hub / github.com/gocolly/colly

github.com/gocolly/colly @v2.3.0 sqlite

repository ↗ · DeepWiki ↗ · release v2.3.0 ↗
361 symbols 1,583 edges 50 files 188 documented · 52%
README

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc Backers on Open Collective Sponsors on Open Collective build status report card view examples Code Coverage FOSSA Status Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example


import (
    "fmt"

    "github.com/gocolly/colly/v2"
)

func main() {
    c := colly.NewCollector()

    // Find and visit all links
    c.OnHTML("a[href]", func(e *colly.HTMLElement) {
        e.Request.Visit(e.Attr("href"))
    })

    c.OnRequest(func(r *colly.Request) {
        fmt.Println("Visiting", r.URL)
    })

    c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

go get github.com/gocolly/colly/v2

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

License

FOSSA Status

Extension points exported contracts — how you extend this code

Debugger (Interface)
Debugger is an interface for different type of debugging backends [2 implementers]
debug/debug.go
CollectorOption (FuncType)
A CollectorOption sets an option on a Collector.
colly.go
Storage (Interface)
Storage is an interface which handles Collector's internal data, like visited urls and cookies. The default Storage of t
storage/storage.go
Storage (Interface)
Storage is the interface of the queue's storage backend Storage must be concurrently safe for multiple goroutines.
queue/queue.go
RequestCallback (FuncType)
RequestCallback is a type alias for OnRequest callback functions
colly.go
ResponseHeadersCallback (FuncType)
ResponseHeadersCallback is a type alias for OnResponseHeaders callback functions
colly.go
ResponseCallback (FuncType)
ResponseCallback is a type alias for OnResponse callback functions
colly.go
HTMLCallback (FuncType)
HTMLCallback is a type alias for OnHTML callback functions
colly.go

Core symbols most depended-on inside this repo

Visit
called by 93
colly.go
Error
called by 86
colly.go
NewCollector
called by 75
colly.go
Close
called by 66
storage/storage.go
OnResponse
called by 46
colly.go
String
called by 42
colly.go
OnHTML
called by 39
colly.go
Get
called by 36
context.go

Shape

Function 166
Method 142
Struct 38
FuncType 11
Interface 3
TypeAlias 1

Languages

Go100%

Modules by API surface

colly.go106 symbols
colly_test.go55 symbols
queue/queue.go23 symbols
storage/storage.go16 symbols
request.go14 symbols
http_backend.go12 symbols
extensions/random_user_agent.go11 symbols
unmarshal.go9 symbols
htmlelement.go9 symbols
context.go9 symbols
xmlelement.go8 symbols
unmarshal_test.go8 symbols

Dependencies from manifests, versioned

github.com/andybalholm/cascadiav1.3.3 · 1×
github.com/antchfx/htmlqueryv1.3.5 · 1×
github.com/antchfx/xmlqueryv1.5.0 · 1×
github.com/antchfx/xpathv1.3.5 · 1×
github.com/bits-and-blooms/bitsetv1.24.4 · 1×
github.com/gobwas/globv0.2.3 · 1×
github.com/golang/groupcachev0.0.0-2024112921072 · 1×
github.com/jawher/mow.cliv1.1.0 · 1×
github.com/kennygrant/sanitizev1.2.4 · 1×

For agents

$ claude mcp add colly \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact