MCPcopy Index your code
hub / github.com/Boris-code/feapder

github.com/Boris-code/feapder @v1.9.3 sqlite

repository ↗ · DeepWiki ↗ · release v1.9.3 ↗
1,254 symbols 4,150 edges 171 files 453 documented · 36%
README

FEAPDER

Downloads Downloads Downloads

简介

  1. feapder是一款上手简单,功能强大的Python爬虫框架,内置AirSpider、Spider、TaskSpider、BatchSpider四种爬虫解决不同场景的需求。
  2. 支持断点续爬、监控报警、浏览器渲染、海量数据去重等功能。
  3. 更有功能强大的爬虫管理系统feaplat为其提供方便的部署及调度

读音: [ˈfiːpdə]

feapder

文档地址

  • 官方文档:https://feapder.com
  • github:https://github.com/Boris-code/feapder
  • 更新日志:https://github.com/Boris-code/feapder/releases
  • 爬虫管理系统:http://feapder.com/#/feapder_platform/feaplat

环境要求:

  • Python 3.6.0+
  • Works on Linux, Windows, macOS

安装

From PyPi:

精简版

pip install feapder

浏览器渲染版:

pip install "feapder[render]"

完整版:

pip install "feapder[all]"

三个版本区别:

  1. 精简版:不支持浏览器渲染、不支持基于内存去重、不支持入库mongo
  2. 浏览器渲染版:不支持基于内存去重、不支持入库mongo
  3. 完整版:支持所有功能

完整版可能会安装出错,若安装出错,请参考安装问题

小试一下

创建爬虫

feapder create -s first_spider

创建后的爬虫代码如下:

import feapder


class FirstSpider(feapder.AirSpider):
    def start_requests(self):
        yield feapder.Request("https://www.baidu.com")

    def parse(self, request, response):
        print(response)


if __name__ == "__main__":
    FirstSpider().start()

直接运行,打印如下:

Thread-2|2021-02-09 14:55:11,373|request.py|get_response|line:283|DEBUG|
                -------------- FirstSpider.parse request for ----------------
                url  = https://www.baidu.com
                method = GET
                body = {'timeout': 22, 'stream': True, 'verify': False, 'headers': {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.93 Safari/537.36'}}

<Response [200]>
Thread-2|2021-02-09 14:55:11,610|parser_control.py|run|line:415|DEBUG| parser 等待任务...
FirstSpider|2021-02-09 14:55:14,620|air_spider.py|run|line:80|INFO| 无任务,爬虫结束

代码解释如下:

  1. start_requests: 生产任务
  2. parse: 解析数据

参与贡献

贡献之前请先阅读 贡献指南

感谢所有做过贡献的人!

爬虫工具推荐

  1. 爬虫在线工具库:http://www.spidertools.cn
  2. 爬虫管理系统:http://feapder.com/#/feapder_platform/feaplat
  3. 验证码识别库:https://github.com/sml2h3/ddddocr

微信赞赏

如果您觉得这个项目帮助到了您,您可以帮作者买一杯咖啡表示鼓励 🍹

也可和作者交个朋友,解决您在使用过程中遇到的问题

赞赏码

学习交流

知识星球:17321694 作者微信: boris_tm QQ群号:521494615

加好友备注:feapder

Core symbols most depended-on inside this repo

info
called by 88
feapder/utils/log.py
join
called by 84
feapder/core/scheduler.py
add_argument
called by 77
feapder/utils/custom_argparse.py
error
called by 68
feapder/utils/log.py
get
called by 64
feapder/network/proxy_pool_old.py
get
called by 58
feapder/dedup/__init__.py
start
called by 50
feapder/utils/tail_thread.py
exception
called by 45
feapder/utils/log.py

Shape

Method 803
Function 319
Class 132

Languages

Python92%
TypeScript8%

Modules by API surface

feapder/utils/tools.py158 symbols
docs/lib/docsify/lib/docsify.min.js61 symbols
feapder/db/redisdb.py56 symbols
feapder/network/user_pool/base_user_pool.py33 symbols
feapder/network/proxy_pool_old.py30 symbols
feapder/core/spiders/batch_spider.py30 symbols
feapder/network/response.py29 symbols
feapder/network/request.py28 symbols
feapder/utils/metrics.py25 symbols
tests/test_csv_pipeline/test_functionality.py24 symbols
feapder/core/spiders/task_spider.py23 symbols
feapder/core/scheduler.py23 symbols

Dependencies from manifests, versioned

DBUtils2.0 · 1×
PyExecJS1.5.1 · 1×
PyMySQL0.9.3 · 1×
better-exceptions0.2.2 · 1×
bitarray1.5.3 · 1×
bs40.0.1 · 1×
cryptography3.3.2 · 1×
influxdb5.3.1 · 1×
ipython7.14.0 · 1×
loguru0.5.3 · 1×
parsel1.5.2 · 1×
pymongo3.10.1 · 1×

Datastores touched

(mongodb)Database · 1 repos
(mysql)Database · 1 repos
dbDatabase · 1 repos
dbDatabase · 1 repos
feapderDatabase · 1 repos
feapderDatabase · 1 repos

For agents

$ claude mcp add feapder \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact