hub / github.com/rchipka/node-osmosis

github.com/rchipka/node-osmosis @1.1.2 sqlite

repository ↗ · DeepWiki ↗ · release 1.1.2 ↗

84 symbols 125 edges 65 files 19 documented · 23%

README

Osmosis

HTML/XML parser and web scraper for NodeJS.

Downloads

Features

Uses native libxml C bindings
Clean promise-like interface
Supports CSS 3.0 and XPath 1.0 selector hybrids
Sizzle selectors, Slick selectors, and more
No large dependencies like jQuery, cheerio, or jsdom
Compose deep and complex data structures
HTML parser features
- Fast parsing
- Very fast searching
- Small memory footprint
HTML DOM features
- Load and search ajax content
- DOM interaction and events
- Execute embedded and remote scripts
- Execute code in the DOM
HTTP request features
- Logs urls, redirects, and errors
- Cookie jar and custom cookies/headers/user agent
- Login/form submission, session cookies, and basic auth
- Single proxy or multiple proxies and handles proxy failure
- Retries and redirect limits

Example

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
})
.data(function(listing) {
    // do something with listing data
})
.log(console.log)
.error(console.log)
.debug(console.log)

Documentation

For documentation and examples check out https://rchipka.github.com/node-osmosis/

Dependencies

libxmljs-dom - DOM wrapper for libxmljs C bindings
needle - Lightweight HTTP wrapper

Donate

Please consider a donation if you depend on web scraping and Osmosis makes your job a bit easier. Your contribution allows me to spend more time making this the best web scraper for Node.

Donation offers:

$25 - A custom Osmosis scraper to extract the data you need efficiently and in as few lines of code as possible.
$25/month - Become a sponsor. Your company will be listed on this page. Priority support and bug fixes.

Core symbols most depended-on inside this repo

lib/commands/learn.js

lib/commands/paginate.js

Shape

Function 84

Languages

TypeScript100%

Modules by API surface

lib/commands/set.js12 symbols

lib/commands/learn.js8 symbols

lib/Command.js7 symbols

lib/commands/then.js5 symbols

lib/commands/paginate.js5 symbols

lib/Request.js5 symbols

benchmark/index.js5 symbols

lib/commands/get.js4 symbols

index.js3 symbols

lib/commands/match.js2 symbols

lib/commands/do.js2 symbols

lib/commands/contains.js2 symbols

Dependencies from manifests, versioned

jscs>=3.0.2 · 1×

libxmljs-dom0.0.8 · 1×

needle1.3.0 · 1×

nodeunit0.9.0 · 1×

For agents

$ claude mcp add node-osmosis \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact