Sunspot is a Ruby library for expressive, powerful interaction with the Solr search engine. Sunspot is built on top of the RSolr library, which provides a low-level interface for Solr interaction; Sunspot provides a simple, intuitive, expressive DSL backed by powerful features for indexing objects and searching for them.
Sunspot is designed to be easily plugged in to any ORM, or even non-database-backed objects such as the filesystem.
This README provides a high level overview; class-by-class and method-by-method documentation is available in the API reference.
For questions about how to use Sunspot in your app, please use the Sunspot Mailing List or search Stack Overflow.
Add to Gemfile:
gem 'sunspot_rails'
gem 'sunspot_solr' # optional pre-packaged Solr distribution for use in development. Not for use in production.
Bundle it!
bundle install
Generate a default configuration file:
rails generate sunspot_rails:install
If sunspot_solr was installed, start the packaged Solr distribution
with:
bundle exec rake sunspot:solr:start # or sunspot:solr:run to start in foreground
This will generate a /solr folder with default configuration files and indexes.
If you're using source control, it's recommended that the files generated for indexing and running (PIDs) are not checked in. You can do this by adding the following lines to .gitignore:
solr/data
solr/test/data
solr/development/data
solr/default/data
solr/pids
Add a searchable block to the objects you wish to index.
class Post < ActiveRecord::Base
searchable do
text :title, :body
text :comments do
comments.map { |comment| comment.body }
end
boolean :featured
integer :blog_id
integer :author_id
integer :category_ids, :multiple => true
double :average_rating
time :published_at
time :expired_at
string :sort_title do
title.downcase.gsub(/^(an?|the)/, '')
end
end
end
text fields will be full-text searchable. Other fields (e.g.,
integer and string) can be used to scope queries.
Post.search do
fulltext 'best pizza'
with :blog_id, 1
with(:published_at).less_than Time.now
field_list :blog_id, :title
order_by :published_at, :desc
paginate :page => 2, :per_page => 15
facet :category_ids, :author_id
end
Given an object Post setup in earlier steps ...
# All posts with a `text` field (:title, :body, or :comments) containing 'pizza'
Post.search { fulltext 'pizza' }
# Posts with pizza, scored higher if pizza appears in the title
Post.search do
fulltext 'pizza' do
boost_fields :title => 2.0
end
end
# Posts with pizza, scored higher if featured
Post.search do
fulltext 'pizza' do
boost(2.0) { with(:featured, true) }
end
end
# Posts with pizza *only* in the title
Post.search do
fulltext 'pizza' do
fields(:title)
end
end
# Posts with pizza in the title (boosted) or in the body (not boosted)
Post.search do
fulltext 'pizza' do
fields(:body, :title => 2.0)
end
end
Solr allows searching for phrases: search terms that are close together.
In the default query parser used by Sunspot (edismax), phrase searches are represented as a double quoted group of words.
# Posts with the exact phrase "great pizza"
Post.search do
fulltext '"great pizza"'
end
If specified, query_phrase_slop sets the number of words that may appear between the words in a phrase.
# One word can appear between the words in the phrase, so "great big pizza"
# also matches, in addition to "great pizza"
Post.search do
fulltext '"great pizza"' do
query_phrase_slop 1
end
end
Phrase boosts add boost to terms that appear in close proximity; the terms do not have to appear in a phrase, but if they do, the document will score more highly.
# Matches documents with great and pizza, and scores documents more
# highly if the terms appear in a phrase in the title field
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
end
end
# Matches documents with great and pizza, and scores documents more
# highly if the terms appear in a phrase (or with one word between them)
# in the title field
Post.search do
fulltext 'great pizza' do
phrase_fields :title => 2.0
phrase_slop 1
end
end
Fields not defined as text (e.g., integer, boolean, time,
etc...) can be used to scope (restrict) queries before full-text
matching is performed.
# Posts with a blog_id of 1
Post.search do
with(:blog_id, 1)
end
# Posts with an average rating between 3.0 and 5.0
Post.search do
with(:average_rating, 3.0..5.0)
end
# Posts with a category of 1, 3, or 5
Post.search do
with(:category_ids, [1, 3, 5])
end
# Posts published since a week ago
Post.search do
with(:published_at).greater_than(1.week.ago)
end
# Posts not in category 1 or 3
Post.search do
without(:category_ids, [1, 3])
end
# All examples in "positive" also work negated using `without`
# Passing an empty array is equivalent to a no-op, allowing you to replace this...
Post.search do
with(:category_ids, id_list) if id_list.present?
end
# ...with this
Post.search do
with(:category_ids, id_list)
end
# Posts with a blog_id of 1
Post.search do
with(:blog_id, 1)
field_list [:title]
end
Post.search do
without(:category_ids, [1, 3])
field_list [:title, :author_id]
end
# Posts that do not have an expired time or have not yet expired
Post.search do
any_of do
with(:expired_at).greater_than(Time.now)
with(:expired_at, nil)
end
end
# Posts with blog_id 1 and author_id 2
Post.search do
all_of do
with(:blog_id, 1)
with(:author_id, 2)
end
end
# Posts scoring with any of the two fields.
Post.search do
any do
fulltext "keyword1", :fields => :title
fulltext "keyword2", :fields => :body
end
end
Disjunctions and conjunctions may be nested
Post.search do
any_of do
with(:blog_id, 1)
all_of do
with(:blog_id, 2)
with(:category_ids, 3)
end
end
any do
all do
fulltext "keyword", :fields => :title
fulltext "keyword", :fields => :body
end
all do
fulltext "keyword", :fields => :first_name
fulltext "keyword", :fields => :last_name
end
fulltext "keyword", :fields => :description
end
end
Scopes/restrictions can be combined with full-text searching. The scope/restriction pares down the objects that are searched for the full-text term.
# Posts with blog_id 1 and 'pizza' in the title
Post.search do
with(:blog_id, 1)
fulltext("pizza")
end
All results from Solr are paginated
The results array that is returned has methods mixed in that allow it to operate seamlessly with common pagination libraries like will_paginate and kaminari.
By default, Sunspot requests the first 30 results from Solr.
search = Post.search do
fulltext "pizza"
end
# Imagine there are 60 *total* results (at 30 results/page, that is two pages)
results = search.results # => Array with 30 Post elements
search.total # => 60
results.total_pages # => 2
results.first_page? # => true
results.last_page? # => false
results.previous_page # => nil
results.next_page # => 2
results.out_of_bounds? # => false
results.offset # => 0
To retrieve the next page of results, recreate the search and use the
paginate method.
search = Post.search do
fulltext "pizza"
paginate :page => 2
end
# Again, imagine there are 60 total results; this is the second page
results = search.results # => Array with 30 Post elements
search.total # => 60
results.total_pages # => 2
results.first_page? # => false
results.last_page? # => true
results.previous_page # => 1
results.next_page # => nil
results.out_of_bounds? # => false
results.offset # => 30
A custom number of results per page can be specified with the
:per_page option to paginate:
search = Post.search do
fulltext "pizza"
paginate :page => 1, :per_page => 50
end
Solr 4.7 and above
With default Solr pagination it may turn that same records appear on different pages (e.g. if many records have the same search score). Cursor-based pagination allows to avoid this.
Useful for any kinds of export, infinite scroll, etc.
Cursor for the first page is "*".
search = Post.search do
fulltext "pizza"
paginate :cursor => "*"
end
results = search.results
# Results will contain cursor for the next page
results.next_page_cursor # => "AoIIP4AAACxQcm9maWxlIDEwMTk="
# Imagine there are 60 *total* results (at 30 results/page, that is two pages)
results.current_cursor # => "*"
results.total_pages # => 2
results.first_page? # => true
results.last_page? # => false
To retrieve the next page of results, recreate the search and use the paginate method with cursor from previous results.
search = Post.search do
fulltext "pizza"
paginate :cursor => "AoIIP4AAACxQcm9maWxlIDEwMTk="
end
results = search.results
# Again, imagine there are 60 total results; this is the second page
results.next_page_cursor # => "AoEsUHJvZmlsZSAxNzY5"
results.current_cursor # => "AoIIP4AAACxQcm9maWxlIDEwMTk="
results.total_pages # => 2
results.first_page? # => false
# Last page will be detected only when current page contains less then per_page elements or contains nothing
results.last_page? # => false
:per_page option is also supported.
Faceting is a feature of Solr that determines the number of documents that match a given search and an additional criterion. This allows you to build powerful drill-down interfaces for search.
Each facet returns zero or more rows, each of which represents a particular criterion conjoined with the actual query being performed. For field facets, each row represents a particular value for a given field. For query facets, each row represents an arbitrary scope; the facet itself is just a means of logically grouping the scopes.
By default Sunspot will only return the first 100 facet values. You can increase this limit, or force it to return all facets by setting limit to -1.
# Posts that match 'pizza' returning counts for each :author_id
search = Post.search do
fulltext "pizza"
facet :author_id
end
search.facet(:author_id).rows.each do |facet|
puts "Author #{facet.value} has #{facet.count} pizza posts!"
end
If you are searching by a specific field and you still want to see all the options available in that field you can exclude it in the faceting.
# Posts that match 'pizza' and author with id 42
# Returning counts for each :author_id (even those not in the search result)
search = Post.search do
fulltext "pizza"
author_filter = with(:author_id, 42)
facet :author_id, exclude: [author_filter]
end
search.facet(:author_id).rows.each do |facet|
puts "Author #{facet.value} has #{facet.count} pizza posts!"
end
# Posts faceted by ranges of average ratings
search = Post.search do
facet(:average_rating) do
row(1.0..2.0) do
with(:average_rating, 1.0..2.0)
end
row(2.0..3.0) do
with(:average_rating, 2.0..3.0)
end
row(3.0..4.0) do
with(:average_rating, 3.0..4.0)
end
row(4.0..5.0) do
with(:average_rating, 4.0..5.0)
end
end
end
# e.g.,
# Number of posts with rating within 1.0..2.0: 2
# Number of posts with rating within 2.0..3.0: 1
search.facet(:average_rating).rows.each do |facet|
puts "Number of posts with rating within #{facet.value}: #{facet.count}"
end
# Posts faceted by range of average ratings
Sunspot.search(Post) do
facet :average_rating, :range => 1..5, :range_interval => 1
end
The json facet can be used with the following syntax:
Sunspot.search(Post) do
json_facet(:title)
end
There are some options you can pass to the json facet:
:limit
:minimum_count
:sort
:prefix
:missing
:all_buckets
:method
Some examples
# limit the results to 10
Sunspot.search(Post) do
json_facet(:title, limit: 10)
end
# returns only the results with a minimum count of 10
Sunspot.search(Post) do
json_facet(:title, minimum_count: 10)
end
# sort by count
Sunspot.search(Post) do
json_facet(:title, sort: :count)
end
# filter titles by prefix 't'
Sunspot.search(Post) do
json_facet(:title, prefix: 't')
end
# compute the total number of records in all buckets
# accessible via search.other_count('allBuckets')
search = Sunspot.search(Post) do
json_facet(:title, all_buckets: true)
end
# compute the total number of records that do not have a title value
# accessible via search.other_count('missing')
search = Sunspot.search(Post) do
json_facet(:title, missing: true)
end
# force usage of the dv faceting algorithm
search = Sunspot.search(Post) do
json_facet(:title, method: 'dv')
end
Range facets are supported on numeric, date, or time fiel
$ claude mcp add sunspot \
-- python -m otcore.mcp_server <graph>