MCPcopy
hub / github.com/cube2222/octosql

github.com/cube2222/octosql @v0.13.0 sqlite

repository ↗ · DeepWiki ↗ · release v0.13.0 ↗
1,936 symbols 4,395 edges 147 files 452 documented · 23%
README

OctoSQL

OctoSQL is predominantly a CLI tool which lets you query a plethora of databases and file formats using SQL through a unified interface, even do JOINs between them. (Ever needed to join a JSON file with a PostgreSQL table? OctoSQL can help you with that.)

At the same time it's an easily extensible full-blown dataflow engine, and you can use it to add a SQL interface to your own applications.

GitHub Go Report Card GoDoc License Latest Version Gitter

Demo

Usage

octosql "SELECT * FROM ./myfile.json"
octosql "SELECT * FROM ./myfile.json" --describe  # Show the schema of the file.
octosql "SELECT invoices.id, address, amount
         FROM invoices.csv JOIN db.customers ON invoices.customer_id = customers.id
         ORDER BY amount DESC"
octosql "SELECT customer_id, SUM(amount)
         FROM invoices.csv
         GROUP BY customer_id"

OctoSQL supports a bunch of file formats out of the box, but you can additionally install plugins to add support for other databases.

octosql "SELECT * FROM plugins.available_plugins"
octosql plugin install postgres
echo "databases:
  - name: mydb
    type: postgres
    config:
      host: localhost
      port: 5443
      database: mydb
      user: postgres
      password: postgres" > octosql.yml
octosql "SELECT * FROM mydb.users" --describe
octosql "SELECT * FROM mydb.users"

You can specify the output format using the --output flag. Available values for it are live_table, batch_table, csv and stream_native.

The documentation about available aggregates and functions is contained within OctoSQL itself. It's in the aggregates, aggregate_signatures, functions and function_signatures tables in the docs database.

octosql "SELECT * FROM docs.functions fs"
+------------------+----------------------------------------+
|     fs.name      |             fs.description             |
+------------------+----------------------------------------+
| 'abs'            | 'Returns absolute value                |
|                  | of argument.'                          |
| 'ceil'           | 'Returns ceiling of                    |
|                  | argument.'                             |
| ...              | ...                                    |
+------------------+----------------------------------------+

Installation

Homebrew

You can install OctoSQL using Homebrew on MacOS or Linux:

brew install cube2222/octosql/octosql

After running it for the first time on MacOS you'll have to go into Preferences -> Security and Privacy -> Allow OctoSQL, as with any app that's not notarized.

Pre-Compiled binary

You can also download the binary for your operating system directly from the Releases page.

Nix Package

The package can be installed in the local nix-profile.

nix-env -iA nixpkgs.octosql

For adhoc or testing purposes a shell with the package can be spawned.

nix-shell -p octosql

For NixOS users it is highly recommended to install the package by adding it to the list of systemPackages.

environment.systemPackages = with pkgs; [
  octosql
  # ...
];

Building from source

With Go in version >= 1.18 the application can be built from source. This can be achieved by cloning the repository and running go install from the project directory.

git clone https://github.com/cube2222/octosql
cd octosql
go install

File Access

Support for multiple file types is included by default in OctoSQL: - JSON (in JSONLines format, one object per line) - CSV - TSV - Parquet - Lines (reading a file line by line)

If your file has a matching extension, you can use its path directly as a table:

~> octosql "SELECT * FROM my/file/path.json"

or, if the extension is not right, you can use this alternative notation, where the extension is used in place of the database name:

~> octosql "SELECT * FROM `json.my/file/path.whatever`"

You can also specify additional options using the following notation: myfile.ext?key=value&key2=value2

The following options are available: - CSV - header: true/false (default: true) - Whether the file has a header row. - JSON - tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file. - Lines - tail: true/false (default: false) - Whether to keep waiting for new content after reaching the end of the file.

Reading from Standard Input

You can also pipe data in through stdin, and OctoSQL will expose it as the stdin.<file_type> table. For example:

~> echo '{"hello": "world"}' | octosql "SELECT * FROM stdin.json"
+---------+
|  hello  |
+---------+
| 'world' |
+---------+
~> seq 100 | octosql "SELECT SUM(int(text)) FROM stdin.lines"
+------+
| sum  |
+------+
| 5050 |
+------+

Plugins

To use databases which are not included in the core of OctoSQL - like PostgreSQL or MySQL - you need to install a plugin. Installing plugins is very easy. The following command installs the latest version of the PostgreSQL plugin:

octosql plugin install postgres

Plugins are grouped into repositories, and potentially have many versions available. The above uses the default core repository and tries to install the latest version. So if 0.42.0 was the latest version, the above would be equivalent to:

octosql plugin install core/postgres@0.42.0

Browsing available and installed plugins is possible through OctoSQL itself, behind a SQL interface. The available tables are: plugins.repositories, plugins.available_plugins, plugins.available_versions, plugins.installed_plugins, plugins.installed_versions.

~> octosql "SELECT name, description FROM plugins.available_plugins LIMIT 2"
+------------------------+-------------------------------+
| available_plugins.name | available_plugins.description |
+------------------------+-------------------------------+
| 'postgres'             | 'Adds support for             |
|                        | querying PostgreSQL           |
|                        | databases.'                   |
| 'random_data'          | 'Generates random data        |
|                        | for testing.'                 |
+------------------------+-------------------------------+
~> octosql "SELECT plugin_name, version FROM plugins.available_versions WHERE plugin_name='random_data'"
+--------------------------------+----------------------------+
| available_versions.plugin_name | available_versions.version |
+--------------------------------+----------------------------+
| 'random_data'                  | '0.1.0'                    |
| 'random_data'                  | '0.1.1'                    |
| 'random_data'                  | '0.2.0'                    |
+--------------------------------+----------------------------+

Some plugins, like the random_data plugin, can be used without any additional configuration:

~> octosql plugin install random_data
Downloading core/random_data@0.2.0...
~> octosql "SELECT * FROM random_data.users" --describe
+---------------------------------+--------------------------+------------+
|              name               |           type           | time_field |
+---------------------------------+--------------------------+------------+
| 'users.avatar'                  | 'String'                 | false      |
| 'users.credit_card'             | '{cc_number: String}'    | false      |
| 'users.date_of_birth'           | 'String'                 | false      |
| 'users.email'                   | 'String'                 | false      |
| 'users.first_name'              | 'String'                 | false      |
| 'users.last_name'               | 'String'                 | false      |
| ...                             | ...                      | ...        |
+---------------------------------+--------------------------+------------+
~> octosql "SELECT first_name, last_name, date_of_birth FROM random_data.users LIMIT 3"
+------------------+-----------------+---------------------+
| users.first_name | users.last_name | users.date_of_birth |
+------------------+-----------------+---------------------+
| 'Alethea'        | 'Kuvalis'       | '1997-01-07'        |
| 'Ambrose'        | 'Spencer'       | '1979-04-18'        |
| 'Antione'        | 'Hodkiewicz'    | '1980-03-04'        |
+------------------+-----------------+---------------------+

Others, like the postgres plugin, require additional configuration. The configuration file is located at ~/.octosql/octosql.yml. You can find the available configuration settings for a plugin in its own documentation.

~> octosql plugin install postgres
Downloading core/postgres@0.1.0...
echo "databases:
  - name: mydb
    type: postgres
    config:
      host: localhost
      port: 5432
      database: postgres
      user: postgres
      password: mypassword" > ~/.octosql/octosql.yml
~> octosql "SELECT * FROM mydb.customers" --describe
+--------------------------+-----------------+------------+
|           name           |      type       | time_field |
+--------------------------+-----------------+------------+
| 'customers.email'        | 'String'        | false      |
| 'customers.first_name'   | 'String'        | false      |
| 'customers.id'           | 'Int'           | false      |
| 'customers.last_name'    | 'String'        | false      |
| 'customers.phone_number' | 'String | NULL' | false      |
+--------------------------+-----------------+------------+
~> octosql "SELECT COUNT(*) FROM mydb.customers"
+-------+
| count |
+-------+
|   183 |
+-------+

In order to create your own plugins, see examples of existing plugins: - MySQL - PostgreSQL - Random Data

To test plugins while developing locally, put the plugin binary into ~/.octosql/plugins/core/octosql-plugin-<plugin name>/0.1.0/octosql-plugin-<plugin name>. That's the location where OctoSQL will be looking for it.

Troubleshooting

OctoSQL writes logs to ~/.octosql/logs.txt, which is the place to look for any errors or issues during execution. Only logs of the most recent execution are kept.

Advanced Features

The Type System

OctoSQL is statically typed. That means that queries are verified, typechecked, and optimized based on the schemas of the tables and types of any values used in the query.

Most of the type system is straight-forward and intuitive, similar to what you'd find in other SQL dialects, even though the types have names which are closer to common programming languages, not SQL databases - in OctoSQL you'll find String's, not varchar's.

However, OctoSQL also supports union types, which means that a value might be one of multiple types. For example, you might have a dataset where a column is usually a Float, but occasionally also a String with the Float inside. Thus, the type of the column would be Float | String.

Moreover, NULL is its own type, which means that a nullable Int column would be represented as Int | NULL in OctoSQL.

There's a few helper features to handle this union types in OctoSQL.

First, whenever a type, i.e. String | Int, is used in a place where a subtype, i.e. Int, is expected, OctoSQL will add a dynamic runtime check which will fail execution only if a String value actually ever reaches that place.

Second, you can use type assertions to get a value only if it's of a certain type and otherwise evaluate to NULL. The syntax for that is value::type. So for example we might have a column age of type String | Int and would like to get its value only if it's an Int. We can write age::Int to express that.

Third, there's a bunch of conversion functions which can help you turn types into other types. For example, the int function is able to turn values of many types, including strings, to integers. You could use it like this: int(age_string).

Fourth, and final, there's the COALESCE operator which accepts an arbitrary number of arguments and returns the first non-null one. It works very well with what's described in the previous two paragraphs. This way, if you have an age column of type String | Int and would like to clean it up, you can write COALESCE(age::int, int(age::string), 0). This would return the value of age as-is if it's an Int, try to parse it if it's a String, and just evaluate to 0 if that fails.

Additionally, you can work with objects and lists using the following syntax: - List access: list[index] - Object field access: object->field

Explaining Query Plans

You can use the --explain flag to get a visual explanation of the query plan. Setting it to 1 gives you a query plan but without type and schema information, setting it to 2 includes those too. For the visualization to work you need to have the graphviz dot command installed.

For example,

Extension points exported contracts — how you extend this code

SQLNode (Interface)
SQLNode defines the interface for all nodes generated by the parser. [90 implementers]
parser/sqlparser/ast.go
TableValuedFunctionArgumentValue (Interface)
(no doc) [6 implementers]
logical/tvf.go
DatasourceImplementation (Interface)
(no doc) [17 implementers]
physical/physical.go
Aggregate (Interface)
(no doc) [11 implementers]
execution/nodes/custom_trigger_group_by.go
FieldNamer (Interface)
FieldNamer can be implemented by expressions with pretty default names based on their content. [1 implementers]
logical/logical.go
Manager (Interface)
(no doc) [1 implementers]
plugins/executor/executor.go
Trigger (Interface)
(no doc)
logical/group_by.go
SchemaOption (FuncType)
(no doc)
physical/nodes.go

Core symbols most depended-on inside this repo

Myprintf
called by 194
parser/sqlparser/tracked_buffer.go
Walk
called by 100
parser/sqlparser/ast.go
Error
called by 62
parser/sqlparser/sql.go
String
called by 62
parser/sqlparser/ast.go
NewColIdent
called by 56
parser/sqlparser/ast.go
NewString
called by 53
octosql/values.go
WriteString
called by 52
parser/sqlparser/dependency/bytes2/buffer.go
Materialize
called by 48
physical/physical.go

Shape

Method 1,040
Function 421
Struct 386
Interface 42
TypeAlias 40
FuncType 7

Languages

Go100%

Modules by API surface

parser/sqlparser/ast.go539 symbols
plugins/internal/plugins/plugins.pb.go186 symbols
parser/sqlparser/dependency/query/query.pb.go139 symbols
plugins/internal/plugins/plugins_grpc.pb.go53 symbols
logical/logical.go51 symbols
execution/expressions.go45 symbols
parser/sqlparser/dependency/sqltypes/value.go36 symbols
physical/nodes.go32 symbols
execution/triggers.go32 symbols
parser/sqlparser/token.go29 symbols
logical/tvf.go24 symbols
parser/sqlparser/sql.go22 symbols

Dependencies from manifests, versioned

github.com/Masterminds/semverv1.5.0 · 1×
github.com/adrg/xdgv0.4.0 · 1×
github.com/andybalholm/brotliv1.0.3 · 1×
github.com/awalterschulze/gographvizv2.0.3+incompatible · 1×
github.com/cespare/xxhashv1.1.0 · 1×
github.com/cockroachdb/apdv1.1.0 · 1×
github.com/davecgh/go-spewv1.1.1 · 1×
github.com/dsnet/compressv0.0.1 · 1×
github.com/frankban/quicktestv1.14.0 · 1×
github.com/fsnotify/fsnotifyv1.4.9 · 1×

For agents

$ claude mcp add octosql \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact