Tools for moving and saving indices from Elasticsearch and OpenSearch

1.0.0 of Elasticdump changes the format of the files created by the dump. Files created with version 0.x.x of this tool are likely not to work with versions going forward. To learn more about the breaking changes, vist the release notes for version 1.0.0. If you recive an "out of memory" error, this is probably or most likely the cause.2.0.0 of Elasticdump removes the bulk options. These options were buggy, and differ between versions of Elasticsearch. If you need to export multiple indexes, look for the multielasticdump section of the tool.2.1.0 of Elasticdump moves from using scan/scroll (ES 1.x) to just scroll (ES 2.x). This is a backwards-compatible change within Elasticsearch, but performance may suffer on Elasticsearch versions prior to 2.x.3.0.0 of Elasticdump has the default queries updated to only work for ElasticSearch version 5+. The tool may be compatible with earlier versions of Elasticsearch, but our version detection method may not work for all ES cluster topologies5.0.0 of Elasticdump contains a breaking change for the s3 transport. s3Bucket and s3RecordKey params are no longer supported please use s3urls instead6.1.0 and higher of Elasticdump contains a change to the upload/dump process. This change allows for overlapping promise processing. The benefit of which is improved performance due increased parallel processing, but a side-effect exists where-by records (data-set) aren't processed in a sequential order (the ordering is no longer guaranteed)6.67.0 and higher of Elasticdump will quit if the node.js version does not match the minimum requirement needed (v10.0.0)6.76.0 and higher of Elasticdump added support for OpenSearch (forked from Elasticsearch 7.10.2)(local)
npm install elasticdump
./bin/elasticdump
(global)
npm install elasticdump -g
elasticdump
Elasticdump works by sending an input to an output. Both can be either an elasticsearch URL or a File.
Elasticsearch/OpenSearch:
- format: {protocol}://{host}:{port}/{index}
- example: http://127.0.0.1:9200/my_index
File:
- format: {FilePath}
- example: /Users/evantahler/Desktop/dump.json
Stdio:
- format: stdin / stdout
- format: $
You can then do things like:
# Copy an index from production to staging with analyzer and mapping:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=analyzer
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data
# Backup index data to a file:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index_mapping.json \
--type=mapping
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
--type=data
# Backup and index to a gzip using stdout:
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=$ \
| gzip > /data/my_index.json.gz
# Backup the results of a query to a file
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}"
# Specify searchBody from a file
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=query.json \
--searchBody=@/data/searchbody.json
# Copy a single shard data:
elasticdump \
--input=http://es.com:9200/api \
--output=http://es.com:9200/api2 \
--input-params="{\"preference\":\"_shards:0\"}"
# Backup aliases to a file
elasticdump \
--input=http://es.com:9200/index-name/alias-filter \
--output=alias.json \
--type=alias
# Import aliases into ES
elasticdump \
--input=./alias.json \
--output=http://es.com:9200 \
--type=alias
# Backup templates to a file
elasticdump \
--input=http://es.com:9200/template-filter \
--output=templates.json \
--type=template
# Import templates into ES
elasticdump \
--input=./templates.json \
--output=http://es.com:9200 \
--type=template
# Split files into multiple parts
elasticdump \
--input=http://production.es.com:9200/my_index \
--output=/data/my_index.json \
--fileSize=10mb
# Import data from S3 into ES (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input "s3://${bucket_name}/${file_name}.json" \
--output=http://production.es.com:9200/my_index
# Export ES data to S3 (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input=http://production.es.com:9200/my_index \
--output "s3://${bucket_name}/${file_name}.json"
# Import data from MINIO (s3 compatible) into ES (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input "s3://${bucket_name}/${file_name}.json" \
--output=http://production.es.com:9200/my_index
--s3ForcePathStyle true
--s3Endpoint https://production.minio.co
# Export ES data to MINIO (s3 compatible) (using s3urls)
elasticdump \
--s3AccessKeyId "${access_key_id}" \
--s3SecretAccessKey "${access_key_secret}" \
--input=http://production.es.com:9200/my_index \
--output "s3://${bucket_name}/${file_name}.json"
--s3ForcePathStyle true
--s3Endpoint https://production.minio.co
# Import data from CSV file into ES (using csvurls)
elasticdump \
# csv:// prefix must be included to allow parsing of csv files
# --input "csv://${file_path}.csv" \
--input "csv:///data/cars.csv"
--output=http://production.es.com:9200/my_index \
--csvSkipRows 1 # used to skip parsed rows (this does not include the headers row)
--csvDelimiter ";" # default csvDelimiter is ','
If Elasticsearch/OpenSearch is not being served from the root directory the --input-index and
--output-index are required. If they are not provided, the additional sub-directories will
be parsed for index and type.
Elasticsearch/OpenSearch:
- format: {protocol}://{host}:{port}/{sub}/{directory...}
- example: http://127.0.0.1:9200/api/search
# Copy a single index from a elasticsearch:
elasticdump \
--input=http://es.com:9200/api/search \
--input-index=my_index \
--output=http://es.com:9200/api/search \
--output-index=my_index \
--type=mapping
# Copy a single type:
elasticdump \
--input=http://es.com:9200/api/search \
--input-index=my_index/my_type \
--output=http://es.com:9200/api/search \
--output-index=my_index \
--type=mapping
If you prefer using docker to use elasticdump, you can download this project from docker hub:
docker pull elasticdump/elasticsearch-dump
Then you can use it just by :
- using docker run --rm -ti elasticdump/elasticsearch-dump
- you'll need to mount your file storage dir -v <your dumps dir>:<your mount point> to your docker container
Example:
# Copy an index from production to staging with mappings:
docker run --rm -ti elasticdump/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
docker run --rm -ti elasticdump/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data
# Backup index data to a file:
docker run --rm -ti -v /data:/tmp elasticdump/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=/tmp/my_index_mapping.json \
--type=data
If you need to run using localhost as your ES host:
docker run --net=host --rm -ti elasticdump/elasticsearch-dump \
--input=http://staging.es.com:9200/my_index \
--output=http://localhost:9200/my_index \
--type=data
The file format generated by this tool is line-delimited JSON files. The dump file itself is not valid JSON, but each line is. We do this so that dumpfiles can be streamed and appended without worrying about whole-file parser integrity.
For example, if you wanted to parse every line, you could do:
while read LINE; do jsonlint-py "${LINE}" ; done < dump.data.json
``` elasticdump: Import and export tools for elasticsearch version: %%version%%
Usage: elasticdump --input SOURCE --output DESTINATION [OPTIONS]
--input Source location (required)
--input-index Source index and type (default: all, example: index/type)
--output Destination location (required)
--output-index Destination index and type (default: all, example: index/type)
--big-int-fields Specifies a comma-seperated list of fields that should be checked for big-int support (default '')
--bulkAction Sets the operation type to be used when preparing the request body to be sent to elastic search. For more info - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html (default: index, options: [index, update, delete, create)
--ca, --input-ca, --output-ca CA certificate. Use --ca if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--cert, --input-cert, --output-cert Client certificate file. Use --cert if source and destination are identical. Otherwise, use the one prefixed with --input or --output as needed.
--csvConfigs Set all fast-csv configurations A escaped JSON string or file can be supplied. File location must be prefixed with the @ symbol (default: null)
--csvCustomHeaders A comma-seperated listed of values that will be used as headers for your data. This param must
be used in conjunction with csvRenameHeaders
(default : null)
--csvDelimiter The delimiter that will separate columns. (default : ',')
--csvFirstRowAsHeaders If set to true the first row will be treated as the headers. (default : true)
--csvHandleNestedData Set to true to handle nested JSON/CSV data. NB : This is a very opinionated implementaton ! (default : false)
--csvIdColumn Name of the column to extract the record identifier (id) from When exporting to CSV this column can be used to override the default id (@id) column name (default : null)
--csvIgnoreAutoColumns Set to true to prevent the following columns @id, @index, @type from being written to the output file (default : false)
--csvIgnoreEmpty Set to true to ignore empty rows. (default : false)
--csvIncludeEndRowDelimiter Set to true to include a row delimiter at the end of the csv (default : false)
--csvIndexColumn Name of the column to extract the record index from When exporting to CSV this column can be used to override the default index (@index) column name (default : null)
--csvLTrim Set to true to left trim all columns. (default : false)
--csvMaxRows If number is > 0 then only the specified number of rows will be parsed.(e.g. 100 would return the first 100 rows of data) (default : 0)
--csvRTrim Set to true to right trim all columns. (default : false)
--csvRenameHeaders
If you want the first line of the file to be removed and replaced by the one provided in the csvCustomHeaders option
(default : true)
--csvSkipLines If number is > 0 the specified number of lines will be skipped. (default : 0)
--csvSkipRows If number is > 0 then the specified number of parsed rows will be skipped NB: (If the first row is treated as headers, they aren't a part of the count) (default : 0)
--csvTrim Set to true to trim all white space from columns. (default : false)
--csvTypeColumn Name of t
$ claude mcp add elasticsearch-dump \
-- python -m otcore.mcp_server <graph>