Table of Contents
1. Introduction
If you store documents in Elasticsearch and your ID has a meaning and for example is integrated with external system then it is worth to consider retrieving docs via such identification. Probably you know about get request with which you can get exact document from index but there is also version for multi retrieval that taking away responsibility from you of writing wrapper around several get commands, now Elasticsearch can do it for you.
Let me show you how.
2. Start Elasticsearch cluster
docker network create elkai
docker run --rm \
--name elk \
--net elkai \
-e ES_JAVA_OPTS="-Xms4g -Xmx4g" \
-d \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.15.2
2.1. set password for elastic user
docker exec -it elk bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"
3. Load test data
Below dummy data of USA citizens
curl -k -X POST -u elastic:123456 -H "Content-Type: application/x-ndjson" "https://localhost:9200/_bulk" -d'
{"index":{"_index":"people","_id":"999-99-9999"}}
{"name":"John","surname":"Doe","address":"123 Main St"}
{"index":{"_index":"people","_id":"999-99-9998"}}
{"name":"Jane","surname":"Smith","address":"456 Oak Ave"}
{"index":{"_index":"people","_id":"999-99-9997"}}
{"name":"Peter","surname":"Jones","address":"789 Pine Ln"}
{"index":{"_index":"people","_id":"999-99-9996"}}
{"name":"Mary","surname":"Brown","address":"1001 Elm St"}
{"index":{"_index":"people","_id":"999-99-9995"}}
{"name":"Robert","surname":"Davis","address":"1234 Maple Dr"}
{"index":{"_index":"people","_id":"999-99-9994"}}
{"name":"Linda","surname":"Wilson","address":"5678 Birch Rd"}
{"index":{"_index":"people","_id":"999-99-9993"}}
{"name":"Michael","surname":"Taylor","address":"9012 Cedar Ave"}
{"index":{"_index":"people","_id":"999-99-9992"}}
{"name":"Barbara","surname":"Anderson","address":"1357 Willow St"}
{"index":{"_index":"people","_id":"999-99-9991"}}
{"name":"David","surname":"Thomas","address":"2468 Redwood Ln"}
{"index":{"_index":"people","_id":"999-99-9990"}}
{"name":"Susan","surname":"Jackson","address":"1479 Oakwood Dr"}
'
4. Retrieve multiple documents
Imagine you want to check data of 5 persons and you know their personal numbers. Here you can use mget API call to provide list of documents _id’s and get whole documents in response.
curl -X GET -u elastic:123456 -H "Content-Type: application/json" "https://localhost:9200/people/_mget?pretty" -k -d'
{
"docs": [
{ "_id": "999-99-9999" },
{ "_id": "999-99-9998" },
{ "_id": "999-99-9997" },
{ "_id": "999-99-9996" },
{ "_id": "999-99-9995" }
]
}
'
response:
{
"docs" : [
{
"_index" : "people",
"_id" : "999-99-9999",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "John",
"surname" : "Doe",
"address" : "123 Main St"
}
},
{
"_index" : "people",
"_id" : "999-99-9998",
"_version" : 1,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Jane",
"surname" : "Smith",
"address" : "456 Oak Ave"
}
},
{
"_index" : "people",
"_id" : "999-99-9997",
"_version" : 1,
"_seq_no" : 2,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Peter",
"surname" : "Jones",
"address" : "789 Pine Ln"
}
},
{
"_index" : "people",
"_id" : "999-99-9996",
"_version" : 1,
"_seq_no" : 3,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Mary",
"surname" : "Brown",
"address" : "1001 Elm St"
}
},
{
"_index" : "people",
"_id" : "999-99-9995",
"_version" : 1,
"_seq_no" : 4,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Robert",
"surname" : "Davis",
"address" : "1234 Maple Dr"
}
}
]
}
4.1. Missing documents during retrieval
There can be situation that you are getting less documents in response. Basically missed documents will be marked with found equal to false. If you see that behavior you should check which routing key was used during data ingestion. As default _id is used for routing key when you get documents via mget API calls so if someone did ingestion using custom routing key, then your request will make Elasticsearch looking documents in wrong shards.
Routing is specific to distributed systems when you have buckets (here called shards) spread across nodes in cluster and you decide to which particular one document should be assigned by processing it’s id using modulo function with prime number. This guarantee equal distribution.
In future I will make another tutorial when you will simulate such case by following ready to execute commands.
5. Summary
In this tutorial you have learned how to retrieve multiple documents by their ID using mget API. You also got warning about possible scenario when result does not contain all requested documents.
Have a nice coding!