Search
Close this search box.

Everything You Should Know Fields Storage And Retrieval In Elasticsearch

fields storage and retrieval in elasticsearch

Table of Contents

1. Introduction

Elasticsearch is document database. Documents can contain plain or nested fields. This fields have a corresponding data type and finally value.

Additionally you can control how they are stored. In this knowledge article I want to show you summary about field storage and how they are accessible.

2. Field storage

In this section I want to show you different settings for field storage.

2.1. Full source

On disk storage.

When a document is indexed in Elasticsearch, it’s stored entirely within the `_source` field – all the details are there. Think of `_source` as a master copy, preserving all original information.

2.2. Particular fields

On disk storage.

You can also choose to store specific fields separately by setting `store=true` in the mapping. These stored fields can be retrieved directly without accessing the entire `_source`, much like pulling out specific details from a database. Here’s how you retrieve them:

				
					GET my-index-000001/_search
{
  "stored_fields": [ "eagle_wingspan", "eagle_nesting_location" ]
}
				
			

2.3. doc_values

On disk storage.

Most field types (excluding ‘text’ and ‘annotated_text’) can utilize ‘doc_values’. This feature stores field values on disk in a columnar format, which is highly efficient for sorting, aggregations, and scripting operations. They also allow searching (though slower) on fields that are not indexed.

If you don’t require sorting, aggregations, or scripting on a specific field, you can disable ‘doc_values’ (“enabled”: false) to conserve storage space.

2.4. fielddata

In-memory storage.

‘fielddata’ allowes accessing individual tokens from analyzed `text` fields for aggregations, sorting, or scripting. For instance, a text type field containing “San Francisco” value can be aggregated on “san” and “francisco” tokens if fielddata is enabled for that type.

				
					PUT my-index-000001/_mapping
{
  "properties": {
    "my_field": { 
      "type":     "text",
      "fielddata": true
    }
  }
}
				
			

This setting is disabled by default because it involves uninverting inverted index then loading into field data cache heap memory, which is expensive. text fields do not support mentioned earlier disk-based doc_values therefore this is only way to aggregate on them.

2.5. Runtime fields

No storage.

Fields are calculated on query execution.

3. Field retrieval

In this section I want to describe different ways to get document fields from Elasticsearch.

3.1. Whole _source document. 

There are two main ways to retrieve the entire ‘_source’:

  • _source filtering: This retrieves the entire ‘_source’ document but only returns the specified fields to the client. While the whole document is processed internally, this approach reduces the amount of data transferred over the network.
				
					GET /_search
{
  "_source": [ "user.first_name", "user.last_name" ],
  "query": {
    "match": {
      "user.id": "12345"
    }
  }
}
				
			
  • fields` parameter: This also retrieves the entire ‘_source’ but allows for formatting, handling multi-fields and aliases, retrieving runtime fields, and accessing fields from related indices using lookups. The response always returns an array of values for each requested field.

3.2. Retrieving Specific Fields

There are several ways to retrieve specific fields without fetching the entire ‘_source’:

3.2.1. docvalue_fields

For fields with ‘doc_values’ enabled (except text type), use ‘docvalue_fields’ for efficient retrieval.

				
					GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id": {
        "query": "XYZ-123",
        "slop": 0
      }
    }
  },
  "docvalue_fields": [
    "product_id", "product_price"
  ],
  "_source": false
}
				
			

This returns an array of values for each specified field.

3.2.2. stored_fields

Retrieve fields explicitly marked as ‘store=true’ in the mapping. Fields not marked ‘store=true’ will be ignored.

				
					GET /products/_search
{
  "query": {
    "match_phrase": {
      "product_id": {
        "query": "ABC-789",
        "slop": 0
      }
    }
  },
  "stored_fields": [
    "product_name", "product_description"
  ],
  "_source": false
}
				
			

3.2.3. Runtime

Runtime fields can be defined and retrieved within the search request itself.

				
					{
  "runtime_mappings": {
    "day_of_week": {
				
			

 You can also define them in mapping section called runtime

				
					{
  "mappings": {
    "runtime": {
				
			

4. Summary

In this knowledge article you have review multiple ways of storage document fields. You also study different approaches to retrieve them. This handy summary let you better understand how Elastic database is working.

Leave a Reply

Your email address will not be published. Required fields are marked *

Follow me on LinkedIn
Share the Post:

Enjoy Free Useful Amazing Content

Related Posts