Table of Contents
1. Introduction
When you have JSON documents as string in one of your input fields you can then parse it and store in structured form. All with help of ingest processors accessible in Painless.
2. Start Elasticsearch
Please start your Elastic instance by running
docker run --rm \
--name animals \
-d \
-p 9200:9200 \
docker.elastic.co/elasticsearch/elasticsearch:8.12.0
then set password for ‘elastic’ user
docker exec -it animals bash -c "(mkfifo pipe1); ( (elasticsearch-reset-password -u elastic -i < pipe1) & ( echo $'y\n123456\n123456' > pipe1) );sleep 5;rm pipe1"
3. Dry run of Painless script
Like you read in article you test painless script by simulating ingestion. This helps with debugging and makes sure you will have right results.
Document that you can use is
{
"classification": {
"kingdom": "Animalia",
"phylum": "Chordata",
"class": "Aves",
"order": "Sphenisciformes",
"family": "Spheniscidae",
"genus": "Aptenodytes",
"species": "Aptenodytes forsteri",
"commonName": "Emperor Penguin"
}
}
And code to test pipeline is
curl -k -u elastic:123456 -XPOST "https://localhost:9200/_ingest/pipeline/_simulate?pretty" \
-H 'content-type: application/json' -d'
{
"pipeline": {
"processors": [
{
"script": {
"lang": "painless",
"source": "ctx.details = Processors.json(ctx.details);"
}
}
]
},
"docs": [
{
"_source": { "animal":"Penguin",
"details": "{\"classification\": {\"kingdom\": {\"phylum\":\"Chordata\",\"class\":\"Aves\",\"order\":\"Sphenisciformes\",\"family\":\"Spheniscidae\",\"genus\":\"Aptenodytes\",\"species\":\"Aptenodytes forsteri\",\"commonName\":\"Emperor Penguin\"}}}"
}
}
]
}'
your result of execution will be
{
"docs" : [
{
"doc" : {
"_index" : "_index",
"_version" : "-3",
"_id" : "_id",
"_source" : {
"animal" : "Penguin",
"details" : {
"classification" : {
"kingdom" : {
"commonName" : "Emperor Penguin",
"phylum" : "Chordata",
"genus" : "Aptenodytes",
"species" : "Aptenodytes forsteri",
"family" : "Spheniscidae",
"class" : "Aves",
"order" : "Sphenisciformes"
}
}
}
},
"_ingest" : {
"timestamp" : "2024-01-30T01:54:31.389949888Z"
}
}
}
]
}
4. Create Pipeline
Your pipeline is rather simple. You could make this script stored optionally but is so simple that can be skipped.
curl -k -u elastic:123456 -XPUT "https://localhost:9200/_ingest/pipeline/parsejson" \
-H 'content-type: application/json' -d'
{
"description": "Parse JSON document",
"processors": [
{
"script": {
"source": "ctx.details = Processors.json(ctx.details);"
}
}
]
}'
5. Load test data through pipeline
Here 5 animals to load with their classification
curl -k -u elastic:123456 -XPOST "https://localhost:9200/animals/_bulk?pipeline=parsejson" \
-H 'content-type: application/json' -d'
{"index":{"_id":"1"}}
{"animal": "Polar Bear","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Carnivora\",\"family\":\"Ursidae\",\"genus\":\"Ursus\",\"species\":\"maritimus\"}}"}
{"index":{"_id":"2"}}
{"animal": "Arctic Fox","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Carnivora\",\"family\":\"Canidae\",\"genus\":\"Vulpes\",\"species\":\"lagopus\"}}"}
{"index":{"_id":"3"}}
{"animal": "Snowy Owl","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Aves\",\"order\":\"Strigiformes\",\"family\":\"Strigidae\",\"genus\":\"Bubo\",\"species\":\"scandiacus\"}}"}
{"index":{"_id":"4"}}
{"animal": "Narwhal","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Cetacea\",\"family\":\"Monodontidae\",\"genus\":\"Monodon\",\"species\":\"monoceros\"}}"}
{"index":{"_id":"5"}}
{"animal": "Arctic Hare","details": "{\"classification\": {\"kingdom\":\"Animalia\",\"phylum\":\"Chordata\",\"class\":\"Mammalia\",\"order\":\"Lagomorpha\",\"family\":\"Leporidae\",\"genus\":\"Lepus\",\"species\":\"arcticus\"}}"}
'
6. Check results
After successful load you can check results by searching _source of docs.
curl -k -u elastic:123456 -XPOST "https://localhost:9200/animals/_search?pretty&filter_path=hits.hits._source"
After executing above command you should get below result
{
"hits" : {
"hits" : [
{
"_source" : {
"animal" : "Polar Bear",
"details" : {
"classification" : {
"phylum" : "Chordata",
"genus" : "Ursus",
"species" : "maritimus",
"family" : "Ursidae",
"kingdom" : "Animalia",
"class" : "Mammalia",
"order" : "Carnivora"
}
}
}
},
{
"_source" : {
"animal" : "Arctic Fox",
"details" : {
"classification" : {
"phylum" : "Chordata",
"genus" : "Vulpes",
"species" : "lagopus",
"family" : "Canidae",
"kingdom" : "Animalia",
"class" : "Mammalia",
"order" : "Carnivora"
}
}
}
},
{
"_source" : {
"animal" : "Snowy Owl",
"details" : {
"classification" : {
"phylum" : "Chordata",
"genus" : "Bubo",
"species" : "scandiacus",
"family" : "Strigidae",
"kingdom" : "Animalia",
"class" : "Aves",
"order" : "Strigiformes"
}
}
}
},
{
"_source" : {
"animal" : "Narwhal",
"details" : {
"classification" : {
"phylum" : "Chordata",
"genus" : "Monodon",
"species" : "monoceros",
"family" : "Monodontidae",
"kingdom" : "Animalia",
"class" : "Mammalia",
"order" : "Cetacea"
}
}
}
},
{
"_source" : {
"animal" : "Arctic Hare",
"details" : {
"classification" : {
"phylum" : "Chordata",
"genus" : "Lepus",
"species" : "arcticus",
"family" : "Leporidae",
"kingdom" : "Animalia",
"class" : "Mammalia",
"order" : "Lagomorpha"
}
}
}
}
]
}
}
7. Conclusion
You have practiced how to parse JSON object if encounter in input data. Noticed that final results make it structured JSON instead of plain string. Now you can use this functionality in your project.