In this blog we shall see
- Gluster block storage setup
- Tcmu-runner target emulation setup
- Initiator side setup (on Elasticsearch node)
- Elasticsearch Configuration with single node
- Testing
- Conclusion
- References
Before we begin,
- In this post, I will try not to talk much about gluster block storage as that is not our main focus, one can look at my previous posts for more details on block storage terminology and architecture related information.
- This post does not explain everything about Elasticsearch, it is just a POC that helps in setting up the gluster block storage as the backend persistent storage for Elasticsearch engine, and
- Finally, be aware that gluster block storage is fresh and new and still in POC state.
All we need to perform this POC is 2 nodes with fedora 24 installed, and each having ~50G disk space.
Setup at a glance:
On Node1:
1. Install and run gluster and create a volume
2. Mount the volume created in step 1 and create a file of size 40G in the volume
3. Install and run tcmu-runner, create and export LUN using targetcli user:glfs handler
On Node2:
1. Discover and login to the target device exported in Node1
2. Notice the block device (/dev/sda) format it with xfs and mount
3. Install, configure Elasticsearch to use the mount point created in step 2 as data path and run it.
4. Play with the Elasticsearch engine by creating indices and querying.
Lets begin …
Gluster block storage setup
Installing glusterfs-server and configuring volume
Installing glusterfs
# dnf install glusterfs-server
got glusterfs-server-3.8.5-1.fc24.x86_64.rpm
Run
# systemctl start glusterd
# systemctl status glusterd
Create a gluster volume
# gluster vol create block 10.70.42.151:/root/brick force
volume create: block: success: please start the volume to access data
Start the volume
# gluster vol start block
volume start: block: success
Mount the gluster volume
# mount.glusterfs localhost:/block /mnt/
Create a big file who play as target device
# fallocate -l 40G /mnt/elastic-media.img
# ls -l /mnt/
total 41943040
-rw-r--r--. 1 root root 42949672960 Nov 17 12:56 elastic-media.img
# df -Th
[...]
localhost:/block fuse.glusterfs 50G 41G 10G 81% /mnt
Tcmu-runner target emulation setup
Install tcmu-runner # dnf install tcmu-runner Run # systemctl start tcmu-runner # systemctl status tcmu-runner Choose some iSCSI Qualified Name # IQN=iqn.2016-11.org.gluster:10.70.42.151 Create the backend with glfs storage module # targetcli /backstores/user:glfs create glfsLUN 40G block@10.70.42.151/elastic-media.img Created user-backed storage object glfsLUN size 42949672960. Create a target # targetcli /iscsi create $IQN Created target iqn.2016-11.org.gluster:10.70.42.151. Created TPG 1. Global pref auto_add_default_portal=true Created default portal listening on all IPs (0.0.0.0), port 3260. Share a glfs backed LUN without any auth checks # targetcli /iscsi/$IQN/tpg1 set attribute generate_node_acls=1 demo_mode_write_protect=0 Parameter generate_node_acls is now '1'. Parameter demo_mode_write_protect is now '0'. Set/Export LUN # targetcli /iscsi/$IQN/tpg1/luns create /backstores/user:glfs/glfsLUN Created LUN 0. # iptables -F
Initiator side setup (on Elasticsearch node) (NODE 2)
# dnf install iscsi-initiator-utils Check existing block devices # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom vda 252:0 0 40G 0 disk ├─vda2 252:2 0 39.5G 0 part │ ├─fedora_dhcp42--17-swap 253:1 0 4G 0 lvm [SWAP] │ └─fedora_dhcp42--17-root 253:0 0 15G 0 lvm / └─vda1 252:1 0 500M 0 part /boot Discovery and login to target # iscsiadm -m discovery -t st -p 10.70.42.151 -l 10.70.42.151:3260,1 iqn.2016-06.org.gluster:10.70.42.151 Logging in to [iface: default, target: iqn.2016-06.org.gluster:10.70.42.151, portal: 10.70.42.151,3260] (multiple) Login to [iface: default, target: iqn.2016-06.org.gluster:10.70.42.151, portal: 10.70.42.151,3260] successful. Boom! got sda with 40G space # lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom sda 8:0 0 40G 0 disk vda 252:0 0 40G 0 disk ├─vda2 252:2 0 39.5G 0 part │ ├─fedora_dhcp42--17-swap 253:1 0 4G 0 lvm [SWAP] │ └─fedora_dhcp42--17-root 253:0 0 15G 0 lvm / └─vda1 252:1 0 500M 0 part /boot Lets format the block device with xfs # mkfs.xfs /dev/sda # mkdir /home/pkalever/block Mount the block device # mount /dev/sda /home/pkalever/block # df -Th Filesystem Type Size Used Avail Use% Mounted on [...] /dev/sda xfs 40G 0.2G 39.8G 1% /home/pkalever/block
Elasticsearch configuration with single node
Elasticsearch is an open-source, distributed, scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications.
elasticsearch-2.3.4 (As it is compatible version with wiki dumps)
Download the rpm, this version is compatible with wiki indexes/dumps/docs
# wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.4/elasticsearch-2.3.4.rpm
Install Elasticsearch
# dnf install ./elasticsearch-2.3.4.rpm
Install Command-line JSON processor
# dnf install jq
Run
# sudo systemctl daemon-reload
# sudo systemctl enable elasticsearch.service
# sudo systemctl start elasticsearch.service
Check the status
# sudo systemctl status elasticsearch.service
Configure Elasticsearch to use gluster block mount directory for storage
Uncomment and edit the below parameters as per your choice
# sudo vi /etc/elasticsearch/elasticsearch.yml
cluster.name: gluster-block-17
node.name: node-17
path.data: /home/pkalever/block/data2
path.logs: /home/pkalever/block/logs2
# mkdir ~/block/data2 ~/block/log2
# /usr/share/elasticsearch/bin/plugin install analysis-icu
# sudo systemctl restart elasticsearch.service
Check the status
# sudo systemctl status elasticsearch.service
Testing
Simple test to make sure setup works
List the Indices
# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
Now let’s create an index name "bank"
# curl -XPUT http://localhost:9200/bank?pretty
{
"acknowledged" : true
}
# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 5 1 0 0 650b 650b
Note docs.count = 0
Let’s now put something into our bank index.
In order to index a document, we must tell Elasticsearch which type in the index it should go to.
Let’s index a simple document into the bank index, "account" type, with an ID of 1 as follows:
# curl -XPUT http://localhost:9200/bank/account/1?pretty -d '
{
"account_number": "999120999",
"name": "pkalever"
}'
And the Response:
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_version" : 1,
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"created" : true
}
By looking at the response we can say that a new bank document was successfully created.
# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open bank 5 1 1 0 3.7kb 3.7kb
And now, Note docs.count = 1
Query a document
# curl -XGET http://localhost:9200/bank/account/1?pretty
{
"_index" : "bank",
"_type" : "account",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"account_number" : "999120999",
"name" : "pkalever"
}
}
If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch.
That pattern can be summarized as follows:
<REST Verb> /<Index>/<Type>/<ID>
Delete the entry
# curl -XDELETE http://localhost:9200/bank/account/1?pretty
So we have manually created the indices and then added the documents, Lets now load some of the data sets/search index’s that Wikipedia provides.
Loading Wikipedia’s Search Index
In the very next script we do: 1. Delete if there is an index with name 'enwikiquote' 2. fetch the settings that en.wikiquote.org uses for its index and set them as template to create a new index 3. fetches the mapping for the content index and apply # cat > run1.sh export es=localhost:9200 export site=en.wikiquote.org export index=enwikiquote curl -XDELETE $es/$index?pretty curl -s 'https://'$site'/w/api.php?action=cirrus-settings-dump&format=json&formatversion=2' | jq '{ analysis: .content.page.index.analysis, number_of_shards: 1, number_of_replicas: 0 }' | curl -XPUT $es/$index?pretty -d @- curl -s 'https://'$site'/w/api.php?action=cirrus-mapping-dump&format=json&formatversion=2' | jq .content | curl -XPUT $es/$index/_mapping/page?pretty -d @- # ./run1.sh { "acknowledged" : true } { "acknowledged" : true } { "acknowledged" : true } Now lets download the wiki dumps (the json formatted documents) # wget https://dumps.wikimedia.org/other/cirrussearch/current/enwikiquote-20161114-cirrussearch-content.json.gz Or you can go here and download whatever is needed for you https://dumps.wikimedia.org/other/cirrussearch/ In the very next script we 1. create a directory with name chunks and 2. extract 500 lines chunks from each file (250 lines metadata and 250 actual doc) # cat > run2.sh export dump=enwikiquote-20161114-cirrussearch-content.json.gz export index=enwikiquote mkdir chunks cd chunks zcat ../$dump | split -a 10 -l 500 - $index # ./run2.sh # ls chunks/ enwikiquoteaaaaaaaaaa enwikiquoteaaaaaaaabd enwikiquoteaaaaaaaacg enwikiquoteaaaaaaaadj enwikiquoteaaaaaaaaab enwikiquoteaaaaaaaabe enwikiquoteaaaaaaaach enwikiquoteaaaaaaaadk [...] enwikiquoteaaaaaaaaba enwikiquoteaaaaaaaacd enwikiquoteaaaaaaaadg enwikiquoteaaaaaaaaej enwikiquoteaaaaaaaabb enwikiquoteaaaaaaaace enwikiquoteaaaaaaaadh enwikiquoteaaaaaaaaek enwikiquoteaaaaaaaabc enwikiquoteaaaaaaaacf enwikiquoteaaaaaaaadi The loop in the script loads each file and deletes it after it's loaded. # cat > ./run3.sh export es=localhost:9200 export index=enwikiquote cd chunks for file in *; do echo -n "${file}: " took=$(curl -s -XPOST $es/$index/_bulk?pretty --data-binary @$file | grep took | cut -d':' -f 2 | cut -d',' -f 1) printf '%7s\n' $took [ "x$took" = "x" ] || rm $file done # ./run3.sh enwikiquoteaaaaaaaaaa: 9306 enwikiquoteaaaaaaaaab: 10607 enwikiquoteaaaaaaaaac: 6652 [...] enwikiquoteaaaaaaaaaz: 4178 enwikiquoteaaaaaaaaba: 4800 enwikiquoteaaaaaaaabb: 4469 enwikiquoteaaaaaaaabc: 4349 [...] enwikiquoteaaaaaaaabz: 8228 enwikiquoteaaaaaaaaca: 5152 enwikiquoteaaaaaaaacb: 4134 enwikiquoteaaaaaaaacc: 4510 [...] List the indices # curl -XGET http://localhost:9200/_cat/indices?v health status index pri rep docs.count docs.deleted store.size pri.store.size green open enwikiquote 1 0 28533 0 1.1gb 1.1gb Query for page 1 # curl -XGET http://localhost:9200/enwikiquote/page/1?pretty # curl -X GET http://localhost:9200/enwikiquote/_search | less
Conclusion
This blog just showcases how Gluster block storage can be used as a backed persistent storage for Elasticsearch engine at POC level. More details will come by in further posts.
References
https://www.elastic.co/blog/loading-wikipedia
https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/_create_an_index.html
Previous posts on gluster block storage
https://pkalever.wordpress.com/2016/06/29/non-shared-persistent-gluster-storage-with-kubernetes/
https://pkalever.wordpress.com/2016/11/04/gluster-as-block-storage-with-qemu-tcmu/
Hi, I like your article and the general idea. It would be good however to actually do this with a typical Gluster volume configuration with e.g. a dist+repl volume. This is more realistic for a real world use later on when your ES indexes start to grow rapidly. This however might also expose some issue ES has when running against higher latency network based storage clusters such as Gluster causing ES to crash. iSCSI on top of such a construct doesn’t typically improve the situation. There are some workarounds when running ES indexes directly on Gluster volumes but they might not apply to situations when you have LIO.
The workarounds will be published in a Red Hat KB article shortly.
Nice Work Prasanna!
Now gluster volume can be consumed as block storage too!