Elasticsearch with Gluster Block Storage

In this blog we shall see

  1. Gluster block storage setup
  2. Elasticsearch Configuration with single node
  3. Testing
  4. Conclusion
  5. References

Before we begin,

  • In this post, I will try not to talk much about gluster block storage as that is not our main focus, one can look at my previous posts for more details on block storage terminology and architecture related information.
  • This post does not explain everything about Elasticsearch, it is just a POC that helps in setting up the gluster block storage as the backend persistent storage for Elasticsearch engine, and
  • Finally, be aware that gluster block storage is fresh and new and still in POC state.

All we need to perform this POC is 2 nodes with fedora 24 installed, and each having ~50G disk space.

Setup at a glance:

On Node1:
1. Install and run gluster and create a volume
2. Mount the volume created in step 1 and create a file of size 40G in the volume
3. Install and run tcmu-runner, create and export LUN using targetcli user:glfs handler
On Node2:
1. Discover and login to the target device exported in Node1
2. Notice the block device (/dev/sda) format it with xfs and mount
3. Install, configure Elasticsearch to use the mount point created in step 2 as data path and run it.
4. Play with the Elasticsearch engine by creating indices and querying.

Lets begin …

Gluster block storage setup

Installing glusterfs-server and configuring volume

Installing glusterfs 
# dnf install glusterfs-server
got glusterfs-server-3.8.5-1.fc24.x86_64.rpm

Run
# systemctl start glusterd
# systemctl status glusterd

Create a gluster volume
# gluster vol create block 10.70.42.151:/root/brick force
volume create: block: success: please start the volume to access data

Start the volume
# gluster vol start block
volume start: block: success

Mount the gluster volume
# mount.glusterfs localhost:/block /mnt/

Create a big file who play as target device
# fallocate -l 40G /mnt/elastic-media.img

# ls -l /mnt/
total 41943040
-rw-r--r--. 1 root root 42949672960 Nov 17 12:56 elastic-media.img

# df -Th
[...]
localhost:/block fuse.glusterfs 50G 41G 10G 81% /mnt

Tcmu-runner target emulation setup

Install tcmu-runner
# dnf install tcmu-runner

Run
# systemctl start tcmu-runner
# systemctl status tcmu-runner

Choose some iSCSI Qualified Name
# IQN=iqn.2016-11.org.gluster:10.70.42.151

Create the backend with glfs storage module
# targetcli /backstores/user:glfs create glfsLUN 40G block@10.70.42.151/elastic-media.img
Created user-backed storage object glfsLUN size 42949672960.

Create a target
# targetcli /iscsi create $IQN
Created target iqn.2016-11.org.gluster:10.70.42.151.
Created TPG 1.
Global pref auto_add_default_portal=true
Created default portal listening on all IPs (0.0.0.0), port 3260.

Share a glfs backed LUN without any auth checks
# targetcli /iscsi/$IQN/tpg1 set attribute generate_node_acls=1 demo_mode_write_protect=0
Parameter generate_node_acls is now '1'.
Parameter demo_mode_write_protect is now '0'.

Set/Export LUN
# targetcli /iscsi/$IQN/tpg1/luns create /backstores/user:glfs/glfsLUN
Created LUN 0.

# iptables -F

Initiator side setup (on Elasticsearch node) (NODE 2)

# dnf install iscsi-initiator-utils

Check existing block devices
# lsblk
NAME                       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                         11:0    1 1024M  0 rom  
vda                        252:0    0   40G  0 disk 
├─vda2                     252:2    0 39.5G  0 part 
│ ├─fedora_dhcp42--17-swap 253:1    0    4G  0 lvm  [SWAP]
│ └─fedora_dhcp42--17-root 253:0    0   15G  0 lvm  /
└─vda1                     252:1    0  500M  0 part /boot

Discovery and login to target
# iscsiadm -m discovery -t st -p 10.70.42.151 -l
10.70.42.151:3260,1 iqn.2016-06.org.gluster:10.70.42.151
Logging in to [iface: default, target: iqn.2016-06.org.gluster:10.70.42.151, portal: 10.70.42.151,3260] (multiple)
Login to [iface: default, target: iqn.2016-06.org.gluster:10.70.42.151, portal: 10.70.42.151,3260] successful.

Boom! got sda with 40G space 
# lsblk
NAME                       MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sr0                         11:0    1 1024M  0 rom  
sda                          8:0    0   40G  0 disk
vda                        252:0    0   40G  0 disk 
├─vda2                     252:2    0 39.5G  0 part 
│ ├─fedora_dhcp42--17-swap 253:1    0    4G  0 lvm  [SWAP]
│ └─fedora_dhcp42--17-root 253:0    0   15G  0 lvm  /
└─vda1                     252:1    0  500M  0 part /boot

Lets format the block device with xfs
#  mkfs.xfs /dev/sda

# mkdir /home/pkalever/block

Mount the block device
# mount /dev/sda /home/pkalever/block

# df -Th
Filesystem Type Size Used Avail Use% Mounted on
[...]
/dev/sda xfs 40G 0.2G 39.8G 1% /home/pkalever/block

Elasticsearch configuration with single node

Elasticsearch is an open-source, distributed, scalable, enterprise-grade search engine. Accessible through an extensive and elaborate API, Elasticsearch can power extremely fast searches that support your data discovery applications.

elasticsearch-2.3.4 (As it is compatible version with wiki dumps)

Download the rpm, this version is compatible with wiki indexes/dumps/docs
# wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.4/elasticsearch-2.3.4.rpm

Install Elasticsearch
# dnf install ./elasticsearch-2.3.4.rpm

Install Command-line JSON processor
# dnf install jq

Run
# sudo systemctl daemon-reload
# sudo systemctl enable elasticsearch.service
# sudo systemctl start elasticsearch.service

Check the status
# sudo systemctl status elasticsearch.service

Configure Elasticsearch to use gluster block mount directory for storage
Uncomment and edit the below parameters as per your choice
# sudo vi /etc/elasticsearch/elasticsearch.yml
cluster.name: gluster-block-17                 
node.name: node-17                             
path.data: /home/pkalever/block/data2     
path.logs: /home/pkalever/block/logs2

# mkdir  ~/block/data2  ~/block/log2

# /usr/share/elasticsearch/bin/plugin install analysis-icu

# sudo systemctl restart elasticsearch.service

Check the status
# sudo systemctl status elasticsearch.service

Testing

Simple test to make sure setup works

List the Indices
# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 

Now let’s create an index name "bank"
# curl -XPUT http://localhost:9200/bank?pretty 
{
 "acknowledged" : true
}

# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 
yellow open bank 5 1 0 0 650b 650b 

Note docs.count = 0 

Let’s now put something into our bank index.
In order to index a document, we must tell Elasticsearch which type in the index it should go to.
Let’s index a simple document into the bank index, "account" type, with an ID of 1 as follows:
# curl -XPUT http://localhost:9200/bank/account/1?pretty -d '
{
 "account_number": "999120999",
 "name": "pkalever"
}'

And the Response:
{
 "_index" : "bank",
 "_type" : "account",
 "_id" : "1",
 "_version" : 1,
 "_shards" : {
 "total" : 2,
 "successful" : 1,
 "failed" : 0
 },
 "created" : true
}

By looking at the response we can say that a new bank document was successfully created.
# curl -XGET http://localhost:9200/_cat/indices?v
health status index pri rep docs.count docs.deleted store.size pri.store.size 
yellow open bank 5 1 1 0 3.7kb 3.7kb

And now, Note docs.count = 1
 
Query a document
# curl -XGET http://localhost:9200/bank/account/1?pretty
{
 "_index" : "bank",
 "_type" : "account",
 "_id" : "1",
 "_version" : 1,
 "found" : true,
 "_source" : {
 "account_number" : "999120999",
 "name" : "pkalever"
 }
}

If we study the above commands carefully, we can actually see a pattern of how we access data in Elasticsearch.
That pattern can be summarized as follows:
<REST Verb> /<Index>/<Type>/<ID>

Delete the entry
# curl -XDELETE http://localhost:9200/bank/account/1?pretty

So we have manually created the indices and then added the documents, Lets now load some of the data sets/search index’s that Wikipedia provides.

Loading Wikipedia’s Search Index

In the very next script we do:
1. Delete if there is an index with name 'enwikiquote'
2. fetch the settings that en.wikiquote.org uses for its index and
   set them as template to create a new index
3. fetches the mapping for the content index and apply
# cat > run1.sh 
export es=localhost:9200
export site=en.wikiquote.org
export index=enwikiquote

curl -XDELETE $es/$index?pretty

curl -s 'https://'$site'/w/api.php?action=cirrus-settings-dump&format=json&formatversion=2' |
  jq '{
    analysis: .content.page.index.analysis,
    number_of_shards: 1,
    number_of_replicas: 0
  }' |
  curl -XPUT $es/$index?pretty -d @-

curl -s 'https://'$site'/w/api.php?action=cirrus-mapping-dump&format=json&formatversion=2' |
  jq .content |
  curl -XPUT $es/$index/_mapping/page?pretty -d @-

# ./run1.sh
{
  "acknowledged" : true
}
{
  "acknowledged" : true
}
{
  "acknowledged" : true
}

Now lets download the wiki dumps (the json formatted documents)
# wget https://dumps.wikimedia.org/other/cirrussearch/current/enwikiquote-20161114-cirrussearch-content.json.gz

Or you can go here and download whatever is needed for you https://dumps.wikimedia.org/other/cirrussearch/

In the very next script we
1. create a directory with name chunks and
2. extract 500 lines chunks from each file (250 lines metadata and 250 actual doc)
# cat > run2.sh 
export dump=enwikiquote-20161114-cirrussearch-content.json.gz
export index=enwikiquote

mkdir chunks
cd chunks
zcat ../$dump | split -a 10 -l 500 - $index


# ./run2.sh 
# ls chunks/
enwikiquoteaaaaaaaaaa  enwikiquoteaaaaaaaabd  enwikiquoteaaaaaaaacg  enwikiquoteaaaaaaaadj
enwikiquoteaaaaaaaaab  enwikiquoteaaaaaaaabe  enwikiquoteaaaaaaaach  enwikiquoteaaaaaaaadk
[...]
enwikiquoteaaaaaaaaba  enwikiquoteaaaaaaaacd  enwikiquoteaaaaaaaadg  enwikiquoteaaaaaaaaej
enwikiquoteaaaaaaaabb  enwikiquoteaaaaaaaace  enwikiquoteaaaaaaaadh  enwikiquoteaaaaaaaaek
enwikiquoteaaaaaaaabc  enwikiquoteaaaaaaaacf  enwikiquoteaaaaaaaadi

The loop in the script loads each file and deletes it after it's loaded. 
# cat > ./run3.sh
export es=localhost:9200
export index=enwikiquote
cd chunks
for file in *; do
  echo -n "${file}:  "
  took=$(curl -s -XPOST $es/$index/_bulk?pretty --data-binary @$file |
    grep took | cut -d':' -f 2 | cut -d',' -f 1)
  printf '%7s\n' $took
  [ "x$took" = "x" ] || rm $file
done

# ./run3.sh 
enwikiquoteaaaaaaaaaa:     9306
enwikiquoteaaaaaaaaab:    10607
enwikiquoteaaaaaaaaac:     6652
[...]
enwikiquoteaaaaaaaaaz:     4178
enwikiquoteaaaaaaaaba:     4800
enwikiquoteaaaaaaaabb:     4469
enwikiquoteaaaaaaaabc:     4349
[...]
enwikiquoteaaaaaaaabz:     8228
enwikiquoteaaaaaaaaca:     5152
enwikiquoteaaaaaaaacb:     4134
enwikiquoteaaaaaaaacc:     4510
[...]

List the indices 
# curl -XGET  http://localhost:9200/_cat/indices?v
health status index       pri rep docs.count docs.deleted store.size pri.store.size 
green  open   enwikiquote   1   0      28533            0      1.1gb          1.1gb

Query for page 1
# curl -XGET http://localhost:9200/enwikiquote/page/1?pretty

# curl -X GET  http://localhost:9200/enwikiquote/_search | less

Conclusion

This blog just showcases how Gluster block storage can be used as a backed persistent storage for Elasticsearch engine at POC level. More details will come by in further posts.

References

https://www.elastic.co/blog/loading-wikipedia

https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html

https://www.elastic.co/guide/en/elasticsearch/reference/current/_create_an_index.html

Previous posts on gluster block storage

https://pkalever.wordpress.com/2016/06/23/gluster-solution-for-non-shared-persistent-storage-in-docker-container/

https://pkalever.wordpress.com/2016/06/29/non-shared-persistent-gluster-storage-with-kubernetes/

https://pkalever.wordpress.com/2016/08/16/read-write-once-persistent-storage-for-openshift-origin-using-gluster/

https://pkalever.wordpress.com/2016/11/04/gluster-as-block-storage-with-qemu-tcmu/

Advertisements

Read Write Once Persistent Storage for OpenShift Origin using Gluster

In this blog we shall learn about:

  1. Containers and Persistent Storage
  2. About OpenShift Origin
  3. Terminology and background
  4. Our approach
  5. Setting up
    • Gluster and iSCSI target
    • iSCSI Initiator
    • Origin master and nodes
  6. Conclusion
  7. References

 

Containers and Persistent Storage

As we all know containers are stateless entities which are used to deploy  applications and hence need persistent storage to store  application data for availability across container incarnations.

Persistent storage in containers are of two types, shared and non-shared.
Shared storage:
Consider this as a volume/store where multiple Containers perform both read and write operations on the same data. Useful for applications like web servers that need to serve the same data from multiple container instances.

Non Shared/Read Write Once Storage:
Only a single container can perform write operations to this store at a given time.

This blog will explain about Non Shared Storage for OpenShift Origin using gluster.

 

About OpenShift Origin

OpenShift Origin is a distribution of Kubernetes optimized for continuous application development and multi-tenant deployment.

A few interesting features include Multi-tenancy support, Web console, Centralized administration, capability to automatically deploy applications on a new commit in source repo and etc..

Difference_dock_openshift

Read More @ origin github

 

Terminology and background

Refer to ‘Terminology and background’ section from our previous post

 

Our Approach

With all the background discussed above now I shall jump into actual essence of this blog and explain how we can expose the file in gluster volume as a read write once persistent storage in openshift pods.

The Current version of Kubernetes v1.2.x  which origin uses in my case, does not provide/understand multipathing, this patch got merged in v1.3.alpha3 release

Hence, In this blog I’m going with multipath disabled, once ansible playbook is upgraded to latest origin which use k8s v1.3.0, I shall update the blog to have multipath changes.

In our approach all the OpenShift Origin nodes initiate the iSCSI session, attaches iSCSI target as block device and serve it to pod where the application is running and requires persistent storage.

OpenShiftOrigin

Now without any delay let me walk through the setup details…

 

Setting Up

You need 6 nodes for setting this up, 3 acts as gluster nodes where the iSCSI target is served from and 1 as OpenShift Origin master and other 2 as the iSCSI initiators which also acts as Origin nodes.

  • We create a gluster replica 3 volume using the 3 nodes {Node1, Node2 and Node3}.
  • Define iSCSI target using the same nodes, expose ‘LUN’ from each of them.
  • Use Node 4 and Node 5 as as iSCSI initiators, by logging-in to the iSCSI target session created above (No multipathing)
  • Setup OpenShift Origin cluster by using {Node4, Node5 and Node6}, Node 6 is master and other 2 are slave nodes
  • From Node 6 create the pod and examine the iSCSI target device mount inside it.

Gluster and iSCSI target Setup

Refer to ‘Gluster and iSCSI target Setup’ section from our previous post

iSCSI initiator Setup

Refer to ‘iSCSI initiator Setup’ section from our previous post

OpenShift Origin Master and Nodes Setup

Master -> Node6
Slaves -> Node5 & Node4

Clone the openshift ansible repo
[root@Node6 ~]# git clone https://github.com/openshift/openshift-ansible.git

Install ansible on all the nodes including master
# dnf install -y ansible pyOpenSSL python-cryptography

Configure nodes in inventory file,
all you need to do is replacehost addresses, highlighted in bold
[root@Node6 ~]# cat > /etc/ansible/hosts
# Create an OSEv3 group that contains the masters and nodes groups
[OSEv3:children]
masters
nodes

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

# If ansible_ssh_user is not root, ansible_sudo must be set to true
#ansible_sudo=true

deployment_type=origin

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
#openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true',
# 'kind': 'HTPasswdPasswordIdentityProvider', 'filename': '/etc/origin/master/htpasswd'}]

# host group for masters
[masters]
Node6

# host group for nodes, includes region info
[nodes]
Node6 openshift_node_labels="{'region': 'infra', 'zone': 'default'}"
Node5 openshift_node_labels="{'region': 'primary', 'zone': 'east'}"
Node4 openshift_node_labels="{'region': 'primary', 'zone': 'west'}"
^C

Make nodes password less authorise logins, on all machines

Generate ssh key 
# ssh-keygen

Share ssh key with all the nodes, to do so, execute below on master,
$HOSTS being all the addresses/ip including master's, one at a time
[root@Node6 ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub $HOSTS

Just matter of precaution on all the hosts, disable selinux
# setenforce 0

Install some package dependencies, ignored by playbook
[root@Node6 ~]# ansible all -m shell -a "dnf install python2-dnf -y" 
[root@Node6 ~]# ansible all -m shell -a "dnf install python-dbus -y"
[root@Node6 ~]# ansible all -m shell -a "dnf install libsemanage-python -y"

Lets, execute the playbook
[root@Node6 ~]# cd $PATH/openshift-ansible
[root@Node6 openshift-ansible]# ansible-playbook playbooks/byo/config.yml
It takes ~40 minutes to finish this, at least that's what it took me. 

Check all nodes are ready
[root@Node6 ~]# oc get nodes
NAME STATUS AGE
Node4 Ready 1h
Node5 Ready 1h
Node6 Ready,SchedulingDisabled 1h

Check for pods
[root@Node6 ~]# oc get pods

 

login to the origin web console https://Node6:8443
Credentials: user->admin, passwd->admin

Screenshot from 2016-08-16 16-20-01.png

create a New project “say blockstore-gluster”

Attach4.png

 

Switch to 'blockstore-gluster' project
[root@Node6 ~]# oc project blockstore-gluster
Now using project "blockstore-gluster" on server "https://Node6:8443".

Write a manifest/artifact for the pod
[root@Node6 ~]# cat > iscsi-pod.json
{
   "apiVersion": "v1",
   "kind": "Pod",
   "metadata": {
      "name": "glusterpod"
   },
   "spec": {
      "containers": [
         {
            "name": "iscsi-rw",
            "image": "fedora",
            "volumeMounts": [
               {
                  "mountPath": "/mnt/gluster-store",
                  "name": "iscsi-rw"
               }
            ],
            "command": [ "sleep", " 100000" ]
         }
      ],
      "volumes": [
         {
            "name": "iscsi-rw",
            "iscsi": {
               "targetPortal": "Node1:3260",
               "iqn": "iqn.2016-06.org.gluster:Node1",
               "lun": 0,
               "fsType": "xfs",
               "readOnly": false
            }
         }
      ]
   } 
}
^C

Create the pod
[root@Node6 ~]# oc create -f ~/iscsi-pod.json 
pod "glusterpod" created

Get the pod info
[root@Node6 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
glusterpod 0/1 ContainerCreating 0 20s

Check events
[root@Node6 ~]# oc get events -w
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
2016-08-16 16:16:10 +0530 IST 2016-08-16 16:16:10 +0530 IST 1 glusterpod Pod Normal Scheduled {default-scheduler } Successfully assigned glusterpod to dhcp43-73.lab.eng.blr.redhat.com
FIRSTSEEN LASTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
2016-08-16 16:16:14 +0530 IST 2016-08-16 16:16:14 +0530 IST 1 glusterpod Pod spec.containers{iscsi-rw} Normal Pulling {kubelet Node5} pulling image "fedora"
2016-08-16 16:17:17 +0530 IST 2016-08-16 16:17:17 +0530 IST 1 glusterpod Pod spec.containers{iscsi-rw} Normal Pulled {kubelet Node5} Successfully pulled image "fedora"
2016-08-16 16:17:18 +0530 IST 2016-08-16 16:17:18 +0530 IST 1 glusterpod Pod spec.containers{iscsi-rw} Normal Created {kubelet Node5} Created container with docker id 0208911923f1
2016-08-16 16:17:18 +0530 IST 2016-08-16 16:17:18 +0530 IST 1 glusterpod Pod spec.containers{iscsi-rw} Normal Started {kubelet Node5} Started container with docker id 0208911923f1

[root@Node6 ~]# oc get pods
NAME READY STATUS RESTARTS AGE
glusterpod 1/1 Running 0 1m

Get into the pod
[root@Node6 ~]# oc exec -it glusterpod bash

[root@glusterpod /]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
[...]
/dev/sda xfs 8G 33M 8G 1% /mnt/gluster-store
/dev/mapper/fedora_dhcp42--82-root xfs 15G 1.8G 14G 12% /etc/hosts
[...]

[root@glusterpod /]# cd /mnt/gluster-store/
[root@glusterpod gluster-store]# ls
1 10 2 3 4 5 6 7 8 9 


 

Origin Web console with pod running:

Attach1.png

Details of pod:

Attach2.png

That’s cool Isn’t it ?

 

Conclusion

This just showcases how Gluster can be used as a distributed block store with OpenShift Origin cluster. More details about multipathing, integration with Mesos etc. will come by in further posts.

 

References

https://docs.openshift.org/latest/welcome/index.html

https://github.com/openshift/openshift-ansible/

http://kubernetes.io/

http://severalnines.com/blog/installing-kubernetes-cluster-minions-centos7-manage-pods-services

http://rootfs.github.io/iSCSI-Kubernetes/

http://blog.gluster.org/2016/04/using-lio-with-gluster/

https://docs.docker.com/engine/tutorials/dockervolumes/http://scst.sourceforge.net/scstvslio.html

http://events.linuxfoundation.org/sites/events/files/slides/tcmu-bobw_0.pdf

https://www.kernel.org/doc/Documentation/target/tcmu-design.txt

https://lwn.net/Articles/424004/

http://www.gluster.org/community/documentation/index.php/GlusterFS_Documentation