Friday, January 24, 2014

How Data Is Stored In CEPH Cluster


HOW :: Data is Storage Inside Ceph Cluster 



how data is stored in ceph storage



This is something definitely your would be wondering about , How Data _ _ _ Ceph Cluster ? 

Now showing a easy to understand ceph data storage diagram.


## POOLS : Ceph cluster has POOLS , pools are the logical group for storing objects .These pools are made up of PG ( Placement Groups ). At the time of pool creation we have to provide number of placement groups that the pool is going to contain , number of object replicas ( usually takes default value , if other not specified )

  • Creating a pool ( pool-A ) with 128 placement groups
# ceph osd pool create pool-A 128
pool 'pool-A' created
  • Listing pools
# ceph osd lspools
0 data,1 metadata,2 rbd,36 pool-A,
  • Find out total number of placement groups being used by pool
# ceph osd pool get pool-A pg_num
pg_num: 128
  • Find out replication level being used by pool ( see rep size value for replication )
# ceph osd dump | grep -i pool-A
pool 36 'pool-A' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 4051 owner 0
  • Changing replication level for a pool ( compare from above step , rep size changed )
# ceph osd pool set pool-A size 3
set pool 36 size to 3
#
# ceph osd dump | grep -i pool-A
pool 36 'pool-A' rep size 3 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 4054 owner 0

This means all the objects of pool-A will be replicated 3 times on 3 different OSD's

Now , Putting some data in pool-A , and data would be stored in the form of objects  :-) thumb rule.

# dd if=/dev/zero of=object-A bs=10M count=1
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.0222705 s, 471 MB/s
#

# dd if=/dev/zero of=object-B bs=10M count=1
1+0 records in
1+0 records out
10485760 bytes (10 MB) copied, 0.0221176 s, 474 MB/s
#
  • Putting some objects in pool-A
# rados -p pool-A put object-A  object-A
# rados -p pool-A put object-B  object-B
  • checking how many objects  does the pool contains
# rados -p pool-A ls
object-A
object-B
#

## PG ( Placement Group ): Ceph cluster links objects --> PG . These PG containing objects are spread across multiple OSD and improves reliability. 

## Object : Object is the smallest unit of data storage in ceph cluster , Each & Everything is stored in the form of objects , thats why ceph cluster is also known as Object Storage Cluster. Objects are mapped to PG , and these Objects / their copies always spreaded on different OSD. This is how ceph is designed. 

  • Locating object , to which PG it belongs and stored where ??
# ceph osd map pool-A object-A
osdmap e4055 pool 'pool-A' (36) object 'object-A' -> pg 36.b301e3e8 (36.68) -> up [122,63,62] acting [122,63,62]
#
# ceph osd map pool-A object-B
osdmap e4055 pool 'pool-A' (36) object 'object-B' -> pg 36.47f173fb (36.7b) -> up [153,110,118] acting [153,110,118]
#
Now , we already created a pool-A , changed its replication level to 3 , added objects ( object-A and object-B ) to pool-A . Observe the above output. It throws a lot of information

  1. OSD map version id is e4055
  2. pool name is pool-A
  3. pool id is 36
  4. object name ( which was enquired , object-A and object-B )
  5. Placement Group id to which this object belongs is  ( 36.68 ) and ( 36.7b )
  6. Our pool-A has replication level set to 3 , so every object of this pool should have 3 copies on different OSD , here our object's 3 copies resides on OSD.122 , OSD.63 and OSD.62
  • Login to ceph nodes containing OSD 122 , 63 and 62
  • You can see your OSD mounted
# df -h /var/lib/ceph/osd/ceph-122
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdj1             2.8T  1.8T  975G  65% /var/lib/ceph/osd/ceph-122
#
  • Browse to the directory where ACTUAL OBJECTS are stored
# pwd
/var/lib/ceph/osd/ceph-122/current
#
  • Under this directory if you do a ls command , you will see PG ID , In our case the PG id is 36.68  for object-A
# ls -la | grep -i 36.68
drwxr-xr-x 1 root root    54 Jan 24 16:45 36.68_head
#
  • Browse to the PG head directory , give ls and Here you go you reached to your OBJECT.
# pwd
/var/lib/ceph/osd/ceph-122/current/36.68_head
#
# ls -l
total 10240
-rw-r--r-- 1 root root 10485760 Jan 24 16:45 object-A__head_B301E3E8__24
#

Moral of the Story

  • Ceph storage cluster can have more than one Pools
  • Each pool SHOULD have multiple Placement Groups . More the PG , better your cluster performance , more reliable your setup would be.
  • A PG contains multiple Objects.
  • A PG is spreaded on multiple OSD , i.e Objects are spreaded across OSD. The first OSD mapped to PG will be its primary OSD and the other ODS's of same PG will be its secondary OSD.
  • An Object can be mapped to exactly one PG
  • Many PG's can be mapped to ONE OSD

How much PG you need for a POOL :


           (OSDs * 100)
Total PGs = ------------
              Replicas

# ceph osd stat
     osdmap e4055: 154 osds: 154 up, 154 in
#

Applying formula gives me  = ( 154 * 100 ) / 3 = 5133.33

Now , round up this value to the next power of 2 , this will give you the number of PG you should have for a pool having replication size of 3 and total 154 OSD in entire cluster.

Final Value = 8192 PG


Friday, January 10, 2014

CephFS with a dedicated pool


CephFS with a Dedicated Pool


cephfs with a dedicated pool

This 
blog is about configuring a dedicated pool ( user defined pool ) for cephfs. If you are looking to configure cephfs , please visit  
CephFS Step by Step blog


  • Create a new pool for cephfs ( obviosly you can use your existing pool )
# rados mkpool cephfs
  • Grab pool id
# ceph osd dump | grep -i cephfs
pool 34 'cephfs' rep size 2 min_size 1 crush_ruleset 0 object_hash rjenkins pg_num 8 pgp_num 8 last_change 860 owner 0
# 
  • Assign the pool to MDS
# ceph mds add_data_pool 34 
    • Mount your cephfs share
    # mount -t ceph 192.168.100.101:/ /cephfs -o name=cephfs,secretfile=/etc/ceph/client.cephfs
    
    • Check current layout of cephfs , you would notice the default layout.data_pool is set to 0 , which means your cephfs will store date in pool 0 i.e data pool
    # cephfs /cephfs/ show_layout
    layout.data_pool:     0
    layout.object_size:   4194304
    layout.stripe_unit:   4194304
    layout.stripe_count:  1
    
    • Set a new layout for data_pool in cephfs , use pool id of the pool that we have created above.
    # cephfs /cephfs/ set_layout -p 34
    # cephfs /cephfs/ show_layout
    layout.data_pool:     34
    layout.object_size:   4194304
    layout.stripe_unit:   4194304
    layout.stripe_count:  1
    [root@na_csc_fedora19 ~]#
    
    • Remount your cephfs share
    # umount /cephfs
    # mount -t ceph 192.168.100.101:/ /cephfs -o name=cephfs,secretfile=/etc/ceph/client.cephfs
    
    • Check objects that are present in cephfs pool , there should be no object as this is a fresh pool and does not contain any data . But if you look for objects for any other pool , it should contain objects.
    # rados --pool=cephfs ls
    #
    # rados --pool=metadata ls
    1.00000000.inode
    100.00000000
    100.00000000.inode
    1.00000000
    2.00000000
    200.00000000
    this is a tesf fine
    200.00000001
    600.00000000
    601.00000000
    602.00000000
    603.00000000
    604.00000000
    605.00000000
    606.00000000
    607.00000000
    608.00000000
    609.00000000
    mds0_inotable
    mds0_sessionmap
    mds_anchortable
    mds_snaptable
    #
    
    • Go to your cephfs directory and create some files ( put data in your file ) .
    # cd /cephfs/
    # vi test
    
    • Recheck for objects in cephfs pool , now it will show you objects .
    # rados --pool=cephfs ls
    10000000005.00000000
    #
    Summary is , we have created a new pool named "cephfs" , changed layout of cephfs to store its data in new pool "cephfs" , and finally we saw cephfs data is getting stored in pool named cephfs  ( i know its too more cephfs , read it again if you are sleeping and didn't understand cephfs)



    Thursday, January 9, 2014

    Kraken :: The First Free Ceph Dashboard in Town




    Kraken :: The Free Ceph Dashboard is Finally Live

    Kraken is a Free ceph dashboard for monitoring and statistics. Special thanks to Donald Talton for this beautiful dashboard. 


    Installing Kraken

    •  Install Prerequisites 
    # yum install git
    # yum install django
    # yum install python-pip
    # pip install requests
    Requirement already satisfied (use --upgrade to upgrade): requests in /usr/lib/python2.7/site-packages
    Cleaning up...
    #
    # pip install django
    Requirement already satisfied (use --upgrade to upgrade): django in /usr/lib/python2.7/site-packages
    Cleaning up...
    #
    # yum install screen
    • Create a new user account 
    # useradd kraken
    • Clone kraken from github
    # cd /home/kraken
    # git clone https://github.com/krakendash/krakendash
    Cloning into 'krakendash'...
    remote: Counting objects: 511, done.
    remote: Compressing objects: 100% (276/276), done.
    remote: Total 511 (delta 240), reused 497 (delta 226)
    Receiving objects: 100% (511/511), 1.53 MiB | 343.00 KiB/s, done.
    Resolving deltas: 100% (240/240), done.
    #
    • Exceute api.sh  and django.sh one by one , these would get launched in screens , use ctrl A  and D  to detach from screen

    # ./api.sh
    [detached from 14662.api]
    # ./django.sh
    [detached from 14698.django]
    #
    # ps -ef | grep -i screen
    root     14662     1  0 07:29 ?        00:00:00 SCREEN -S api sudo ceph-rest-api -c /etc/ceph/ceph.conf --cluster ceph -i admin
    root     14698     1  0 07:30 ?        00:00:00 SCREEN -S django sudo python krakendash/kraken/manage.py runserver 0.0.0.0:8000
    root     14704 14472  0 07:30 pts/0    00:00:00 grep --color=auto -i screen
    #
    • Open your browser and navigate to http://localhost:8000/


    kraken




    kraken




    kraken pools

    • Great you have a Ceph GUI dashboard running now :-)
    • Watch out this space for new features of kraken



    Thursday, January 2, 2014

    Zero To Hero Guide : : For CEPH CLUSTER PLANNING


    What it is all about :

    If you think or discuss about Ceph , the most common question strike to your mind is "What Hardware Should I Select For My CEPH Storage Cluster ?" and yes if you really thought of this question in your mind , congratulations you seems to be serious about ceph technology and You should be because CEPH IS THE FUTURE OF STORAGE.

    Ceph runs on Commodity hardware , Ohh Yeah !! everyone now knows it . It is designed to build a multi-petabyte storage cluster while providing enterprise ready features. No single point of failure , scaling to exabytes , self managing and self healing ( saves operational cost ) , runs on commodity hardware ( no vendor locking , saves capital investment )

    Ceph Overview :-

    The sole of ceph storage cluster is RADOS ( Reliable Autonomic Distributed Object Store ). Ceph uses powerful CRUSH ( Controlled Replication Under Scalable Hashing ) algorithm for optimize data placement ,  self managing and self healing. The RESTful interface is provided by Ceph Object Gateway (RGW) aks Rados GateWay and virtual disks are provisioned by Ceph Block Device (RBD) 



    Ceph Overview - Image Credit : Inktank


    Ceph Components :-

    # Ceph OSD ( Object Storage Daemons ) storage data in objects , manages data replication , recovery , rebalancing and provides stage information to Ceph Monitor. Its recommended to user 1 OSD per physical disk.

    # Ceph MON ( Monitors ) maintains overall health of cluster by keeping cluster map state including Monitor map , OSD map , Placement Group ( PG ) map , and CRUSH map. Monitors receives state information from other components to maintain maps and circulate these maps to other Monitor and OSD nodes.

    # Ceph RGW ( Object Gateway / Rados Gateway ) RESTful API interface compatible with Amazon S3 , OpenStack Swift .

    # Ceph RBD ( Raw Block Device ) Provides Block Storage to VM / bare metal as well as regular clients , supports OpenStack and CloudStack . Includes Enterprise features like snapshot , thin provisioning , compression.

    # CephFS ( File System ) distributed POSIX NAS storage.


    Few Thumb Rules :-

    • Run OSD on a dedicated storage node ( server with multiple disks ) , actual data is stored in the form of objects.
    • Run Monitor on a separate dedicated hardware or coexists with ceph client nodes ( other than OSD node ) such as RGW , CephFS node . For production its recommended to run Monitors on dedicated low cost servers since Monitors are not resource hungry.


    Monitor Hardware Configuration :-

    Monitor maintains health of entire cluster , it contains PG logs and OSD logs . A minimum of three monitors nodes are recommended for a cluster quorum. Ceph monitor nodes are not resource hungry they can work well with fairly low cpu and memory. A 1U server with low cost processor E5-2603,16GB RAM and 1GbE network should be sufficient in most of the cases. If PG,Monitor and OSD logs are storage on local disk of monitor node , make sure you have sufficient amount of local storage so that it should not fill up.

    Unhealthy clusters require more storage for logs , can reach upto GB and even hundreds of GB if the cluster is left unhealthy for a very long time . If verbose output is set on monitor nodes, then these are bound to generate huge amount of logging information. Refer ceph documentation for monitor log setting.

    Its recommended to run monitor on distant nodes rather on all on all one node or on virtual machines on physical separated machines to prevent single point of failure.


    The Planning Stage :-

    Deploying a ceph cluster in production requires a little bit Homework , you should gather the below information so that you can design a better and more reliable and scalable ceph cluster to fit in your IT needs. These very specific to your needs and your IT environment. This information will help you to design your storage requirement better.


    • Business Requirement
      • Budget ?
      • Do you need Ceph cluster for day to day operation or SPECIAL 
    • Technical Requirement
      • What applications will be running on your ceph cluster ?
      • What type of data will be stored on your ceph cluster ?
      • Should the ceph cluster be optimized for capacity and performance ?
      • What should be usable storage capacity ?
      • What is expected growth rate ?
      • How many IOPS should the cluster support ?
      • How much throughput should the cluster support
      • How much data replication ( reliability level ) you need ?


    Collect as much information as possible during the planning stage , the will give all the answers required to construct a better ceph cluster.

    The Physical Node and clustering technique:-

    In addition to above collected information , also take into account the rack density  and power budget , data center space pace cost to size the optimal node configuration. Ceph replicated data across multiple nodes in a storage cluster to provide data redundancy and higher availability. Its important to consider.


    • Should the replicated node be on the same rack or multiple racks to avoid SPOF ?
    • Should the OSD traffic stay within the rack or span across rack in a dedicated or shared network ?
    • How many nodes failure can be tolerated ?
    • If the nodes are separated out across multiple racks network traffic increases and the impact of latency and the number of network switch hops should be considered.
    Ceph will automatically recover by re-replicating data from the failed nodes using secondary copies present on other nodes in cluster . A node failure thus have several effects.

    • Total cluster capacity is reduced by some fractions.
    • Total cluster throughput is reduced by some fractions.
    • The cluster enters a write heavy recovery processes.

    A general thumb of rule to calculate recovery time in a ceph cluster given 1 disk per OSD node is : 

    Recovery Time in seconds = disk capacity in Gigabits / ( network speed *(nodes-1) )



    # POC Environment -- Can have a minimum of 3 physical nodes with 10 OSD's each. This provides 66% cluster availability upon a physical node failure and 97% uptime upon an OSD failure. RGW and Monitor nodes can be put on OSD  nodes but this may impact performance  and not recommended for production.

    # Production Environment -- a minimum of 5 physically separated nodes and minimum of 100 OSD @ 4TB per OSD the cluster capacity is over 130TB  and provides 80% uptime on physical node failure and 99% uptime on OSD failure. RGW and Monitors should be on separate nodes.

     Based on the outcome of planning phase and physical nodes and clustering stage you have a look on the hardware available in market as per your budget.


    OSD CPU selection :-


    < Under Construction ... Stay Tuned >




    Monday, December 23, 2013

    Ceph and OpenStack in a Nutshell


    Ceph and OpenStack in a Nutshell



    Ceph Filesystem ( CephFS) :: Step by Step Configuration


    CephFS 

    Ceph Filesystem is a posix compliant file system that uses ceph storage cluster to store its data. This is the only ceph component that is not ready for production , i would like to say ready for pre-production.


    Internals 
    Thanks to http://ceph.com/docs/master/cephfs/ for Image 

    Requirement of CephFS


    • You need a running ceph cluster with at least one MDS node. MDS is required for CephFS to work.
    • If you don't have MDS configure one
      • # ceph-deploy mds create <MDS-NODE-ADDRESS>
    Note : If you are running short of hardware or want to save hardware you can run MDS services on existing Monitor nodes. MDS services does not need much resources
    • A Ceph client to mount cephFS

    Configuring CephFS
    • Install ceph on client node
    [root@storage0101-ib ceph]# ceph-deploy install na_fedora19
    [ceph_deploy.cli][INFO  ] Invoked (1.3.2): /usr/bin/ceph-deploy install na_fedora19
    [ceph_deploy.install][DEBUG ] Installing stable version emperor on cluster ceph hosts na_csc_fedora19
    [ceph_deploy.install][DEBUG ] Detecting platform for host na_fedora19 ...
    [na_csc_fedora19][DEBUG ] connected to host: na_csc_fedora19
    [na_csc_fedora19][DEBUG ] detect platform information from remote host
    [na_csc_fedora19][DEBUG ] detect machine type
    [ceph_deploy.install][INFO  ] Distro info: Fedora 19 Schrödinger’s Cat
    [na_csc_fedora19][INFO  ] installing ceph on na_fedora19
    [na_csc_fedora19][INFO  ] Running command: rpm --import https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc
    [na_csc_fedora19][INFO  ] Running command: rpm -Uvh --replacepkgs --force --quiet http://ceph.com/rpm-emperor/fc19/noarch/ceph-release-1-0.fc19.noarch.rpm
    [na_csc_fedora19][DEBUG ] ########################################
    [na_csc_fedora19][DEBUG ] Updating / installing...
    [na_csc_fedora19][DEBUG ] ########################################
    [na_csc_fedora19][INFO  ] Running command: yum -y -q install ceph
    
    [na_csc_fedora19][ERROR ] Warning: RPMDB altered outside of yum.
    [na_csc_fedora19][DEBUG ] No Presto metadata available for Ceph
    [na_csc_fedora19][INFO  ] Running command: ceph --version
    [na_csc_fedora19][DEBUG ] ceph version 0.72.2 (a913ded2ff138aefb8cb84d347d72164099cfd60)
    [root@storage0101-ib ceph]#
    • Create a new pool for CephFS
    # rados mkpool cephfs
    • Create a new keyring (client.cephfs) for cephfs 
    # ceph auth get-or-create client.cephfs mon 'allow r' osd 'allow rwx pool=cephfs' -o /etc/ceph/client.cephfs.keyring
    • Extract secret key from keyring
    # ceph-authtool -p -n client.cephfs /etc/ceph/client.cephfs.keyring > /etc/ceph/client.cephfs
    • Copy the secret file to client node under /etc/ceph . This allow filesystem to mount when cephx authentication is enabled
    # scp client.cephfs na_fedora19:/etc/ceph
    client.cephfs                                                                100%   41     0.0KB/s   00:00
    • List all the keys on ceph cluster
    # ceph auth list                                               


    Option-1 : Mount CephFS with Kernel Driver


    • On the client machine add mount point in /etc/fstab . Provide IP address of your ceph monitor node and path of secret key that we have created above
    192.168.200.101:6789:/ /cephfs ceph name=cephfs,secretfile=/etc/ceph/client.cephfs,noatime 0 2    
    • Mount cephfs mount point  , you might see some "mount: error writing /etc/mtab: Invalid argument" but you can ignore them and check  df -h
    [root@na_fedora19 ceph]# mount /cephfs
    mount: error writing /etc/mtab: Invalid argument
    
    [root@na_fedora19 ceph]#
    [root@na_fedora19 ceph]# df -h
    Filesystem              Size  Used Avail Use% Mounted on
    /dev/vda1               7.8G  2.1G  5.4G  28% /
    devtmpfs                3.9G     0  3.9G   0% /dev
    tmpfs                   3.9G     0  3.9G   0% /dev/shm
    tmpfs                   3.9G  288K  3.9G   1% /run
    tmpfs                   3.9G     0  3.9G   0% /sys/fs/cgroup
    tmpfs                   3.9G  2.6M  3.9G   1% /tmp
    192.168.200.101:6789:/  419T  8.5T  411T   3% /cephfs
    [root@na_fedora19 ceph]#

    Option-2 : Mounting CephFS as FUSE
    • Copy ceph configuration file ( ceph.conf ) from monitor node to client node and make sure it has permission of 644
    # scp ceph.conf na_fedora19:/etc/ceph
    # chmod 644 ceph.conf
    • Copy the secret file from monitor node to client node under /etc/ceph. This allow filesystem to mount when cephx authentication is enabled ( we have done this earlier )
    # scp client.cephfs na_fedora19:/etc/ceph
    client.cephfs                                                                100%   41     0.0KB/s   00:00
    • Make sure you have "ceph-fuse" package installed on client machine
    # rpm -qa | grep -i ceph-fuse
    ceph-fuse-0.72.2-0.fc19.x86_64 
    • To mount Ceph Filesystem as FUSE use ceph-fuse comand 
    [root@na_fedora19 ceph]# ceph-fuse -m 192.168.100.101:6789  /cephfs
    ceph-fuse[3256]: starting ceph client
    ceph-fuse[3256]: starting fuse
    [root@na_csc_fedora19 ceph]#
    
    [root@na_fedora19 ceph]# df -h
    Filesystem      Size  Used Avail Use% Mounted on
    /dev/vda1       7.8G  2.1G  5.4G  28% /
    devtmpfs        3.9G     0  3.9G   0% /dev
    tmpfs           3.9G     0  3.9G   0% /dev/shm
    tmpfs           3.9G  292K  3.9G   1% /run
    tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
    tmpfs           3.9G  2.6M  3.9G   1% /tmp
    ceph-fuse       419T  8.5T  411T   3% /cephfs
    [root@na_fedora19 ceph]#



    Thursday, December 5, 2013

    Ceph + OpenStack :: Part-5



    OpenStack Instance boot from Ceph Volume

    • For a list of images to choose from to create a bootable volume
    [root@rdo /(keystone_admin)]# nova image-list
    +--------------------------------------+-----------------------------+--------+--------+
    | ID                                   | Name                        | Status | Server |
    +--------------------------------------+-----------------------------+--------+--------+
    | f61edc8d-c9a1-4ff4-b4fc-c8128bd1a10b | Ubuntu 12.04 cloudimg amd64 | ACTIVE |        |
    | fcc07414-bbb3-4473-a8df-523664c8c9df | ceph-glance-image           | ACTIVE |        |
    | be62a5bf-879f-4d1f-846c-fdef960224ff | precise-cloudimg.raw        | ACTIVE |        |
    | 3c2db0ad-8d1e-400d-ba13-a506448f2a8e | precise-server-cloudimg     | ACTIVE |        |
    +--------------------------------------+-----------------------------+--------+--------+
    [root@rdo /(keystone_admin)]#
    
    • To create a bootable volume from an image, include the image ID in the command: Before the volume builds, its bootable state is false.
    [root@rdo qemu(keystone_admin)]# cinder create --image-id be62a5bf-879f-4d1f-846c-fdef960224ff --display-name my-boot-vol 10
    +---------------------+--------------------------------------+
    |       Property      |                Value                 |
    +---------------------+--------------------------------------+
    |     attachments     |                  []                  |
    |  availability_zone  |                 nova                 |
    |       bootable      |                false                 |
    |      created_at     |      2013-12-05T13:34:38.296723      |
    | display_description |                 None                 |
    |     display_name    |             my-boot-vol              |
    |          id         | 5fca6e1b-b494-4773-9c78-63f72703bfdf |
    |       image_id      | be62a5bf-879f-4d1f-846c-fdef960224ff |
    |       metadata      |                  {}                  |
    |         size        |                  10                  |
    |     snapshot_id     |                 None                 |
    |     source_volid    |                 None                 |
    |        status       |               creating               |
    |     volume_type     |                 None                 |
    +---------------------+--------------------------------------+
    [root@rdo qemu(keystone_admin)]#
    [root@rdo qemu(keystone_admin)]# cinder list
    +--------------------------------------+-------------+--------------+------+--------------+----------+--------------------------------------+
    |                  ID                  |    Status   | Display Name | Size | Volume Type  | Bootable |             Attached to              |
    +--------------------------------------+-------------+--------------+------+--------------+----------+--------------------------------------+
    | 0e2bfced-be6a-44ec-a3ca-22c771c66cdc |    in-use   |  nova-vol_1  |  2   |     None     |  false   | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 |
    | 10cc0855-652a-4a9b-baa1-80bc86dc12ac |  available  |  ceph-vol1   |  5   | ceph-storage |  false   |                                      |
    | 5fca6e1b-b494-4773-9c78-63f72703bfdf | downloading | my-boot-vol  |  10  |     None     |  false   |                                      |
    +--------------------------------------+-------------+--------------+------+--------------+----------+--------------------------------------+
    
    
    • Wait for few minutes the bootable state turns to true. Copy the value in the ID field for your volume.
    [root@rdo qemu(keystone_admin)]# cinder list
    +--------------------------------------+-----------+--------------+------+--------------+----------+--------------------------------------+
    |                  ID                  |   Status  | Display Name | Size | Volume Type  | Bootable |             Attached to              |
    +--------------------------------------+-----------+--------------+------+--------------+----------+--------------------------------------+
    | 0e2bfced-be6a-44ec-a3ca-22c771c66cdc |   in-use  |  nova-vol_1  |  2   |     None     |  false   | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 |
    | 10cc0855-652a-4a9b-baa1-80bc86dc12ac | available |  ceph-vol1   |  5   | ceph-storage |  false   |                                      |
    | 5fca6e1b-b494-4773-9c78-63f72703bfdf | available | my-boot-vol  |  10  |     None     |   true   |                                      |
    +--------------------------------------+-----------+--------------+------+--------------+----------+--------------------------------------+
    [root@rdo qemu(keystone_admin)]#
    
    • Create a nova instance which will be boot from ceph volume
    [root@rdo qemu(keystone_admin)]# nova boot --flavor 2 --image be62a5bf-879f-4d1f-846c-fdef960224ff --block_device_mapping vda=5fca6e1b-b494-4773-9c78-63f72703bfdf::0 --security_groups=default --nic net-id=4fe5909e-02db-4517-89f2-1278248fa26c  myInstanceFromVolume
    +--------------------------------------+--------------------------------------+
    | Property                             | Value                                |
    +--------------------------------------+--------------------------------------+
    | OS-EXT-STS:task_state                | scheduling                           |
    | image                                | precise-cloudimg.raw                 |
    | OS-EXT-STS:vm_state                  | building                             |
    | OS-EXT-SRV-ATTR:instance_name        | instance-0000001e                    |
    | OS-SRV-USG:launched_at               | None                                 |
    | flavor                               | m1.small                             |
    | id                                   | f24a0b29-9f1e-444b-b895-c3c694f2f1bc |
    | security_groups                      | [{u'name': u'default'}]              |
    | user_id                              | 99f8019ba2694d78a680a5de46aa1afd     |
    | OS-DCF:diskConfig                    | MANUAL                               |
    | accessIPv4                           |                                      |
    | accessIPv6                           |                                      |
    | progress                             | 0                                    |
    | OS-EXT-STS:power_state               | 0                                    |
    | OS-EXT-AZ:availability_zone          | nova                                 |
    | config_drive                         |                                      |
    | status                               | BUILD                                |
    | updated                              | 2013-12-05T13:47:34Z                 |
    | hostId                               |                                      |
    | OS-EXT-SRV-ATTR:host                 | None                                 |
    | OS-SRV-USG:terminated_at             | None                                 |
    | key_name                             | None                                 |
    | OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                 |
    | name                                 | myInstanceFromVolume                 |
    | adminPass                            | qt34izQiLkG3                         |
    | tenant_id                            | 0dafe42cfde242ddbb67b681f59bdb00     |
    | created                              | 2013-12-05T13:47:34Z                 |
    | os-extended-volumes:volumes_attached | []                                   |
    | metadata                             | {}                                   |
    +--------------------------------------+--------------------------------------+
    [root@rdo qemu(keystone_admin)]#
    [root@rdo qemu(keystone_admin)]#
    [root@rdo qemu(keystone_admin)]#
    [root@rdo qemu(keystone_admin)]# nova list
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | ID                                   | Name                 | Status  | Task State | Power State | Networks            |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | 0043a8be-60d1-43ed-ba43-1ccd0bba7559 | instance2            | SHUTOFF | None       | Shutdown    | public=172.24.4.228 |
    | f24a0b29-9f1e-444b-b895-c3c694f2f1bc | myInstanceFromVolume | BUILD   | spawning   | NOSTATE     | private=10.0.0.3    |
    | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 | small-ubuntu         | ACTIVE  | None       | Running     | public=172.24.4.230 |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    [root@rdo qemu(keystone_admin)]#
    
    • Just in few minutes the instance starts RUNNING , time for a party now
    [root@rdo qemu(keystone_admin)]# nova list
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | ID                                   | Name                 | Status  | Task State | Power State | Networks            |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | 0043a8be-60d1-43ed-ba43-1ccd0bba7559 | instance2            | SHUTOFF | None       | Shutdown    | public=172.24.4.228 |
    | f24a0b29-9f1e-444b-b895-c3c694f2f1bc | myInstanceFromVolume | ACTIVE  | None       | Running     | private=10.0.0.3    |
    | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 | small-ubuntu         | ACTIVE  | None       | Running     | public=172.24.4.230 |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    [root@rdo qemu(keystone_admin)]#
    

    OpenStack Instance boot from Ceph Volume :: Troubleshooting


    • During boot from volume , i encountered some errors after creating nova instance. The image was not able to get booted up from volume
    [root@rdo nova(keystone_admin)]# nova boot --flavor 2 --image be62a5bf-879f-4d1f-846c-fdef960224ff --block_device_mapping vda=dd315dda-b22a-4cf8-8b77-7c2b2f163155:::0 --security_groups=default --nic net-id=4fe5909e-02db-4517-89f2-1278248fa26c  myInstanceFromVolume
    +--------------------------------------+----------------------------------------------------+
    | Property                             | Value                                              |
    +--------------------------------------+----------------------------------------------------+
    | OS-EXT-STS:task_state                | scheduling                                         |
    | image                                | precise-cloudimg.raw                               |
    | OS-EXT-STS:vm_state                  | building                                           |
    | OS-EXT-SRV-ATTR:instance_name        | instance-0000001d                                  |
    | OS-SRV-USG:launched_at               | None                                               |
    | flavor                               | m1.small                                           |
    | id                                   | f324e9b8-ec3a-4174-8b97-bf78dba62932               |
    | security_groups                      | [{u'name': u'default'}]                            |
    | user_id                              | 99f8019ba2694d78a680a5de46aa1afd                   |
    | OS-DCF:diskConfig                    | MANUAL                                             |
    | accessIPv4                           |                                                    |
    | accessIPv6                           |                                                    |
    | progress                             | 0                                                  |
    | OS-EXT-STS:power_state               | 0                                                  |
    | OS-EXT-AZ:availability_zone          | nova                                               |
    | config_drive                         |                                                    |
    | status                               | BUILD                                              |
    | updated                              | 2013-12-05T12:42:22Z                               |
    | hostId                               |                                                    |
    | OS-EXT-SRV-ATTR:host                 | None                                               |
    | OS-SRV-USG:terminated_at             | None                                               |
    | key_name                             | None                                               |
    | OS-EXT-SRV-ATTR:hypervisor_hostname  | None                                               |
    | name                                 | myInstanceFromVolume                               |
    | adminPass                            | eish5pu56CiE                                       |
    | tenant_id                            | 0dafe42cfde242ddbb67b681f59bdb00                   |
    | created                              | 2013-12-05T12:42:21Z                               |
    | os-extended-volumes:volumes_attached | [{u'id': u'dd315dda-b22a-4cf8-8b77-7c2b2f163155'}] |
    | metadata                             | {}                                                 |
    +--------------------------------------+----------------------------------------------------+
    [root@rdo nova(keystone_admin)]#
    [root@rdo nova(keystone_admin)]#
    [root@rdo nova(keystone_admin)]#
    [root@rdo nova(keystone_admin)]# nova list
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | ID                                   | Name                 | Status  | Task State | Power State | Networks            |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | 0043a8be-60d1-43ed-ba43-1ccd0bba7559 | instance2            | SHUTOFF | None       | Shutdown    | public=172.24.4.228 |
    | f324e9b8-ec3a-4174-8b97-bf78dba62932 | myInstanceFromVolume | ERROR   | None       | NOSTATE     | private=10.0.0.3    |
    | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 | small-ubuntu         | ACTIVE  | None       | Running     | public=172.24.4.230 |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    [root@rdo nova(keystone_admin)]#
    
    • After checking up logs from /var/log/libvirt/qemu/instance-0000001d.log
    qemu-kvm: -drive file=rbd:ceph-volumes/volume-dd315dda-b22a-4cf8-8b77-7c2b2f163155:id=volumes:key=AQC804xS8HzFJxAAD/zzQ8LMzq9wDLq/5a472g==:auth_supported=cephx\;none:mon_host=192.168.1.31\:6789\;192.168.1.33\:6789\;192.168.1.38\:6789,if=none,id=drive-virtio-disk0,format=raw,serial=dd315dda-b22a-4cf8-8b77-7c2b2f163155,cache=none: could not open disk image rbd:ceph-volumes/volume-dd315dda-b22a-4cf8-8b77-7c2b2f163155:id=volumes:key=AQC804xS8HzFJxAAD/zzQ8LMzq9wDLq/5a472g==:auth_supported=cephx\;none:mon_host=192.168.1.31\:6789\;192.168.1.33\:6789\;192.168.1.38\:6789: No such file or directory
    2013-12-05 12:42:29.544+0000: shutting down
    
    • Run qemu-img -h command to check for the supported format , here i found rbd format is not supported by qemu , so there is something fishy in this
    Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2 qed vhdx parallels nbd blkdebug host_cdrom host_floppy host_device file gluster gluster gluster gluster
    
    • Check the installed qemu version
    [root@rdo qemu(keystone_admin)]# rpm -qa | grep -i qemu
    qemu-img-0.12.1.2-2.415.el6_5.3.x86_64
    qemu-guest-agent-0.12.1.2-2.415.el6_5.3.x86_64
    gpxe-roms-qemu-0.9.7-6.10.el6.noarch
    qemu-kvm-0.12.1.2-2.415.el6_5.3.x86_64
    qemu-kvm-tools-0.12.1.2-2.415.el6_5.3.x86_64
    [root@rdo qemu(keystone_admin)]#
    
    • Have a look on previous post to see the installation of correct version of qemu . After this your nova instance should boot from volume


    [root@rdo qemu(keystone_admin)]# rpm -qa | grep -i qemu
    qemu-img-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64
    qemu-guest-agent-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64
    qemu-kvm-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64
    gpxe-roms-qemu-0.9.7-6.10.el6.noarch
    qemu-kvm-tools-0.12.1.2-2.355.el6.2.cuttlefish.async.x86_64
    [root@rdo qemu(keystone_admin)]#
    
    [root@rdo /(keystone_admin)]# qemu-img -h | grep -i rbd
    Supported formats: raw cow qcow vdi vmdk cloop dmg bochs vpc vvfat qcow2 qed parallels nbd blkdebug host_cdrom host_floppy host_device file rbd
    [root@rdo /(keystone_admin)]#
    
    
    [root@rdo qemu(keystone_admin)]# nova list
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | ID                                   | Name                 | Status  | Task State | Power State | Networks            |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    | 0043a8be-60d1-43ed-ba43-1ccd0bba7559 | instance2            | SHUTOFF | None       | Shutdown    | public=172.24.4.228 |
    | f24a0b29-9f1e-444b-b895-c3c694f2f1bc | myInstanceFromVolume | ACTIVE  | None       | Running     | private=10.0.0.3    |
    | 9d3c327f-1893-40ff-8a82-16fad9ce6d91 | small-ubuntu         | ACTIVE  | None       | Running     | public=172.24.4.230 |
    +--------------------------------------+----------------------+---------+------------+-------------+---------------------+
    [root@rdo qemu(keystone_admin)]#