Benchmarking EBS volumes in EC2

I ran the following benchmark in mid-2010 when evaluating Oracle database storage in EC2. We never made the move to EC2, in parts because of the bad IOs provided by EBS volumes, even when aggregated in software raid. The result spreadsheet is available here: ebsresult.ods

- Test protocol

All EBS volumes have been zero-ed before use (dd if=/dev/zero of=<VOLUME> bs=1M) to avoid the first write penalty problem.

Bonnie++ command is :

# time bonnie++ -d /mnt/orabackup/store/ \
-s 70G:4k -m ORACLE-EC2 -u oracle -g oinstall

Sysbench script is :

set -u
set -x
set -e

for size in 256M 16G; do
   for mode in seqwr seqrd rndrd rndwr rndrw; do
      sysbench --test=fileio --file-num=1 --file-total-size=$size prepare
      for threads in 1 4 8 16; do
         echo PARAMS $size $mode $threads > sysbench-size-$size-mode-$mode-threads-$threads
         sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\
            --max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \
            --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \
            >> sysbench-size-$size-mode-$mode-threads-$threads 2>&1
     sysbench --test=fileio --file-total-size=$size cleanup

To parse sysbench results, I use the following script to generate a CSV output:



for size in 256M 16G; do
   for mode in seqwr seqrd rndrd rndwr rndrw; do
      for threads in 1 4 8 16; do
        BYTESPERSEC=$(grep "Total transferred" $FILE|awk '{print $8}'|cut -d '(' -f 2|cut -d 'M' -f 1)
        REQPERSEC=$(grep "Requests/sec executed" $FILE|awk '{print $1}')
        MINREQRESP=$(grep "min:" $FILE|awk '{print $2}')
        AVGREQRESP=$(grep "avg:" $FILE|awk '{print $2}')
        MAXREQRESP=$(grep "max:" $FILE|awk '{print $2}')
        PCTREQRESP=$(grep "approx.  95 percentile:" $FILE|awk '{print $4}')

- Non-EBS Ephemeral storage (as a reference

# time bonnie++ -d /mnt/test/ -s 70G:4k -m ORACLE-EC2-EPHEMERAL -u oracle -g oinstall
Using uid:500, gid:500.
Writing with putc()...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ORACLE-EC2-E 70G:4k 28674  34 76965  14 32880   1 55955  57 73941   0 144.7   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++

real    140m24.643s
user    24m3.727s
sys     6m16.934s

- LVM over 2 EBS

2 EBS volumes of 200GB each, basic LVM setup, no RAID, xfs file size with default options (block size = 4ko)

 Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ORACLE-EC2   70G:4k 63355  71 66032   8 28465   0 58438  64 84394   0 179.6   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  2369   2 +++++ +++  2698   0  4049   3 +++++ +++  2491   1

real	117m46.785s
user	24m49.691s
sys	4m25.653s

- LVM over RAID 0 over 2 EBS

2 EBS of 200G each

# mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 --chunk=4 --name="orabackup" /dev/sdg{1,2}
mdadm: array /dev/md0 started.
# pvcreate /dev/md0
  Physical volume "/dev/md0" successfully created

# vgcreate ORABACKUP /dev/md0
  Volume group "ORABACKUP" successfully created

# lvcreate -L 390G -n store ORABACKUP
  Logical volume "store" created
# mkfs.xfs /dev/ORABACKUP/store
meta-data=/dev/ORABACKUP/store   isize=256    agcount=16, agsize=6389760 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=102236160, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

# mount /dev/ORABACKUP/store /mnt/orabackup/store/
# time bonnie++ -d /mnt/orabackup/store/ -s 70G:4k -m ORACLE-EC2 -u oracle -g oinstall   Using uid:500, gid:500.
Writing with putc()...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ORACLE-EC2   70G:4k 88914  98 112718  16 20814   0 14713  14 63929   0 265.8   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  2575   2 +++++ +++  3405   1  3839   4 +++++ +++  2735   1

real    186m44.444s
user    23m18.176s
sys     4m52.625s

- LVM over RAID 10 2*2 EBS

4 EBS 100G agregated in RAID1 first (2 by 2) and then in a RAID0, topped with LVM.

$ for i in $(seq 1 4);do ./ec2-create-volume -K ../../pk.pem -C ../../cert.pem -s 100 -z us-east-1d;done
VOLUME	vol-9953e6f2	100		us-east-1d	creating	2011-05-20T12:51:47+0000
VOLUME	vol-9f53e6f4	100		us-east-1d	creating	2011-05-20T12:51:51+0000
VOLUME	vol-6b52e700	100		us-east-1d	creating	2011-05-20T12:51:55+0000
VOLUME	vol-6d52e706	100		us-east-1d	creating	2011-05-20T12:52:00+0000
$ count=1;for i in vol-9953e6f2 vol-9f53e6f4 vol-6b52e700 vol-6d52e706 ;do ./ec2-attach-volume -K ../../pk.pem -C ../../cert.pem $i -i i-927921fd -d /dev/sdh$count;count=$(( count + 1 ));done
ATTACHMENT	vol-9953e6f2	i-927921fd	/dev/sdh1	attaching	2011-05-20T12:56:56+0000
ATTACHMENT	vol-9f53e6f4	i-927921fd	/dev/sdh2	attaching	2011-05-20T12:57:00+0000
ATTACHMENT	vol-6b52e700	i-927921fd	/dev/sdh3	attaching	2011-05-20T12:57:03+0000
ATTACHMENT	vol-6d52e706	i-927921fd	/dev/sdh4	attaching	2011-05-20T12:57:07+0000
# dd if=/dev/zero of=/dev/sdh1 bs=1M &
[1] 17737
# dd if=/dev/zero of=/dev/sdh2 bs=1M &
[2] 17738
# dd if=/dev/zero of=/dev/sdh3 bs=1M &
[3] 17739
# dd if=/dev/zero of=/dev/sdh4 bs=1M &
[4] 17740
# mdadm --create /dev/md1 --verbose --level=raid1 --raid-devices=2 /dev/sdh1 /dev/sdh2
mdadm: size set to 104857536K
mdadm: array /dev/md1 started.
# mdadm --create /dev/md2 --verbose --level=raid1 --raid-devices=2 /dev/sdh3 /dev/sdh4
mdadm: size set to 104857536K
mdadm: array /dev/md2 started.
# mdadm --create /dev/md3 --verbose --chunk=4 --level=raid0 --raid-devices=2 /dev/md1 /dev/md2
mdadm: array /dev/md3 started.
# pvcreate /dev/md3
  Physical volume "/dev/md3" successfully created
# vgcreate RAID10 /dev/md3
  Volume group "RAID10" successfully created
# lvcreate -L 190G -n store RAID10
  Logical volume "store" created
# mkfs.xfs /dev/RAID10/store 
meta-data=/dev/RAID10/store      isize=256    agcount=16, agsize=3112960 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=49807360, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal log           bsize=4096   blocks=24320, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0
# mkdir -p /mnt/raid10/store
# mount /dev/RAID10/store /mnt/raid10/store/
# chown oracle:oinstall /mnt/raid10/ -R
# time bonnie++ -d /mnt/raid10/store/ -s 70G:4k -m ORACLE-EC2-RAID10 -u oracle -g oinstall
Using uid:500, gid:500.
Writing with putc()...done
Writing intelligently...done
Reading with getc()...done
Reading intelligently...done
start 'em...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version  1.03       ------Sequential Output------ --Sequential Input- --Random-
                    -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
ORACLE-EC2-R 70G:4k 59424  68 56086   9 16275   0 14360  15 33553   0 289.8   0
                    ------Sequential Create------ --------Random Create--------
                    -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
                 16  1082   1 +++++ +++  1436   2  1775   3 +++++ +++  1307   2

real    240m57.833s
user    24m22.156s
sys     6m5.419s

- LVM over RAID 0 4 EBS

4 EBS 100G agregated in RAID0, topped with LVM.

with a chunk size of 8KB

mkfs.xfs -f -d su=8k,sw=4 -i attr=2 /dev/ORACLE/oradata

mount -o noatime,nodiratime,logbufs=8,osyncisdsync /dev/ORACLE/oradata /home/oradata/

Settling for 2x 1TB over RAID1

I noticed there is a bug with older kernels and RAID1 of 1TB EBS volumes on m2.4xlarge instances. It seems that the instance freezes with high IOs. That was with an older kernel (2.6.21) and doesn't seem to be a problem with a recent 2.6.35 kernel.

Creating the raid

# mdadm --create /dev/md0 --level=raid1 --verbose --raid-devices=2 --metadata=1.1 /dev/xvdf{1,2}
mdadm: size set to 1073740668K
mdadm: array /dev/md0 started.

Creating the file system.

I initially wanted to create a filesystem with a blocksize of 8KB, because that's what Oracle recommands. But on Linux, you cannot have a block size larger than the page size, which is 4KB. So I settled for XFS default options, which are usually the best choice anyway.

# mkfs.xfs /dev/md0

Mount options

mount -o noatime,nodiratime,logbufs=8 /dev/md0 /home/oradata/