Table of Contents
Benchmarking EBS volumes in EC2
I ran the following benchmark in mid-2010 when evaluating Oracle database storage in EC2. We never made the move to EC2, in parts because of the bad IOs provided by EBS volumes, even when aggregated in software raid. The result spreadsheet is available here: ebsresult.ods
- Test protocol
All EBS volumes have been zero-ed before use (dd if=/dev/zero of=<VOLUME> bs=1M) to avoid the first write penalty problem.
Bonnie++ command is :
# time bonnie++ -d /mnt/orabackup/store/ \ -s 70G:4k -m ORACLE-EC2 -u oracle -g oinstall
Sysbench script is :
#!/bin/sh set -u set -x set -e for size in 256M 16G; do for mode in seqwr seqrd rndrd rndwr rndrw; do sysbench --test=fileio --file-num=1 --file-total-size=$size prepare for threads in 1 4 8 16; do echo PARAMS $size $mode $threads > sysbench-size-$size-mode-$mode-threads-$threads sysbench --test=fileio --file-total-size=$size --file-test-mode=$mode\ --max-time=60 --max-requests=10000000 --num-threads=$threads --init-rng=on \ --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 run \ >> sysbench-size-$size-mode-$mode-threads-$threads 2>&1 done sysbench --test=fileio --file-total-size=$size cleanup done done
To parse sysbench results, I use the following script to generate a CSV output:
#!/bin/sh TESTNAME="$1" for size in 256M 16G; do for mode in seqwr seqrd rndrd rndwr rndrw; do for threads in 1 4 8 16; do FILE=sysbench-size-$size-mode-$mode-threads-$threads BYTESPERSEC=$(grep "Total transferred" $FILE|awk '{print $8}'|cut -d '(' -f 2|cut -d 'M' -f 1) REQPERSEC=$(grep "Requests/sec executed" $FILE|awk '{print $1}') MINREQRESP=$(grep "min:" $FILE|awk '{print $2}') AVGREQRESP=$(grep "avg:" $FILE|awk '{print $2}') MAXREQRESP=$(grep "max:" $FILE|awk '{print $2}') PCTREQRESP=$(grep "approx. 95 percentile:" $FILE|awk '{print $4}') echo "$TESTNAME,$size,$mode,$threads,$BYTESPERSEC,$REQPERSEC,$MINREQRESP,$AVGREQRESP,$MAXREQRESP,$PCTREQRESP" done done done
- Non-EBS Ephemeral storage (as a reference
# time bonnie++ -d /mnt/test/ -s 70G:4k -m ORACLE-EC2-EPHEMERAL -u oracle -g oinstall Using uid:500, gid:500. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ORACLE-EC2-E 70G:4k 28674 34 76965 14 32880 1 55955 57 73941 0 144.7 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ ORACLE-EC2-EPHEMERAL,70G:4k,28674,34,76965,14,32880,1,55955,57,73941,0,144.7,0,16,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++ real 140m24.643s user 24m3.727s sys 6m16.934s
- LVM over 2 EBS
2 EBS volumes of 200GB each, basic LVM setup, no RAID, xfs file size with default options (block size = 4ko)
Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ORACLE-EC2 70G:4k 63355 71 66032 8 28465 0 58438 64 84394 0 179.6 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 2369 2 +++++ +++ 2698 0 4049 3 +++++ +++ 2491 1 ORACLE-EC2,70G:4k,63355,71,66032,8,28465,0,58438,64,84394,0,179.6,0,16,2369,2,+++++,+++,2698,0,4049,3,+++++,+++,2491,1 real 117m46.785s user 24m49.691s sys 4m25.653s
- LVM over RAID 0 over 2 EBS
2 EBS of 200G each
# mdadm --create --verbose /dev/md0 --level=0 --raid-devices=2 --chunk=4 --name="orabackup" /dev/sdg{1,2} mdadm: array /dev/md0 started.
# pvcreate /dev/md0 Physical volume "/dev/md0" successfully created # vgcreate ORABACKUP /dev/md0 Volume group "ORABACKUP" successfully created # lvcreate -L 390G -n store ORABACKUP Logical volume "store" created
# mkfs.xfs /dev/ORABACKUP/store meta-data=/dev/ORABACKUP/store isize=256 agcount=16, agsize=6389760 blks = sectsz=512 attr=0 data = bsize=4096 blocks=102236160, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=32768, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 # mount /dev/ORABACKUP/store /mnt/orabackup/store/
# time bonnie++ -d /mnt/orabackup/store/ -s 70G:4k -m ORACLE-EC2 -u oracle -g oinstall Using uid:500, gid:500. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ORACLE-EC2 70G:4k 88914 98 112718 16 20814 0 14713 14 63929 0 265.8 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 2575 2 +++++ +++ 3405 1 3839 4 +++++ +++ 2735 1 ORACLE-EC2,70G:4k,88914,98,112718,16,20814,0,14713,14,63929,0,265.8,0,16,2575,2,+++++,+++,3405,1,3839,4,+++++,+++,2735,1 real 186m44.444s user 23m18.176s sys 4m52.625s
- LVM over RAID 10 2*2 EBS
4 EBS 100G agregated in RAID1 first (2 by 2) and then in a RAID0, topped with LVM.
$ for i in $(seq 1 4);do ./ec2-create-volume -K ../../pk.pem -C ../../cert.pem -s 100 -z us-east-1d;done VOLUME vol-9953e6f2 100 us-east-1d creating 2011-05-20T12:51:47+0000 VOLUME vol-9f53e6f4 100 us-east-1d creating 2011-05-20T12:51:51+0000 VOLUME vol-6b52e700 100 us-east-1d creating 2011-05-20T12:51:55+0000 VOLUME vol-6d52e706 100 us-east-1d creating 2011-05-20T12:52:00+0000
$ count=1;for i in vol-9953e6f2 vol-9f53e6f4 vol-6b52e700 vol-6d52e706 ;do ./ec2-attach-volume -K ../../pk.pem -C ../../cert.pem $i -i i-927921fd -d /dev/sdh$count;count=$(( count + 1 ));done ATTACHMENT vol-9953e6f2 i-927921fd /dev/sdh1 attaching 2011-05-20T12:56:56+0000 ATTACHMENT vol-9f53e6f4 i-927921fd /dev/sdh2 attaching 2011-05-20T12:57:00+0000 ATTACHMENT vol-6b52e700 i-927921fd /dev/sdh3 attaching 2011-05-20T12:57:03+0000 ATTACHMENT vol-6d52e706 i-927921fd /dev/sdh4 attaching 2011-05-20T12:57:07+0000
# dd if=/dev/zero of=/dev/sdh1 bs=1M & [1] 17737 # dd if=/dev/zero of=/dev/sdh2 bs=1M & [2] 17738 # dd if=/dev/zero of=/dev/sdh3 bs=1M & [3] 17739 # dd if=/dev/zero of=/dev/sdh4 bs=1M & [4] 17740
# mdadm --create /dev/md1 --verbose --level=raid1 --raid-devices=2 /dev/sdh1 /dev/sdh2 mdadm: size set to 104857536K mdadm: array /dev/md1 started. # mdadm --create /dev/md2 --verbose --level=raid1 --raid-devices=2 /dev/sdh3 /dev/sdh4 mdadm: size set to 104857536K mdadm: array /dev/md2 started. # mdadm --create /dev/md3 --verbose --chunk=4 --level=raid0 --raid-devices=2 /dev/md1 /dev/md2 mdadm: array /dev/md3 started.
# pvcreate /dev/md3 Physical volume "/dev/md3" successfully created # vgcreate RAID10 /dev/md3 Volume group "RAID10" successfully created # lvcreate -L 190G -n store RAID10 Logical volume "store" created
# mkfs.xfs /dev/RAID10/store meta-data=/dev/RAID10/store isize=256 agcount=16, agsize=3112960 blks = sectsz=512 attr=0 data = bsize=4096 blocks=49807360, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal log bsize=4096 blocks=24320, version=1 = sectsz=512 sunit=0 blks, lazy-count=0 realtime =none extsz=4096 blocks=0, rtextents=0 # mkdir -p /mnt/raid10/store # mount /dev/RAID10/store /mnt/raid10/store/ # chown oracle:oinstall /mnt/raid10/ -R
# time bonnie++ -d /mnt/raid10/store/ -s 70G:4k -m ORACLE-EC2-RAID10 -u oracle -g oinstall Using uid:500, gid:500. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order...done. Create files in random order...done. Stat files in random order...done. Delete files in random order...done. Version 1.03 ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ORACLE-EC2-R 70G:4k 59424 68 56086 9 16275 0 14360 15 33553 0 289.8 0 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 1082 1 +++++ +++ 1436 2 1775 3 +++++ +++ 1307 2 ORACLE-EC2-RAID10,70G:4k,59424,68,56086,9,16275,0,14360,15,33553,0,289.8,0,16,1082,1,+++++,+++,1436,2,1775,3,+++++,+++,1307,2 real 240m57.833s user 24m22.156s sys 6m5.419s
- LVM over RAID 0 4 EBS
4 EBS 100G agregated in RAID0, topped with LVM.
with a chunk size of 8KB
mkfs.xfs -f -d su=8k,sw=4 -i attr=2 /dev/ORACLE/oradata mount -o noatime,nodiratime,logbufs=8,osyncisdsync /dev/ORACLE/oradata /home/oradata/
Settling for 2x 1TB over RAID1
I noticed there is a bug with older kernels and RAID1 of 1TB EBS volumes on m2.4xlarge instances. It seems that the instance freezes with high IOs. That was with an older kernel (2.6.21) and doesn't seem to be a problem with a recent 2.6.35 kernel.
Creating the raid
# mdadm --create /dev/md0 --level=raid1 --verbose --raid-devices=2 --metadata=1.1 /dev/xvdf{1,2} mdadm: size set to 1073740668K mdadm: array /dev/md0 started.
Creating the file system.
I initially wanted to create a filesystem with a blocksize of 8KB, because that's what Oracle recommands. But on Linux, you cannot have a block size larger than the page size, which is 4KB. So I settled for XFS default options, which are usually the best choice anyway.
# mkfs.xfs /dev/md0
Mount options
mount -o noatime,nodiratime,logbufs=8 /dev/md0 /home/oradata/