ahcisata poor performance

Discussion:

Dima Veselov

2014-07-13 22:38:14 UTC

Hello!

Can someone tell - is there a way to inspect poor hard drive performance?

Two computers - windows and linux, SMB and NFS connected to NetBSD host.

Writing over SMB (GE interface) freeze linux NFS client (writing few bytes
can take 2 seconds).

I believe thats hard drive problem because iostat show not more than 10MB/s
even copying locally on NetBSD from one drive to another. Top show up to
10% interrupt.

Hardware is quite modern and this can be a problem, but I also could miss
proper raid or filesystem tunes.

ahcisata0 at pci0 dev 9 function 0: vendor 0x10de product 0x0ad4 (rev. 0xa2)
APSI: Picked IRQ 20 with weight 1
ahcisata0: interrupting at ioapic0 pin 20
ahcisata0: 64-bit DMA
ahcisata0: AHCI revision 1.20, 6 ports, 32 slots, CAP 0xe3229f05<PMD,SPM,ISS=0x2=Gen2,SCLO,SAL,SSNTF,SNCQ,S64A>

wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)
wd1 at atabus2 drive 0
wd1: <ST3000VN000-1H4167>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 2794 GB, 5814021 cyl, 16 head, 63 sec, 512 bytes/sect x 5860533168 sectors

pcictl:
000:09:0: NVIDIA nForce MCP77 AHCI Controller (SATA mass storage, interface 0x01, revision 0xa2)

--
Sincerelly yours

Manuel Bouyer

2014-07-14 10:37:09 UTC

Permalink

Post by Dima Veselov
Hello!
Can someone tell - is there a way to inspect poor hard drive performance?
Two computers - windows and linux, SMB and NFS connected to NetBSD host.
Writing over SMB (GE interface) freeze linux NFS client (writing few bytes
can take 2 seconds).
I believe thats hard drive problem because iostat show not more than 10MB/s
even copying locally on NetBSD from one drive to another. Top show up to
10% interrupt.

I would try some benchmark like iozone.
A simple dd from /dev/zero can also give some information:
dd if=/dev/zero of=file bs=1m count=50000
(you have to make sure the file created doesn't fit in ram).

Also you don't say what kind of access you're doing: a single large
file write, or lots of small files ?

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Dima Veselov

2014-07-17 05:07:14 UTC

Permalink

Post by Manuel Bouyer

By 'hard disk problem' I had in mind 'not network'. dd benchmarking prove that.

Post by Manuel Bouyer
I would try some benchmark like iozone.

Don't know if I interpret iozone data correctly, but there is a significant
loose of writing speed at 32k (and bigger) blocks, comparing to 16k or
smaller. It is seen on write, rewrite, fwrite, frewrite columns. I'm in
doubt, because i don't beleive in 800Mb/s or 1Gb/s speed of fwrite (for
less than 16k), however 17-20 Mb/s (for 32kb and more) look quite real.

The original question was about - how to know where speed loses?

Post by Manuel Bouyer
dd if=/dev/zero of=file bs=1m count=50000
(you have to make sure the file created doesn't fit in ram).

Stats are awful.

Linux/amd64 on usual workstation computer gave 111Mb/s

NetBSD 6.0/amd64 (with RAIDframe) Intel 82801JI SATA Controller
Intel Xeon E5506 working good as server 54.5 Mb/s

NetBSD 6.1.4/amd64 which was originally discussed
RAIDframe dk-raid-dk sandwitch
AMD Athlon(tm) 7750 Dual-Core Processor
NVIDIA nForce MCP77 ATA133 IDE Controller gave 13Mb/s

Post by Manuel Bouyer
Also you don't say what kind of access you're doing: a single large
file write, or lots of small files ?

Don't see much difference. I would say big files make it freeze more,
but maybe it just because of little breathes between small ones? :)

Iozone don't load much - it take 2-10% CPU and not more than 1%
interrupt, but samba bigfile copy take 6% CPU and up to 10% int.

Anyway saving this message during iozone make me feel the problem
(1-2 seconds of freeze while saving).

One more thing - this system previously worked with NetBSD 5 for
years and I never noticed anything like that. Hardware was the
same except it was 1x1Tb Raidframe disk instead of 2x3Tb now.

--
Sincerelly yours

Ilia Zykov

2014-07-17 06:48:29 UTC

Permalink

Post by Dima Veselov

Post by Manuel Bouyer
dd if=/dev/zero of=file bs=1m count=50000
(you have to make sure the file created doesn't fit in ram).

Stats are awful.
Linux/amd64 on usual workstation computer gave 111Mb/s

Do you get it on the same hardware with the same disk?

Post by Dima Veselov
AMD Athlon(tm) 7750 Dual-Core Processor
NVIDIA nForce MCP77 ATA133 IDE Controller gave 13Mb/s

Ilia.

Manuel Bouyer

2014-07-17 07:56:25 UTC

Permalink

Post by Dima Veselov
[...]
One more thing - this system previously worked with NetBSD 5 for
years and I never noticed anything like that. Hardware was the
same except it was 1x1Tb Raidframe disk instead of 2x3Tb now.

that could be the difference. Are partitions properly aligned to 4K ?
can you show the gpt tables (of both the disks, and raid device) ?

Also, what do dkctl strategy say for each component involved ?

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Dima Veselov

2014-07-26 09:05:05 UTC

Permalink

Hello,

Sorry, I have lost your last message, but wish to know something about
4k-alignment. What should be aligned to what?

I assume sector is 512-byte and gpt show it in sectors. So I have
gpt disk part 5801288527 sectors, it doesn't align to anything.

raidctl says it in sectors too and size is 5801288448, so it is aligned
to 4k, and is aligned to sectPerSU, which is 128 blocks.

dk10 is 5801288381, which is aligned to nothing too, filesystem resides
on this wedge seem to be unaligned. It is the only thing I have to do is
to shrink filesystem to align to 4k blocks? Or better align it to 128*512 to
match raid somehow? Or I have to rebuild wedges to align to something?

Or maybe I am completely wrong and all the things should be aligned not
only by its size but also by its start?

Post by Manuel Bouyer

I would try some benchmark like iozone.
dd if=/dev/zero of=file bs=1m count=50000
(you have to make sure the file created doesn't fit in ram).
Also you don't say what kind of access you're doing: a single large
file write, or lots of small files ?
--
NetBSD: 26 ans d'experience feront toujours la difference
--

--
Sincerelly yours

Manuel Bouyer

2014-07-26 09:12:24 UTC

Permalink

Post by Dima Veselov
Hello,
Sorry, I have lost your last message, but wish to know something about
4k-alignment. What should be aligned to what?
I assume sector is 512-byte and gpt show it in sectors. So I have
gpt disk part 5801288527 sectors, it doesn't align to anything.
raidctl says it in sectors too and size is 5801288448, so it is aligned
to 4k, and is aligned to sectPerSU, which is 128 blocks.
dk10 is 5801288381, which is aligned to nothing too, filesystem resides
on this wedge seem to be unaligned. It is the only thing I have to do is
to shrink filesystem to align to 4k blocks? Or better align it to 128*512 to
match raid somehow? Or I have to rebuild wedges to align to something?
Or maybe I am completely wrong and all the things should be aligned not
only by its size but also by its start?

size doesn't matter much I guess, start is important.
Disks largers than 2TB use 4K sectors internally, so even through they
present themselves as disks with 512bytes sectors, any xfer wich is not
aligned to 4K and a multiple of 4K in size will be handled as a
read-modify-write transaction internally (which is much slower).
So every parititon should be aligned to 4K on the physical media
(and so you need to make sure everything is properly aligned in
a gpt on raid on gpt setup), and it's better to use filesystems with
fragment size >= 4K too.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Patrick Welche

2014-07-26 09:14:53 UTC

Permalink

Post by Dima Veselov
Sorry, I have lost your last message, but wish to know something about
4k-alignment. What should be aligned to what?

I suspect it is something like:

# gpt show wd0
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 30
64 14680192 1 GPT part - NetBSD RAIDFrame component
14680256 33554432 2 GPT part - NetBSD swap

Note how I didn't start index 1 at the default 34 (512 byte blocks
= offset of 17408 bytes which is not divisible by 4k = 4096 bytes),
but used gpt add "-b" to start at 64 (i.e., offset of 32768 bytes which
is divisible by 4096).

(and
14680256*512/4096
1835032.00000000000000000000
)

Cheers,

Patrick

Dima Veselov

2014-07-26 12:39:09 UTC

Permalink

Thank god I used your manual instead of www one for the setup and
my first wedges are aligned as yours (same hole from 34 to 64).

Pity I cannot rebuild biggest partition without copying 2Tb of data,
because of last wedge, which is on raid device. Will do this
tomorrow. At least I don't need more drives making one more raid device
and copying data between them.

Post by Patrick Welche

Post by Dima Veselov
Sorry, I have lost your last message, but wish to know something about
4k-alignment. What should be aligned to what?

# gpt show wd0
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 30
64 14680192 1 GPT part - NetBSD RAIDFrame component
14680256 33554432 2 GPT part - NetBSD swap
Note how I didn't start index 1 at the default 34 (512 byte blocks
= offset of 17408 bytes which is not divisible by 4k = 4096 bytes),
but used gpt add "-b" to start at 64 (i.e., offset of 32768 bytes which
is divisible by 4096).
(and
14680256*512/4096
1835032.00000000000000000000
)
Cheers,
Patrick

--
Sincerelly yours

Dima Veselov

2014-07-26 12:55:11 UTC

Permalink

Hello once again,

I should not use FS with less then 4k fragment, but for 2.7Tb
disk - what should be an advice for raidframe blocks, for
fs blocks?

Usually I use 128 1 1 1 in raidctl (knowing nothing else :)
and newfs -b 64k.

Post by Patrick Welche

Post by Dima Veselov
Sorry, I have lost your last message, but wish to know something about
4k-alignment. What should be aligned to what?

--
Sincerelly yours

Manuel Bouyer

2014-07-27 18:14:19 UTC

Permalink

Post by Dima Veselov
Hello once again,
I should not use FS with less then 4k fragment, but for 2.7Tb
disk - what should be an advice for raidframe blocks, for
fs blocks?

It depends on how you use the filesystem (especially the average file size)
but usually 32k or 64k is fine for large filesystems.

--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--

Ottavio Caruso

2014-07-14 14:38:52 UTC

Permalink

Post by Dima Veselov
Can someone tell - is there a way to inspect poor hard drive performance?

On my ancient Lenovo X61 I can perform a smart test directly from the
bios. I don'k know how reliable it is. Otherwise I have used
SystemRescueCd in the past.

--
Ottavio