diff mbox

docs: document how to use the l2-cache-entry-size parameter

Message ID 20180219145459.8143-1-berto@igalia.com (mailing list archive)
State New, archived
Headers show

Commit Message

Alberto Garcia Feb. 19, 2018, 2:54 p.m. UTC
This patch updates docs/qcow2-cache.txt explaining how to use the new
l2-cache-entry-size parameter.

Here's a more detailed technical description of this feature:

   https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html

And here are some performance numbers:

   https://lists.gnu.org/archive/html/qemu-block/2017-12/msg00507.html

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/qcow2-cache.txt | 46 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 43 insertions(+), 3 deletions(-)

Comments

Eric Blake Feb. 19, 2018, 4:44 p.m. UTC | #1
On 02/19/2018 08:54 AM, Alberto Garcia wrote:
> This patch updates docs/qcow2-cache.txt explaining how to use the new
> l2-cache-entry-size parameter.
> 
> Here's a more detailed technical description of this feature:
> 
>     https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> And here are some performance numbers:
> 
>     https://lists.gnu.org/archive/html/qemu-block/2017-12/msg00507.html
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   docs/qcow2-cache.txt | 46 +++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 43 insertions(+), 3 deletions(-)
> 

> +Some things to take into account:
> +
> + - The L2 cache entry size has the same restrictions as the cluster
> +   size (power of two, at least 512 bytes).

Worth mentioning the upper limit of the cluster size?

Otherwise,
Reviewed-by: Eric Blake <eblake@redhat.com>
Alberto Garcia Feb. 19, 2018, 6:26 p.m. UTC | #2
On Mon 19 Feb 2018 05:44:23 PM CET, Eric Blake <eblake@redhat.com> wrote:
> On 02/19/2018 08:54 AM, Alberto Garcia wrote:
>> This patch updates docs/qcow2-cache.txt explaining how to use the new
>> l2-cache-entry-size parameter.
>> 
>> Here's a more detailed technical description of this feature:
>> 
>>     https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
>> 
>> And here are some performance numbers:
>> 
>>     https://lists.gnu.org/archive/html/qemu-block/2017-12/msg00507.html
>> 
>> Signed-off-by: Alberto Garcia <berto@igalia.com>
>> ---
>>   docs/qcow2-cache.txt | 46 +++++++++++++++++++++++++++++++++++++++++++---
>>   1 file changed, 43 insertions(+), 3 deletions(-)
>> 
>
>> +Some things to take into account:
>> +
>> + - The L2 cache entry size has the same restrictions as the cluster
>> +   size (power of two, at least 512 bytes).
>
> Worth mentioning the upper limit of the cluster size?

I thought it would be unnecessary since the that's already mentioned
several times in the text ("you can change the size [...] and make it
smaller than the cluster size", "Using smaller cache entries"), and I
think it's pretty obvious if you actually read that section.

Berto
Eric Blake Feb. 19, 2018, 7:10 p.m. UTC | #3
On 02/19/2018 12:26 PM, Alberto Garcia wrote:

>>> + - The L2 cache entry size has the same restrictions as the cluster
>>> +   size (power of two, at least 512 bytes).
>>
>> Worth mentioning the upper limit of the cluster size?
> 
> I thought it would be unnecessary since the that's already mentioned
> several times in the text ("you can change the size [...] and make it
> smaller than the cluster size", "Using smaller cache entries"), and I
> think it's pretty obvious if you actually read that section.

Works for me, then
Kevin Wolf Feb. 21, 2018, 6:33 p.m. UTC | #4
Am 19.02.2018 um 15:54 hat Alberto Garcia geschrieben:
> This patch updates docs/qcow2-cache.txt explaining how to use the new
> l2-cache-entry-size parameter.
> 
> Here's a more detailed technical description of this feature:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2017-09/msg00635.html
> 
> And here are some performance numbers:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2017-12/msg00507.html
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>

Thanks, applied to the block branch.


While reviewing this, I read the whole document and stumbled across
these paragraphs:

> The reason for this 1/4 ratio is to ensure that both caches cover the
> same amount of disk space. Note however that this is only valid with
> the default value of refcount_bits (16). If you are using a different
> value you might want to calculate both cache sizes yourself since QEMU
> will always use the same 1/4 ratio.

Sounds like we should fix our defaults?

While we're at it, would l2-cache-entry-size = MIN(cluster_size, 64k)
make sense as a default?

> It's also worth mentioning that there's no strict need for both caches
> to cover the same amount of disk space. The refcount cache is used
> much less often than the L2 cache, so it's perfectly reasonable to
> keep it small.

More precisely, it is only used for cluster allocation, not for read or
for rewrites. Usually this means that it's indeed accessed a lot less,
though especially in benchmarks, this isn't necessarily less often.

However, the more important part is that even for allocating writes with
random I/O, the refcount cache is still accessed sequentially and we
don't really take advantage of having more than a single refcount block
in memory. This only stops being true as soon as you add something that
can free clusters (discards, overwriting compressed cluster, deleting
internal snapshots).

We have a minimum refcount block cache size of 4 clusters because of the
possible recursion during refcount table growth, which leaves some room
to hold the refcount block for an occasional discard (and subsequent
reallocation).

So should we default to this minimum on the grounds that for most
people, refcounts blocks are probably only accessed sequentially in
practice? The remaining memory of the total cache size seems to help the
average case more if it's added to the L2 cache instead.

Kevin
Alberto Garcia Feb. 22, 2018, 1:06 p.m. UTC | #5
On Wed 21 Feb 2018 07:33:55 PM CET, Kevin Wolf wrote:

 [docs/qcow2-cache.txt]
> While reviewing this, I read the whole document and stumbled across
> these paragraphs:
>
>> The reason for this 1/4 ratio is to ensure that both caches cover the
>> same amount of disk space. Note however that this is only valid with
>> the default value of refcount_bits (16). If you are using a different
>> value you might want to calculate both cache sizes yourself since
>> QEMU will always use the same 1/4 ratio.
>
> Sounds like we should fix our defaults?

We could do that, yes, we would be "breaking" compatibility with
previous versions, but we're not talking about semantic changes here, so
it's not a big deal in and on itself.

Of course this would be a problem if the new defaults make things work
slower, but I don't think that's the case here.

Just to confirm: do you suggest reducing the refcount cache size
(because it's not so necessary) or changing the formula so the number of
refcount_bits is taken into account so that both caches cover the same
amount of disk space in all cases?

I suppose it's the former based on what you write below.

> While we're at it, would l2-cache-entry-size = MIN(cluster_size, 64k)
> make sense as a default?

Any reason why you choose 64k, or is it because it's the default cluster
size?

In general I'd be cautious to reduce the default entry size
unconditionally because with rotating HDDs it may even have a negative
(although slight) effect on performance. But the larger the cluster, the
larger the area mapped by L2 entries, so we need less metadata and it
makes more sense to read less in general.

In summary: while I think it's probably a good idea, it would be good to
make some tests before changing the default.

>> It's also worth mentioning that there's no strict need for both
>> caches to cover the same amount of disk space. The refcount cache is
>> used much less often than the L2 cache, so it's perfectly reasonable
>> to keep it small.
>
> More precisely, it is only used for cluster allocation, not for read
> or for rewrites. Usually this means that it's indeed accessed a lot
> less, though especially in benchmarks, this isn't necessarily less
> often.
>
> However, the more important part is that even for allocating writes
> with random I/O, the refcount cache is still accessed sequentially and
> we don't really take advantage of having more than a single refcount
> block in memory. This only stops being true as soon as you add
> something that can free clusters (discards, overwriting compressed
> cluster, deleting internal snapshots).
>
> We have a minimum refcount block cache size of 4 clusters because of
> the possible recursion during refcount table growth, which leaves some
> room to hold the refcount block for an occasional discard (and
> subsequent reallocation).
>
> So should we default to this minimum on the grounds that for most
> people, refcounts blocks are probably only accessed sequentially in
> practice? The remaining memory of the total cache size seems to help
> the average case more if it's added to the L2 cache instead.

That sounds like a good idea. We should double check that the minimum is
indeed the required minimum, and run some tests, but otherwise I'm all
for it.

I think I can take care of this if you want.

Berto
Kevin Wolf Feb. 22, 2018, 2:17 p.m. UTC | #6
Am 22.02.2018 um 14:06 hat Alberto Garcia geschrieben:
> On Wed 21 Feb 2018 07:33:55 PM CET, Kevin Wolf wrote:
> 
>  [docs/qcow2-cache.txt]
> > While reviewing this, I read the whole document and stumbled across
> > these paragraphs:
> >
> >> The reason for this 1/4 ratio is to ensure that both caches cover the
> >> same amount of disk space. Note however that this is only valid with
> >> the default value of refcount_bits (16). If you are using a different
> >> value you might want to calculate both cache sizes yourself since
> >> QEMU will always use the same 1/4 ratio.
> >
> > Sounds like we should fix our defaults?
> 
> We could do that, yes, we would be "breaking" compatibility with
> previous versions, but we're not talking about semantic changes here, so
> it's not a big deal in and on itself.

I don't think changing behaviour and breaking compatibility is the same
thing. If there is an option, you should expect that its default can
change in newer versions. If you want a specific setting, you should be
explicit.

> Of course this would be a problem if the new defaults make things work
> slower, but I don't think that's the case here.
> 
> Just to confirm: do you suggest reducing the refcount cache size
> (because it's not so necessary) or changing the formula so the number of
> refcount_bits is taken into account so that both caches cover the same
> amount of disk space in all cases?
> 
> I suppose it's the former based on what you write below.

I was first thinking to make the ratio dynamic based on the refcount
width, but then it occurred to me that a fixed absolute number probably
makes even more sense. Either way should be better than the current
default, though.

> > While we're at it, would l2-cache-entry-size = MIN(cluster_size, 64k)
> > make sense as a default?
> 
> Any reason why you choose 64k, or is it because it's the default cluster
> size?
> 
> In general I'd be cautious to reduce the default entry size
> unconditionally because with rotating HDDs it may even have a negative
> (although slight) effect on performance. But the larger the cluster, the
> larger the area mapped by L2 entries, so we need less metadata and it
> makes more sense to read less in general.
> 
> In summary: while I think it's probably a good idea, it would be good to
> make some tests before changing the default.

The exact value of 64k is more or less because it's the default cluster
size, yes. Not changing anything for the default cluster size makes it
a bit easier to justify.

What I really want to fix is the 2 MB entry size with the maximum
cluster size, because that's just unreasonably large. It's not
completely clear what the ideal size is, but when we increased the
default cluster size from 4k, the optimal values (on rotating hard
disks back then) were 64k or 128k. So I assume that it's somewhere
around these sizes that unnecessary I/O starts to hurt more than reading
one big chunk instead of two smaller ones helps.

Of course, guest I/O a few years ago and metadata I/O today aren't
exactly the same thing, so I agree that a change should be measured
first. But 64k-128k feels right as an educated guess where to start.

> >> It's also worth mentioning that there's no strict need for both
> >> caches to cover the same amount of disk space. The refcount cache is
> >> used much less often than the L2 cache, so it's perfectly reasonable
> >> to keep it small.
> >
> > More precisely, it is only used for cluster allocation, not for read
> > or for rewrites. Usually this means that it's indeed accessed a lot
> > less, though especially in benchmarks, this isn't necessarily less
> > often.
> >
> > However, the more important part is that even for allocating writes
> > with random I/O, the refcount cache is still accessed sequentially and
> > we don't really take advantage of having more than a single refcount
> > block in memory. This only stops being true as soon as you add
> > something that can free clusters (discards, overwriting compressed
> > cluster, deleting internal snapshots).
> >
> > We have a minimum refcount block cache size of 4 clusters because of
> > the possible recursion during refcount table growth, which leaves some
> > room to hold the refcount block for an occasional discard (and
> > subsequent reallocation).
> >
> > So should we default to this minimum on the grounds that for most
> > people, refcounts blocks are probably only accessed sequentially in
> > practice? The remaining memory of the total cache size seems to help
> > the average case more if it's added to the L2 cache instead.
> 
> That sounds like a good idea. We should double check that the minimum is
> indeed the required minimum, and run some tests, but otherwise I'm all
> for it.
> 
> I think I can take care of this if you want.

That would be good.

I'm pretty confident that the minimum is enough because it's also the
default for 64k clusters. Maybe it's too high if I miscalculated back
then, but I haven't seen a crash report related to running out of
refcount block cache entries.

Kevin
Alberto Garcia Feb. 22, 2018, 4:28 p.m. UTC | #7
On Thu 22 Feb 2018 03:17:57 PM CET, Kevin Wolf wrote:
>> > While we're at it, would l2-cache-entry-size = MIN(cluster_size,
>> > 64k) make sense as a default?
>> 
>> Any reason why you choose 64k, or is it because it's the default
>> cluster size?
>> 
>> In general I'd be cautious to reduce the default entry size
>> unconditionally because with rotating HDDs it may even have a
>> negative (although slight) effect on performance. But the larger the
>> cluster, the larger the area mapped by L2 entries, so we need less
>> metadata and it makes more sense to read less in general.
>> 
>> In summary: while I think it's probably a good idea, it would be good
>> to make some tests before changing the default.
>
> The exact value of 64k is more or less because it's the default
> cluster size, yes. Not changing anything for the default cluster size
> makes it a bit easier to justify.
>
> What I really want to fix is the 2 MB entry size with the maximum
> cluster size, because that's just unreasonably large. It's not
> completely clear what the ideal size is, but when we increased the
> default cluster size from 4k, the optimal values (on rotating hard
> disks back then) were 64k or 128k. So I assume that it's somewhere
> around these sizes that unnecessary I/O starts to hurt more than
> reading one big chunk instead of two smaller ones helps.
>
> Of course, guest I/O a few years ago and metadata I/O today aren't
> exactly the same thing, so I agree that a change should be measured
> first. But 64k-128k feels right as an educated guess where to start.

Yes, I agree.

>> > So should we default to this minimum on the grounds that for most
>> > people, refcounts blocks are probably only accessed sequentially in
>> > practice? The remaining memory of the total cache size seems to
>> > help the average case more if it's added to the L2 cache instead.
>> 
>> That sounds like a good idea. We should double check that the minimum
>> is indeed the required minimum, and run some tests, but otherwise I'm
>> all for it.
>> 
>> I think I can take care of this if you want.
>
> That would be good.
>
> I'm pretty confident that the minimum is enough because it's also the
> default for 64k clusters. Maybe it's too high if I miscalculated back
> then, but I haven't seen a crash report related to running out of
> refcount block cache entries.

Ok, I'll take a look.

Berto
diff mbox

Patch

diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index b0571de4b8..170191a242 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@ 
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015 Igalia, S.L.
+Copyright (C) 2015, 2018 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -118,8 +118,8 @@  There are three options available, and all of them take bytes:
 
 There are two things that need to be taken into account:
 
- - Both caches must have a size that is a multiple of the cluster
-   size.
+ - Both caches must have a size that is a multiple of the cluster size
+   (or the cache entry size: see "Using smaller cache sizes" below).
 
  - If you only set one of the options above, QEMU will automatically
    adjust the others so that the L2 cache is 4 times bigger than the
@@ -143,6 +143,46 @@  much less often than the L2 cache, so it's perfectly reasonable to
 keep it small.
 
 
+Using smaller cache entries
+---------------------------
+The qcow2 L2 cache stores complete tables by default. This means that
+if QEMU needs an entry from an L2 table then the whole table is read
+from disk and is kept in the cache. If the cache is full then a
+complete table needs to be evicted first.
+
+This can be inefficient with large cluster sizes since it results in
+more disk I/O and wastes more cache memory.
+
+Since QEMU 2.12 you can change the size of the L2 cache entry and make
+it smaller than the cluster size. This can be configured using the
+"l2-cache-entry-size" parameter:
+
+   -drive file=hd.qcow2,l2-cache-size=2097152,l2-cache-entry-size=4096
+
+Some things to take into account:
+
+ - The L2 cache entry size has the same restrictions as the cluster
+   size (power of two, at least 512 bytes).
+
+ - Smaller entry sizes generally improve the cache efficiency and make
+   disk I/O faster. This is particularly true with solid state drives
+   so it's a good idea to reduce the entry size in those cases. With
+   rotating hard drives the situation is a bit more complicated so you
+   should test it first and stay with the default size if unsure.
+
+ - Try different entry sizes to see which one gives faster performance
+   in your case. The block size of the host filesystem is generally a
+   good default (usually 4096 bytes in the case of ext4).
+
+ - Only the L2 cache can be configured this way. The refcount cache
+   always uses the cluster size as the entry size.
+
+ - If the L2 cache is big enough to hold all of the image's L2 tables
+   (as explained in the "Choosing the right cache sizes" section
+   earlier in this document) then none of this is necessary and you
+   can omit the "l2-cache-entry-size" parameter altogether.
+
+
 Reducing the memory usage
 -------------------------
 It is possible to clean unused cache entries in order to reduce the