diff mbox series

[RFC,v3,05/27] qcow2: Document the Extended L2 Entries feature

Message ID 0b884ddcd0ac3a3c0b8cdd9d09c74566ac107c9a.1577014346.git.berto@igalia.com (mailing list archive)
State New, archived
Headers show
Series Add subcluster allocation to qcow2 | expand

Commit Message

Alberto Garcia Dec. 22, 2019, 11:36 a.m. UTC
Subcluster allocation in qcow2 is implemented by extending the
existing L2 table entries and adding additional information to
indicate the allocation status of each subcluster.

This patch documents the changes to the qcow2 format and how they
affect the calculation of the L2 cache size.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
 docs/qcow2-cache.txt   | 19 +++++++++++-
 2 files changed, 83 insertions(+), 4 deletions(-)

Comments

Eric Blake Feb. 20, 2020, 2:28 p.m. UTC | #1
On 12/22/19 5:36 AM, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>   docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>   docs/qcow2-cache.txt   | 19 +++++++++++-
>   2 files changed, 83 insertions(+), 4 deletions(-)

This adds a new feature bit; where is the corresponding patch to qcow2.c 
to advertise the feature bit name in the optional feature name table?

/me reads ahead

good, patch 25 covers it.  Quick comment added there as a result.


> +== Extended L2 Entries ==
> +
> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
> +field of the header.
> +
> +In these images standard data clusters are divided into 32 subclusters of the
> +same size. They are contiguous and start from the beginning of the cluster.
> +Subclusters can be allocated independently and the L2 entry contains information
> +indicating the status of each one of them. Compressed data clusters don't have
> +subclusters so they are treated like in images without this feature.

Grammar; I'd suggest:

...don't have subclusters, so they are treated the same as in images 
without this feature.

Are they truly the same, or do you still need to document that the extra 
64 bits of the extended L2 entry are all zero?

> +
> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
> +
> +The first 64 bits have the same format as the standard L2 table entry described
> +in the previous section, with the exception of bit 0 of the standard cluster
> +descriptor.
> +
> +The last 64 bits contain a subcluster allocation bitmap with this format:
> +
> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 31 - x)
> +

Missing trailing '.'

> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
> +
> +                    1: the subcluster reads as zeros. In this case the
> +                       allocation status bit must be unset. The host
> +                       cluster offset field may or may not be set.

Why must the allocation bit be unset?  When we preallocate, we want a 
cluster to reserve space, but still read as zero, so the combination of 
both bits set makes sense to me.

> +                    0: no effect.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 63 - x)

and again.

> +
> +Subcluster Allocation Bitmap (for compressed clusters):
> +
> +    Bit  0 -  63:   Reserved (set to 0)
> +                    Compressed clusters don't have subclusters,
> +                    so this field is not used.
>   
>   == Snapshots ==
>   
> diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
> index d57f409861..04eb4ce2f1 100644
> --- a/docs/qcow2-cache.txt
> +++ b/docs/qcow2-cache.txt
> @@ -1,6 +1,6 @@
>   qcow2 L2/refcount cache configuration
>   =====================================
> -Copyright (C) 2015, 2018 Igalia, S.L.
> +Copyright (C) 2015, 2018-2019 Igalia, S.L.

Our review is late; you could add 2020 if desired, now.

>   Author: Alberto Garcia <berto@igalia.com>
>   
>   This work is licensed under the terms of the GNU GPL, version 2 or
> @@ -222,3 +222,20 @@ support this functionality, and is 0 (disabled) on other platforms.
>   This functionality currently relies on the MADV_DONTNEED argument for
>   madvise() to actually free the memory. This is a Linux-specific feature,
>   so cache-clean-interval is not supported on other systems.
> +
> +
> +Extended L2 Entries
> +-------------------
> +All numbers shown in this document are valid for qcow2 images with normal
> +64-bit L2 entries.
> +
> +Images with extended L2 entries need twice as much L2 metadata, so the L2
> +cache size must be twice as large for the same disk space.
> +
> +   disk_size = l2_cache_size * cluster_size / 16
> +
> +i.e.
> +
> +   l2_cache_size = disk_size * 16 / cluster_size
> +
> +Refcount blocks are not affected by this.
>
Eric Blake Feb. 20, 2020, 2:33 p.m. UTC | #2
On 12/22/19 5:36 AM, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---

> @@ -437,7 +445,7 @@ cannot be relaxed without an incompatible layout change).
>   Given an offset into the virtual disk, the offset into the image file can be
>   obtained as follows:
>   
> -    l2_entries = (cluster_size / sizeof(uint64_t))
> +    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
>   
>       l2_index = (offset / cluster_size) % l2_entries
>       l1_index = (offset / cluster_size) / l2_entries
> @@ -447,6 +455,8 @@ obtained as follows:
>   
>       return cluster_offset + (offset % cluster_size)
>   
> +    [*] this changes if Extended L2 Entries are enabled, see next section

> +The size of an extended L2 entry is 128 bits so the number of entries per table
> +is calculated using this formula:
> +
> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))

Is it worth unifying these statements by writing:

l2_entries = (cluster_size / ((1 + extended_l2) * sizeof(uint64_t)))

or is that too confusing?
Alberto Garcia Feb. 20, 2020, 2:49 p.m. UTC | #3
On Thu 20 Feb 2020 03:28:17 PM CET, Eric Blake wrote:
>> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
>> +field of the header.
>> +
>> +In these images standard data clusters are divided into 32 subclusters of the
>> +same size. They are contiguous and start from the beginning of the cluster.
>> +Subclusters can be allocated independently and the L2 entry contains information
>> +indicating the status of each one of them. Compressed data clusters don't have
>> +subclusters so they are treated like in images without this feature.
>
> Grammar; I'd suggest:
>
> ...don't have subclusters, so they are treated the same as in images 
> without this feature.

Ok

> Are they truly the same, or do you still need to document that the
> extra 64 bits of the extended L2 entry are all zero?

It is documented later in the same patch ("Subcluster Allocation Bitmap
for compressed clusters").

By the way, this series treats an L2 entry as invalid if any of those
bits is not zero, but I think I'll change that. Conceivably those bits
could be used for a future compatible feature, but it can only be
compatible if the previous versions ignore those bits.

>> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
>> +
>> +                    1: the subcluster reads as zeros. In this case the
>> +                       allocation status bit must be unset. The host
>> +                       cluster offset field may or may not be set.
>
> Why must the allocation bit be unset?  When we preallocate, we want a
> cluster to reserve space, but still read as zero, so the combination
> of both bits set makes sense to me.

Since 00 means unallocated and 01 allocated, there are two options left
to represent the "reads as zero" case: 10 and 11.

I think that one could argue for either one and there is no "right"
choice. I chose the former because I understood the allocation bit as
"the guest visible data is obtained from the raw data in that
subcluster" but the other option also makes sense.

Berto
Eric Blake Feb. 20, 2020, 3:16 p.m. UTC | #4
On 2/20/20 8:49 AM, Alberto Garcia wrote:
> On Thu 20 Feb 2020 03:28:17 PM CET, Eric Blake wrote:
>>> +An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
>>> +field of the header.
>>> +
>>> +In these images standard data clusters are divided into 32 subclusters of the
>>> +same size. They are contiguous and start from the beginning of the cluster.
>>> +Subclusters can be allocated independently and the L2 entry contains information
>>> +indicating the status of each one of them. Compressed data clusters don't have
>>> +subclusters so they are treated like in images without this feature.
>>
>> Grammar; I'd suggest:
>>
>> ...don't have subclusters, so they are treated the same as in images
>> without this feature.
> 
> Ok
> 
>> Are they truly the same, or do you still need to document that the
>> extra 64 bits of the extended L2 entry are all zero?
> 
> It is documented later in the same patch ("Subcluster Allocation Bitmap
> for compressed clusters").

Yes, I saw the mention later.  I'm just wondering if we need to 
rearrange text to mention that the bits are reserved (set to 0, ignore 
on read) closer to the point where we document compressed clusters have 
no subclusters.

> 
> By the way, this series treats an L2 entry as invalid if any of those
> bits is not zero, but I think I'll change that. Conceivably those bits
> could be used for a future compatible feature, but it can only be
> compatible if the previous versions ignore those bits.
> 
>>> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
>>> +
>>> +                    1: the subcluster reads as zeros. In this case the
>>> +                       allocation status bit must be unset. The host
>>> +                       cluster offset field may or may not be set.
>>
>> Why must the allocation bit be unset?  When we preallocate, we want a
>> cluster to reserve space, but still read as zero, so the combination
>> of both bits set makes sense to me.
> 
> Since 00 means unallocated and 01 allocated, there are two options left
> to represent the "reads as zero" case: 10 and 11.
> 
> I think that one could argue for either one and there is no "right"
> choice. I chose the former because I understood the allocation bit as
> "the guest visible data is obtained from the raw data in that
> subcluster" but the other option also makes sense.

My argument is that BOTH bit settings make sense:

10 - reads as zero, but subcluster is not allocated
11 - reads as zero, and subcluster is allocated

Oh, I see.  I'm getting confused on the meanings of "allocated". 
Meaning 1: a host address is reserved for the guest address 
(pre-allocation sense).  Meaning 2: guest reads come from this layer 
rather than from the backing layer (COW/COR sense).

Pre-allocation is ALWAYS done a cluster at a time (you only have ONE 
host offset, shared among all 32 subclusters, per L2 entry), so either 
all 32 subclusters have a preallocated location, or none of them do. 
What is left, then, is a determination of whether to read locally or 
from the backing file, AND when reading locally, whether to read from 
the pre-allocated space or to just read zeroes.

We have 8 potential combinations (not all make sense):

host   zero alloc
   0      0    0     cluster unallocated, subcluster defers to backing
   0      0    1     error (except maybe for external data file)
   0      1    0     cluster unallocated, subcluster reads as zero
   0      1    1     error (except maybe for external data file)
  addr    0    0     cluster allocated, subcluster defers to backing
  addr    0    1     cluster allocated, subcluster reads from host
  addr    1    0     cluster allocated, subcluster reads as zero
  addr    1    1   error, or cluster allocated, subcluster reads as zero

Hmm - normally addr is non-zero (because the 0 addr is the metadata 
cluster of qcow2), but with external data file, host addr 0 is required 
for guest offset 0.  How do subclusters play with external data files? 
It makes sense to still have subclusters read as 0 or defer to backing 
with an external file (except maybe when raw external file is set).  But 
you did word it as if the alloc bit is set, the "host cluster offset 
field must contain a valid offset" which includes an offset of 0 for 
external data file.

If we mandate 10 for the reads-as-zero form, then whether addr is valid 
is irrelevant. If we mandate 11 for the reads-as-zero form, then addr 
must be valid even though we don't reference addr.  Having written all 
that, I agree that either form should work, but also that mandating one 
form leaves the door open for a future extension to define meaning to 
the form we did not permit (that is, either 10 or 11 becomes a reserved 
pattern that we can later give meaning to), vs. allowing both forms now 
and locking ourselves out of a future meaning.  And mandating addr to be 
valid even when reading zeroes doesn't use addr feels odd.

So, I'm okay with your choice of picking 00, 01, and 10 as the mandated 
forms, and declaring 11 as invalid for now (but a possible future 
extension).  Maybe I'll change my mind when seeing what complexity it 
adds to the qcow2 reference implementation, but hopefully not.
Max Reitz Feb. 20, 2020, 3:54 p.m. UTC | #5
On 22.12.19 12:36, Alberto Garcia wrote:
> Subcluster allocation in qcow2 is implemented by extending the
> existing L2 table entries and adding additional information to
> indicate the allocation status of each subcluster.
> 
> This patch documents the changes to the qcow2 format and how they
> affect the calculation of the L2 cache size.
> 
> Signed-off-by: Alberto Garcia <berto@igalia.com>
> ---
>  docs/interop/qcow2.txt | 68 ++++++++++++++++++++++++++++++++++++++++--
>  docs/qcow2-cache.txt   | 19 +++++++++++-
>  2 files changed, 83 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
> index af5711e533..d34261f955 100644
> --- a/docs/interop/qcow2.txt
> +++ b/docs/interop/qcow2.txt
> @@ -39,6 +39,9 @@ The first cluster of a qcow2 image contains the file header:
>                      as the maximum cluster size and won't be able to open images
>                      with larger cluster sizes.
>  
> +                    Note: if the image has Extended L2 Entries then cluster_bits
> +                    must be at least 14 (i.e. 16384 byte clusters).
> +
>           24 - 31:   size
>                      Virtual disk size in bytes.
>  
> @@ -109,7 +112,12 @@ in the description of a field.
>                                  An External Data File Name header extension may
>                                  be present if this bit is set.
>  
> -                    Bits 3-63:  Reserved (set to 0)
> +                    Bit 3:      Extended L2 Entries.  If this bit is set then

I suppose bit 4 now.  (Compression is bit 3.)

[...]

> +Subcluster Allocation Bitmap (for standard clusters):
> +
> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
> +
> +                    1: the subcluster is allocated. In this case the
> +                       host cluster offset field must contain a valid
> +                       offset.
> +                    0: the subcluster is not allocated. In this case
> +                       read requests shall go to the backing file or
> +                       return zeros if there is no backing file data.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 31 - x)

I still prefer it the other way round, both personally (e.g. it’s the C
ordering), and because other places in qcow2 use LSb for bit ordering
(the refcount order).

I don’t see ease of debugging as a particularly good reason; but then
again, I didn’t have to debug this feature yet (as opposed to you).

But since I’m used to counting bits from the right (because this is how
it’s done basically everywhere), I can’t imagine I would find it more
difficult than counting them from the left.

Max

> +        32 -  63    Subcluster reads as zeros (one bit per subcluster)
> +
> +                    1: the subcluster reads as zeros. In this case the
> +                       allocation status bit must be unset. The host
> +                       cluster offset field may or may not be set.
> +                    0: no effect.
> +
> +                    Bits are assigned starting from the most significant one.
> +                    (i.e. bit x is used for subcluster 63 - x)
Eric Blake Feb. 20, 2020, 4:02 p.m. UTC | #6
On 2/20/20 9:54 AM, Max Reitz wrote:

>> +Subcluster Allocation Bitmap (for standard clusters):
>> +
>> +    Bit  0 -  31:   Allocation status (one bit per subcluster)
>> +
>> +                    1: the subcluster is allocated. In this case the
>> +                       host cluster offset field must contain a valid
>> +                       offset.
>> +                    0: the subcluster is not allocated. In this case
>> +                       read requests shall go to the backing file or
>> +                       return zeros if there is no backing file data.
>> +
>> +                    Bits are assigned starting from the most significant one.
>> +                    (i.e. bit x is used for subcluster 31 - x)
> 
> I still prefer it the other way round, both personally (e.g. it’s the C
> ordering), and because other places in qcow2 use LSb for bit ordering
> (the refcount order).

Internal consistency with refcount order using LSb ordering is the 
strongest reason to flip things, and have bit x be subcluster x.
Alberto Garcia Feb. 20, 2020, 4:04 p.m. UTC | #7
On Thu 20 Feb 2020 05:02:22 PM CET, Eric Blake wrote:
>>> +                    Bits are assigned starting from the most significant one.
>>> +                    (i.e. bit x is used for subcluster 31 - x)
>> 
>> I still prefer it the other way round, both personally (e.g. it’s the
>> C ordering), and because other places in qcow2 use LSb for bit
>> ordering (the refcount order).
>
> Internal consistency with refcount order using LSb ordering is the
> strongest reason to flip things, and have bit x be subcluster x.

Ok, I think you're both right, I'll change that.

Berto
Alberto Garcia Feb. 20, 2020, 4:10 p.m. UTC | #8
On Thu 20 Feb 2020 03:33:57 PM CET, Eric Blake wrote:
>>   Given an offset into the virtual disk, the offset into the image file can be
>>   obtained as follows:
>>   
>> -    l2_entries = (cluster_size / sizeof(uint64_t))
>> +    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
>>   
>>       l2_index = (offset / cluster_size) % l2_entries
>>       l1_index = (offset / cluster_size) / l2_entries
>> @@ -447,6 +455,8 @@ obtained as follows:
>>   
>>       return cluster_offset + (offset % cluster_size)
>>   
>> +    [*] this changes if Extended L2 Entries are enabled, see next section
>
>> +The size of an extended L2 entry is 128 bits so the number of entries per table
>> +is calculated using this formula:
>> +
>> +    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
>
> Is it worth unifying these statements by writing:
>
> l2_entries = (cluster_size / ((1 + extended_l2) * sizeof(uint64_t)))
>
> or is that too confusing?

I think it's too confusing...

Berto
Alberto Garcia Feb. 26, 2020, 4:57 p.m. UTC | #9
On Thu 20 Feb 2020 04:16:12 PM CET, Eric Blake wrote:

>>>> +In these images standard data clusters are divided into 32 subclusters of the
>>>> +same size. They are contiguous and start from the beginning of the cluster.
>>>> +Subclusters can be allocated independently and the L2 entry contains information
>>>> +indicating the status of each one of them. Compressed data clusters don't have
>>>> +subclusters so they are treated like in images without this feature.
>>>
>>> Grammar; I'd suggest:
>>>
>>> ...don't have subclusters, so they are treated the same as in images
>>> without this feature.
>> 
>> Ok
>> 
>>> Are they truly the same, or do you still need to document that the
>>> extra 64 bits of the extended L2 entry are all zero?
>> 
>> It is documented later in the same patch ("Subcluster Allocation
>> Bitmap for compressed clusters").
>
> Yes, I saw the mention later.  I'm just wondering if we need to
> rearrange text to mention that the bits are reserved (set to 0, ignore
> on read) closer to the point where we document compressed clusters
> have no subclusters.

When I say that "compressed data clusters are treated the same as in
images without this feature" I mean that there are no semantic
changes. I don't think it's necessary to add anything else considering
that the sentence immediately after that one says that the L2 entry size
is now 128 bits, so it's not hard to guess that compressed cluster
descriptors must somehow be affected by this.

> We have 8 potential combinations (not all make sense):
>
> host   zero alloc
>    0      0    0     cluster unallocated, subcluster defers to backing
>    0      0    1     error (except maybe for external data file)

Correct (without the 'maybe')

>    0      1    0     cluster unallocated, subcluster reads as zero
>    0      1    1     error (except maybe for external data file)

This is an error in all cases.

>   addr    0    0     cluster allocated, subcluster defers to backing
>   addr    0    1     cluster allocated, subcluster reads from host
>   addr    1    0     cluster allocated, subcluster reads as zero
>   addr    1    1   error, or cluster allocated, subcluster reads as zero

The last one is also an error.

> Hmm - normally addr is non-zero (because the 0 addr is the metadata 
> cluster of qcow2), but with external data file, host addr 0 is required 
> for guest offset 0. How do subclusters play with external data files?

No difference:

    /* ... */ if (!(l2_entry & L2E_OFFSET_MASK)) {
        /* Offset 0 generally means unallocated, but it is ambiguous with
         * external data files because 0 is a valid offset there. However, all
         * clusters in external data files always have refcount 1, so we can
         * rely on QCOW_OFLAG_COPIED to disambiguate. */
        if (has_data_file(bs) && (l2_entry & QCOW_OFLAG_COPIED)) {
            return QCOW2_CLUSTER_NORMAL;
        } else {
            return QCOW2_CLUSTER_UNALLOCATED;
        }
    } /* ... */

This code doesn't change if there are subclusters, and is still used to
determine whether a cluster is allocated or not, and therefore whether
the subcluster allocation bits need to be checked or not.

> It makes sense to still have subclusters read as 0 or defer to backing
> with an external file (except maybe when raw external file is set).
> But you did word it as if the alloc bit is set, the "host cluster
> offset field must contain a valid offset" which includes an offset of
> 0 for external data file.

Yes, that is possible with subclusters (unless there's a bug).

> If we mandate 10 for the reads-as-zero form, then whether addr is
> valid is irrelevant. If we mandate 11 for the reads-as-zero form, then
> addr must be valid even though we don't reference addr.  Having
> written all that, I agree that either form should work, but also that
> mandating one form leaves the door open for a future extension to
> define meaning to the form we did not permit (that is, either 10 or 11
> becomes a reserved pattern that we can later give meaning to),
> vs. allowing both forms now and locking ourselves out of a future
> meaning.  And mandating addr to be valid even when reading zeroes
> doesn't use addr feels odd.

Yes, we definitely don't want to make 10 and 11 synonymous. One of them
should return an error and maybe in the future we can think of a new
meaning.

> So, I'm okay with your choice of picking 00, 01, and 10 as the
> mandated forms, and declaring 11 as invalid for now (but a possible
> future extension).  Maybe I'll change my mind when seeing what
> complexity it adds to the qcow2 reference implementation, but
> hopefully not.

From the implementation point of view there's no difference in
complexity.

Berto
diff mbox series

Patch

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index af5711e533..d34261f955 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -39,6 +39,9 @@  The first cluster of a qcow2 image contains the file header:
                     as the maximum cluster size and won't be able to open images
                     with larger cluster sizes.
 
+                    Note: if the image has Extended L2 Entries then cluster_bits
+                    must be at least 14 (i.e. 16384 byte clusters).
+
          24 - 31:   size
                     Virtual disk size in bytes.
 
@@ -109,7 +112,12 @@  in the description of a field.
                                 An External Data File Name header extension may
                                 be present if this bit is set.
 
-                    Bits 3-63:  Reserved (set to 0)
+                    Bit 3:      Extended L2 Entries.  If this bit is set then
+                                L2 table entries use an extended format that
+                                allows subcluster-based allocation. See the
+                                Extended L2 Entries section for more details.
+
+                    Bits 4-63:  Reserved (set to 0)
 
          80 -  87:  compatible_features
                     Bitmask of compatible features. An implementation can
@@ -437,7 +445,7 @@  cannot be relaxed without an incompatible layout change).
 Given an offset into the virtual disk, the offset into the image file can be
 obtained as follows:
 
-    l2_entries = (cluster_size / sizeof(uint64_t))
+    l2_entries = (cluster_size / sizeof(uint64_t))        [*]
 
     l2_index = (offset / cluster_size) % l2_entries
     l1_index = (offset / cluster_size) / l2_entries
@@ -447,6 +455,8 @@  obtained as follows:
 
     return cluster_offset + (offset % cluster_size)
 
+    [*] this changes if Extended L2 Entries are enabled, see next section
+
 L1 table entry:
 
     Bit  0 -  8:    Reserved (set to 0)
@@ -487,7 +497,8 @@  Standard Cluster Descriptor:
                     nor is data read from the backing file if the cluster is
                     unallocated.
 
-                    With version 2, this is always 0.
+                    With version 2 or with extended L2 entries (see the next
+                    section), this is always 0.
 
          1 -  8:    Reserved (set to 0)
 
@@ -524,6 +535,57 @@  file (except if bit 0 in the Standard Cluster Descriptor is set). If there is
 no backing file or the backing file is smaller than the image, they shall read
 zeros for all parts that are not covered by the backing file.
 
+== Extended L2 Entries ==
+
+An image uses Extended L2 Entries if bit 3 is set on the incompatible_features
+field of the header.
+
+In these images standard data clusters are divided into 32 subclusters of the
+same size. They are contiguous and start from the beginning of the cluster.
+Subclusters can be allocated independently and the L2 entry contains information
+indicating the status of each one of them. Compressed data clusters don't have
+subclusters so they are treated like in images without this feature.
+
+The size of an extended L2 entry is 128 bits so the number of entries per table
+is calculated using this formula:
+
+    l2_entries = (cluster_size / (2 * sizeof(uint64_t)))
+
+The first 64 bits have the same format as the standard L2 table entry described
+in the previous section, with the exception of bit 0 of the standard cluster
+descriptor.
+
+The last 64 bits contain a subcluster allocation bitmap with this format:
+
+Subcluster Allocation Bitmap (for standard clusters):
+
+    Bit  0 -  31:   Allocation status (one bit per subcluster)
+
+                    1: the subcluster is allocated. In this case the
+                       host cluster offset field must contain a valid
+                       offset.
+                    0: the subcluster is not allocated. In this case
+                       read requests shall go to the backing file or
+                       return zeros if there is no backing file data.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 31 - x)
+
+        32 -  63    Subcluster reads as zeros (one bit per subcluster)
+
+                    1: the subcluster reads as zeros. In this case the
+                       allocation status bit must be unset. The host
+                       cluster offset field may or may not be set.
+                    0: no effect.
+
+                    Bits are assigned starting from the most significant one.
+                    (i.e. bit x is used for subcluster 63 - x)
+
+Subcluster Allocation Bitmap (for compressed clusters):
+
+    Bit  0 -  63:   Reserved (set to 0)
+                    Compressed clusters don't have subclusters,
+                    so this field is not used.
 
 == Snapshots ==
 
diff --git a/docs/qcow2-cache.txt b/docs/qcow2-cache.txt
index d57f409861..04eb4ce2f1 100644
--- a/docs/qcow2-cache.txt
+++ b/docs/qcow2-cache.txt
@@ -1,6 +1,6 @@ 
 qcow2 L2/refcount cache configuration
 =====================================
-Copyright (C) 2015, 2018 Igalia, S.L.
+Copyright (C) 2015, 2018-2019 Igalia, S.L.
 Author: Alberto Garcia <berto@igalia.com>
 
 This work is licensed under the terms of the GNU GPL, version 2 or
@@ -222,3 +222,20 @@  support this functionality, and is 0 (disabled) on other platforms.
 This functionality currently relies on the MADV_DONTNEED argument for
 madvise() to actually free the memory. This is a Linux-specific feature,
 so cache-clean-interval is not supported on other systems.
+
+
+Extended L2 Entries
+-------------------
+All numbers shown in this document are valid for qcow2 images with normal
+64-bit L2 entries.
+
+Images with extended L2 entries need twice as much L2 metadata, so the L2
+cache size must be twice as large for the same disk space.
+
+   disk_size = l2_cache_size * cluster_size / 16
+
+i.e.
+
+   l2_cache_size = disk_size * 16 / cluster_size
+
+Refcount blocks are not affected by this.