diff mbox series

[v2,1/2] KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes

Message ID 20230324145424.293889-2-nrb@linux.ibm.com (mailing list archive)
State Accepted
Commit 285cff4c0454340a4dc53f46e67f2cb1c293bd74
Headers show
Series KVM: s390: CMMA migration selftest and small bugfix | expand

Commit Message

Nico Boehr March 24, 2023, 2:54 p.m. UTC
The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace
specifies a start_gfn outside of memslots.

This can occur when a VM has multiple memslots with a hole in between:

+-----+----------+--------+--------+
| ... | Slot N-1 | <hole> | Slot N |
+-----+----------+--------+--------+
      ^          ^        ^        ^
      |          |        |        |
GFN   A          A+B      |        |
                          A+B+C    |
			           A+B+C+D

When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the
CMMA values of the first dirty page in Slot N. However, userspace may get a
start_gfn of A+B+C+D with a count of 0, hence completely skipping over any
dirty pages in slot N.

The error is in kvm_s390_next_dirty_cmma(), which assumes
gfn_to_memslot_approx() will return the memslot _below_ the specified GFN
when the specified GFN lies outside a memslot. In reality it may return
either the memslot below or above the specified GFN.

When a memslot above the specified GFN is returned this happens:

- ofs is calculated, but since the memslot's base_gfn is larger than the
  specified cur_gfn, ofs will underflow to a huge number.
- ofs is passed to find_next_bit(). Since ofs will exceed the memslot's
  number of pages, the number of pages in the memslot is returned,
  completely skipping over all bits in the memslot userspace would be
  interested in.

Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is
returned (cur_gfn < ms->base_gfn).

Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Janosch Frank May 12, 2023, 7:20 a.m. UTC | #1
On 3/24/23 15:54, Nico Boehr wrote:
> The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace
> specifies a start_gfn outside of memslots.
> 
> This can occur when a VM has multiple memslots with a hole in between:
> 
> +-----+----------+--------+--------+
> | ... | Slot N-1 | <hole> | Slot N |
> +-----+----------+--------+--------+
>        ^          ^        ^        ^
>        |          |        |        |
> GFN   A          A+B      |        |
>                            A+B+C    |
> 			           A+B+C+D
> 
> When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the
> CMMA values of the first dirty page in Slot N. However, userspace may get a
> start_gfn of A+B+C+D with a count of 0, hence completely skipping over any
> dirty pages in slot N.
> 
> The error is in kvm_s390_next_dirty_cmma(), which assumes
> gfn_to_memslot_approx() will return the memslot _below_ the specified GFN
> when the specified GFN lies outside a memslot. In reality it may return
> either the memslot below or above the specified GFN.
> 
> When a memslot above the specified GFN is returned this happens:
> 
> - ofs is calculated, but since the memslot's base_gfn is larger than the
>    specified cur_gfn, ofs will underflow to a huge number.
> - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's
>    number of pages, the number of pages in the memslot is returned,
>    completely skipping over all bits in the memslot userspace would be
>    interested in.
> 
> Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is
> returned (cur_gfn < ms->base_gfn).
> 
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Does this need a fix tag?
Nico Boehr May 12, 2023, 7:54 a.m. UTC | #2
Quoting Nico Boehr (2023-03-24 15:54:23)
> The KVM_S390_GET_CMMA_BITS ioctl may return incorrect values when userspace
> specifies a start_gfn outside of memslots.
> 
> This can occur when a VM has multiple memslots with a hole in between:
> 
> +-----+----------+--------+--------+
> | ... | Slot N-1 | <hole> | Slot N |
> +-----+----------+--------+--------+
>       ^          ^        ^        ^
>       |          |        |        |
> GFN   A          A+B      |        |
>                           A+B+C    |
>                                    A+B+C+D
> 
> When userspace specifies a GFN in [A+B, A+B+C), it would expect to get the
> CMMA values of the first dirty page in Slot N. However, userspace may get a
> start_gfn of A+B+C+D with a count of 0, hence completely skipping over any
> dirty pages in slot N.
> 
> The error is in kvm_s390_next_dirty_cmma(), which assumes
> gfn_to_memslot_approx() will return the memslot _below_ the specified GFN
> when the specified GFN lies outside a memslot. In reality it may return
> either the memslot below or above the specified GFN.
> 
> When a memslot above the specified GFN is returned this happens:
> 
> - ofs is calculated, but since the memslot's base_gfn is larger than the
>   specified cur_gfn, ofs will underflow to a huge number.
> - ofs is passed to find_next_bit(). Since ofs will exceed the memslot's
>   number of pages, the number of pages in the memslot is returned,
>   completely skipping over all bits in the memslot userspace would be
>   interested in.
> 
> Fix this by resetting ofs to zero when a memslot _above_ cur_gfn is
> returned (cur_gfn < ms->base_gfn).
> 
> Signed-off-by: Nico Boehr <nrb@linux.ibm.com>
> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>

Since Janosch asked for it, I would suggest the following Fixes tag:

Fixes: afdad61615cc ("KVM: s390: Fix storage attributes migration with memory slots")
diff mbox series

Patch

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 39b36562c043..db2de76bb0ef 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2155,6 +2155,10 @@  static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots,
 		ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);
 		ofs = 0;
 	}
+
+	if (cur_gfn < ms->base_gfn)
+		ofs = 0;
+
 	ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs);
 	while (ofs >= ms->npages && (mnode = rb_next(mnode))) {
 		ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);