diff mbox series

KVM: arm64: vgic: fix wrong loop condition in scan_its_table()

Message ID acd9f1643980fbd27cd22523d2d84ca7c9add84a.1665592448.git.renzhengeek@gmail.com (mailing list archive)
State New, archived
Headers show
Series KVM: arm64: vgic: fix wrong loop condition in scan_its_table() | expand

Commit Message

Eric Ren Oct. 12, 2022, 4:59 p.m. UTC
Reproducer hints:
1. Create ARM virt VM with pxb-pcie bus which adds
   extra host bridges, with qemu command like:

```
  -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \
  -device pcie-root-port,..,bus=pci.x \
  ...
  -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \
  -device pcie-root-port,..,bus=pci.y \
  ...

```
2. Perform VM migration which calls save/restore device tables.

In that setup, we get a big "offset" between 2 device_ids (
one is small, another is big), which makes unsigned "len" round
up a big positive number, causing loop to continue exceptionally.

Signed-off-by: Eric Ren <renzhengeek@gmail.com>
---
 arch/arm64/kvm/vgic/vgic-its.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Marc Zyngier Oct. 12, 2022, 6:33 p.m. UTC | #1
Hi Eric,

Before I comment on this patch, a couple of things that need
addressing:

> "Cc: marc.zyngier@arm.com, cdall@linaro.org"

None of these two addresses are valid anymore, and haven't been for
several years.

Please consult the MAINTAINERS file for up-to-date addresses for
current maintainers and reviewers, all of whom should be Cc'd on this
email. I've now added them as well as Eric Auger who has written most
of the ITS migration code, and the new mailing list (the Columbia list
is about to be killed).

On Wed, 12 Oct 2022 17:59:25 +0100,
Eric Ren <renzhengeek@gmail.com> wrote:
> 
> Reproducer hints:
> 1. Create ARM virt VM with pxb-pcie bus which adds
>    extra host bridges, with qemu command like:
> 
> ```
>   -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \
>   -device pcie-root-port,..,bus=pci.x \
>   ...
>   -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \
>   -device pcie-root-port,..,bus=pci.y \
>   ...
> 
> ```
> 2. Perform VM migration which calls save/restore device tables.
> 
> In that setup, we get a big "offset" between 2 device_ids (
> one is small, another is big), which makes unsigned "len" round
> up a big positive number, causing loop to continue exceptionally.

You'll have to spell it out for me here. If you have a very sparse
device ID and you are only using a single level device table, you are
bound to have a large len. Now, is the issue that 'size' is so large
that it is negative as an 'int'? Describing the exact situation you're
in would help a lot.

>
> Signed-off-by: Eric Ren <renzhengeek@gmail.com>
> ---
>  arch/arm64/kvm/vgic/vgic-its.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 24d7778d1ce6..673554ef02f9 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2141,7 +2141,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
>  			  int start_id, entry_fn_t fn, void *opaque)
>  {
>  	struct kvm *kvm = its->dev->kvm;
> -	unsigned long len = size;
> +	ssize_t len = size;

This feels wrong, really. If anything, all these types should be
unsigned, not signed. Signed types in this context make very little
sense...

Thanks,

	M.
Eric Auger Oct. 12, 2022, 8:14 p.m. UTC | #2
Hi,

On 10/12/22 18:59, Eric Ren wrote:
> Reproducer hints:
> 1. Create ARM virt VM with pxb-pcie bus which adds
>    extra host bridges, with qemu command like:
>
> ```
>   -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \
>   -device pcie-root-port,..,bus=pci.x \
>   ...
>   -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \
>   -device pcie-root-port,..,bus=pci.y \
>   ...
>
> ```
> 2. Perform VM migration which calls save/restore device tables.
>
> In that setup, we get a big "offset" between 2 device_ids (
> one is small, another is big), which makes unsigned "len" round
> up a big positive number, causing loop to continue exceptionally.
>
> Signed-off-by: Eric Ren <renzhengeek@gmail.com>

I fixed Marc's address and removed Christoffer's one. Please use the
scripts/get_maintainer.pl to identify the right email addresses.

Just to make sure I correctly understand, you mean len -= byte_offset
becomes negative and that is not properly reflected due to the unsigned
type. I agree we should be robust against that but doesn't it also mean
that the saved table has an issue in the first place (the offset points
to a location outside of the max size of the table)?

Thanks

Eric
> ---
>  arch/arm64/kvm/vgic/vgic-its.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 24d7778d1ce6..673554ef02f9 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2141,7 +2141,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
>  			  int start_id, entry_fn_t fn, void *opaque)
>  {
>  	struct kvm *kvm = its->dev->kvm;
> -	unsigned long len = size;
> +	ssize_t len = size;
>  	int id = start_id;
>  	gpa_t gpa = base;
>  	char entry[ESZ_MAX];
Eric Auger Oct. 13, 2022, 4:42 p.m. UTC | #3
Hi Eric,

On 10/12/22 20:33, Marc Zyngier wrote:
> Hi Eric,
> 
> Before I comment on this patch, a couple of things that need
> addressing:
> 
>> "Cc: marc.zyngier@arm.com, cdall@linaro.org"
> 
> None of these two addresses are valid anymore, and haven't been for
> several years.
> 
> Please consult the MAINTAINERS file for up-to-date addresses for
> current maintainers and reviewers, all of whom should be Cc'd on this
> email. I've now added them as well as Eric Auger who has written most
> of the ITS migration code, and the new mailing list (the Columbia list
> is about to be killed).
> 
> On Wed, 12 Oct 2022 17:59:25 +0100,
> Eric Ren <renzhengeek@gmail.com> wrote:
>>
>> Reproducer hints:
>> 1. Create ARM virt VM with pxb-pcie bus which adds
>>    extra host bridges, with qemu command like:
>>
>> ```
>>   -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \
>>   -device pcie-root-port,..,bus=pci.x \
>>   ...
>>   -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \
>>   -device pcie-root-port,..,bus=pci.y \
>>   ...
>>
>> ```
>> 2. Perform VM migration which calls save/restore device tables.
>>
>> In that setup, we get a big "offset" between 2 device_ids (
>> one is small, another is big), which makes unsigned "len" round
>> up a big positive number, causing loop to continue exceptionally.
> 
> You'll have to spell it out for me here. If you have a very sparse
> device ID and you are only using a single level device table, you are
> bound to have a large len. Now, is the issue that 'size' is so large
> that it is negative as an 'int'? Describing the exact situation you're
> in would help a lot.
> 
>>
>> Signed-off-by: Eric Ren <renzhengeek@gmail.com>
>> ---
>>  arch/arm64/kvm/vgic/vgic-its.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
>> index 24d7778d1ce6..673554ef02f9 100644
>> --- a/arch/arm64/kvm/vgic/vgic-its.c
>> +++ b/arch/arm64/kvm/vgic/vgic-its.c
>> @@ -2141,7 +2141,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
>>  			  int start_id, entry_fn_t fn, void *opaque)
>>  {
>>  	struct kvm *kvm = its->dev->kvm;
>> -	unsigned long len = size;
>> +	ssize_t len = size;
> 
> This feels wrong, really. If anything, all these types should be
> unsigned, not signed. Signed types in this context make very little
> sense...

After digging into the code back again, I realized I told you something
wrong. The next_offset is the delta between the current device id and
the next one. The next device can perfectly be in a different L1 device
table, - it is your case actually- , in which case the code is
definitely broken.

So I guess we should rather have a
while (true) {
	../..
	if (byte_offset >= len)
		break;
	len -= byte_offset;
}

You can add a Fixes tag too:
Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for table
lookup")
and cc stable@vger.kernel.org

Thanks

Eric
> 
> Thanks,
> 
> 	M.
>
Marc Zyngier Oct. 14, 2022, 2:28 p.m. UTC | #4
On Thu, 13 Oct 2022 17:42:31 +0100,
Eric Auger <eauger@redhat.com> wrote:
> 
> Hi Eric,
> 
> On 10/12/22 20:33, Marc Zyngier wrote:
> > Hi Eric,
> > 
> > Before I comment on this patch, a couple of things that need
> > addressing:
> > 
> >> "Cc: marc.zyngier@arm.com, cdall@linaro.org"
> > 
> > None of these two addresses are valid anymore, and haven't been for
> > several years.
> > 
> > Please consult the MAINTAINERS file for up-to-date addresses for
> > current maintainers and reviewers, all of whom should be Cc'd on this
> > email. I've now added them as well as Eric Auger who has written most
> > of the ITS migration code, and the new mailing list (the Columbia list
> > is about to be killed).

Duh, I never CC'd the new list... Now hopefully done.

> > 
> > On Wed, 12 Oct 2022 17:59:25 +0100,
> > Eric Ren <renzhengeek@gmail.com> wrote:
> >>
> >> Reproducer hints:
> >> 1. Create ARM virt VM with pxb-pcie bus which adds
> >>    extra host bridges, with qemu command like:
> >>
> >> ```
> >>   -device pxb-pcie,bus_nr=8,id=pci.x,numa_node=0,bus=pcie.0 \
> >>   -device pcie-root-port,..,bus=pci.x \
> >>   ...
> >>   -device pxb-pcie,bus_nr=37,id=pci.y,numa_node=1,bus=pcie.0 \
> >>   -device pcie-root-port,..,bus=pci.y \
> >>   ...
> >>
> >> ```
> >> 2. Perform VM migration which calls save/restore device tables.
> >>
> >> In that setup, we get a big "offset" between 2 device_ids (
> >> one is small, another is big), which makes unsigned "len" round
> >> up a big positive number, causing loop to continue exceptionally.
> > 
> > You'll have to spell it out for me here. If you have a very sparse
> > device ID and you are only using a single level device table, you are
> > bound to have a large len. Now, is the issue that 'size' is so large
> > that it is negative as an 'int'? Describing the exact situation you're
> > in would help a lot.
> > 
> >>
> >> Signed-off-by: Eric Ren <renzhengeek@gmail.com>
> >> ---
> >>  arch/arm64/kvm/vgic/vgic-its.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> >> index 24d7778d1ce6..673554ef02f9 100644
> >> --- a/arch/arm64/kvm/vgic/vgic-its.c
> >> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> >> @@ -2141,7 +2141,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
> >>  			  int start_id, entry_fn_t fn, void *opaque)
> >>  {
> >>  	struct kvm *kvm = its->dev->kvm;
> >> -	unsigned long len = size;
> >> +	ssize_t len = size;
> > 
> > This feels wrong, really. If anything, all these types should be
> > unsigned, not signed. Signed types in this context make very little
> > sense...
> 
> After digging into the code back again, I realized I told you something
> wrong. The next_offset is the delta between the current device id and
> the next one. The next device can perfectly be in a different L1 device

A different L2 table, surely? By definition, we only have a single L1
table.

> table, - it is your case actually- , in which case the code is
> definitely broken.
> 
> So I guess we should rather have a
> while (true) {
> 	../..
> 	if (byte_offset >= len)
> 		break;
> 	len -= byte_offset;
> }
> 
> You can add a Fixes tag too:
> Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for table
> lookup")
> and cc stable@vger.kernel.org

Just to make it clear, do you mean this:

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 9d3299a70242..e722cafdff60 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2162,6 +2162,9 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
 			return next_offset;
 
 		byte_offset = next_offset * esz;
+		if (byte_offset >= len)
+			break;
+
 		id += next_offset;
 		gpa += byte_offset;
 		len -= byte_offset;


If so, then I agree that this is a sensible fix. EricR, do you mind
respinning this ASAP so that I can get it merged and backported?

Thanks,

	M.
Eric Ren Oct. 15, 2022, 2:41 a.m. UTC | #5
Hi all,

snip...

> > > You'll have to spell it out for me here. If you have a very sparse
> > > device ID and you are only using a single level device table, you are
> > > bound to have a large len. Now, is the issue that 'size' is so large
> > > that it is negative as an 'int'? Describing the exact situation you're
> > > in would help a lot.
> > >

The problem happens when we have a very sparse device ID and use "2
level" device
tables, ie. GITS_BASERn.Indirect enabled.

For example,
1. L1 table has 2 entries;
2. and we are now scanning at L2 table entry index 2075 (pointed by L1
first entry)
3. if next device id is 9472, we will get a big offset: 7397;
4. with signed 'len', 'len -= offset * esz', len will underflow to a
positive number, mistakenly into next iteration with a bad GPA;
(It should break the current L2 table scanning, and jump into the next
L1 table entry)
5. that bad GPA fails the guest read.

hope this make it clean:-)
> > >>
> > >> Signed-off-by: Eric Ren <renzhengeek@gmail.com>
> > >> ---
> > >>  arch/arm64/kvm/vgic/vgic-its.c | 2 +-
> > >>  1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> > >> index 24d7778d1ce6..673554ef02f9 100644
> > >> --- a/arch/arm64/kvm/vgic/vgic-its.c
> > >> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> > >> @@ -2141,7 +2141,7 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
> > >>                      int start_id, entry_fn_t fn, void *opaque)
> > >>  {
> > >>    struct kvm *kvm = its->dev->kvm;
> > >> -  unsigned long len = size;
> > >> +  ssize_t len = size;
> > >
> > > This feels wrong, really. If anything, all these types should be
> > > unsigned, not signed. Signed types in this context make very little
> > > sense...
> >
> > After digging into the code back again, I realized I told you something
> > wrong. The next_offset is the delta between the current device id and
> > the next one. The next device can perfectly be in a different L1 device
>
> A different L2 table, surely? By definition, we only have a single L1
> table.
>
> > table, - it is your case actually- , in which case the code is
> > definitely broken.
> >

Yes. You've got the point, hah.

> > So I guess we should rather have a
> > while (true) {
> >       ../..
> >       if (byte_offset >= len)
> >               break;
> >       len -= byte_offset;
> > }
> >

Thanks. This looks better. I'll send v2.

> > You can add a Fixes tag too:
> > Fixes: 920a7a8fa92a ("KVM: arm64: vgic-its: Add infrastructure for table
> > lookup")
> > and cc stable@vger.kernel.org
>
> Just to make it clear, do you mean this:

Yes, exactly.

>
> diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
> index 9d3299a70242..e722cafdff60 100644
> --- a/arch/arm64/kvm/vgic/vgic-its.c
> +++ b/arch/arm64/kvm/vgic/vgic-its.c
> @@ -2162,6 +2162,9 @@ static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
>                         return next_offset;
>
>                 byte_offset = next_offset * esz;
> +               if (byte_offset >= len)
> +                       break;
> +
>                 id += next_offset;
>                 gpa += byte_offset;
>                 len -= byte_offset;
>
>
> If so, then I agree that this is a sensible fix. EricR, do you mind
> respinning this ASAP so that I can get it merged and backported?

OK, please see V2:-)

Thanks all!
Eric
diff mbox series

Patch

diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 24d7778d1ce6..673554ef02f9 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -2141,7 +2141,7 @@  static int scan_its_table(struct vgic_its *its, gpa_t base, int size, u32 esz,
 			  int start_id, entry_fn_t fn, void *opaque)
 {
 	struct kvm *kvm = its->dev->kvm;
-	unsigned long len = size;
+	ssize_t len = size;
 	int id = start_id;
 	gpa_t gpa = base;
 	char entry[ESZ_MAX];