Message ID | 20200403153050.20569-2-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: s390: vsie: fixes and cleanups | expand |
On 03.04.20 17:30, David Hildenbrand wrote: > In case we have a region 1 ASCE, our shadow/g3 address can have any value. > Unfortunately, (-1UL << 64) is undefined and triggers sometimes, > rejecting valid shadow addresses when trying to walk our shadow table > hierarchy. I thin the range of the addresses do not matter. Took me a while to understand maybe rephrase that: In case we have a region 1 the following calculation (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) results in 64. As shifts beyond the size are undefined the compiler is free to use instructions like sllg. sllg will only use 6 bits of the shift value (here 64) resulting in no shift at all. That means that ALL addresses will be rejected. With that this makes sense. Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> > > The result is that the prefix cannot get mapped and will loop basically > forever trying to map it (-EAGAIN loop). > > After all, the broken check is only a sanity check, our table shadowing > code in kvm_s390_shadow_tables() already checks these conditions, injecting > proper translation exceptions. Turn it into a WARN_ON_ONCE(). > > Fixes: 4be130a08420 ("s390/mm: add shadow gmap support") > Tested-by: Janosch Frank <frankja@linux.ibm.com> > Reported-by: Janosch Frank <frankja@linux.ibm.com> > Cc: <stable@vger.kernel.org> # v4.8+ > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > arch/s390/mm/gmap.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c > index 2fbece47ef6f..b93dd54b234a 100644 > --- a/arch/s390/mm/gmap.c > +++ b/arch/s390/mm/gmap.c > @@ -787,14 +787,18 @@ static void gmap_call_notifier(struct gmap *gmap, unsigned long start, > static inline unsigned long *gmap_table_walk(struct gmap *gmap, > unsigned long gaddr, int level) > { > + const int asce_type = gmap->asce & _ASCE_TYPE_MASK; > unsigned long *table; > > if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4)) > return NULL; > if (gmap_is_shadow(gmap) && gmap->removed) > return NULL; > - if (gaddr & (-1UL << (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11))) > + > + if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1 && > + gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))) > return NULL; > + > table = gmap->table; > switch (gmap->asce & _ASCE_TYPE_MASK) { > case _ASCE_TYPE_REGION1: >
> Am 03.04.2020 um 19:56 schrieb Christian Borntraeger <borntraeger@de.ibm.com>: > > > >> On 03.04.20 17:30, David Hildenbrand wrote: >> In case we have a region 1 ASCE, our shadow/g3 address can have any value. >> Unfortunately, (-1UL << 64) is undefined and triggers sometimes, >> rejecting valid shadow addresses when trying to walk our shadow table >> hierarchy. > > I thin the range of the addresses do not matter. > Took me a while to understand maybe rephrase that: > > In case we have a region 1 the following calculation > (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) > results in 64. As shifts beyond the size are undefined the compiler is free to use > instructions like sllg. sllg will only use 6 bits of the shift value (here 64) > resulting in no shift at all. That means that ALL addresses will be rejected. Interestingly, it would not fail when shadowing the r2t, but only when trying to shadow the r3t. > > With that this makes sense. > > Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> > In case there are no other comments, can you fixup when applying, or do you want me to resend? Cheers
On 03.04.20 21:55, David Hildenbrand wrote: > > >> Am 03.04.2020 um 19:56 schrieb Christian Borntraeger <borntraeger@de.ibm.com>: >> >> >> >>> On 03.04.20 17:30, David Hildenbrand wrote: >>> In case we have a region 1 ASCE, our shadow/g3 address can have any value. >>> Unfortunately, (-1UL << 64) is undefined and triggers sometimes, >>> rejecting valid shadow addresses when trying to walk our shadow table >>> hierarchy. >> >> I thin the range of the addresses do not matter. >> Took me a while to understand maybe rephrase that: >> >> In case we have a region 1 the following calculation >> (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11) >> results in 64. As shifts beyond the size are undefined the compiler is free to use >> instructions like sllg. sllg will only use 6 bits of the shift value (here 64) >> resulting in no shift at all. That means that ALL addresses will be rejected. > > Interestingly, it would not fail when shadowing the r2t, but only when trying to shadow the r3t. > >> >> With that this makes sense. >> >> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> >> > > In case there are no other comments, can you fixup when applying, or do you want me to resend? I can fixup.
On 03.04.20 17:30, David Hildenbrand wrote: > In case we have a region 1 ASCE, our shadow/g3 address can have any value. > Unfortunately, (-1UL << 64) is undefined and triggers sometimes, > rejecting valid shadow addresses when trying to walk our shadow table > hierarchy. > > The result is that the prefix cannot get mapped and will loop basically > forever trying to map it (-EAGAIN loop). > > After all, the broken check is only a sanity check, our table shadowing > code in kvm_s390_shadow_tables() already checks these conditions, injecting > proper translation exceptions. Turn it into a WARN_ON_ONCE(). After some testing I now triggered this warning: [ 541.633114] ------------[ cut here ]------------ [ 541.633128] WARNING: CPU: 38 PID: 2812 at arch/s390/mm/gmap.c:799 gmap_shadow_pgt_lookup+0x98/0x1a0 [ 541.633129] Modules linked in: vhost_net vhost macvtap macvlan tap kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT tun bridge stp llc xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm configfs mlx5_ib s390_trng ghash_s390 prng aes_s390 ib_uverbs des_s390 ib_core libdes sha3_512_s390 genwqe_card sha3_256_s390 vfio_ccw crc_itu_t vfio_mdev sha512_s390 mdev vfio_iommu_type1 sha1_s390 vfio eadm_sch zcrypt_cex4 sch_fq_codel ip_tables x_tables mlx5_core sha256_s390 sha_common pkey zcrypt rng_core autofs4 [ 541.633164] CPU: 38 PID: 2812 Comm: CPU 0/KVM Not tainted 5.6.0+ #354 [ 541.633166] Hardware name: IBM 3906 M04 704 (LPAR) [ 541.633167] Krnl PSW : 0704d00180000000 00000014e05dc454 (gmap_shadow_pgt_lookup+0x9c/0x1a0) [ 541.633169] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 [ 541.633171] Krnl GPRS: 0000000000000000 0000001f00000000 0000001f487b8000 ffffffff80000000 [ 541.633172] ffffffffffffffff 000003e003defa18 000003e003defa1c 000003e003defa18 [ 541.633173] fffffffffffff000 000003e003defa18 000003e003defa28 0000001f70e06300 [ 541.633174] 0000001f43484000 00000000043ed200 000003e003def978 000003e003def920 [ 541.633203] Krnl Code: 00000014e05dc448: b9800038 ngr %r3,%r8 00000014e05dc44c: a7840014 brc 8,00000014e05dc474 #00000014e05dc450: af000000 mc 0,0 >00000014e05dc454: a728fff5 lhi %r2,-11 00000014e05dc458: a7180000 lhi %r1,0 00000014e05dc45c: b2fa0070 niai 7,0 00000014e05dc460: 4010b04a sth %r1,74(%r11) 00000014e05dc464: b9140022 lgfr %r2,%r2 [ 541.633215] Call Trace: [ 541.633218] [<00000014e05dc454>] gmap_shadow_pgt_lookup+0x9c/0x1a0 [ 541.633257] [<000003ff804c57d6>] kvm_s390_shadow_fault+0x66/0x1e8 [kvm] [ 541.633265] [<000003ff804c72dc>] vsie_run+0x43c/0x710 [kvm] [ 541.633273] [<000003ff804c85ca>] kvm_s390_handle_vsie+0x632/0x750 [kvm] [ 541.633281] [<000003ff804c123c>] kvm_s390_handle_b2+0x84/0x4e0 [kvm] [ 541.633289] [<000003ff804b46b2>] kvm_handle_sie_intercept+0x172/0xcb8 [kvm] [ 541.633297] [<000003ff804b18a8>] __vcpu_run+0x658/0xc90 [kvm] [ 541.633305] [<000003ff804b2920>] kvm_arch_vcpu_ioctl_run+0x248/0x858 [kvm] [ 541.633313] [<000003ff8049d454>] kvm_vcpu_ioctl+0x284/0x7b0 [kvm] [ 541.633316] [<00000014e087d5ae>] ksys_ioctl+0xae/0xe8 [ 541.633318] [<00000014e087d652>] __s390x_sys_ioctl+0x2a/0x38 [ 541.633323] [<00000014e0ff02a2>] system_call+0x2a6/0x2c8 [ 541.633323] Last Breaking-Event-Address: [ 541.633334] [<000003ff804983e0>] kvm_running_vcpu+0x3ea9ee997d8/0x3ea9ee99950 [kvm] [ 541.633335] ---[ end trace f69b6021855ea189 ]--- Unfortunately no dump at that point in time. I have other tests which are clearly fixed by this patch, so we should propbably go forward anyway. Question is, is this just another bug we need to fix or is the assumption that somebody else checked all conditions so we can warn false?
On 07.04.20 09:33, Christian Borntraeger wrote: > > On 03.04.20 17:30, David Hildenbrand wrote: >> In case we have a region 1 ASCE, our shadow/g3 address can have any value. >> Unfortunately, (-1UL << 64) is undefined and triggers sometimes, >> rejecting valid shadow addresses when trying to walk our shadow table >> hierarchy. >> >> The result is that the prefix cannot get mapped and will loop basically >> forever trying to map it (-EAGAIN loop). >> >> After all, the broken check is only a sanity check, our table shadowing >> code in kvm_s390_shadow_tables() already checks these conditions, injecting >> proper translation exceptions. Turn it into a WARN_ON_ONCE(). > > After some testing I now triggered this warning: > > [ 541.633114] ------------[ cut here ]------------ > [ 541.633128] WARNING: CPU: 38 PID: 2812 at arch/s390/mm/gmap.c:799 gmap_shadow_pgt_lookup+0x98/0x1a0 > [ 541.633129] Modules linked in: vhost_net vhost macvtap macvlan tap kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT tun bridge stp llc xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm configfs mlx5_ib s390_trng ghash_s390 prng aes_s390 ib_uverbs des_s390 ib_core libdes sha3_512_s390 genwqe_card sha3_256_s390 vfio_ccw crc_itu_t vfio_mdev sha512_s390 mdev vfio_iommu_type1 sha1_s390 vfio eadm_sch zcrypt_cex4 sch_fq_codel ip_tables x_tables mlx5_core sha256_s390 sha_common pkey zcrypt rng_core autofs4 > [ 541.633164] CPU: 38 PID: 2812 Comm: CPU 0/KVM Not tainted 5.6.0+ #354 > [ 541.633166] Hardware name: IBM 3906 M04 704 (LPAR) > [ 541.633167] Krnl PSW : 0704d00180000000 00000014e05dc454 (gmap_shadow_pgt_lookup+0x9c/0x1a0) > [ 541.633169] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 > [ 541.633171] Krnl GPRS: 0000000000000000 0000001f00000000 0000001f487b8000 ffffffff80000000 > [ 541.633172] ffffffffffffffff 000003e003defa18 000003e003defa1c 000003e003defa18 > [ 541.633173] fffffffffffff000 000003e003defa18 000003e003defa28 0000001f70e06300 > [ 541.633174] 0000001f43484000 00000000043ed200 000003e003def978 000003e003def920 > [ 541.633203] Krnl Code: 00000014e05dc448: b9800038 ngr %r3,%r8 > 00000014e05dc44c: a7840014 brc 8,00000014e05dc474 > #00000014e05dc450: af000000 mc 0,0 > >00000014e05dc454: a728fff5 lhi %r2,-11 > 00000014e05dc458: a7180000 lhi %r1,0 > 00000014e05dc45c: b2fa0070 niai 7,0 > 00000014e05dc460: 4010b04a sth %r1,74(%r11) > 00000014e05dc464: b9140022 lgfr %r2,%r2 > [ 541.633215] Call Trace: > [ 541.633218] [<00000014e05dc454>] gmap_shadow_pgt_lookup+0x9c/0x1a0 > [ 541.633257] [<000003ff804c57d6>] kvm_s390_shadow_fault+0x66/0x1e8 [kvm] > [ 541.633265] [<000003ff804c72dc>] vsie_run+0x43c/0x710 [kvm] > [ 541.633273] [<000003ff804c85ca>] kvm_s390_handle_vsie+0x632/0x750 [kvm] > [ 541.633281] [<000003ff804c123c>] kvm_s390_handle_b2+0x84/0x4e0 [kvm] > [ 541.633289] [<000003ff804b46b2>] kvm_handle_sie_intercept+0x172/0xcb8 [kvm] > [ 541.633297] [<000003ff804b18a8>] __vcpu_run+0x658/0xc90 [kvm] > [ 541.633305] [<000003ff804b2920>] kvm_arch_vcpu_ioctl_run+0x248/0x858 [kvm] > [ 541.633313] [<000003ff8049d454>] kvm_vcpu_ioctl+0x284/0x7b0 [kvm] > [ 541.633316] [<00000014e087d5ae>] ksys_ioctl+0xae/0xe8 > [ 541.633318] [<00000014e087d652>] __s390x_sys_ioctl+0x2a/0x38 > [ 541.633323] [<00000014e0ff02a2>] system_call+0x2a6/0x2c8 > [ 541.633323] Last Breaking-Event-Address: > [ 541.633334] [<000003ff804983e0>] kvm_running_vcpu+0x3ea9ee997d8/0x3ea9ee99950 [kvm] > [ 541.633335] ---[ end trace f69b6021855ea189 ]--- > > > Unfortunately no dump at that point in time. > I have other tests which are clearly fixed by this patch, so we should propbably go forward anyway. > Question is, is this just another bug we need to fix or is the assumption that somebody else checked > all conditions so we can warn false? Yeah, I think it is via kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()->gmap_table_walk() where we just peek if there is already something shadowed. If not, we go via the full kvm_s390_shadow_tables() path. So we could either do sanity checks in gmap_shadow_pgt_lookup(), or rather drop the WARN_ON_ONCE. I think the latter makes sense, now that we understood the problem.
On 07.04.20 09:49, David Hildenbrand wrote: > On 07.04.20 09:33, Christian Borntraeger wrote: >> >> On 03.04.20 17:30, David Hildenbrand wrote: >>> In case we have a region 1 ASCE, our shadow/g3 address can have any value. >>> Unfortunately, (-1UL << 64) is undefined and triggers sometimes, >>> rejecting valid shadow addresses when trying to walk our shadow table >>> hierarchy. >>> >>> The result is that the prefix cannot get mapped and will loop basically >>> forever trying to map it (-EAGAIN loop). >>> >>> After all, the broken check is only a sanity check, our table shadowing >>> code in kvm_s390_shadow_tables() already checks these conditions, injecting >>> proper translation exceptions. Turn it into a WARN_ON_ONCE(). >> >> After some testing I now triggered this warning: >> >> [ 541.633114] ------------[ cut here ]------------ >> [ 541.633128] WARNING: CPU: 38 PID: 2812 at arch/s390/mm/gmap.c:799 gmap_shadow_pgt_lookup+0x98/0x1a0 >> [ 541.633129] Modules linked in: vhost_net vhost macvtap macvlan tap kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT tun bridge stp llc xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm configfs mlx5_ib s390_trng ghash_s390 prng aes_s390 ib_uverbs des_s390 ib_core libdes sha3_512_s390 genwqe_card sha3_256_s390 vfio_ccw crc_itu_t vfio_mdev sha512_s390 mdev vfio_iommu_type1 sha1_s390 vfio eadm_sch zcrypt_cex4 sch_fq_codel ip_tables x_tables mlx5_core sha256_s390 sha_common pkey zcrypt rng_core autofs4 >> [ 541.633164] CPU: 38 PID: 2812 Comm: CPU 0/KVM Not tainted 5.6.0+ #354 >> [ 541.633166] Hardware name: IBM 3906 M04 704 (LPAR) >> [ 541.633167] Krnl PSW : 0704d00180000000 00000014e05dc454 (gmap_shadow_pgt_lookup+0x9c/0x1a0) >> [ 541.633169] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 >> [ 541.633171] Krnl GPRS: 0000000000000000 0000001f00000000 0000001f487b8000 ffffffff80000000 >> [ 541.633172] ffffffffffffffff 000003e003defa18 000003e003defa1c 000003e003defa18 >> [ 541.633173] fffffffffffff000 000003e003defa18 000003e003defa28 0000001f70e06300 >> [ 541.633174] 0000001f43484000 00000000043ed200 000003e003def978 000003e003def920 >> [ 541.633203] Krnl Code: 00000014e05dc448: b9800038 ngr %r3,%r8 >> 00000014e05dc44c: a7840014 brc 8,00000014e05dc474 >> #00000014e05dc450: af000000 mc 0,0 >> >00000014e05dc454: a728fff5 lhi %r2,-11 >> 00000014e05dc458: a7180000 lhi %r1,0 >> 00000014e05dc45c: b2fa0070 niai 7,0 >> 00000014e05dc460: 4010b04a sth %r1,74(%r11) >> 00000014e05dc464: b9140022 lgfr %r2,%r2 >> [ 541.633215] Call Trace: >> [ 541.633218] [<00000014e05dc454>] gmap_shadow_pgt_lookup+0x9c/0x1a0 >> [ 541.633257] [<000003ff804c57d6>] kvm_s390_shadow_fault+0x66/0x1e8 [kvm] >> [ 541.633265] [<000003ff804c72dc>] vsie_run+0x43c/0x710 [kvm] >> [ 541.633273] [<000003ff804c85ca>] kvm_s390_handle_vsie+0x632/0x750 [kvm] >> [ 541.633281] [<000003ff804c123c>] kvm_s390_handle_b2+0x84/0x4e0 [kvm] >> [ 541.633289] [<000003ff804b46b2>] kvm_handle_sie_intercept+0x172/0xcb8 [kvm] >> [ 541.633297] [<000003ff804b18a8>] __vcpu_run+0x658/0xc90 [kvm] >> [ 541.633305] [<000003ff804b2920>] kvm_arch_vcpu_ioctl_run+0x248/0x858 [kvm] >> [ 541.633313] [<000003ff8049d454>] kvm_vcpu_ioctl+0x284/0x7b0 [kvm] >> [ 541.633316] [<00000014e087d5ae>] ksys_ioctl+0xae/0xe8 >> [ 541.633318] [<00000014e087d652>] __s390x_sys_ioctl+0x2a/0x38 >> [ 541.633323] [<00000014e0ff02a2>] system_call+0x2a6/0x2c8 >> [ 541.633323] Last Breaking-Event-Address: >> [ 541.633334] [<000003ff804983e0>] kvm_running_vcpu+0x3ea9ee997d8/0x3ea9ee99950 [kvm] >> [ 541.633335] ---[ end trace f69b6021855ea189 ]--- >> >> >> Unfortunately no dump at that point in time. >> I have other tests which are clearly fixed by this patch, so we should propbably go forward anyway. >> Question is, is this just another bug we need to fix or is the assumption that somebody else checked >> all conditions so we can warn false? > > Yeah, I think it is via > > kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()->gmap_table_walk() > > where we just peek if there is already something shadowed. If not, we go > via the full kvm_s390_shadow_tables() path. > > So we could either do sanity checks in gmap_shadow_pgt_lookup(), or > rather drop the WARN_ON_ONCE. I think the latter makes sense, now that > we understood the problem. Ok, so I will drop the WARN_ON_ONCE and fixup the commit message.
On 07.04.20 09:52, Christian Borntraeger wrote: > > > On 07.04.20 09:49, David Hildenbrand wrote: >> On 07.04.20 09:33, Christian Borntraeger wrote: >>> >>> On 03.04.20 17:30, David Hildenbrand wrote: >>>> In case we have a region 1 ASCE, our shadow/g3 address can have any value. >>>> Unfortunately, (-1UL << 64) is undefined and triggers sometimes, >>>> rejecting valid shadow addresses when trying to walk our shadow table >>>> hierarchy. >>>> >>>> The result is that the prefix cannot get mapped and will loop basically >>>> forever trying to map it (-EAGAIN loop). >>>> >>>> After all, the broken check is only a sanity check, our table shadowing >>>> code in kvm_s390_shadow_tables() already checks these conditions, injecting >>>> proper translation exceptions. Turn it into a WARN_ON_ONCE(). >>> >>> After some testing I now triggered this warning: >>> >>> [ 541.633114] ------------[ cut here ]------------ >>> [ 541.633128] WARNING: CPU: 38 PID: 2812 at arch/s390/mm/gmap.c:799 gmap_shadow_pgt_lookup+0x98/0x1a0 >>> [ 541.633129] Modules linked in: vhost_net vhost macvtap macvlan tap kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT tun bridge stp llc xt_tcpudp ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter ip6_tables iptable_filter rpcrdma sunrpc rdma_ucm rdma_cm iw_cm ib_cm configfs mlx5_ib s390_trng ghash_s390 prng aes_s390 ib_uverbs des_s390 ib_core libdes sha3_512_s390 genwqe_card sha3_256_s390 vfio_ccw crc_itu_t vfio_mdev sha512_s390 mdev vfio_iommu_type1 sha1_s390 vfio eadm_sch zcrypt_cex4 sch_fq_codel ip_tables x_tables mlx5_core sha256_s390 sha_common pkey zcrypt rng_core autofs4 >>> [ 541.633164] CPU: 38 PID: 2812 Comm: CPU 0/KVM Not tainted 5.6.0+ #354 >>> [ 541.633166] Hardware name: IBM 3906 M04 704 (LPAR) >>> [ 541.633167] Krnl PSW : 0704d00180000000 00000014e05dc454 (gmap_shadow_pgt_lookup+0x9c/0x1a0) >>> [ 541.633169] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3 >>> [ 541.633171] Krnl GPRS: 0000000000000000 0000001f00000000 0000001f487b8000 ffffffff80000000 >>> [ 541.633172] ffffffffffffffff 000003e003defa18 000003e003defa1c 000003e003defa18 >>> [ 541.633173] fffffffffffff000 000003e003defa18 000003e003defa28 0000001f70e06300 >>> [ 541.633174] 0000001f43484000 00000000043ed200 000003e003def978 000003e003def920 >>> [ 541.633203] Krnl Code: 00000014e05dc448: b9800038 ngr %r3,%r8 >>> 00000014e05dc44c: a7840014 brc 8,00000014e05dc474 >>> #00000014e05dc450: af000000 mc 0,0 >>> >00000014e05dc454: a728fff5 lhi %r2,-11 >>> 00000014e05dc458: a7180000 lhi %r1,0 >>> 00000014e05dc45c: b2fa0070 niai 7,0 >>> 00000014e05dc460: 4010b04a sth %r1,74(%r11) >>> 00000014e05dc464: b9140022 lgfr %r2,%r2 >>> [ 541.633215] Call Trace: >>> [ 541.633218] [<00000014e05dc454>] gmap_shadow_pgt_lookup+0x9c/0x1a0 >>> [ 541.633257] [<000003ff804c57d6>] kvm_s390_shadow_fault+0x66/0x1e8 [kvm] >>> [ 541.633265] [<000003ff804c72dc>] vsie_run+0x43c/0x710 [kvm] >>> [ 541.633273] [<000003ff804c85ca>] kvm_s390_handle_vsie+0x632/0x750 [kvm] >>> [ 541.633281] [<000003ff804c123c>] kvm_s390_handle_b2+0x84/0x4e0 [kvm] >>> [ 541.633289] [<000003ff804b46b2>] kvm_handle_sie_intercept+0x172/0xcb8 [kvm] >>> [ 541.633297] [<000003ff804b18a8>] __vcpu_run+0x658/0xc90 [kvm] >>> [ 541.633305] [<000003ff804b2920>] kvm_arch_vcpu_ioctl_run+0x248/0x858 [kvm] >>> [ 541.633313] [<000003ff8049d454>] kvm_vcpu_ioctl+0x284/0x7b0 [kvm] >>> [ 541.633316] [<00000014e087d5ae>] ksys_ioctl+0xae/0xe8 >>> [ 541.633318] [<00000014e087d652>] __s390x_sys_ioctl+0x2a/0x38 >>> [ 541.633323] [<00000014e0ff02a2>] system_call+0x2a6/0x2c8 >>> [ 541.633323] Last Breaking-Event-Address: >>> [ 541.633334] [<000003ff804983e0>] kvm_running_vcpu+0x3ea9ee997d8/0x3ea9ee99950 [kvm] >>> [ 541.633335] ---[ end trace f69b6021855ea189 ]--- >>> >>> >>> Unfortunately no dump at that point in time. >>> I have other tests which are clearly fixed by this patch, so we should propbably go forward anyway. >>> Question is, is this just another bug we need to fix or is the assumption that somebody else checked >>> all conditions so we can warn false? >> >> Yeah, I think it is via >> >> kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()->gmap_table_walk() >> >> where we just peek if there is already something shadowed. If not, we go >> via the full kvm_s390_shadow_tables() path. >> >> So we could either do sanity checks in gmap_shadow_pgt_lookup(), or >> rather drop the WARN_ON_ONCE. I think the latter makes sense, now that >> we understood the problem. > > Ok, so I will drop the WARN_ON_ONCE and fixup the commit message. > Perfect, thanks!
On Tue, 7 Apr 2020 09:52:53 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > On 07.04.20 09:49, David Hildenbrand wrote: > > On 07.04.20 09:33, Christian Borntraeger wrote: > >> > >> On 03.04.20 17:30, David Hildenbrand wrote: > >>> In case we have a region 1 ASCE, our shadow/g3 address can have > >>> any value. Unfortunately, (-1UL << 64) is undefined and triggers > >>> sometimes, rejecting valid shadow addresses when trying to walk > >>> our shadow table hierarchy. > >>> > >>> The result is that the prefix cannot get mapped and will loop > >>> basically forever trying to map it (-EAGAIN loop). > >>> > >>> After all, the broken check is only a sanity check, our table > >>> shadowing code in kvm_s390_shadow_tables() already checks these > >>> conditions, injecting proper translation exceptions. Turn it into > >>> a WARN_ON_ONCE(). > >> > >> After some testing I now triggered this warning: > >> > >> [ 541.633114] ------------[ cut here ]------------ > >> [ 541.633128] WARNING: CPU: 38 PID: 2812 at > >> arch/s390/mm/gmap.c:799 gmap_shadow_pgt_lookup+0x98/0x1a0 [ > >> 541.633129] Modules linked in: vhost_net vhost macvtap macvlan tap > >> kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp xt_CT > >> tun bridge stp llc xt_tcpudp ip6t_REJECT nf_reject_ipv6 > >> ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ip6table_nat > >> ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat > >> iptable_mangle iptable_raw iptable_security nf_conntrack > >> nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ip6table_filter > >> ip6_tables iptable_filter rpcrdma sunrpc rdma_ucm rdma_cm iw_cm > >> ib_cm configfs mlx5_ib s390_trng ghash_s390 prng aes_s390 > >> ib_uverbs des_s390 ib_core libdes sha3_512_s390 genwqe_card > >> sha3_256_s390 vfio_ccw crc_itu_t vfio_mdev sha512_s390 mdev > >> vfio_iommu_type1 sha1_s390 vfio eadm_sch zcrypt_cex4 sch_fq_codel > >> ip_tables x_tables mlx5_core sha256_s390 sha_common pkey zcrypt > >> rng_core autofs4 [ 541.633164] CPU: 38 PID: 2812 Comm: CPU 0/KVM > >> Not tainted 5.6.0+ #354 [ 541.633166] Hardware name: IBM 3906 M04 > >> 704 (LPAR) [ 541.633167] Krnl PSW : 0704d00180000000 > >> 00000014e05dc454 (gmap_shadow_pgt_lookup+0x9c/0x1a0) [ > >> 541.633169] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 > >> CC:1 PM:0 RI:0 EA:3 [ 541.633171] Krnl GPRS: 0000000000000000 > >> 0000001f00000000 0000001f487b8000 ffffffff80000000 [ 541.633172] > >> ffffffffffffffff 000003e003defa18 000003e003defa1c > >> 000003e003defa18 [ 541.633173] fffffffffffff000 > >> 000003e003defa18 000003e003defa28 0000001f70e06300 [ 541.633174] > >> 0000001f43484000 00000000043ed200 000003e003def978 > >> 000003e003def920 [ 541.633203] Krnl Code: 00000014e05dc448: > >> b9800038 ngr %r3,%r8 00000014e05dc44c: > >> a7840014 brc 8,00000014e05dc474 > >> #00000014e05dc450: af000000 mc 0,0 > >> >00000014e05dc454: a728fff5 > >> > lhi %r2,-11 > >> 00000014e05dc458: a7180000 > >> lhi %r1,0 00000014e05dc45c: b2fa0070 > >> niai 7,0 00000014e05dc460: 4010b04a > >> sth %r1,74(%r11) 00000014e05dc464: b9140022 > >> lgfr %r2,%r2 [ 541.633215] Call Trace: > >> [ 541.633218] [<00000014e05dc454>] > >> gmap_shadow_pgt_lookup+0x9c/0x1a0 [ 541.633257] > >> [<000003ff804c57d6>] kvm_s390_shadow_fault+0x66/0x1e8 [kvm] [ > >> 541.633265] [<000003ff804c72dc>] vsie_run+0x43c/0x710 [kvm] [ > >> 541.633273] [<000003ff804c85ca>] kvm_s390_handle_vsie+0x632/0x750 > >> [kvm] [ 541.633281] [<000003ff804c123c>] > >> kvm_s390_handle_b2+0x84/0x4e0 [kvm] [ 541.633289] > >> [<000003ff804b46b2>] kvm_handle_sie_intercept+0x172/0xcb8 [kvm] [ > >> 541.633297] [<000003ff804b18a8>] __vcpu_run+0x658/0xc90 [kvm] [ > >> 541.633305] [<000003ff804b2920>] > >> kvm_arch_vcpu_ioctl_run+0x248/0x858 [kvm] [ 541.633313] > >> [<000003ff8049d454>] kvm_vcpu_ioctl+0x284/0x7b0 [kvm] [ > >> 541.633316] [<00000014e087d5ae>] ksys_ioctl+0xae/0xe8 [ > >> 541.633318] [<00000014e087d652>] __s390x_sys_ioctl+0x2a/0x38 [ > >> 541.633323] [<00000014e0ff02a2>] system_call+0x2a6/0x2c8 [ > >> 541.633323] Last Breaking-Event-Address: [ 541.633334] > >> [<000003ff804983e0>] kvm_running_vcpu+0x3ea9ee997d8/0x3ea9ee99950 > >> [kvm] [ 541.633335] ---[ end trace f69b6021855ea189 ]--- > >> > >> > >> Unfortunately no dump at that point in time. > >> I have other tests which are clearly fixed by this patch, so we > >> should propbably go forward anyway. Question is, is this just > >> another bug we need to fix or is the assumption that somebody else > >> checked all conditions so we can warn false? > > > > Yeah, I think it is via > > > > kvm_s390_shadow_fault()->gmap_shadow_pgt_lookup()->gmap_table_walk() > > > > where we just peek if there is already something shadowed. If not, > > we go via the full kvm_s390_shadow_tables() path. > > > > So we could either do sanity checks in gmap_shadow_pgt_lookup(), or > > rather drop the WARN_ON_ONCE. I think the latter makes sense, now > > that we understood the problem. > > Ok, so I will drop the WARN_ON_ONCE and fixup the commit message. > with those fixes, you can also add: Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 2fbece47ef6f..b93dd54b234a 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -787,14 +787,18 @@ static void gmap_call_notifier(struct gmap *gmap, unsigned long start, static inline unsigned long *gmap_table_walk(struct gmap *gmap, unsigned long gaddr, int level) { + const int asce_type = gmap->asce & _ASCE_TYPE_MASK; unsigned long *table; if ((gmap->asce & _ASCE_TYPE_MASK) + 4 < (level * 4)) return NULL; if (gmap_is_shadow(gmap) && gmap->removed) return NULL; - if (gaddr & (-1UL << (31 + ((gmap->asce & _ASCE_TYPE_MASK) >> 2)*11))) + + if (WARN_ON_ONCE(asce_type != _ASCE_TYPE_REGION1 && + gaddr & (-1UL << (31 + (asce_type >> 2) * 11)))) return NULL; + table = gmap->table; switch (gmap->asce & _ASCE_TYPE_MASK) { case _ASCE_TYPE_REGION1: