Message ID | 20240919104238.232704-2-michal.orzel@amd.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | xen/arm: dt overlay fixes | expand |
On Thu, 19 Sep 2024, Michal Orzel wrote: > Attempt to attach an overlay (xl dt-overlay attach) to a domain without > first adding this overlay to Xen (xl dt-overlay add) results in an > overlay track entry being NULL in handle_attach_overlay_nodes(). This > leads to NULL pointer dereference and the following data abort crash: > > (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. > (XEN) Data Abort Trap. Syndrome=0x5 > (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 > (XEN) 0TH[0x000] = 0x46940f7f > (XEN) 1ST[0x000] = 0x0 > (XEN) CPU0: Unexpected Trap: Data Abort > (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- > ... > (XEN) Xen call trace: > (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) > (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) > (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 > > Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") > Signed-off-by: Michal Orzel <michal.orzel@amd.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> > --- > xen/common/dt-overlay.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c > index d53b4706cd2f..8606b14d1e8e 100644 > --- a/xen/common/dt-overlay.c > +++ b/xen/common/dt-overlay.c > @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, > out: > spin_unlock(&overlay_lock); > > - rangeset_destroy(entry->irq_ranges); > - rangeset_destroy(entry->iomem_ranges); > + if ( entry ) > + { > + rangeset_destroy(entry->irq_ranges); > + rangeset_destroy(entry->iomem_ranges); > + } > > return rc; > } > -- > 2.37.6 >
Hi Michal, On 19/09/2024 12:42, Michal Orzel wrote: > Attempt to attach an overlay (xl dt-overlay attach) to a domain without > first adding this overlay to Xen (xl dt-overlay add) results in an > overlay track entry being NULL in handle_attach_overlay_nodes(). This > leads to NULL pointer dereference and the following data abort crash: > > (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. > (XEN) Data Abort Trap. Syndrome=0x5 > (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 > (XEN) 0TH[0x000] = 0x46940f7f > (XEN) 1ST[0x000] = 0x0 > (XEN) CPU0: Unexpected Trap: Data Abort > (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- > ... > (XEN) Xen call trace: > (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) > (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) > (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 > > Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") > Signed-off-by: Michal Orzel <michal.orzel@amd.com> > --- > xen/common/dt-overlay.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c > index d53b4706cd2f..8606b14d1e8e 100644 > --- a/xen/common/dt-overlay.c > +++ b/xen/common/dt-overlay.c > @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, > out: > spin_unlock(&overlay_lock); > > - rangeset_destroy(entry->irq_ranges); > - rangeset_destroy(entry->iomem_ranges); > + if ( entry ) > + { > + rangeset_destroy(entry->irq_ranges); > + rangeset_destroy(entry->iomem_ranges); > + } While looking at the error paths in handle_attach_overlay_nodes(), I noticed we don't revert any partial changes made by handle_device(). In this case, I am wondering whether it is correct to destroy the rangeset. How would you be able to revert the changes? Cheers,
Hi Julien, On 20/09/2024 10:29, Julien Grall wrote: > > > Hi Michal, > > On 19/09/2024 12:42, Michal Orzel wrote: >> Attempt to attach an overlay (xl dt-overlay attach) to a domain without >> first adding this overlay to Xen (xl dt-overlay add) results in an >> overlay track entry being NULL in handle_attach_overlay_nodes(). This >> leads to NULL pointer dereference and the following data abort crash: >> >> (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. >> (XEN) Data Abort Trap. Syndrome=0x5 >> (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 >> (XEN) 0TH[0x000] = 0x46940f7f >> (XEN) 1ST[0x000] = 0x0 >> (XEN) CPU0: Unexpected Trap: Data Abort >> (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- >> ... >> (XEN) Xen call trace: >> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) >> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) >> (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 >> >> Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") >> Signed-off-by: Michal Orzel <michal.orzel@amd.com> >> --- >> xen/common/dt-overlay.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c >> index d53b4706cd2f..8606b14d1e8e 100644 >> --- a/xen/common/dt-overlay.c >> +++ b/xen/common/dt-overlay.c >> @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, >> out: >> spin_unlock(&overlay_lock); >> >> - rangeset_destroy(entry->irq_ranges); >> - rangeset_destroy(entry->iomem_ranges); >> + if ( entry ) >> + { >> + rangeset_destroy(entry->irq_ranges); >> + rangeset_destroy(entry->iomem_ranges); >> + } > > While looking at the error paths in handle_attach_overlay_nodes(), I > noticed we don't revert any partial changes made by handle_device(). > > In this case, I am wondering whether it is correct to destroy the > rangeset. How would you be able to revert the changes? I guess the same story applies as for the partial add/remove which was stated by Vikram in the commit msg of 7e5c4a8b86f12942de0566b1d61f71d15774364b meaning partial success withe some failures may lead to other failures and might need a system reboot. I did not carefully look into this series, my plan was to fix the issues without changing the functional behavior. FWICS, we do not yet support detachment (only add/remove and attach) and removal of nodes and ranges is only possible if the nodes are assigned to hwdom. ~Michal
On 23/09/2024 12:05, Michal Orzel wrote: > Hi Julien, Hi Michal, > On 20/09/2024 10:29, Julien Grall wrote: >> >> >> Hi Michal, >> >> On 19/09/2024 12:42, Michal Orzel wrote: >>> Attempt to attach an overlay (xl dt-overlay attach) to a domain without >>> first adding this overlay to Xen (xl dt-overlay add) results in an >>> overlay track entry being NULL in handle_attach_overlay_nodes(). This >>> leads to NULL pointer dereference and the following data abort crash: >>> >>> (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. >>> (XEN) Data Abort Trap. Syndrome=0x5 >>> (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 >>> (XEN) 0TH[0x000] = 0x46940f7f >>> (XEN) 1ST[0x000] = 0x0 >>> (XEN) CPU0: Unexpected Trap: Data Abort >>> (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- >>> ... >>> (XEN) Xen call trace: >>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) >>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) >>> (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 >>> >>> Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") >>> Signed-off-by: Michal Orzel <michal.orzel@amd.com> >>> --- >>> xen/common/dt-overlay.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c >>> index d53b4706cd2f..8606b14d1e8e 100644 >>> --- a/xen/common/dt-overlay.c >>> +++ b/xen/common/dt-overlay.c >>> @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, >>> out: >>> spin_unlock(&overlay_lock); >>> >>> - rangeset_destroy(entry->irq_ranges); >>> - rangeset_destroy(entry->iomem_ranges); >>> + if ( entry ) >>> + { >>> + rangeset_destroy(entry->irq_ranges); >>> + rangeset_destroy(entry->iomem_ranges); >>> + } >> >> While looking at the error paths in handle_attach_overlay_nodes(), I >> noticed we don't revert any partial changes made by handle_device(). >> >> In this case, I am wondering whether it is correct to destroy the >> rangeset. How would you be able to revert the changes? > I guess the same story applies as for the partial add/remove which was stated by Vikram > in the commit msg of 7e5c4a8b86f12942de0566b1d61f71d15774364b meaning partial success withe some > failures may lead to other failures and might need a system reboot. I did not carefully look into > this series, my plan was to fix the issues without changing the functional behavior. Do you mean in this series or forever? If the former, would you be able to outline what you expect after the end of this series? What should work? What should not work? > FWICS, we do not > yet support detachment (only add/remove and attach) and removal of nodes and ranges is only > possible if the nodes are assigned to hwdom. I need some clarifications. By "we do not yet support detachment", do you mean while a guest is running or do you also include a domain shutting down? Cheers,
Hi Julien, On 30/09/2024 12:37, Julien Grall wrote: > > > On 23/09/2024 12:05, Michal Orzel wrote: >> Hi Julien, > > Hi Michal, > >> On 20/09/2024 10:29, Julien Grall wrote: >>> >>> >>> Hi Michal, >>> >>> On 19/09/2024 12:42, Michal Orzel wrote: >>>> Attempt to attach an overlay (xl dt-overlay attach) to a domain without >>>> first adding this overlay to Xen (xl dt-overlay add) results in an >>>> overlay track entry being NULL in handle_attach_overlay_nodes(). This >>>> leads to NULL pointer dereference and the following data abort crash: >>>> >>>> (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. >>>> (XEN) Data Abort Trap. Syndrome=0x5 >>>> (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 >>>> (XEN) 0TH[0x000] = 0x46940f7f >>>> (XEN) 1ST[0x000] = 0x0 >>>> (XEN) CPU0: Unexpected Trap: Data Abort >>>> (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- >>>> ... >>>> (XEN) Xen call trace: >>>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) >>>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) >>>> (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 >>>> >>>> Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") >>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com> >>>> --- >>>> xen/common/dt-overlay.c | 7 +++++-- >>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c >>>> index d53b4706cd2f..8606b14d1e8e 100644 >>>> --- a/xen/common/dt-overlay.c >>>> +++ b/xen/common/dt-overlay.c >>>> @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, >>>> out: >>>> spin_unlock(&overlay_lock); >>>> >>>> - rangeset_destroy(entry->irq_ranges); >>>> - rangeset_destroy(entry->iomem_ranges); >>>> + if ( entry ) >>>> + { >>>> + rangeset_destroy(entry->irq_ranges); >>>> + rangeset_destroy(entry->iomem_ranges); >>>> + } >>> >>> While looking at the error paths in handle_attach_overlay_nodes(), I >>> noticed we don't revert any partial changes made by handle_device(). >>> >>> In this case, I am wondering whether it is correct to destroy the >>> rangeset. How would you be able to revert the changes? >> I guess the same story applies as for the partial add/remove which was stated by Vikram >> in the commit msg of 7e5c4a8b86f12942de0566b1d61f71d15774364b meaning partial success withe some >> failures may lead to other failures and might need a system reboot. I did not carefully look into >> this series, my plan was to fix the issues without changing the functional behavior. > > Do you mean in this series or forever? If the former, would you be able > to outline what you expect after the end of this series? What should > work? What should not work? The goal of this series is to fix the issues I encountered while doing some other DT overlay work as stated in the patch 0. The goal of each patch is clearly outlined in the commit message by stating what is being fixed and why. The first 3 patches contain "Fixes" tag, therefore there is no functional behavior change - only fixing what should have been fixed by the original series. Patch 4 does not have "Fixes" tag although in theory it could. There is no doc mentioning that adding nodes right into "/" is forbidden. Linux supports it so I added support for Xen as well. After end of *this* series I expect the issues mentioned in each patch to be fixed + the ability to add nodes directly under root node. It's definitely not the end of fixes for this series. I have already written at least 1 more fix that allows to have other nodes in the overlay tree that should be ignored (at the moment the code assumes that each node one level below root, needs to specify target-path. This is wrong as we can have __fixups__, __symbols__, and other nodes that we should simply ignore according to docs. > >> FWICS, we do not >> yet support detachment (only add/remove and attach) and removal of nodes and ranges is only >> possible if the nodes are assigned to hwdom. > > I need some clarifications. By "we do not yet support detachment", do > you mean while a guest is running or do you also include a domain > shutting down?In order to reason about detachment we need to go back to see what is supported in terms of attachment. At the moment, we can only attach dtbo to 1:1 domUs, therefore we can stop talking about libxl domUs for now. For dom0less domUs, we also cannot really shut down them. So it leaves us with dom0 whose reboot implies system reboot. At the moment there is no code that would release IRQs and unmap MMIOs. ~Michal
Hi, On 30/09/2024 15:04, Michal Orzel wrote: > Hi Julien, > > On 30/09/2024 12:37, Julien Grall wrote: >> >> >> On 23/09/2024 12:05, Michal Orzel wrote: >>> Hi Julien, >> >> Hi Michal, >> >>> On 20/09/2024 10:29, Julien Grall wrote: >>>> >>>> >>>> Hi Michal, >>>> >>>> On 19/09/2024 12:42, Michal Orzel wrote: >>>>> Attempt to attach an overlay (xl dt-overlay attach) to a domain without >>>>> first adding this overlay to Xen (xl dt-overlay add) results in an >>>>> overlay track entry being NULL in handle_attach_overlay_nodes(). This >>>>> leads to NULL pointer dereference and the following data abort crash: >>>>> >>>>> (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. >>>>> (XEN) Data Abort Trap. Syndrome=0x5 >>>>> (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 >>>>> (XEN) 0TH[0x000] = 0x46940f7f >>>>> (XEN) 1ST[0x000] = 0x0 >>>>> (XEN) CPU0: Unexpected Trap: Data Abort >>>>> (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- >>>>> ... >>>>> (XEN) Xen call trace: >>>>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) >>>>> (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) >>>>> (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 >>>>> >>>>> Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") >>>>> Signed-off-by: Michal Orzel <michal.orzel@amd.com> >>>>> --- >>>>> xen/common/dt-overlay.c | 7 +++++-- >>>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c >>>>> index d53b4706cd2f..8606b14d1e8e 100644 >>>>> --- a/xen/common/dt-overlay.c >>>>> +++ b/xen/common/dt-overlay.c >>>>> @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, >>>>> out: >>>>> spin_unlock(&overlay_lock); >>>>> >>>>> - rangeset_destroy(entry->irq_ranges); >>>>> - rangeset_destroy(entry->iomem_ranges); >>>>> + if ( entry ) >>>>> + { >>>>> + rangeset_destroy(entry->irq_ranges); >>>>> + rangeset_destroy(entry->iomem_ranges); >>>>> + } >>>> >>>> While looking at the error paths in handle_attach_overlay_nodes(), I >>>> noticed we don't revert any partial changes made by handle_device(). >>>> >>>> In this case, I am wondering whether it is correct to destroy the >>>> rangeset. How would you be able to revert the changes? >>> I guess the same story applies as for the partial add/remove which was stated by Vikram >>> in the commit msg of 7e5c4a8b86f12942de0566b1d61f71d15774364b meaning partial success withe some >>> failures may lead to other failures and might need a system reboot. I did not carefully look into >>> this series, my plan was to fix the issues without changing the functional behavior. >> >> Do you mean in this series or forever? If the former, would you be able >> to outline what you expect after the end of this series? What should >> work? What should not work? > The goal of this series is to fix the issues I encountered while doing some other DT overlay work > as stated in the patch 0. The goal of each patch is clearly outlined in the commit message > by stating what is being fixed and why. The first 3 patches contain "Fixes" tag, therefore there> is no functional behavior change - only fixing what should have been fixed by the original series. > Patch 4 does not have "Fixes" tag although in theory it could. There is no doc mentioning that adding > nodes right into "/" is forbidden. Linux supports it so I added support for Xen as well. > > After end of *this* series I expect the issues mentioned in each patch to be fixed + the ability to add > nodes directly under root node. > > It's definitely not the end of fixes for this series. Good. This wasn't clear from your wording. Thanks for clarification! >> >>> FWICS, we do not >>> yet support detachment (only add/remove and attach) and removal of nodes and ranges is only >>> possible if the nodes are assigned to hwdom. >> >> I need some clarifications. By "we do not yet support detachment", do >> you mean while a guest is running or do you also include a domain >> shutting down?In order to reason about detachment we need to go back to see what is supported in terms of attachment. > At the moment, we can only attach dtbo to 1:1 domUs, therefore we can stop talking about libxl domUs for now. Ah, I missed the check. I thought we already supported device attachment to guest started by xl. > For dom0less domUs, we also cannot really shut down them. So it leaves us with dom0 whose reboot implies system > reboot. At the moment there is no code that would release IRQs and unmap MMIOs. Understood. Then no action expected from my side for now. But it would be good to keep track of the unknown issues when we will implement detach. Cheers,
diff --git a/xen/common/dt-overlay.c b/xen/common/dt-overlay.c index d53b4706cd2f..8606b14d1e8e 100644 --- a/xen/common/dt-overlay.c +++ b/xen/common/dt-overlay.c @@ -908,8 +908,11 @@ static long handle_attach_overlay_nodes(struct domain *d, out: spin_unlock(&overlay_lock); - rangeset_destroy(entry->irq_ranges); - rangeset_destroy(entry->iomem_ranges); + if ( entry ) + { + rangeset_destroy(entry->irq_ranges); + rangeset_destroy(entry->iomem_ranges); + } return rc; }
Attempt to attach an overlay (xl dt-overlay attach) to a domain without first adding this overlay to Xen (xl dt-overlay add) results in an overlay track entry being NULL in handle_attach_overlay_nodes(). This leads to NULL pointer dereference and the following data abort crash: (XEN) Cannot find any matching tracker with input dtbo. Operation is supported only for prior added dtbo. (XEN) Data Abort Trap. Syndrome=0x5 (XEN) Walking Hypervisor VA 0x40 on CPU0 via TTBR 0x0000000046948000 (XEN) 0TH[0x000] = 0x46940f7f (XEN) 1ST[0x000] = 0x0 (XEN) CPU0: Unexpected Trap: Data Abort (XEN) ----[ Xen-4.20-unstable arm64 debug=y Not tainted ]---- ... (XEN) Xen call trace: (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (PC) (XEN) [<00000a0000208b30>] dt_overlay_domctl+0x304/0x370 (LR) (XEN) [<00000a0000274b7c>] arch_do_domctl+0x48/0x328 Fixes: 4c733873b5c2 ("xen/arm: Add XEN_DOMCTL_dt_overlay and device attachment to domains") Signed-off-by: Michal Orzel <michal.orzel@amd.com> --- xen/common/dt-overlay.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)