Message ID | 1596478888-23030-5-git-send-email-olekstysh@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | IOREQ feature (+ virtio-mmio) on Arm | expand |
> -----Original Message----- > From: Xen-devel <xen-devel-bounces@lists.xenproject.org> On Behalf Of Oleksandr Tyshchenko > Sent: 03 August 2020 19:21 > To: xen-devel@lists.xenproject.org > Cc: Stefano Stabellini <sstabellini@kernel.org>; Julien Grall <julien@xen.org>; Wei Liu <wl@xen.org>; > Andrew Cooper <andrew.cooper3@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; George Dunlap > <george.dunlap@citrix.com>; Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Julien Grall > <julien.grall@arm.com>; Jan Beulich <jbeulich@suse.com>; Daniel De Graaf <dgdegra@tycho.nsa.gov>; > Volodymyr Babchuk <Volodymyr_Babchuk@epam.com> > Subject: [RFC PATCH V1 04/12] xen/arm: Introduce arch specific bits for IOREQ/DM features > > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > This patch makes possible to forward Guest MMIO accesses > to a device emulator on Arm and enables that support for > Arm64. > > Also update XSM code a bit to let DM op be used on Arm. > New arch DM op will be introduced in the follow-up patch. > > Please note, at the moment build on Arm32 is broken > (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone > wants to enable CONFIG_IOREQ_SERVER due to the lack of > cmpxchg_64 support on Arm32. > > Please note, this is a split/cleanup of Julien's PoC: > "Add support for Guest IO forwarding to a device emulator" > > Signed-off-by: Julien Grall <julien.grall@arm.com> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > --- > tools/libxc/xc_dom_arm.c | 25 +++++++--- > xen/arch/arm/Kconfig | 1 + > xen/arch/arm/Makefile | 2 + > xen/arch/arm/dm.c | 34 +++++++++++++ > xen/arch/arm/domain.c | 9 ++++ > xen/arch/arm/hvm.c | 46 +++++++++++++++++- > xen/arch/arm/io.c | 67 +++++++++++++++++++++++++- > xen/arch/arm/ioreq.c | 86 +++++++++++++++++++++++++++++++++ > xen/arch/arm/traps.c | 17 +++++++ > xen/common/memory.c | 5 +- > xen/include/asm-arm/domain.h | 80 +++++++++++++++++++++++++++++++ > xen/include/asm-arm/hvm/ioreq.h | 103 ++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/mmio.h | 1 + > xen/include/asm-arm/p2m.h | 7 +-- > xen/include/xsm/dummy.h | 4 +- > xen/include/xsm/xsm.h | 6 +-- > xen/xsm/dummy.c | 2 +- > xen/xsm/flask/hooks.c | 5 +- > 18 files changed, 476 insertions(+), 24 deletions(-) > create mode 100644 xen/arch/arm/dm.c > create mode 100644 xen/arch/arm/ioreq.c > create mode 100644 xen/include/asm-arm/hvm/ioreq.h > > diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c > index 931404c..b5fc066 100644 > --- a/tools/libxc/xc_dom_arm.c > +++ b/tools/libxc/xc_dom_arm.c > @@ -26,11 +26,19 @@ > #include "xg_private.h" > #include "xc_dom.h" > > -#define NR_MAGIC_PAGES 4 > + > #define CONSOLE_PFN_OFFSET 0 > #define XENSTORE_PFN_OFFSET 1 > #define MEMACCESS_PFN_OFFSET 2 > #define VUART_PFN_OFFSET 3 > +#define IOREQ_SERVER_PFN_OFFSET 4 > + > +#define NR_IOREQ_SERVER_PAGES 8 > +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) > + > +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) > + > +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have begun to do away with by adding resource mapping. Paul
Hi Paul, On 04/08/2020 08:49, Paul Durrant wrote: >> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c >> index 931404c..b5fc066 100644 >> --- a/tools/libxc/xc_dom_arm.c >> +++ b/tools/libxc/xc_dom_arm.c >> @@ -26,11 +26,19 @@ >> #include "xg_private.h" >> #include "xc_dom.h" >> >> -#define NR_MAGIC_PAGES 4 >> + >> #define CONSOLE_PFN_OFFSET 0 >> #define XENSTORE_PFN_OFFSET 1 >> #define MEMACCESS_PFN_OFFSET 2 >> #define VUART_PFN_OFFSET 3 >> +#define IOREQ_SERVER_PFN_OFFSET 4 >> + >> +#define NR_IOREQ_SERVER_PAGES 8 >> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) >> + >> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) >> + >> +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) > > Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have begun to do away with by adding resource mapping. This would require us to mandate at least Linux 4.17 in a domain that will run an IOREQ server. If we don't mandate this, the minimum version would be 4.10 where DM OP was introduced. Because of XSA-300, we could technically not safely run an IOREQ server with existing Linux. So it is probably OK to enforce the use of the acquire interface. Note that I haven't yet looked at the rest of the series. So I am not sure if there is more work necessary to enable it. Cheers,
On Tue, 4 Aug 2020, Julien Grall wrote: > On 04/08/2020 08:49, Paul Durrant wrote: > > > diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c > > > index 931404c..b5fc066 100644 > > > --- a/tools/libxc/xc_dom_arm.c > > > +++ b/tools/libxc/xc_dom_arm.c > > > @@ -26,11 +26,19 @@ > > > #include "xg_private.h" > > > #include "xc_dom.h" > > > > > > -#define NR_MAGIC_PAGES 4 > > > + > > > #define CONSOLE_PFN_OFFSET 0 > > > #define XENSTORE_PFN_OFFSET 1 > > > #define MEMACCESS_PFN_OFFSET 2 > > > #define VUART_PFN_OFFSET 3 > > > +#define IOREQ_SERVER_PFN_OFFSET 4 > > > + > > > +#define NR_IOREQ_SERVER_PAGES 8 > > > +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) > > > + > > > +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) > > > + > > > +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) > > > > Why introduce 'magic pages' for Arm? It's quite a horrible hack that we have > > begun to do away with by adding resource mapping. > > This would require us to mandate at least Linux 4.17 in a domain that will run > an IOREQ server. If we don't mandate this, the minimum version would be 4.10 > where DM OP was introduced. > > Because of XSA-300, we could technically not safely run an IOREQ server with > existing Linux. So it is probably OK to enforce the use of the acquire > interface. +1
On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > This patch makes possible to forward Guest MMIO accesses > to a device emulator on Arm and enables that support for > Arm64. > > Also update XSM code a bit to let DM op be used on Arm. > New arch DM op will be introduced in the follow-up patch. > > Please note, at the moment build on Arm32 is broken > (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone Speaking of buffered_ioreq, if I recall correctly, they were only used for VGA-related things on x86. It looks like it is still true. If so, do we need it on ARM? Note that I don't think we can get rid of it from the interface as it is baked into ioreq, but it might be possible to have a dummy implementation on ARM. Or maybe not: looking at xen/common/hvm/ioreq.c it looks like it would be difficult to disentangle bufioreq stuff from the rest of the code. > wants to enable CONFIG_IOREQ_SERVER due to the lack of > cmpxchg_64 support on Arm32. > > Please note, this is a split/cleanup of Julien's PoC: > "Add support for Guest IO forwarding to a device emulator" > > Signed-off-by: Julien Grall <julien.grall@arm.com> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> [...] > @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) > */ > void leave_hypervisor_to_guest(void) > { > +#ifdef CONFIG_IOREQ_SERVER > + /* > + * XXX: Check the return. Shall we call that in > + * continue_running and context_switch instead? > + * The benefits would be to avoid calling > + * handle_hvm_io_completion on every return. > + */ Yeah, that could be a simple and good optimization > + local_irq_enable(); > + handle_hvm_io_completion(current); > +#endif > local_irq_disable(); > > check_for_vcpu_work(); > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 4e2f582..e060b0a 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -11,12 +11,64 @@ > #include <asm/vgic.h> > #include <asm/vpl011.h> > #include <public/hvm/params.h> > +#include <public/hvm/dm_op.h> > +#include <public/hvm/ioreq.h> > #include <xen/serial.h> > #include <xen/rbtree.h> > > +struct hvm_ioreq_page { > + gfn_t gfn; > + struct page_info *page; > + void *va; > +}; > + > +struct hvm_ioreq_vcpu { > + struct list_head list_entry; > + struct vcpu *vcpu; > + evtchn_port_t ioreq_evtchn; > + bool pending; > +}; > + > +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) > +#define MAX_NR_IO_RANGES 256 > + > +#define MAX_NR_IOREQ_SERVERS 8 > +#define DEFAULT_IOSERVID 0 > + > +struct hvm_ioreq_server { > + struct domain *target, *emulator; > + > + /* Lock to serialize toolstack modifications */ > + spinlock_t lock; > + > + struct hvm_ioreq_page ioreq; > + struct list_head ioreq_vcpu_list; > + struct hvm_ioreq_page bufioreq; > + > + /* Lock to serialize access to buffered ioreq ring */ > + spinlock_t bufioreq_lock; > + evtchn_port_t bufioreq_evtchn; > + struct rangeset *range[NR_IO_RANGE_TYPES]; > + bool enabled; > + uint8_t bufioreq_handling; > +}; > + > struct hvm_domain > { > uint64_t params[HVM_NR_PARAMS]; > + > + /* Guest page range used for non-default ioreq servers */ > + struct { > + unsigned long base; > + unsigned long mask; > + unsigned long legacy_mask; /* indexed by HVM param number */ > + } ioreq_gfn; > + > + /* Lock protects all other values in the sub-struct and the default */ > + struct { > + spinlock_t lock; > + struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; > + } ioreq_server; > }; > > #ifdef CONFIG_ARM_64 > @@ -93,6 +145,29 @@ struct arch_domain > #endif > } __cacheline_aligned; > > +enum hvm_io_completion { > + HVMIO_no_completion, > + HVMIO_mmio_completion, > + HVMIO_pio_completion, > + HVMIO_realmode_completion realmode is an x86-ism (as pio), I wonder if we could get rid of it on ARM > +}; > + > +struct hvm_vcpu_io { > + /* I/O request in flight to device model. */ > + enum hvm_io_completion io_completion; > + ioreq_t io_req; > + > + /* > + * HVM emulation: > + * Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn. > + * The latter is known to be an MMIO frame (not RAM). > + * This translation is only valid for accesses as per @mmio_access. > + */ > + struct npfec mmio_access; > + unsigned long mmio_gla; > + unsigned long mmio_gpfn; > +}; > + > struct arch_vcpu > { > struct { > @@ -206,6 +281,11 @@ struct arch_vcpu > */ > bool need_flush_to_ram; > > + struct hvm_vcpu > + { > + struct hvm_vcpu_io hvm_io; > + } hvm; > + > } __cacheline_aligned; > > void vcpu_show_execution_state(struct vcpu *); > diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h > new file mode 100644 > index 0000000..83a560c > --- /dev/null > +++ b/xen/include/asm-arm/hvm/ioreq.h > @@ -0,0 +1,103 @@ > +/* > + * hvm.h: Hardware virtual machine assist interface definitions. > + * > + * Copyright (c) 2016 Citrix Systems Inc. > + * Copyright (c) 2019 Arm ltd. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#ifndef __ASM_ARM_HVM_IOREQ_H__ > +#define __ASM_ARM_HVM_IOREQ_H__ > + > +#include <public/hvm/ioreq.h> > +#include <public/hvm/dm_op.h> > + > +#define has_vpci(d) (false) > + > +bool handle_mmio(void); > + > +static inline bool handle_pio(uint16_t port, unsigned int size, int dir) > +{ > + /* XXX */ > + BUG(); > + return true; > +} > + > +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) > +{ > + return p->addr; > +} > + > +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) > +{ > + unsigned long size = p->size; > + > + return p->addr + size - 1; > +} > + > +struct hvm_ioreq_server; > + > +static inline int p2m_set_ioreq_server(struct domain *d, > + unsigned int flags, > + struct hvm_ioreq_server *s) > +{ > + return -EOPNOTSUPP; > +} > + > +static inline void msix_write_completion(struct vcpu *v) > +{ > +} > + > +static inline void handle_realmode_completion(void) > +{ > + ASSERT_UNREACHABLE(); > +} > + > +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn) > +{ > +} > + > +static inline void hvm_get_ioreq_server_range_type(struct domain *d, > + ioreq_t *p, > + uint8_t *type, > + uint64_t *addr) > +{ > + *type = (p->type == IOREQ_TYPE_PIO) ? > + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; > + *addr = p->addr; > +} > + > +static inline void arch_hvm_ioreq_init(struct domain *d) > +{ > +} > + > +static inline void arch_hvm_ioreq_destroy(struct domain *d) > +{ > +} > + > +#define IOREQ_IO_HANDLED IO_HANDLED > +#define IOREQ_IO_UNHANDLED IO_UNHANDLED > +#define IOREQ_IO_RETRY IO_RETRY > + > +#endif /* __ASM_X86_HVM_IOREQ_H__ */ > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index 5fdb6e8..5823f11 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, > mfn_t mfn) > { > /* > - * NOTE: If this is implemented then proper reference counting of > - * foreign entries will need to be implemented. > + * XXX: handle properly reference. It looks like the page may not always > + * belong to d. Just as a reference, and without taking away anything from the comment, I think that QEMU is doing its own internal reference counting for these mappings. > */ > - return -EOPNOTSUPP; > + > + return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); > } > > /*
On 05.08.2020 01:22, Stefano Stabellini wrote: > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: >> --- a/xen/include/asm-arm/p2m.h >> +++ b/xen/include/asm-arm/p2m.h >> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >> mfn_t mfn) >> { >> /* >> - * NOTE: If this is implemented then proper reference counting of >> - * foreign entries will need to be implemented. >> + * XXX: handle properly reference. It looks like the page may not always >> + * belong to d. > > Just as a reference, and without taking away anything from the comment, > I think that QEMU is doing its own internal reference counting for these > mappings. Which of course in no way replaces the need to do proper ref counting in Xen. (Just FAOD, as I'm not sure why you've said what you've said.) Jan
Hi Stefano, On 05/08/2020 00:22, Stefano Stabellini wrote: > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >> >> This patch makes possible to forward Guest MMIO accesses >> to a device emulator on Arm and enables that support for >> Arm64. >> >> Also update XSM code a bit to let DM op be used on Arm. >> New arch DM op will be introduced in the follow-up patch. >> >> Please note, at the moment build on Arm32 is broken >> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone > > Speaking of buffered_ioreq, if I recall correctly, they were only used > for VGA-related things on x86. It looks like it is still true. > > If so, do we need it on ARM? Note that I don't think we can get rid of > it from the interface as it is baked into ioreq, but it might be > possible to have a dummy implementation on ARM. Or maybe not: looking at > xen/common/hvm/ioreq.c it looks like it would be difficult to > disentangle bufioreq stuff from the rest of the code. We possibly don't need it right now. However, this could possibly be used in the future (e.g. virtio notification doorbell). >> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >> */ >> void leave_hypervisor_to_guest(void) >> { >> +#ifdef CONFIG_IOREQ_SERVER >> + /* >> + * XXX: Check the return. Shall we call that in >> + * continue_running and context_switch instead? >> + * The benefits would be to avoid calling >> + * handle_hvm_io_completion on every return. >> + */ > > Yeah, that could be a simple and good optimization Well, it is not simple as it is sounds :). handle_hvm_io_completion() is the function in charge to mark the vCPU as waiting for I/O. So we would at least need to split the function. I wrote this TODO because I wasn't sure about the complexity of handle_hvm_io_completion(current). Looking at it again, the main complexity is the looping over the IOREQ servers. I think it would be better to optimize handle_hvm_io_completion() rather than trying to hack the context_switch() or continue_running(). [...] >> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >> index 5fdb6e8..5823f11 100644 >> --- a/xen/include/asm-arm/p2m.h >> +++ b/xen/include/asm-arm/p2m.h >> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >> mfn_t mfn) >> { >> /* >> - * NOTE: If this is implemented then proper reference counting of >> - * foreign entries will need to be implemented. >> + * XXX: handle properly reference. It looks like the page may not always >> + * belong to d. > > Just as a reference, and without taking away anything from the comment, > I think that QEMU is doing its own internal reference counting for these > mappings. I am not sure how this matters here? We can't really trust the DM to do the right thing if it is not running in dom0. But, IIRC, the problem is some of the pages doesn't belong to do a domain, so it is not possible to treat them as foreign mapping (e.g. you wouldn't be able to grab a reference). This investigation was done a couple of years ago, so this may have changed in recent Xen. As a side note, I am a bit surprised to see most of my original TODOs present in the code. What is the plan to solve them? Cheers,
Hi, On 03/08/2020 19:21, Oleksandr Tyshchenko wrote: > From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > > This patch makes possible to forward Guest MMIO accesses > to a device emulator on Arm and enables that support for > Arm64. > > Also update XSM code a bit to let DM op be used on Arm. > New arch DM op will be introduced in the follow-up patch. > > Please note, at the moment build on Arm32 is broken > (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone > wants to enable CONFIG_IOREQ_SERVER due to the lack of > cmpxchg_64 support on Arm32. > > Please note, this is a split/cleanup of Julien's PoC: > "Add support for Guest IO forwarding to a device emulator" > > Signed-off-by: Julien Grall <julien.grall@arm.com> > Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> > --- > tools/libxc/xc_dom_arm.c | 25 +++++++--- > xen/arch/arm/Kconfig | 1 + > xen/arch/arm/Makefile | 2 + > xen/arch/arm/dm.c | 34 +++++++++++++ > xen/arch/arm/domain.c | 9 ++++ > xen/arch/arm/hvm.c | 46 +++++++++++++++++- > xen/arch/arm/io.c | 67 +++++++++++++++++++++++++- > xen/arch/arm/ioreq.c | 86 +++++++++++++++++++++++++++++++++ > xen/arch/arm/traps.c | 17 +++++++ > xen/common/memory.c | 5 +- > xen/include/asm-arm/domain.h | 80 +++++++++++++++++++++++++++++++ > xen/include/asm-arm/hvm/ioreq.h | 103 ++++++++++++++++++++++++++++++++++++++++ > xen/include/asm-arm/mmio.h | 1 + > xen/include/asm-arm/p2m.h | 7 +-- > xen/include/xsm/dummy.h | 4 +- > xen/include/xsm/xsm.h | 6 +-- > xen/xsm/dummy.c | 2 +- > xen/xsm/flask/hooks.c | 5 +- > 18 files changed, 476 insertions(+), 24 deletions(-) > create mode 100644 xen/arch/arm/dm.c > create mode 100644 xen/arch/arm/ioreq.c > create mode 100644 xen/include/asm-arm/hvm/ioreq.h It feels to me the patch is doing quite a few things that are indirectly related. Can this be split to make the review easier? I would like at least the following split from the series: - The tools changes - The P2M changes - The HVMOP plumbing (if we still require them) [...] > diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c > new file mode 100644 > index 0000000..2437099 > --- /dev/null > +++ b/xen/arch/arm/dm.c > @@ -0,0 +1,34 @@ > +/* > + * Copyright (c) 2019 Arm ltd. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#include <xen/hypercall.h> > +#include <asm/vgic.h> The list of includes sounds strange. Can we make sure to include only necessary headers and add the others when they are required? > + > +int arch_dm_op(struct xen_dm_op *op, struct domain *d, > + const struct dmop_args *op_args, bool *const_op) > +{ > + return -EOPNOTSUPP; > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) > { > long rc = 0; > @@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) > if ( rc ) > goto param_fail; > > - d->arch.hvm.params[a.index] = a.value; > + rc = hvmop_set_param(d, &a); > } > else > { > diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c > index ae7ef96..436f669 100644 > --- a/xen/arch/arm/io.c > +++ b/xen/arch/arm/io.c > @@ -16,6 +16,7 @@ > * GNU General Public License for more details. > */ > > +#include <xen/hvm/ioreq.h> > #include <xen/lib.h> > #include <xen/spinlock.h> > #include <xen/sched.h> > @@ -107,6 +108,62 @@ static const struct mmio_handler *find_mmio_handler(struct domain *d, > return handler; > } > > +#ifdef CONFIG_IOREQ_SERVER Can we just implement this function in ioreq.c and provide a stub when !CONFIG_IOREQ_SERVER? > +static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, > + struct vcpu *v, mmio_info_t *info) > +{ > + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; > + ioreq_t p = { > + .type = IOREQ_TYPE_COPY, > + .addr = info->gpa, > + .size = 1 << info->dabt.size, > + .count = 0, > + .dir = !info->dabt.write, > + .df = 0, /* XXX: What's for? */ > + .data = get_user_reg(regs, info->dabt.reg), > + .state = STATE_IOREQ_READY, > + }; > + struct hvm_ioreq_server *s = NULL; > + enum io_state rc; > + > + switch ( vio->io_req.state ) > + { > + case STATE_IOREQ_NONE: > + break; > + default: > + printk("d%u wrong state %u\n", v->domain->domain_id, > + vio->io_req.state); This will likely want to be a gprintk() or gdprintk() to avoid a guest spamming Xen. > + return IO_ABORT; > + } > + > + s = hvm_select_ioreq_server(v->domain, &p); > + if ( !s ) > + return IO_UNHANDLED; > + > + if ( !info->dabt.valid ) > + { > + printk("Valid bit not set\n"); Same here. However, I am not convinced this is a useful message to keep. > + return IO_ABORT; > + } > + > + vio->io_req = p; > + > + rc = hvm_send_ioreq(s, &p, 0); > + if ( rc != IO_RETRY || v->domain->is_shutting_down ) > + vio->io_req.state = STATE_IOREQ_NONE; > + else if ( !hvm_ioreq_needs_completion(&vio->io_req) ) > + rc = IO_HANDLED; > + else > + vio->io_completion = HVMIO_mmio_completion; > + > + /* XXX: Decide what to do */ We want to understand how IO_RETRY can happen on x86 first. With that, we should be able to understand whether this can happen on Arm as well. > + if ( rc == IO_RETRY ) > + rc = IO_HANDLED; > + > + return rc; > +} > +#endif > + > enum io_state try_handle_mmio(struct cpu_user_regs *regs, > const union hsr hsr, > paddr_t gpa) > @@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs, > > handler = find_mmio_handler(v->domain, info.gpa); > if ( !handler ) > - return IO_UNHANDLED; > + { > + int rc = IO_UNHANDLED; > + > +#ifdef CONFIG_IOREQ_SERVER > + rc = try_fwd_ioserv(regs, v, &info); > +#endif > + > + return rc; > + } > > /* All the instructions used on emulated MMIO region should be valid */ > if ( !dabt.valid ) > diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c > new file mode 100644 > index 0000000..a9cc839 > --- /dev/null > +++ b/xen/arch/arm/ioreq.c > @@ -0,0 +1,86 @@ > +/* > + * arm/ioreq.c: hardware virtual machine I/O emulation > + * > + * Copyright (c) 2019 Arm ltd. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#include <xen/ctype.h> > +#include <xen/hvm/ioreq.h> > +#include <xen/init.h> > +#include <xen/lib.h> > +#include <xen/trace.h> > +#include <xen/sched.h> > +#include <xen/irq.h> > +#include <xen/softirq.h> > +#include <xen/domain.h> > +#include <xen/domain_page.h> > +#include <xen/event.h> > +#include <xen/paging.h> > +#include <xen/vpci.h> > + > +#include <public/hvm/dm_op.h> > +#include <public/hvm/ioreq.h> > + > +bool handle_mmio(void) The name of the function is pretty generic and can be confusing on Arm (we already have a try_handle_mmio()). What is this function supposed to do? > +{ > + struct vcpu *v = current; > + struct cpu_user_regs *regs = guest_cpu_user_regs(); > + const union hsr hsr = { .bits = regs->hsr }; > + const struct hsr_dabt dabt = hsr.dabt; > + /* Code is similar to handle_read */ > + uint8_t size = (1 << dabt.size) * 8; > + register_t r = v->arch.hvm.hvm_io.io_req.data; > + > + /* We should only be here on Guest Data Abort */ > + ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL); > + > + /* We are done with the IO */ > + /* XXX: Is it the right place? */ > + v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE; > + > + /* XXX: Do we need to take care of write here ? */ > + if ( dabt.write ) > + return true; > + > + /* > + * Sign extend if required. > + * Note that we expect the read handler to have zeroed the bits > + * outside the requested access size. > + */ > + if ( dabt.sign && (r & (1UL << (size - 1))) ) > + { > + /* > + * We are relying on register_t using the same as > + * an unsigned long in order to keep the 32-bit assembly > + * code smaller. > + */ > + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); > + r |= (~0UL) << size; > + } > + > + set_user_reg(regs, dabt.reg, r); > + > + return true; > +} [...] > diff --git a/xen/common/memory.c b/xen/common/memory.c > index 9283e5e..0000477 100644 > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -8,6 +8,7 @@ > */ > > #include <xen/domain_page.h> > +#include <xen/hvm/ioreq.h> > #include <xen/types.h> > #include <xen/lib.h> > #include <xen/mm.h> > @@ -30,10 +31,6 @@ > #include <public/memory.h> > #include <xsm/xsm.h> > > -#ifdef CONFIG_IOREQ_SERVER > -#include <xen/hvm/ioreq.h> > -#endif > - Why do you remove something your just introduced? > #ifdef CONFIG_X86 > #include <asm/guest.h> > #endif > diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h > index 4e2f582..e060b0a 100644 > --- a/xen/include/asm-arm/domain.h > +++ b/xen/include/asm-arm/domain.h > @@ -11,12 +11,64 @@ > #include <asm/vgic.h> > #include <asm/vpl011.h> > #include <public/hvm/params.h> > +#include <public/hvm/dm_op.h> > +#include <public/hvm/ioreq.h> > #include <xen/serial.h> > #include <xen/rbtree.h> > > +struct hvm_ioreq_page { > + gfn_t gfn; > + struct page_info *page; > + void *va; > +}; AFAICT all the structures/define you introduced here are used by the code common. So it feels to me they should be defined in a common header. > + > +struct hvm_ioreq_vcpu { > + struct list_head list_entry; > + struct vcpu *vcpu; > + evtchn_port_t ioreq_evtchn; > + bool pending; > +}; > + > +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) > +#define MAX_NR_IO_RANGES 256 > + > +#define MAX_NR_IOREQ_SERVERS 8 > +#define DEFAULT_IOSERVID 0 > + > +struct hvm_ioreq_server { > + struct domain *target, *emulator; > + > + /* Lock to serialize toolstack modifications */ > + spinlock_t lock; > + > + struct hvm_ioreq_page ioreq; > + struct list_head ioreq_vcpu_list; > + struct hvm_ioreq_page bufioreq; > + > + /* Lock to serialize access to buffered ioreq ring */ > + spinlock_t bufioreq_lock; > + evtchn_port_t bufioreq_evtchn; > + struct rangeset *range[NR_IO_RANGE_TYPES]; > + bool enabled; > + uint8_t bufioreq_handling; > +}; > + > struct hvm_domain > { > uint64_t params[HVM_NR_PARAMS]; > + > + /* Guest page range used for non-default ioreq servers */ > + struct { > + unsigned long base; > + unsigned long mask; > + unsigned long legacy_mask; /* indexed by HVM param number */ > + } ioreq_gfn; > + > + /* Lock protects all other values in the sub-struct and the default */ > + struct { > + spinlock_t lock; > + struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; > + } ioreq_server; > }; > > #ifdef CONFIG_ARM_64 > @@ -93,6 +145,29 @@ struct arch_domain > #endif > } __cacheline_aligned; > > +enum hvm_io_completion { > + HVMIO_no_completion, > + HVMIO_mmio_completion, > + HVMIO_pio_completion, > + HVMIO_realmode_completion > +}; > + > +struct hvm_vcpu_io { > + /* I/O request in flight to device model. */ > + enum hvm_io_completion io_completion; > + ioreq_t io_req; > + > + /* > + * HVM emulation: > + * Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn. > + * The latter is known to be an MMIO frame (not RAM). > + * This translation is only valid for accesses as per @mmio_access. > + */ > + struct npfec mmio_access; > + unsigned long mmio_gla; > + unsigned long mmio_gpfn; > +}; > + > struct arch_vcpu > { > struct { > @@ -206,6 +281,11 @@ struct arch_vcpu > */ > bool need_flush_to_ram; > > + struct hvm_vcpu > + { > + struct hvm_vcpu_io hvm_io; > + } hvm; > + > } __cacheline_aligned; > > void vcpu_show_execution_state(struct vcpu *); > diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h > new file mode 100644 > index 0000000..83a560c > --- /dev/null > +++ b/xen/include/asm-arm/hvm/ioreq.h > @@ -0,0 +1,103 @@ > +/* > + * hvm.h: Hardware virtual machine assist interface definitions. > + * > + * Copyright (c) 2016 Citrix Systems Inc. > + * Copyright (c) 2019 Arm ltd. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#ifndef __ASM_ARM_HVM_IOREQ_H__ > +#define __ASM_ARM_HVM_IOREQ_H__ > + > +#include <public/hvm/ioreq.h> > +#include <public/hvm/dm_op.h> > + > +#define has_vpci(d) (false) It feels to me this wants to be defined in vcpi.h. > + > +bool handle_mmio(void); > + > +static inline bool handle_pio(uint16_t port, unsigned int size, int dir) > +{ > + /* XXX */ Can you expand this TODO? What do you expect to do? > + BUG(); > + return true; > +} > + > +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) > +{ > + return p->addr; > +} I understand that the x86 version is more complex as it check p->df. However, aside reducing the complexity, I am not sure why we would want to diverge it. After all, IOREQ is now meant to be a common feature. > + > +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) > +{ > + unsigned long size = p->size; > + > + return p->addr + size - 1; > +} Same. > + > +struct hvm_ioreq_server; Why do we need a forward declaration? > + > +static inline int p2m_set_ioreq_server(struct domain *d, > + unsigned int flags, > + struct hvm_ioreq_server *s) > +{ > + return -EOPNOTSUPP; > +} This should be defined in p2m.h. But I am not even sure what it is meant for. Can you expand it? > + > +static inline void msix_write_completion(struct vcpu *v) > +{ > +} > + > +static inline void handle_realmode_completion(void) > +{ > + ASSERT_UNREACHABLE(); > +} realmode is very x86 specific. So I don't think this function should be called from common code. It might be worth considering to split handle_hvm_io_completion() is 2 parts: common and arch specific. > + > +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn) > +{ > +} This will want to be stubbed in asm-arm/paging.h. > + > +static inline void hvm_get_ioreq_server_range_type(struct domain *d, > + ioreq_t *p, > + uint8_t *type, > + uint64_t *addr) > +{ > + *type = (p->type == IOREQ_TYPE_PIO) ? > + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; > + *addr = p->addr; > +} > + > +static inline void arch_hvm_ioreq_init(struct domain *d) > +{ > +} > + > +static inline void arch_hvm_ioreq_destroy(struct domain *d) > +{ > +} > + > +#define IOREQ_IO_HANDLED IO_HANDLED > +#define IOREQ_IO_UNHANDLED IO_UNHANDLED > +#define IOREQ_IO_RETRY IO_RETRY > + > +#endif /* __ASM_X86_HVM_IOREQ_H__ */ s/X86/ARM/ > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h > index 8dbfb27..7ab873c 100644 > --- a/xen/include/asm-arm/mmio.h > +++ b/xen/include/asm-arm/mmio.h > @@ -37,6 +37,7 @@ enum io_state > IO_ABORT, /* The IO was handled by the helper and led to an abort. */ > IO_HANDLED, /* The IO was successfully handled by the helper. */ > IO_UNHANDLED, /* The IO was not handled by the helper. */ > + IO_RETRY, /* Retry the emulation for some reason */ > }; > > typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info, > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index 5fdb6e8..5823f11 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, > mfn_t mfn) > { > /* > - * NOTE: If this is implemented then proper reference counting of > - * foreign entries will need to be implemented. > + * XXX: handle properly reference. It looks like the page may not always > + * belong to d. > */ > - return -EOPNOTSUPP; > + > + return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); Treating foreign as p2m_ram_rw is more an hack that a real fix. I have answered to this separately (see my answer on Stefano's e-mail), so we can continue the conversation there. Cheers,
On 05.08.2020 16:12, Julien Grall wrote: > On 03/08/2020 19:21, Oleksandr Tyshchenko wrote: >> --- /dev/null >> +++ b/xen/include/asm-arm/hvm/ioreq.h >> @@ -0,0 +1,103 @@ >> +/* >> + * hvm.h: Hardware virtual machine assist interface definitions. >> + * >> + * Copyright (c) 2016 Citrix Systems Inc. >> + * Copyright (c) 2019 Arm ltd. >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms and conditions of the GNU General Public License, >> + * version 2, as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope it will be useful, but WITHOUT >> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License along with >> + * this program; If not, see <http://www.gnu.org/licenses/>. >> + */ >> + >> +#ifndef __ASM_ARM_HVM_IOREQ_H__ >> +#define __ASM_ARM_HVM_IOREQ_H__ >> + >> +#include <public/hvm/ioreq.h> >> +#include <public/hvm/dm_op.h> >> + >> +#define has_vpci(d) (false) > > It feels to me this wants to be defined in vcpi.h. On x86 it wants to live together with a bunch of other has_v*() macros. Jan
On 05.08.20 12:32, Julien Grall wrote: Hi Julien. > Hi Stefano, > > On 05/08/2020 00:22, Stefano Stabellini wrote: >> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>> >>> This patch makes possible to forward Guest MMIO accesses >>> to a device emulator on Arm and enables that support for >>> Arm64. >>> >>> Also update XSM code a bit to let DM op be used on Arm. >>> New arch DM op will be introduced in the follow-up patch. >>> >>> Please note, at the moment build on Arm32 is broken >>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone >> >> Speaking of buffered_ioreq, if I recall correctly, they were only used >> for VGA-related things on x86. It looks like it is still true. >> >> If so, do we need it on ARM? Note that I don't think we can get rid of >> it from the interface as it is baked into ioreq, but it might be >> possible to have a dummy implementation on ARM. Or maybe not: looking at >> xen/common/hvm/ioreq.c it looks like it would be difficult to >> disentangle bufioreq stuff from the rest of the code. > > We possibly don't need it right now. However, this could possibly be > used in the future (e.g. virtio notification doorbell). > >>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>> */ >>> void leave_hypervisor_to_guest(void) >>> { >>> +#ifdef CONFIG_IOREQ_SERVER >>> + /* >>> + * XXX: Check the return. Shall we call that in >>> + * continue_running and context_switch instead? >>> + * The benefits would be to avoid calling >>> + * handle_hvm_io_completion on every return. >>> + */ >> >> Yeah, that could be a simple and good optimization > > Well, it is not simple as it is sounds :). handle_hvm_io_completion() > is the function in charge to mark the vCPU as waiting for I/O. So we > would at least need to split the function. > > I wrote this TODO because I wasn't sure about the complexity of > handle_hvm_io_completion(current). Looking at it again, the main > complexity is the looping over the IOREQ servers. > > I think it would be better to optimize handle_hvm_io_completion() > rather than trying to hack the context_switch() or continue_running(). > > [...] > >>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>> index 5fdb6e8..5823f11 100644 >>> --- a/xen/include/asm-arm/p2m.h >>> +++ b/xen/include/asm-arm/p2m.h >>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct >>> domain *d, unsigned long gfn, >>> mfn_t mfn) >>> { >>> /* >>> - * NOTE: If this is implemented then proper reference counting of >>> - * foreign entries will need to be implemented. >>> + * XXX: handle properly reference. It looks like the page may >>> not always >>> + * belong to d. >> >> Just as a reference, and without taking away anything from the comment, >> I think that QEMU is doing its own internal reference counting for these >> mappings. > > I am not sure how this matters here? We can't really trust the DM to > do the right thing if it is not running in dom0. > > But, IIRC, the problem is some of the pages doesn't belong to do a > domain, so it is not possible to treat them as foreign mapping (e.g. > you wouldn't be able to grab a reference). This investigation was done > a couple of years ago, so this may have changed in recent Xen. > > As a side note, I am a bit surprised to see most of my original TODOs > present in the code. What is the plan to solve them? The plan is to solve most critical TODOs in current series, and rest in follow-up series if no objections of course. Any pointers how to solve them properly would be much appreciated. Unfortunately, now I have a weak understanding how they should be fixed. I see at least 3 major TODO here: 1. handle properly reference in set_foreign_p2m_entry() 2. optimize handle_hvm_io_completion() 3. hande properly IO_RETRY in try_fwd_ioserv()
On 03.08.2020 20:21, Oleksandr Tyshchenko wrote: > --- a/xen/include/xsm/dummy.h > +++ b/xen/include/xsm/dummy.h > @@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int > } > } > > +#endif /* CONFIG_X86 */ > + > static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d) > { > XSM_ASSERT_ACTION(XSM_DM_PRIV); > return xsm_default_action(action, current->domain, d); > } > > -#endif /* CONFIG_X86 */ > - > #ifdef CONFIG_ARGO > static XSM_INLINE int xsm_argo_enable(const struct domain *d) > { > diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h > index a80bcf3..2a9b39d 100644 > --- a/xen/include/xsm/xsm.h > +++ b/xen/include/xsm/xsm.h > @@ -177,8 +177,8 @@ struct xsm_operations { > int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); > int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); > int (*pmu_op) (struct domain *d, unsigned int op); > - int (*dm_op) (struct domain *d); > #endif > + int (*dm_op) (struct domain *d); > int (*xen_version) (uint32_t cmd); > int (*domain_resource_map) (struct domain *d); > #ifdef CONFIG_ARGO > @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int > return xsm_ops->pmu_op(d, op); > } > > +#endif /* CONFIG_X86 */ > + > static inline int xsm_dm_op(xsm_default_t def, struct domain *d) > { > return xsm_ops->dm_op(d); > } > > -#endif /* CONFIG_X86 */ > - > static inline int xsm_xen_version (xsm_default_t def, uint32_t op) > { > return xsm_ops->xen_version(op); > diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c > index d4cce68..e3afd06 100644 > --- a/xen/xsm/dummy.c > +++ b/xen/xsm/dummy.c > @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) > set_to_dummy_if_null(ops, ioport_permission); > set_to_dummy_if_null(ops, ioport_mapping); > set_to_dummy_if_null(ops, pmu_op); > - set_to_dummy_if_null(ops, dm_op); > #endif > + set_to_dummy_if_null(ops, dm_op); > set_to_dummy_if_null(ops, xen_version); > set_to_dummy_if_null(ops, domain_resource_map); > #ifdef CONFIG_ARGO > diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c > index a314bf8..645192a 100644 > --- a/xen/xsm/flask/hooks.c > +++ b/xen/xsm/flask/hooks.c > @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op) > return -EPERM; > } > } > +#endif /* CONFIG_X86 */ > > static int flask_dm_op(struct domain *d) > { > return current_has_perm(d, SECCLASS_HVM, HVM__DM); > } > > -#endif /* CONFIG_X86 */ > - > static int flask_xen_version (uint32_t op) > { > u32 dsid = domain_sid(current->domain); > @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = { > .ioport_permission = flask_ioport_permission, > .ioport_mapping = flask_ioport_mapping, > .pmu_op = flask_pmu_op, > - .dm_op = flask_dm_op, > #endif > + .dm_op = flask_dm_op, > .xen_version = flask_xen_version, > .domain_resource_map = flask_domain_resource_map, > #ifdef CONFIG_ARGO All of this looks to belong into patch 2? Jan
On Wed, 5 Aug 2020, Jan Beulich wrote: > On 05.08.2020 01:22, Stefano Stabellini wrote: > > On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: > >> --- a/xen/include/asm-arm/p2m.h > >> +++ b/xen/include/asm-arm/p2m.h > >> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, > >> mfn_t mfn) > >> { > >> /* > >> - * NOTE: If this is implemented then proper reference counting of > >> - * foreign entries will need to be implemented. > >> + * XXX: handle properly reference. It looks like the page may not always > >> + * belong to d. > > > > Just as a reference, and without taking away anything from the comment, > > I think that QEMU is doing its own internal reference counting for these > > mappings. > > Which of course in no way replaces the need to do proper ref counting > in Xen. (Just FAOD, as I'm not sure why you've said what you've said.) Given the state of the series, which is a RFC, I only meant to say that the lack of refcounting shouldn't prevent things from working when using QEMU. In the sense that if somebody wants to give it a try for an early demo, they should be able to see it running without crashes. Of course, refcounting needs to be implemented.
On 05.08.20 17:12, Julien Grall wrote: > Hi, Hi Julien > > On 03/08/2020 19:21, Oleksandr Tyshchenko wrote: >> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >> >> This patch makes possible to forward Guest MMIO accesses >> to a device emulator on Arm and enables that support for >> Arm64. >> >> Also update XSM code a bit to let DM op be used on Arm. >> New arch DM op will be introduced in the follow-up patch. >> >> Please note, at the moment build on Arm32 is broken >> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone >> wants to enable CONFIG_IOREQ_SERVER due to the lack of >> cmpxchg_64 support on Arm32. >> >> Please note, this is a split/cleanup of Julien's PoC: >> "Add support for Guest IO forwarding to a device emulator" >> >> Signed-off-by: Julien Grall <julien.grall@arm.com> >> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >> --- >> tools/libxc/xc_dom_arm.c | 25 +++++++--- >> xen/arch/arm/Kconfig | 1 + >> xen/arch/arm/Makefile | 2 + >> xen/arch/arm/dm.c | 34 +++++++++++++ >> xen/arch/arm/domain.c | 9 ++++ >> xen/arch/arm/hvm.c | 46 +++++++++++++++++- >> xen/arch/arm/io.c | 67 +++++++++++++++++++++++++- >> xen/arch/arm/ioreq.c | 86 >> +++++++++++++++++++++++++++++++++ >> xen/arch/arm/traps.c | 17 +++++++ >> xen/common/memory.c | 5 +- >> xen/include/asm-arm/domain.h | 80 +++++++++++++++++++++++++++++++ >> xen/include/asm-arm/hvm/ioreq.h | 103 >> ++++++++++++++++++++++++++++++++++++++++ >> xen/include/asm-arm/mmio.h | 1 + >> xen/include/asm-arm/p2m.h | 7 +-- >> xen/include/xsm/dummy.h | 4 +- >> xen/include/xsm/xsm.h | 6 +-- >> xen/xsm/dummy.c | 2 +- >> xen/xsm/flask/hooks.c | 5 +- >> 18 files changed, 476 insertions(+), 24 deletions(-) >> create mode 100644 xen/arch/arm/dm.c >> create mode 100644 xen/arch/arm/ioreq.c >> create mode 100644 xen/include/asm-arm/hvm/ioreq.h > > It feels to me the patch is doing quite a few things that are > indirectly related. Can this be split to make the review easier? > > I would like at least the following split from the series: > - The tools changes > - The P2M changes > - The HVMOP plumbing (if we still require them) Sure, will split. However, I don't quite understand where we should leave HVMOP plumbing. If I understand correctly the suggestion was to switch to acquire interface instead (which requires a Linux version to be v4.17 at least)? I suspect, this is all about "xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE" support for Linux? Sorry, if asking a lot of questions, my developing environment is based on Vendor's BSP which uses v4.14 currently. > > [...] > >> diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c >> new file mode 100644 >> index 0000000..2437099 >> --- /dev/null >> +++ b/xen/arch/arm/dm.c >> @@ -0,0 +1,34 @@ >> +/* >> + * Copyright (c) 2019 Arm ltd. >> + * >> + * This program is free software; you can redistribute it and/or >> modify it >> + * under the terms and conditions of the GNU General Public License, >> + * version 2, as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope it will be useful, but >> WITHOUT >> + * ANY WARRANTY; without even the implied warranty of >> MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public >> License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License >> along with >> + * this program; If not, see <http://www.gnu.org/licenses/>. >> + */ >> + >> +#include <xen/hypercall.h> >> +#include <asm/vgic.h> > > The list of includes sounds strange. Can we make sure to include only > necessary headers and add the others when they are required? Sure, I moved arch_dm_op internals to the next patch in this series, but forgot to move corresponding headers as well. > > >> + >> +int arch_dm_op(struct xen_dm_op *op, struct domain *d, >> + const struct dmop_args *op_args, bool *const_op) >> +{ >> + return -EOPNOTSUPP; >> +} >> + >> +/* >> + * Local variables: >> + * mode: C >> + * c-file-style: "BSD" >> + * c-basic-offset: 4 >> + * tab-width: 4 >> + * indent-tabs-mode: nil >> + * End: >> + */ > >> long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) >> { >> long rc = 0; >> @@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, >> XEN_GUEST_HANDLE_PARAM(void) arg) >> if ( rc ) >> goto param_fail; >> - d->arch.hvm.params[a.index] = a.value; >> + rc = hvmop_set_param(d, &a); >> } >> else >> { >> diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c >> index ae7ef96..436f669 100644 >> --- a/xen/arch/arm/io.c >> +++ b/xen/arch/arm/io.c >> @@ -16,6 +16,7 @@ >> * GNU General Public License for more details. >> */ >> +#include <xen/hvm/ioreq.h> >> #include <xen/lib.h> >> #include <xen/spinlock.h> >> #include <xen/sched.h> >> @@ -107,6 +108,62 @@ static const struct mmio_handler >> *find_mmio_handler(struct domain *d, >> return handler; >> } >> +#ifdef CONFIG_IOREQ_SERVER > > Can we just implement this function in ioreq.c and provide a stub when > !CONFIG_IOREQ_SERVER? Sure > > >> +static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, >> + struct vcpu *v, mmio_info_t *info) >> +{ >> + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; >> + ioreq_t p = { >> + .type = IOREQ_TYPE_COPY, >> + .addr = info->gpa, >> + .size = 1 << info->dabt.size, >> + .count = 0, >> + .dir = !info->dabt.write, >> + .df = 0, /* XXX: What's for? */ >> + .data = get_user_reg(regs, info->dabt.reg), >> + .state = STATE_IOREQ_READY, >> + }; >> + struct hvm_ioreq_server *s = NULL; >> + enum io_state rc; >> + >> + switch ( vio->io_req.state ) >> + { >> + case STATE_IOREQ_NONE: >> + break; >> + default: >> + printk("d%u wrong state %u\n", v->domain->domain_id, >> + vio->io_req.state); > > This will likely want to be a gprintk() or gdprintk() to avoid a guest > spamming Xen. ok > >> + return IO_ABORT; >> + } >> + >> + s = hvm_select_ioreq_server(v->domain, &p); >> + if ( !s ) >> + return IO_UNHANDLED; >> + >> + if ( !info->dabt.valid ) >> + { >> + printk("Valid bit not set\n"); > > Same here. However, I am not convinced this is a useful message to keep. ok > >> + return IO_ABORT; >> + } >> + >> + vio->io_req = p; >> + >> + rc = hvm_send_ioreq(s, &p, 0); >> + if ( rc != IO_RETRY || v->domain->is_shutting_down ) >> + vio->io_req.state = STATE_IOREQ_NONE; >> + else if ( !hvm_ioreq_needs_completion(&vio->io_req) ) >> + rc = IO_HANDLED; >> + else >> + vio->io_completion = HVMIO_mmio_completion; >> + >> + /* XXX: Decide what to do */ > > We want to understand how IO_RETRY can happen on x86 first. With that, > we should be able to understand whether this can happen on Arm as well. Noted > > >> + if ( rc == IO_RETRY ) >> + rc = IO_HANDLED; >> + >> + return rc; >> +} >> +#endif >> + >> enum io_state try_handle_mmio(struct cpu_user_regs *regs, >> const union hsr hsr, >> paddr_t gpa) >> @@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct >> cpu_user_regs *regs, >> handler = find_mmio_handler(v->domain, info.gpa); >> if ( !handler ) >> - return IO_UNHANDLED; >> + { >> + int rc = IO_UNHANDLED; >> + >> +#ifdef CONFIG_IOREQ_SERVER >> + rc = try_fwd_ioserv(regs, v, &info); >> +#endif >> + >> + return rc; >> + } >> /* All the instructions used on emulated MMIO region should >> be valid */ >> if ( !dabt.valid ) >> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c >> new file mode 100644 >> index 0000000..a9cc839 >> --- /dev/null >> +++ b/xen/arch/arm/ioreq.c >> @@ -0,0 +1,86 @@ >> +/* >> + * arm/ioreq.c: hardware virtual machine I/O emulation >> + * >> + * Copyright (c) 2019 Arm ltd. >> + * >> + * This program is free software; you can redistribute it and/or >> modify it >> + * under the terms and conditions of the GNU General Public License, >> + * version 2, as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope it will be useful, but >> WITHOUT >> + * ANY WARRANTY; without even the implied warranty of >> MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public >> License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License >> along with >> + * this program; If not, see <http://www.gnu.org/licenses/>. >> + */ >> + >> +#include <xen/ctype.h> >> +#include <xen/hvm/ioreq.h> >> +#include <xen/init.h> >> +#include <xen/lib.h> >> +#include <xen/trace.h> >> +#include <xen/sched.h> >> +#include <xen/irq.h> >> +#include <xen/softirq.h> >> +#include <xen/domain.h> >> +#include <xen/domain_page.h> >> +#include <xen/event.h> >> +#include <xen/paging.h> >> +#include <xen/vpci.h> >> + >> +#include <public/hvm/dm_op.h> >> +#include <public/hvm/ioreq.h> >> + >> +bool handle_mmio(void) > > The name of the function is pretty generic and can be confusing on Arm > (we already have a try_handle_mmio()). > > What is this function supposed to do? Agree, sounds confusing a bit. I assume it is supposed to complete Guest MMIO access after finishing emulation. Shall I rename it to something appropriate (maybe by adding ioreq prefix)? > > >> +{ >> + struct vcpu *v = current; >> + struct cpu_user_regs *regs = guest_cpu_user_regs(); >> + const union hsr hsr = { .bits = regs->hsr }; >> + const struct hsr_dabt dabt = hsr.dabt; >> + /* Code is similar to handle_read */ >> + uint8_t size = (1 << dabt.size) * 8; >> + register_t r = v->arch.hvm.hvm_io.io_req.data; >> + >> + /* We should only be here on Guest Data Abort */ >> + ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL); >> + >> + /* We are done with the IO */ >> + /* XXX: Is it the right place? */ >> + v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE; >> + >> + /* XXX: Do we need to take care of write here ? */ >> + if ( dabt.write ) >> + return true; >> + >> + /* >> + * Sign extend if required. >> + * Note that we expect the read handler to have zeroed the bits >> + * outside the requested access size. >> + */ >> + if ( dabt.sign && (r & (1UL << (size - 1))) ) >> + { >> + /* >> + * We are relying on register_t using the same as >> + * an unsigned long in order to keep the 32-bit assembly >> + * code smaller. >> + */ >> + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); >> + r |= (~0UL) << size; >> + } >> + >> + set_user_reg(regs, dabt.reg, r); >> + >> + return true; >> +} > > [...] > >> diff --git a/xen/common/memory.c b/xen/common/memory.c >> index 9283e5e..0000477 100644 >> --- a/xen/common/memory.c >> +++ b/xen/common/memory.c >> @@ -8,6 +8,7 @@ >> */ >> #include <xen/domain_page.h> >> +#include <xen/hvm/ioreq.h> >> #include <xen/types.h> >> #include <xen/lib.h> >> #include <xen/mm.h> >> @@ -30,10 +31,6 @@ >> #include <public/memory.h> >> #include <xsm/xsm.h> >> -#ifdef CONFIG_IOREQ_SERVER >> -#include <xen/hvm/ioreq.h> >> -#endif >> - > > Why do you remove something your just introduced? The reason I guarded that header is to make "xen/mm: Make x86's XENMEM_resource_ioreq_server handling common" (previous) patch buildable on Arm without arch IOREQ header added yet. I tried to make sure that the result after each patch was buildable to retain bisectability. As current patch adds Arm IOREQ specific bits (including header), that guard could be removed as not needed anymore. > >> #ifdef CONFIG_X86 >> #include <asm/guest.h> >> #endif >> diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h >> index 4e2f582..e060b0a 100644 >> --- a/xen/include/asm-arm/domain.h >> +++ b/xen/include/asm-arm/domain.h >> @@ -11,12 +11,64 @@ >> #include <asm/vgic.h> >> #include <asm/vpl011.h> >> #include <public/hvm/params.h> >> +#include <public/hvm/dm_op.h> >> +#include <public/hvm/ioreq.h> >> #include <xen/serial.h> >> #include <xen/rbtree.h> >> +struct hvm_ioreq_page { >> + gfn_t gfn; >> + struct page_info *page; >> + void *va; >> +}; > > AFAICT all the structures/define you introduced here are used by the > code common. So it feels to me they should be defined in a common header. Make sense. Probably worth moving. I assume this also applies to x86 ones. > > >> + >> +struct hvm_ioreq_vcpu { >> + struct list_head list_entry; >> + struct vcpu *vcpu; >> + evtchn_port_t ioreq_evtchn; >> + bool pending; >> +}; >> + >> +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) >> +#define MAX_NR_IO_RANGES 256 >> + >> +#define MAX_NR_IOREQ_SERVERS 8 >> +#define DEFAULT_IOSERVID 0 >> + >> +struct hvm_ioreq_server { >> + struct domain *target, *emulator; >> + >> + /* Lock to serialize toolstack modifications */ >> + spinlock_t lock; >> + >> + struct hvm_ioreq_page ioreq; >> + struct list_head ioreq_vcpu_list; >> + struct hvm_ioreq_page bufioreq; >> + >> + /* Lock to serialize access to buffered ioreq ring */ >> + spinlock_t bufioreq_lock; >> + evtchn_port_t bufioreq_evtchn; >> + struct rangeset *range[NR_IO_RANGE_TYPES]; >> + bool enabled; >> + uint8_t bufioreq_handling; >> +}; >> + >> struct hvm_domain >> { >> uint64_t params[HVM_NR_PARAMS]; >> + >> + /* Guest page range used for non-default ioreq servers */ >> + struct { >> + unsigned long base; >> + unsigned long mask; >> + unsigned long legacy_mask; /* indexed by HVM param number */ >> + } ioreq_gfn; >> + >> + /* Lock protects all other values in the sub-struct and the >> default */ >> + struct { >> + spinlock_t lock; >> + struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; >> + } ioreq_server; >> }; >> #ifdef CONFIG_ARM_64 >> @@ -93,6 +145,29 @@ struct arch_domain >> #endif >> } __cacheline_aligned; >> +enum hvm_io_completion { >> + HVMIO_no_completion, >> + HVMIO_mmio_completion, >> + HVMIO_pio_completion, >> + HVMIO_realmode_completion >> +}; >> + >> +struct hvm_vcpu_io { >> + /* I/O request in flight to device model. */ >> + enum hvm_io_completion io_completion; >> + ioreq_t io_req; >> + >> + /* >> + * HVM emulation: >> + * Linear address @mmio_gla maps to MMIO physical frame >> @mmio_gpfn. >> + * The latter is known to be an MMIO frame (not RAM). >> + * This translation is only valid for accesses as per >> @mmio_access. >> + */ >> + struct npfec mmio_access; >> + unsigned long mmio_gla; >> + unsigned long mmio_gpfn; >> +}; >> + >> struct arch_vcpu >> { >> struct { >> @@ -206,6 +281,11 @@ struct arch_vcpu >> */ >> bool need_flush_to_ram; >> + struct hvm_vcpu >> + { >> + struct hvm_vcpu_io hvm_io; >> + } hvm; >> + >> } __cacheline_aligned; >> void vcpu_show_execution_state(struct vcpu *); >> diff --git a/xen/include/asm-arm/hvm/ioreq.h >> b/xen/include/asm-arm/hvm/ioreq.h >> new file mode 100644 >> index 0000000..83a560c >> --- /dev/null >> +++ b/xen/include/asm-arm/hvm/ioreq.h >> @@ -0,0 +1,103 @@ >> +/* >> + * hvm.h: Hardware virtual machine assist interface definitions. >> + * >> + * Copyright (c) 2016 Citrix Systems Inc. >> + * Copyright (c) 2019 Arm ltd. >> + * >> + * This program is free software; you can redistribute it and/or >> modify it >> + * under the terms and conditions of the GNU General Public License, >> + * version 2, as published by the Free Software Foundation. >> + * >> + * This program is distributed in the hope it will be useful, but >> WITHOUT >> + * ANY WARRANTY; without even the implied warranty of >> MERCHANTABILITY or >> + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public >> License for >> + * more details. >> + * >> + * You should have received a copy of the GNU General Public License >> along with >> + * this program; If not, see <http://www.gnu.org/licenses/>. >> + */ >> + >> +#ifndef __ASM_ARM_HVM_IOREQ_H__ >> +#define __ASM_ARM_HVM_IOREQ_H__ >> + >> +#include <public/hvm/ioreq.h> >> +#include <public/hvm/dm_op.h> >> + >> +#define has_vpci(d) (false) > > It feels to me this wants to be defined in vcpi.h. ok, will move. > > >> + >> +bool handle_mmio(void); >> + >> +static inline bool handle_pio(uint16_t port, unsigned int size, int >> dir) >> +{ >> + /* XXX */ > > Can you expand this TODO? What do you expect to do? I didn't expect this to be called on Arm. Sorry, I am not sure l have an idea how to handle this properly. I would keep it unimplemented until a real reason. Will expand TODO. > > >> + BUG(); >> + return true; >> +} >> + >> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) >> +{ >> + return p->addr; >> +} > > I understand that the x86 version is more complex as it check p->df. > However, aside reducing the complexity, I am not sure why we would > want to diverge it. > > After all, IOREQ is now meant to be a common feature. Well, no objections at all. Could you please clarify how could 'df' (Direction Flag?) be handled/used on Arm? I see that try_fwd_ioserv() always sets it 0. Or I need to just reuse x86's helpers as is, which (together with count = df = 0) will result in what we actually have here? > >> + >> +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) >> +{ >> + unsigned long size = p->size; >> + >> + return p->addr + size - 1; >> +} > > Same. + > >> + >> +struct hvm_ioreq_server; > > Why do we need a forward declaration? I don't remember exactly, probably this way I temporally solved a build issue. Please let me recheck whether we could avoid using it. > > >> + >> +static inline int p2m_set_ioreq_server(struct domain *d, >> + unsigned int flags, >> + struct hvm_ioreq_server *s) >> +{ >> + return -EOPNOTSUPP; >> +} > > This should be defined in p2m.h. But I am not even sure what it is > meant for. Can you expand it? ok, will move. In this series I tried to make as much IOREQ code common as possible and avoid complicating things, in order to achieve that a few stubs were added here. Please note, that I also considered splitting into arch parts. But some functions couldn't be split easily. This one is called from common hvm_destroy_ioreq_server() with flag being 0 (which will result in unmapping ioreq server from p2m type on x86). I could add a comment describing why this stub is present here. > > >> + >> +static inline void msix_write_completion(struct vcpu *v) >> +{ >> +} >> + >> +static inline void handle_realmode_completion(void) >> +{ >> + ASSERT_UNREACHABLE(); >> +} > > realmode is very x86 specific. So I don't think this function should > be called from common code. It might be worth considering to split > handle_hvm_io_completion() is 2 parts: common and arch specific. I agree with you that realmode is x86 specific and looks not good in Arm header. I was thinking how to split handle_hvm_io_completion() gracefully but I failed find a good solution for that, so decided to add two stubs (msix_write_completion and handle_realmode_completion) on Arm. I could add a comment describing why they are here if appropriate. But if you think they shouldn't be called from the common code in any way, I will try to split it. > >> + >> +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn) >> +{ >> +} > > This will want to be stubbed in asm-arm/paging.h. ok > > >> + >> +static inline void hvm_get_ioreq_server_range_type(struct domain *d, >> + ioreq_t *p, >> + uint8_t *type, >> + uint64_t *addr) >> +{ >> + *type = (p->type == IOREQ_TYPE_PIO) ? >> + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; >> + *addr = p->addr; >> +} >> + >> +static inline void arch_hvm_ioreq_init(struct domain *d) >> +{ >> +} >> + >> +static inline void arch_hvm_ioreq_destroy(struct domain *d) >> +{ >> +} >> + >> +#define IOREQ_IO_HANDLED IO_HANDLED >> +#define IOREQ_IO_UNHANDLED IO_UNHANDLED >> +#define IOREQ_IO_RETRY IO_RETRY >> + >> +#endif /* __ASM_X86_HVM_IOREQ_H__ */ > > s/X86/ARM/ ok > >> + >> +/* >> + * Local variables: >> + * mode: C >> + * c-file-style: "BSD" >> + * c-basic-offset: 4 >> + * tab-width: 4 >> + * indent-tabs-mode: nil >> + * End: >> + */ >> diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h >> index 8dbfb27..7ab873c 100644 >> --- a/xen/include/asm-arm/mmio.h >> +++ b/xen/include/asm-arm/mmio.h >> @@ -37,6 +37,7 @@ enum io_state >> IO_ABORT, /* The IO was handled by the helper and led to >> an abort. */ >> IO_HANDLED, /* The IO was successfully handled by the >> helper. */ >> IO_UNHANDLED, /* The IO was not handled by the helper. */ >> + IO_RETRY, /* Retry the emulation for some reason */ >> }; >> typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info, >> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >> index 5fdb6e8..5823f11 100644 >> --- a/xen/include/asm-arm/p2m.h >> +++ b/xen/include/asm-arm/p2m.h >> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct >> domain *d, unsigned long gfn, >> mfn_t mfn) >> { >> /* >> - * NOTE: If this is implemented then proper reference counting of >> - * foreign entries will need to be implemented. >> + * XXX: handle properly reference. It looks like the page may >> not always >> + * belong to d. >> */ >> - return -EOPNOTSUPP; >> + >> + return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); > > Treating foreign as p2m_ram_rw is more an hack that a real fix. I have > answered to this separately (see my answer on Stefano's e-mail), so we > can continue the conversation there. ok
On 05.08.20 19:41, Stefano Stabellini wrote: Hi Stefano > On Wed, 5 Aug 2020, Jan Beulich wrote: >> On 05.08.2020 01:22, Stefano Stabellini wrote: >>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: >>>> --- a/xen/include/asm-arm/p2m.h >>>> +++ b/xen/include/asm-arm/p2m.h >>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >>>> mfn_t mfn) >>>> { >>>> /* >>>> - * NOTE: If this is implemented then proper reference counting of >>>> - * foreign entries will need to be implemented. >>>> + * XXX: handle properly reference. It looks like the page may not always >>>> + * belong to d. >>> Just as a reference, and without taking away anything from the comment, >>> I think that QEMU is doing its own internal reference counting for these >>> mappings. >> Which of course in no way replaces the need to do proper ref counting >> in Xen. (Just FAOD, as I'm not sure why you've said what you've said.) > Given the state of the series, which is a RFC, I only meant to say that > the lack of refcounting shouldn't prevent things from working when using > QEMU. In the sense that if somebody wants to give it a try for an early > demo, they should be able to see it running without crashes. Yes, for the early demo it works fine, however I don't use Qemu. > > Of course, refcounting needs to be implemented. +
On 05.08.20 19:13, Jan Beulich wrote: Hi, Jan > On 03.08.2020 20:21, Oleksandr Tyshchenko wrote: >> --- a/xen/include/xsm/dummy.h >> +++ b/xen/include/xsm/dummy.h >> @@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int >> } >> } >> >> +#endif /* CONFIG_X86 */ >> + >> static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d) >> { >> XSM_ASSERT_ACTION(XSM_DM_PRIV); >> return xsm_default_action(action, current->domain, d); >> } >> >> -#endif /* CONFIG_X86 */ >> - >> #ifdef CONFIG_ARGO >> static XSM_INLINE int xsm_argo_enable(const struct domain *d) >> { >> diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h >> index a80bcf3..2a9b39d 100644 >> --- a/xen/include/xsm/xsm.h >> +++ b/xen/include/xsm/xsm.h >> @@ -177,8 +177,8 @@ struct xsm_operations { >> int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); >> int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); >> int (*pmu_op) (struct domain *d, unsigned int op); >> - int (*dm_op) (struct domain *d); >> #endif >> + int (*dm_op) (struct domain *d); >> int (*xen_version) (uint32_t cmd); >> int (*domain_resource_map) (struct domain *d); >> #ifdef CONFIG_ARGO >> @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int >> return xsm_ops->pmu_op(d, op); >> } >> >> +#endif /* CONFIG_X86 */ >> + >> static inline int xsm_dm_op(xsm_default_t def, struct domain *d) >> { >> return xsm_ops->dm_op(d); >> } >> >> -#endif /* CONFIG_X86 */ >> - >> static inline int xsm_xen_version (xsm_default_t def, uint32_t op) >> { >> return xsm_ops->xen_version(op); >> diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c >> index d4cce68..e3afd06 100644 >> --- a/xen/xsm/dummy.c >> +++ b/xen/xsm/dummy.c >> @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) >> set_to_dummy_if_null(ops, ioport_permission); >> set_to_dummy_if_null(ops, ioport_mapping); >> set_to_dummy_if_null(ops, pmu_op); >> - set_to_dummy_if_null(ops, dm_op); >> #endif >> + set_to_dummy_if_null(ops, dm_op); >> set_to_dummy_if_null(ops, xen_version); >> set_to_dummy_if_null(ops, domain_resource_map); >> #ifdef CONFIG_ARGO >> diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c >> index a314bf8..645192a 100644 >> --- a/xen/xsm/flask/hooks.c >> +++ b/xen/xsm/flask/hooks.c >> @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op) >> return -EPERM; >> } >> } >> +#endif /* CONFIG_X86 */ >> >> static int flask_dm_op(struct domain *d) >> { >> return current_has_perm(d, SECCLASS_HVM, HVM__DM); >> } >> >> -#endif /* CONFIG_X86 */ >> - >> static int flask_xen_version (uint32_t op) >> { >> u32 dsid = domain_sid(current->domain); >> @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = { >> .ioport_permission = flask_ioport_permission, >> .ioport_mapping = flask_ioport_mapping, >> .pmu_op = flask_pmu_op, >> - .dm_op = flask_dm_op, >> #endif >> + .dm_op = flask_dm_op, >> .xen_version = flask_xen_version, >> .domain_resource_map = flask_domain_resource_map, >> #ifdef CONFIG_ARGO > All of this looks to belong into patch 2? Good point. Will move.
Hi, On 05/08/2020 16:41, Oleksandr wrote: > > On 05.08.20 12:32, Julien Grall wrote: > > Hi Julien. > >> Hi Stefano, >> >> On 05/08/2020 00:22, Stefano Stabellini wrote: >>> On Mon, 3 Aug 2020, Oleksandr Tyshchenko wrote: >>>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>>> >>>> This patch makes possible to forward Guest MMIO accesses >>>> to a device emulator on Arm and enables that support for >>>> Arm64. >>>> >>>> Also update XSM code a bit to let DM op be used on Arm. >>>> New arch DM op will be introduced in the follow-up patch. >>>> >>>> Please note, at the moment build on Arm32 is broken >>>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone >>> >>> Speaking of buffered_ioreq, if I recall correctly, they were only used >>> for VGA-related things on x86. It looks like it is still true. >>> >>> If so, do we need it on ARM? Note that I don't think we can get rid of >>> it from the interface as it is baked into ioreq, but it might be >>> possible to have a dummy implementation on ARM. Or maybe not: looking at >>> xen/common/hvm/ioreq.c it looks like it would be difficult to >>> disentangle bufioreq stuff from the rest of the code. >> >> We possibly don't need it right now. However, this could possibly be >> used in the future (e.g. virtio notification doorbell). >> >>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>> */ >>>> void leave_hypervisor_to_guest(void) >>>> { >>>> +#ifdef CONFIG_IOREQ_SERVER >>>> + /* >>>> + * XXX: Check the return. Shall we call that in >>>> + * continue_running and context_switch instead? >>>> + * The benefits would be to avoid calling >>>> + * handle_hvm_io_completion on every return. >>>> + */ >>> >>> Yeah, that could be a simple and good optimization >> >> Well, it is not simple as it is sounds :). handle_hvm_io_completion() >> is the function in charge to mark the vCPU as waiting for I/O. So we >> would at least need to split the function. >> >> I wrote this TODO because I wasn't sure about the complexity of >> handle_hvm_io_completion(current). Looking at it again, the main >> complexity is the looping over the IOREQ servers. >> >> I think it would be better to optimize handle_hvm_io_completion() >> rather than trying to hack the context_switch() or continue_running(). >> >> [...] >> >>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>>> index 5fdb6e8..5823f11 100644 >>>> --- a/xen/include/asm-arm/p2m.h >>>> +++ b/xen/include/asm-arm/p2m.h >>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct >>>> domain *d, unsigned long gfn, >>>> mfn_t mfn) >>>> { >>>> /* >>>> - * NOTE: If this is implemented then proper reference counting of >>>> - * foreign entries will need to be implemented. >>>> + * XXX: handle properly reference. It looks like the page may >>>> not always >>>> + * belong to d. >>> >>> Just as a reference, and without taking away anything from the comment, >>> I think that QEMU is doing its own internal reference counting for these >>> mappings. >> >> I am not sure how this matters here? We can't really trust the DM to >> do the right thing if it is not running in dom0. >> >> But, IIRC, the problem is some of the pages doesn't belong to do a >> domain, so it is not possible to treat them as foreign mapping (e.g. >> you wouldn't be able to grab a reference). This investigation was done >> a couple of years ago, so this may have changed in recent Xen. >> >> As a side note, I am a bit surprised to see most of my original TODOs >> present in the code. What is the plan to solve them? > The plan is to solve most critical TODOs in current series, and rest in > follow-up series if no objections of course. Any pointers how to solve > them properly would be much appreciated. Unfortunately, now I have a > weak understanding how they should be fixed. AFAICT, there is already some discussion about those 3 major TODOs happening. I would suggest to go through the discussions. We can clarify anything if needed. Cheers,
On 05/08/2020 20:30, Oleksandr wrote: > > On 05.08.20 17:12, Julien Grall wrote: >> Hi, > > Hi Julien > > >> >> On 03/08/2020 19:21, Oleksandr Tyshchenko wrote: >>> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>> >>> This patch makes possible to forward Guest MMIO accesses >>> to a device emulator on Arm and enables that support for >>> Arm64. >>> >>> Also update XSM code a bit to let DM op be used on Arm. >>> New arch DM op will be introduced in the follow-up patch. >>> >>> Please note, at the moment build on Arm32 is broken >>> (see cmpxchg usage in hvm_send_buffered_ioreq()) if someone >>> wants to enable CONFIG_IOREQ_SERVER due to the lack of >>> cmpxchg_64 support on Arm32. >>> >>> Please note, this is a split/cleanup of Julien's PoC: >>> "Add support for Guest IO forwarding to a device emulator" >>> >>> Signed-off-by: Julien Grall <julien.grall@arm.com> >>> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> >>> --- >>> tools/libxc/xc_dom_arm.c | 25 +++++++--- >>> xen/arch/arm/Kconfig | 1 + >>> xen/arch/arm/Makefile | 2 + >>> xen/arch/arm/dm.c | 34 +++++++++++++ >>> xen/arch/arm/domain.c | 9 ++++ >>> xen/arch/arm/hvm.c | 46 +++++++++++++++++- >>> xen/arch/arm/io.c | 67 +++++++++++++++++++++++++- >>> xen/arch/arm/ioreq.c | 86 >>> +++++++++++++++++++++++++++++++++ >>> xen/arch/arm/traps.c | 17 +++++++ >>> xen/common/memory.c | 5 +- >>> xen/include/asm-arm/domain.h | 80 +++++++++++++++++++++++++++++++ >>> xen/include/asm-arm/hvm/ioreq.h | 103 >>> ++++++++++++++++++++++++++++++++++++++++ >>> xen/include/asm-arm/mmio.h | 1 + >>> xen/include/asm-arm/p2m.h | 7 +-- >>> xen/include/xsm/dummy.h | 4 +- >>> xen/include/xsm/xsm.h | 6 +-- >>> xen/xsm/dummy.c | 2 +- >>> xen/xsm/flask/hooks.c | 5 +- >>> 18 files changed, 476 insertions(+), 24 deletions(-) >>> create mode 100644 xen/arch/arm/dm.c >>> create mode 100644 xen/arch/arm/ioreq.c >>> create mode 100644 xen/include/asm-arm/hvm/ioreq.h >> >> It feels to me the patch is doing quite a few things that are >> indirectly related. Can this be split to make the review easier? >> >> I would like at least the following split from the series: >> - The tools changes >> - The P2M changes >> - The HVMOP plumbing (if we still require them) > Sure, will split. > However, I don't quite understand where we should leave HVMOP plumbing. I think they will need to be droppped if we decide to use the acquire interface. > If I understand correctly the suggestion was to switch to acquire > interface instead (which requires a Linux version to be v4.17 at least)? This was the suggestion. > I suspect, this is all about "xen/privcmd: add > IOCTL_PRIVCMD_MMAP_RESOURCE" support for Linux? Correct. >> What is this function supposed to do? > Agree, sounds confusing a bit. I assume it is supposed to complete Guest > MMIO access after finishing emulation. > > Shall I rename it to something appropriate (maybe by adding ioreq prefix)? How about ioreq_handle_complete_mmio()? >>> diff --git a/xen/common/memory.c b/xen/common/memory.c >>> index 9283e5e..0000477 100644 >>> --- a/xen/common/memory.c >>> +++ b/xen/common/memory.c >>> @@ -8,6 +8,7 @@ >>> */ >>> #include <xen/domain_page.h> >>> +#include <xen/hvm/ioreq.h> >>> #include <xen/types.h> >>> #include <xen/lib.h> >>> #include <xen/mm.h> >>> @@ -30,10 +31,6 @@ >>> #include <public/memory.h> >>> #include <xsm/xsm.h> >>> -#ifdef CONFIG_IOREQ_SERVER >>> -#include <xen/hvm/ioreq.h> >>> -#endif >>> - >> >> Why do you remove something your just introduced? > The reason I guarded that header is to make "xen/mm: Make x86's > XENMEM_resource_ioreq_server handling common" (previous) patch buildable > on Arm > without arch IOREQ header added yet. I tried to make sure that the > result after each patch was buildable to retain bisectability. > As current patch adds Arm IOREQ specific bits (including header), that > guard could be removed as not needed anymore. I agree we want to have the build bisectable. However, I am still puzzled why it is necessary to remove the #ifdef and move it earlier in the list. Do you mind to provide more details? [...] >>> + >>> +bool handle_mmio(void); >>> + >>> +static inline bool handle_pio(uint16_t port, unsigned int size, int >>> dir) >>> +{ >>> + /* XXX */ >> >> Can you expand this TODO? What do you expect to do? > I didn't expect this to be called on Arm. Sorry, I am not sure l have an > idea how to handle this properly. I would keep it unimplemented until a > real reason. > Will expand TODO. Let see how the conversation on patch#1 goes about PIO vs MMIO. >> >> >>> + BUG(); >>> + return true; >>> +} >>> + >>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) >>> +{ >>> + return p->addr; >>> +} >> >> I understand that the x86 version is more complex as it check p->df. >> However, aside reducing the complexity, I am not sure why we would >> want to diverge it. >> >> After all, IOREQ is now meant to be a common feature. > Well, no objections at all. > Could you please clarify how could 'df' (Direction Flag?) be > handled/used on Arm? On x86, this is used by 'rep' instruction to tell the direction to iterate (forward or backward). On Arm, all the accesses to MMIO region will do a single memory access. So for now, we can safely always set to 0. > I see that try_fwd_ioserv() always sets it 0. Or I > need to just reuse x86's helpers as is, > which (together with count = df = 0) will result in what we actually > have here? AFAIU, both count and df should be 0 on Arm. >> >> >>> + >>> +static inline int p2m_set_ioreq_server(struct domain *d, >>> + unsigned int flags, >>> + struct hvm_ioreq_server *s) >>> +{ >>> + return -EOPNOTSUPP; >>> +} >> >> This should be defined in p2m.h. But I am not even sure what it is >> meant for. Can you expand it? > > ok, will move. > > > In this series I tried to make as much IOREQ code common as possible and > avoid complicating things, in order to achieve that a few stubs were > added here. Please note, > that I also considered splitting into arch parts. But some functions > couldn't be split easily. > This one is called from common hvm_destroy_ioreq_server() with flag > being 0 (which will result in unmapping ioreq server from p2m type on x86). > I could add a comment describing why this stub is present here. Sorry if I wasn't clear. I wasn't asking why the stub is there but what should be the expected implementation of the function. In particular, you are returning -EOPNOTSUPP. The only reason you are getting away from trouble is because the caller doesn't check the return. Would it make sense to have a stub arch_hvm_destroy_ioreq_server()? > > >> >> >>> + >>> +static inline void msix_write_completion(struct vcpu *v) >>> +{ >>> +} >>> + >>> +static inline void handle_realmode_completion(void) >>> +{ >>> + ASSERT_UNREACHABLE(); >>> +} >> >> realmode is very x86 specific. So I don't think this function should >> be called from common code. It might be worth considering to split >> handle_hvm_io_completion() is 2 parts: common and arch specific. > > I agree with you that realmode is x86 specific and looks not good in Arm > header. It is not a problem of looking good or not. Instead, it is about abstraction. A developper shouldn't need to understand all the other architectures we support in order to follow the common code. > I was thinking how to split handle_hvm_io_completion() > gracefully but I failed find a good solution for that, so decided to add > two stubs (msix_write_completion and handle_realmode_completion) on Arm. > I could add a comment describing why they are here if appropriate. But > if you think they shouldn't be called from the common code in any way, I > will try to split it. I am not entirely sure what msix_write_completion is meant to do on x86. Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help? Regarding handle_realmode_completion, I would add a new stub: arch_ioreq_handle_io_completion() that is called from the default case of the switch. On x86 it would be implemented as: switch (io_completion) { case HVMIO_realmode_completion: ... default: ASSERT_UNREACHABLE(); } On Arm, it would be implemented as: ASSERT_UNREACHABLE(); Cheers,
On 06.08.2020 13:08, Julien Grall wrote: > On 05/08/2020 20:30, Oleksandr wrote: >> I was thinking how to split handle_hvm_io_completion() >> gracefully but I failed find a good solution for that, so decided to add >> two stubs (msix_write_completion and handle_realmode_completion) on Arm. >> I could add a comment describing why they are here if appropriate. But >> if you think they shouldn't be called from the common code in any way, I >> will try to split it. > > I am not entirely sure what msix_write_completion is meant to do on x86. > Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help? Due to the split brain model of handling PCI pass-through (between Xen and qemu), a guest writing to an MSI-X entry needs this write handed to qemu, and upon completion of the write there Xen also needs to take some extra action. Jan
On 06.08.20 14:08, Julien Grall wrote: Hi Julien > >>> What is this function supposed to do? >> Agree, sounds confusing a bit. I assume it is supposed to complete >> Guest MMIO access after finishing emulation. >> >> Shall I rename it to something appropriate (maybe by adding ioreq >> prefix)? > > How about ioreq_handle_complete_mmio()? For me it sounds fine. > >>>> diff --git a/xen/common/memory.c b/xen/common/memory.c >>>> index 9283e5e..0000477 100644 >>>> --- a/xen/common/memory.c >>>> +++ b/xen/common/memory.c >>>> @@ -8,6 +8,7 @@ >>>> */ >>>> #include <xen/domain_page.h> >>>> +#include <xen/hvm/ioreq.h> >>>> #include <xen/types.h> >>>> #include <xen/lib.h> >>>> #include <xen/mm.h> >>>> @@ -30,10 +31,6 @@ >>>> #include <public/memory.h> >>>> #include <xsm/xsm.h> >>>> -#ifdef CONFIG_IOREQ_SERVER >>>> -#include <xen/hvm/ioreq.h> >>>> -#endif >>>> - >>> >>> Why do you remove something your just introduced? >> The reason I guarded that header is to make "xen/mm: Make x86's >> XENMEM_resource_ioreq_server handling common" (previous) patch >> buildable on Arm >> without arch IOREQ header added yet. I tried to make sure that the >> result after each patch was buildable to retain bisectability. >> As current patch adds Arm IOREQ specific bits (including header), >> that guard could be removed as not needed anymore. > I agree we want to have the build bisectable. However, I am still > puzzled why it is necessary to remove the #ifdef and move it earlier > in the list. > > Do you mind to provide more details? Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server handling common" breaks build on Arm as it includes xen/hvm/ioreq.h which requires arch header to be present (asm/hvm/ioreq.h). But the missing arch header together with other arch specific bits are introduced here in current patch. Probably I should have rearranged changes in a way to not introduce #ifdef and then remove it... > > [...] > >>>> + >>>> +bool handle_mmio(void); >>>> + >>>> +static inline bool handle_pio(uint16_t port, unsigned int size, >>>> int dir) >>>> +{ >>>> + /* XXX */ >>> >>> Can you expand this TODO? What do you expect to do? >> I didn't expect this to be called on Arm. Sorry, I am not sure l have >> an idea how to handle this properly. I would keep it unimplemented >> until a real reason. >> Will expand TODO. > > Let see how the conversation on patch#1 goes about PIO vs MMIO. ok > >>> >>> >>>> + BUG(); >>>> + return true; >>>> +} >>>> + >>>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) >>>> +{ >>>> + return p->addr; >>>> +} >>> >>> I understand that the x86 version is more complex as it check p->df. >>> However, aside reducing the complexity, I am not sure why we would >>> want to diverge it. >>> >>> After all, IOREQ is now meant to be a common feature. >> Well, no objections at all. >> Could you please clarify how could 'df' (Direction Flag?) be >> handled/used on Arm? > > On x86, this is used by 'rep' instruction to tell the direction to > iterate (forward or backward). > > On Arm, all the accesses to MMIO region will do a single memory > access. So for now, we can safely always set to 0. > >> I see that try_fwd_ioserv() always sets it 0. Or I need to just reuse >> x86's helpers as is, >> which (together with count = df = 0) will result in what we actually >> have here? > AFAIU, both count and df should be 0 on Arm. Thanks for the explanation. The only one question remains where to put common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common io.h?)? >>> >>>> + >>>> +static inline int p2m_set_ioreq_server(struct domain *d, >>>> + unsigned int flags, >>>> + struct hvm_ioreq_server *s) >>>> +{ >>>> + return -EOPNOTSUPP; >>>> +} >>> >>> This should be defined in p2m.h. But I am not even sure what it is >>> meant for. Can you expand it? >> >> ok, will move. >> >> >> In this series I tried to make as much IOREQ code common as possible >> and avoid complicating things, in order to achieve that a few stubs >> were added here. Please note, >> that I also considered splitting into arch parts. But some functions >> couldn't be split easily. >> This one is called from common hvm_destroy_ioreq_server() with flag >> being 0 (which will result in unmapping ioreq server from p2m type on >> x86). >> I could add a comment describing why this stub is present here. > > Sorry if I wasn't clear. I wasn't asking why the stub is there but > what should be the expected implementation of the function. > > In particular, you are returning -EOPNOTSUPP. The only reason you are > getting away from trouble is because the caller doesn't check the return. True. > > Would it make sense to have a stub arch_hvm_destroy_ioreq_server()? With what has been said above, it make sense, will create. >>>> + >>>> +static inline void msix_write_completion(struct vcpu *v) >>>> +{ >>>> +} >>>> + >>>> +static inline void handle_realmode_completion(void) >>>> +{ >>>> + ASSERT_UNREACHABLE(); >>>> +} >>> >>> realmode is very x86 specific. So I don't think this function should >>> be called from common code. It might be worth considering to split >>> handle_hvm_io_completion() is 2 parts: common and arch specific. >> >> I agree with you that realmode is x86 specific and looks not good in >> Arm header. > It is not a problem of looking good or not. Instead, it is about > abstraction. A developper shouldn't need to understand all the other > architectures we support in order to follow the common code. > >> I was thinking how to split handle_hvm_io_completion() gracefully but >> I failed find a good solution for that, so decided to add two stubs >> (msix_write_completion and handle_realmode_completion) on Arm. I >> could add a comment describing why they are here if appropriate. But >> if you think they shouldn't be called from the common code in any >> way, I will try to split it. > > I am not entirely sure what msix_write_completion is meant to do on > x86. Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could > help? > > Regarding handle_realmode_completion, I would add a new stub: > > arch_ioreq_handle_io_completion() that is called from the default case > of the switch. > > On x86 it would be implemented as: > > switch (io_completion) > { > case HVMIO_realmode_completion: > ... > default: > ASSERT_UNREACHABLE(); > } > > On Arm, it would be implemented as: > > ASSERT_UNREACHABLE(); Good point, will update.
On 05.08.20 12:32, Julien Grall wrote: Hi Julien > >>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>> */ >>> void leave_hypervisor_to_guest(void) >>> { >>> +#ifdef CONFIG_IOREQ_SERVER >>> + /* >>> + * XXX: Check the return. Shall we call that in >>> + * continue_running and context_switch instead? >>> + * The benefits would be to avoid calling >>> + * handle_hvm_io_completion on every return. >>> + */ >> >> Yeah, that could be a simple and good optimization > > Well, it is not simple as it is sounds :). handle_hvm_io_completion() > is the function in charge to mark the vCPU as waiting for I/O. So we > would at least need to split the function. > > I wrote this TODO because I wasn't sure about the complexity of > handle_hvm_io_completion(current). Looking at it again, the main > complexity is the looping over the IOREQ servers. > > I think it would be better to optimize handle_hvm_io_completion() > rather than trying to hack the context_switch() or continue_running(). Well, is the idea in proposed dirty test patch below close to what you expect? Patch optimizes handle_hvm_io_completion() to avoid extra actions if vcpu's domain doesn't have ioreq_server, alternatively the check could be moved out of handle_hvm_io_completion() to avoid calling that function at all. BTW, TODO also suggests checking the return value of handle_hvm_io_completion(), but I am completely sure we can simply just return from leave_hypervisor_to_guest() at this point. Could you please share your opinion? --- xen/common/hvm/ioreq.c | 12 +++++++++++- xen/include/asm-arm/domain.h | 1 + xen/include/xen/hvm/ioreq.h | 5 +++++ 3 files changed, 17 insertions(+), 1 deletion(-) diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c index 7e1fa23..dc6647a 100644 --- a/xen/common/hvm/ioreq.c +++ b/xen/common/hvm/ioreq.c @@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id, struct hvm_ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) || + (s && !d->arch.hvm.ioreq_server.server[id])); d->arch.hvm.ioreq_server.server[id] = s; + + if ( s ) + d->arch.hvm.ioreq_server.nr_servers ++; + else + d->arch.hvm.ioreq_server.nr_servers --; } /* @@ -169,6 +175,9 @@ bool handle_hvm_io_completion(struct vcpu *v) return false; } + if ( !hvm_domain_has_ioreq_server(d) ) + return true; + FOR_EACH_IOREQ_SERVER(d, id, s) { struct hvm_ioreq_vcpu *sv; @@ -1415,6 +1424,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) void hvm_ioreq_init(struct domain *d) { spin_lock_init(&d->arch.hvm.ioreq_server.lock); + d->arch.hvm.ioreq_server.nr_servers = 0; arch_hvm_ioreq_init(d); } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 6a01d69..484bd1a 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -68,6 +68,7 @@ struct hvm_domain struct { spinlock_t lock; struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + unsigned int nr_servers; } ioreq_server; bool_t qemu_mapcache_invalidate; diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h index 40b7b5e..8f78852 100644 --- a/xen/include/xen/hvm/ioreq.h +++ b/xen/include/xen/hvm/ioreq.h @@ -23,6 +23,11 @@ #include <asm/hvm/ioreq.h> +static inline bool hvm_domain_has_ioreq_server(const struct domain *d) +{ + return (d->arch.hvm.ioreq_server.nr_servers > 0); +} + #define GET_IOREQ_SERVER(d, id) \ (d)->arch.hvm.ioreq_server.server[id]
Hi > >> >>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>> */ >>>> void leave_hypervisor_to_guest(void) >>>> { >>>> +#ifdef CONFIG_IOREQ_SERVER >>>> + /* >>>> + * XXX: Check the return. Shall we call that in >>>> + * continue_running and context_switch instead? >>>> + * The benefits would be to avoid calling >>>> + * handle_hvm_io_completion on every return. >>>> + */ >>> >>> Yeah, that could be a simple and good optimization >> >> Well, it is not simple as it is sounds :). handle_hvm_io_completion() >> is the function in charge to mark the vCPU as waiting for I/O. So we >> would at least need to split the function. >> >> I wrote this TODO because I wasn't sure about the complexity of >> handle_hvm_io_completion(current). Looking at it again, the main >> complexity is the looping over the IOREQ servers. >> >> I think it would be better to optimize handle_hvm_io_completion() >> rather than trying to hack the context_switch() or continue_running(). > Well, is the idea in proposed dirty test patch below close to what you > expect? Patch optimizes handle_hvm_io_completion() to avoid extra > actions if vcpu's domain doesn't have ioreq_server, alternatively > the check could be moved out of handle_hvm_io_completion() to avoid > calling that function at all. BTW, TODO also suggests checking the > return value of handle_hvm_io_completion(), but I am completely sure > we can simply > just return from leave_hypervisor_to_guest() at this point. Sorry, made a mistake in last sentence). s / I am completely sure / I am not completely sure
On 06/08/2020 14:27, Oleksandr wrote: > > On 06.08.20 14:08, Julien Grall wrote: > > Hi Julien > >> >>>> What is this function supposed to do? >>> Agree, sounds confusing a bit. I assume it is supposed to complete >>> Guest MMIO access after finishing emulation. >>> >>> Shall I rename it to something appropriate (maybe by adding ioreq >>> prefix)? >> >> How about ioreq_handle_complete_mmio()? > > For me it sounds fine. > > > >> >>>>> diff --git a/xen/common/memory.c b/xen/common/memory.c >>>>> index 9283e5e..0000477 100644 >>>>> --- a/xen/common/memory.c >>>>> +++ b/xen/common/memory.c >>>>> @@ -8,6 +8,7 @@ >>>>> */ >>>>> #include <xen/domain_page.h> >>>>> +#include <xen/hvm/ioreq.h> >>>>> #include <xen/types.h> >>>>> #include <xen/lib.h> >>>>> #include <xen/mm.h> >>>>> @@ -30,10 +31,6 @@ >>>>> #include <public/memory.h> >>>>> #include <xsm/xsm.h> >>>>> -#ifdef CONFIG_IOREQ_SERVER >>>>> -#include <xen/hvm/ioreq.h> >>>>> -#endif >>>>> - >>>> >>>> Why do you remove something your just introduced? >>> The reason I guarded that header is to make "xen/mm: Make x86's >>> XENMEM_resource_ioreq_server handling common" (previous) patch >>> buildable on Arm >>> without arch IOREQ header added yet. I tried to make sure that the >>> result after each patch was buildable to retain bisectability. >>> As current patch adds Arm IOREQ specific bits (including header), >>> that guard could be removed as not needed anymore. >> I agree we want to have the build bisectable. However, I am still >> puzzled why it is necessary to remove the #ifdef and move it earlier >> in the list. >> >> Do you mind to provide more details? > Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server handling > common" breaks build on Arm as it includes xen/hvm/ioreq.h which > requires arch header > to be present (asm/hvm/ioreq.h). But the missing arch header together > with other arch specific bits are introduced here in current patch. I understand that both Arm and x86 now implement the asm/hvm/ioreq.h. However, please keep in mind that there might be other architectures in the future. With your change here, you would impose a new arch to implement asm/hvm/ioreq.h even if the developper have no plan to use the feature. > Probably I should have rearranged > changes in a way to not introduce #ifdef and then remove it... Ideally we want to avoid #ifdef in the common code. But if this can't be done in an header, then the #ifdef here would be fine. > > >> >> [...] >> >>>>> + >>>>> +bool handle_mmio(void); >>>>> + >>>>> +static inline bool handle_pio(uint16_t port, unsigned int size, >>>>> int dir) >>>>> +{ >>>>> + /* XXX */ >>>> >>>> Can you expand this TODO? What do you expect to do? >>> I didn't expect this to be called on Arm. Sorry, I am not sure l have >>> an idea how to handle this properly. I would keep it unimplemented >>> until a real reason. >>> Will expand TODO. >> >> Let see how the conversation on patch#1 goes about PIO vs MMIO. > > ok > > >> >>>> >>>> >>>>> + BUG(); >>>>> + return true; >>>>> +} >>>>> + >>>>> +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) >>>>> +{ >>>>> + return p->addr; >>>>> +} >>>> >>>> I understand that the x86 version is more complex as it check p->df. >>>> However, aside reducing the complexity, I am not sure why we would >>>> want to diverge it. >>>> >>>> After all, IOREQ is now meant to be a common feature. >>> Well, no objections at all. >>> Could you please clarify how could 'df' (Direction Flag?) be >>> handled/used on Arm? >> >> On x86, this is used by 'rep' instruction to tell the direction to >> iterate (forward or backward). >> >> On Arm, all the accesses to MMIO region will do a single memory >> access. So for now, we can safely always set to 0. >> >>> I see that try_fwd_ioserv() always sets it 0. Or I need to just reuse >>> x86's helpers as is, >>> which (together with count = df = 0) will result in what we actually >>> have here? >> AFAIU, both count and df should be 0 on Arm. > > Thanks for the explanation. The only one question remains where to put > common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common io.h?)? It feels to me it should be part of the common ioreq.h. Cheers,
On 10/08/2020 19:09, Oleksandr wrote: > > On 05.08.20 12:32, Julien Grall wrote: > > Hi Julien > >> >>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>> */ >>>> void leave_hypervisor_to_guest(void) >>>> { >>>> +#ifdef CONFIG_IOREQ_SERVER >>>> + /* >>>> + * XXX: Check the return. Shall we call that in >>>> + * continue_running and context_switch instead? >>>> + * The benefits would be to avoid calling >>>> + * handle_hvm_io_completion on every return. >>>> + */ >>> >>> Yeah, that could be a simple and good optimization >> >> Well, it is not simple as it is sounds :). handle_hvm_io_completion() >> is the function in charge to mark the vCPU as waiting for I/O. So we >> would at least need to split the function. >> >> I wrote this TODO because I wasn't sure about the complexity of >> handle_hvm_io_completion(current). Looking at it again, the main >> complexity is the looping over the IOREQ servers. >> >> I think it would be better to optimize handle_hvm_io_completion() >> rather than trying to hack the context_switch() or continue_running(). > Well, is the idea in proposed dirty test patch below close to what you > expect? Patch optimizes handle_hvm_io_completion() to avoid extra > actions if vcpu's domain doesn't have ioreq_server, alternatively > the check could be moved out of handle_hvm_io_completion() to avoid > calling that function at all. This looks ok to me. > BTW, TODO also suggests checking the > return value of handle_hvm_io_completion(), but I am completely sure we > can simply > just return from leave_hypervisor_to_guest() at this point. Could you > please share your opinion? From my understanding, handle_hvm_io_completion() may return false if there is pending I/O or a failure. In the former case, I think we want to call handle_hvm_io_completion() later on. Possibly after we call do_softirq(). I am wondering whether check_for_vcpu_work() could return whether there are more work todo on the behalf of the vCPU. So we could have: do { check_for_pcpu_work(); } while (check_for_vcpu_work()) The implementation of check_for_vcpu_work() would be: if ( !handle_hvm_io_completion() ) return true; /* Rest of the existing code */ return false; Cheers,
On 10.08.20 21:25, Julien Grall wrote: Hi Julien > >>> >>> Do you mind to provide more details? >> Previous patch "xen/mm: Make x86's XENMEM_resource_ioreq_server >> handling common" breaks build on Arm as it includes xen/hvm/ioreq.h >> which requires arch header >> to be present (asm/hvm/ioreq.h). But the missing arch header together >> with other arch specific bits are introduced here in current patch. > > I understand that both Arm and x86 now implement the asm/hvm/ioreq.h. > However, please keep in mind that there might be other architectures > in the future. > > With your change here, you would impose a new arch to implement > asm/hvm/ioreq.h even if the developper have no plan to use the feature. > >> Probably I should have rearranged >> changes in a way to not introduce #ifdef and then remove it... > > Ideally we want to avoid #ifdef in the common code. But if this can't > be done in an header, then the #ifdef here would be fine. Got it. > >>>>> I understand that the x86 version is more complex as it check >>>>> p->df. However, aside reducing the complexity, I am not sure why >>>>> we would want to diverge it. >>>>> >>>>> After all, IOREQ is now meant to be a common feature. >>>> Well, no objections at all. >>>> Could you please clarify how could 'df' (Direction Flag?) be >>>> handled/used on Arm? >>> >>> On x86, this is used by 'rep' instruction to tell the direction to >>> iterate (forward or backward). >>> >>> On Arm, all the accesses to MMIO region will do a single memory >>> access. So for now, we can safely always set to 0. >>> >>>> I see that try_fwd_ioserv() always sets it 0. Or I need to just >>>> reuse x86's helpers as is, >>>> which (together with count = df = 0) will result in what we >>>> actually have here? >>> AFAIU, both count and df should be 0 on Arm. >> >> Thanks for the explanation. The only one question remains where to >> put common helpers hvm_mmio_first_byte/hvm_mmio_last_byte (common >> io.h?)? > > It feels to me it should be part of the common ioreq.h. ok, will move.
On 10.08.20 22:00, Julien Grall wrote: Hi Julien > >>> >>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>>> */ >>>>> void leave_hypervisor_to_guest(void) >>>>> { >>>>> +#ifdef CONFIG_IOREQ_SERVER >>>>> + /* >>>>> + * XXX: Check the return. Shall we call that in >>>>> + * continue_running and context_switch instead? >>>>> + * The benefits would be to avoid calling >>>>> + * handle_hvm_io_completion on every return. >>>>> + */ >>>> >>>> Yeah, that could be a simple and good optimization >>> >>> Well, it is not simple as it is sounds :). >>> handle_hvm_io_completion() is the function in charge to mark the >>> vCPU as waiting for I/O. So we would at least need to split the >>> function. >>> >>> I wrote this TODO because I wasn't sure about the complexity of >>> handle_hvm_io_completion(current). Looking at it again, the main >>> complexity is the looping over the IOREQ servers. >>> >>> I think it would be better to optimize handle_hvm_io_completion() >>> rather than trying to hack the context_switch() or continue_running(). >> Well, is the idea in proposed dirty test patch below close to what >> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra >> actions if vcpu's domain doesn't have ioreq_server, alternatively >> the check could be moved out of handle_hvm_io_completion() to avoid >> calling that function at all. > > This looks ok to me. > >> BTW, TODO also suggests checking the return value of >> handle_hvm_io_completion(), but I am completely sure we can simply >> just return from leave_hypervisor_to_guest() at this point. Could you >> please share your opinion? > > From my understanding, handle_hvm_io_completion() may return false if > there is pending I/O or a failure. It seems, yes > > In the former case, I think we want to call handle_hvm_io_completion() > later on. Possibly after we call do_softirq(). > > I am wondering whether check_for_vcpu_work() could return whether > there are more work todo on the behalf of the vCPU. > > So we could have: > > do > { > check_for_pcpu_work(); > } while (check_for_vcpu_work()) > > The implementation of check_for_vcpu_work() would be: > > if ( !handle_hvm_io_completion() ) > return true; > > /* Rest of the existing code */ > > return false; Thank you, will give it a try. Can we behave the same way for both "pending I/O" and "failure" or we need to distinguish them? Probably we need some sort of safe timeout/number attempts in order to not spin forever?
On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote: > > > On 10.08.20 22:00, Julien Grall wrote: > > Hi Julien > > > > >>> > >>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) > >>>>> */ > >>>>> void leave_hypervisor_to_guest(void) > >>>>> { > >>>>> +#ifdef CONFIG_IOREQ_SERVER > >>>>> + /* > >>>>> + * XXX: Check the return. Shall we call that in > >>>>> + * continue_running and context_switch instead? > >>>>> + * The benefits would be to avoid calling > >>>>> + * handle_hvm_io_completion on every return. > >>>>> + */ > >>>> > >>>> Yeah, that could be a simple and good optimization > >>> > >>> Well, it is not simple as it is sounds :). > >>> handle_hvm_io_completion() is the function in charge to mark the > >>> vCPU as waiting for I/O. So we would at least need to split the > >>> function. > >>> > >>> I wrote this TODO because I wasn't sure about the complexity of > >>> handle_hvm_io_completion(current). Looking at it again, the main > >>> complexity is the looping over the IOREQ servers. > >>> > >>> I think it would be better to optimize handle_hvm_io_completion() > >>> rather than trying to hack the context_switch() or continue_running(). > >> Well, is the idea in proposed dirty test patch below close to what > >> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra > >> actions if vcpu's domain doesn't have ioreq_server, alternatively > >> the check could be moved out of handle_hvm_io_completion() to avoid > >> calling that function at all. > > > > This looks ok to me. > > > >> BTW, TODO also suggests checking the return value of > >> handle_hvm_io_completion(), but I am completely sure we can simply > >> just return from leave_hypervisor_to_guest() at this point. Could you > >> please share your opinion? > > > > From my understanding, handle_hvm_io_completion() may return false if > > there is pending I/O or a failure. > > It seems, yes > > > > > > In the former case, I think we want to call handle_hvm_io_completion() > > later on. Possibly after we call do_softirq(). > > > > I am wondering whether check_for_vcpu_work() could return whether > > there are more work todo on the behalf of the vCPU. > > > > So we could have: > > > > do > > { > > check_for_pcpu_work(); > > } while (check_for_vcpu_work()) > > > > The implementation of check_for_vcpu_work() would be: > > > > if ( !handle_hvm_io_completion() ) > > return true; > > > > /* Rest of the existing code */ > > > > return false; > > Thank you, will give it a try. > > Can we behave the same way for both "pending I/O" and "failure" or we > need to distinguish them? We don't need to distinguish them. In both cases, we will want to process softirqs. In all the failure cases, the domain will have crashed. Therefore the vCPU will be unscheduled. > > Probably we need some sort of safe timeout/number attempts in order to > not spin forever? Well, anything based on timeout/number of attempts is flaky. How do you know whether the I/O is just taking a "long time" to complete? But a vCPU shouldn't continue until an I/O has completed. This is nothing very different than what a processor would do. In Xen case, if an I/O never completes then it most likely means that something went horribly wrong with the Device Emulator. So it is most likely not safe to continue. In HW, when there is a device failure, the OS may receive an SError (this is implementation defined) and could act accordingly if it is able to recognize the issue. It *might* be possible to send a virtual SError but there are a couple of issues with it: * How do you detect a failure? * SErrors are implementations defined. You would need to teach your OS (or the firmware) how to deal with them. I would expect quite a bit of effort in order to design and implement it. For now, it is probably best to just let the vCPU spin forever. This wouldn't be an issue for Xen as do_softirq() would be called at every loop. Cheers,
On 11.08.20 01:37, Julien Grall wrote: Hi Julien > On Mon, 10 Aug 2020 at 21:29, Oleksandr <olekstysh@gmail.com> wrote: >> >> On 10.08.20 22:00, Julien Grall wrote: >> >> Hi Julien >> >>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>>>>> */ >>>>>>> void leave_hypervisor_to_guest(void) >>>>>>> { >>>>>>> +#ifdef CONFIG_IOREQ_SERVER >>>>>>> + /* >>>>>>> + * XXX: Check the return. Shall we call that in >>>>>>> + * continue_running and context_switch instead? >>>>>>> + * The benefits would be to avoid calling >>>>>>> + * handle_hvm_io_completion on every return. >>>>>>> + */ >>>>>> Yeah, that could be a simple and good optimization >>>>> Well, it is not simple as it is sounds :). >>>>> handle_hvm_io_completion() is the function in charge to mark the >>>>> vCPU as waiting for I/O. So we would at least need to split the >>>>> function. >>>>> >>>>> I wrote this TODO because I wasn't sure about the complexity of >>>>> handle_hvm_io_completion(current). Looking at it again, the main >>>>> complexity is the looping over the IOREQ servers. >>>>> >>>>> I think it would be better to optimize handle_hvm_io_completion() >>>>> rather than trying to hack the context_switch() or continue_running(). >>>> Well, is the idea in proposed dirty test patch below close to what >>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra >>>> actions if vcpu's domain doesn't have ioreq_server, alternatively >>>> the check could be moved out of handle_hvm_io_completion() to avoid >>>> calling that function at all. >>> This looks ok to me. >>> >>>> BTW, TODO also suggests checking the return value of >>>> handle_hvm_io_completion(), but I am completely sure we can simply >>>> just return from leave_hypervisor_to_guest() at this point. Could you >>>> please share your opinion? >>> From my understanding, handle_hvm_io_completion() may return false if >>> there is pending I/O or a failure. >> It seems, yes >> >> >>> In the former case, I think we want to call handle_hvm_io_completion() >>> later on. Possibly after we call do_softirq(). >>> >>> I am wondering whether check_for_vcpu_work() could return whether >>> there are more work todo on the behalf of the vCPU. >>> >>> So we could have: >>> >>> do >>> { >>> check_for_pcpu_work(); >>> } while (check_for_vcpu_work()) >>> >>> The implementation of check_for_vcpu_work() would be: >>> >>> if ( !handle_hvm_io_completion() ) >>> return true; >>> >>> /* Rest of the existing code */ >>> >>> return false; >> Thank you, will give it a try. >> >> Can we behave the same way for both "pending I/O" and "failure" or we >> need to distinguish them? > We don't need to distinguish them. In both cases, we will want to > process softirqs. In all the failure cases, the domain will have > crashed. Therefore the vCPU will be unscheduled. Got it. >> Probably we need some sort of safe timeout/number attempts in order to >> not spin forever? > Well, anything based on timeout/number of attempts is flaky. How do > you know whether the I/O is just taking a "long time" to complete? > > But a vCPU shouldn't continue until an I/O has completed. This is > nothing very different than what a processor would do. > > In Xen case, if an I/O never completes then it most likely means that > something went horribly wrong with the Device Emulator. So it is most > likely not safe to continue. In HW, when there is a device failure, > the OS may receive an SError (this is implementation defined) and > could act accordingly if it is able to recognize the issue. > > It *might* be possible to send a virtual SError but there are a couple > of issues with it: > * How do you detect a failure? > * SErrors are implementations defined. You would need to teach > your OS (or the firmware) how to deal with them. > > I would expect quite a bit of effort in order to design and implement > it. For now, it is probably best to just let the vCPU spin forever. > > This wouldn't be an issue for Xen as do_softirq() would be called at > every loop. Thank you for clarification. Fair enough and sounds reasonable.
On 05.08.20 12:32, Julien Grall wrote: Hi Julien, Stefano > >>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>> index 5fdb6e8..5823f11 100644 >>> --- a/xen/include/asm-arm/p2m.h >>> +++ b/xen/include/asm-arm/p2m.h >>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct >>> domain *d, unsigned long gfn, >>> mfn_t mfn) >>> { >>> /* >>> - * NOTE: If this is implemented then proper reference counting of >>> - * foreign entries will need to be implemented. >>> + * XXX: handle properly reference. It looks like the page may >>> not always >>> + * belong to d. >> >> Just as a reference, and without taking away anything from the comment, >> I think that QEMU is doing its own internal reference counting for these >> mappings. > > I am not sure how this matters here? We can't really trust the DM to > do the right thing if it is not running in dom0. > > But, IIRC, the problem is some of the pages doesn't belong to do a > domain, so it is not possible to treat them as foreign mapping (e.g. > you wouldn't be able to grab a reference). This investigation was done > a couple of years ago, so this may have changed in recent Xen. Well, emulator is going to be used in driver domain, so this TODO must be resolved. I suspect that the check for a hardware domain in acquire_resource() which I skipped in a hackish way [1] could be simply removed once proper reference counting is implemented in Xen, correct? Could you please provide some pointers on that problem? Maybe some questions need to be investigated again? Unfortunately, it is not completely clear to me the direction to follow... *** I am wondering whether the similar problem exists on x86 as well? The FIXME tag (before checking for a hardware domain in acquire_resource()) in the common code makes me think it is a common issue. From other side x86's implementation of set_foreign_p2m_entry() is exists unlike Arm's one (which returned -EOPNOTSUPP so far). Or these are unrelated? *** [1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html
On 11/08/2020 18:09, Oleksandr wrote: > > On 05.08.20 12:32, Julien Grall wrote: > > Hi Julien, Stefano > >> >>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>>> index 5fdb6e8..5823f11 100644 >>>> --- a/xen/include/asm-arm/p2m.h >>>> +++ b/xen/include/asm-arm/p2m.h >>>> @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct >>>> domain *d, unsigned long gfn, >>>> mfn_t mfn) >>>> { >>>> /* >>>> - * NOTE: If this is implemented then proper reference counting of >>>> - * foreign entries will need to be implemented. >>>> + * XXX: handle properly reference. It looks like the page may >>>> not always >>>> + * belong to d. >>> >>> Just as a reference, and without taking away anything from the comment, >>> I think that QEMU is doing its own internal reference counting for these >>> mappings. >> >> I am not sure how this matters here? We can't really trust the DM to >> do the right thing if it is not running in dom0. >> >> But, IIRC, the problem is some of the pages doesn't belong to do a >> domain, so it is not possible to treat them as foreign mapping (e.g. >> you wouldn't be able to grab a reference). This investigation was done >> a couple of years ago, so this may have changed in recent Xen. > > Well, emulator is going to be used in driver domain, so this TODO must > be resolved. I suspect that the check for a hardware domain in > acquire_resource() which I skipped in a hackish way [1] could be simply > removed once proper reference counting is implemented in Xen, correct? It depends how you are going to solve it. If you manage to solve it in a generic way, then yes you could resolve. If not (i.e. it is solved in an arch-specific way), we would need to keep the check on arch that are not able to deal with it. See more below. > > Could you please provide some pointers on that problem? Maybe some > questions need to be investigated again? Unfortunately, it is not > completely clear to me the direction to follow... > > *** > I am wondering whether the similar problem exists on x86 as well? It is somewhat different. On Arm, we are able to handle properly foreign mapping (i.e. mapping page from a another domain) as we would grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap()). The reference will then be released when the entry is removed from the P2M (see p2m_free_entry()). If all the pages given to set_foreign_p2m_entry() belong to a domain, then you could use the same approach. However, I remember to run into some issues in some of the cases. I had a quick looked at the caller and I wasn't able to find any use cases that may be an issue. The refcounting in the IOREQ code has changed after XSA-276 (this was found while working on the Arm port). Probably the best way to figure out if it works would be to try it and see if it fails. Note that set_foreign_p2m_entry() doesn't have a parameter for the foreign domain. You would need to add a extra parameter for this. > The > FIXME tag (before checking for a hardware domain in acquire_resource()) > in the common code makes me think it is a common issue. From other side > x86's > implementation of set_foreign_p2m_entry() is exists unlike Arm's one > (which returned -EOPNOTSUPP so far). Or these are unrelated? At the moment, x86 doesn't support refcounting for foreign mapping. Hence the reason to restrict them to the hardware domain. > *** > > [1] https://lists.xen.org/archives/html/xen-devel/2020-08/msg00075.html Cheers,
Hi Julien >>>>>>>> @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) >>>>>>>> */ >>>>>>>> void leave_hypervisor_to_guest(void) >>>>>>>> { >>>>>>>> +#ifdef CONFIG_IOREQ_SERVER >>>>>>>> + /* >>>>>>>> + * XXX: Check the return. Shall we call that in >>>>>>>> + * continue_running and context_switch instead? >>>>>>>> + * The benefits would be to avoid calling >>>>>>>> + * handle_hvm_io_completion on every return. >>>>>>>> + */ >>>>>>> Yeah, that could be a simple and good optimization >>>>>> Well, it is not simple as it is sounds :). >>>>>> handle_hvm_io_completion() is the function in charge to mark the >>>>>> vCPU as waiting for I/O. So we would at least need to split the >>>>>> function. >>>>>> >>>>>> I wrote this TODO because I wasn't sure about the complexity of >>>>>> handle_hvm_io_completion(current). Looking at it again, the main >>>>>> complexity is the looping over the IOREQ servers. >>>>>> >>>>>> I think it would be better to optimize handle_hvm_io_completion() >>>>>> rather than trying to hack the context_switch() or >>>>>> continue_running(). >>>>> Well, is the idea in proposed dirty test patch below close to what >>>>> you expect? Patch optimizes handle_hvm_io_completion() to avoid extra >>>>> actions if vcpu's domain doesn't have ioreq_server, alternatively >>>>> the check could be moved out of handle_hvm_io_completion() to avoid >>>>> calling that function at all. >>>> This looks ok to me. >>>> >>>>> BTW, TODO also suggests checking the return value of >>>>> handle_hvm_io_completion(), but I am completely sure we can simply >>>>> just return from leave_hypervisor_to_guest() at this point. Could you >>>>> please share your opinion? >>>> From my understanding, handle_hvm_io_completion() may return false if >>>> there is pending I/O or a failure. >>> It seems, yes >>> >>> >>>> In the former case, I think we want to call handle_hvm_io_completion() >>>> later on. Possibly after we call do_softirq(). >>>> >>>> I am wondering whether check_for_vcpu_work() could return whether >>>> there are more work todo on the behalf of the vCPU. >>>> >>>> So we could have: >>>> >>>> do >>>> { >>>> check_for_pcpu_work(); >>>> } while (check_for_vcpu_work()) >>>> >>>> The implementation of check_for_vcpu_work() would be: >>>> >>>> if ( !handle_hvm_io_completion() ) >>>> return true; >>>> >>>> /* Rest of the existing code */ >>>> >>>> return false; >>> Thank you, will give it a try. >>> >>> Can we behave the same way for both "pending I/O" and "failure" or we >>> need to distinguish them? >> We don't need to distinguish them. In both cases, we will want to >> process softirqs. In all the failure cases, the domain will have >> crashed. Therefore the vCPU will be unscheduled. > > Got it. > > >>> Probably we need some sort of safe timeout/number attempts in order to >>> not spin forever? >> Well, anything based on timeout/number of attempts is flaky. How do >> you know whether the I/O is just taking a "long time" to complete? >> >> But a vCPU shouldn't continue until an I/O has completed. This is >> nothing very different than what a processor would do. >> >> In Xen case, if an I/O never completes then it most likely means that >> something went horribly wrong with the Device Emulator. So it is most >> likely not safe to continue. In HW, when there is a device failure, >> the OS may receive an SError (this is implementation defined) and >> could act accordingly if it is able to recognize the issue. >> >> It *might* be possible to send a virtual SError but there are a couple >> of issues with it: >> * How do you detect a failure? >> * SErrors are implementations defined. You would need to teach >> your OS (or the firmware) how to deal with them. >> >> I would expect quite a bit of effort in order to design and implement >> it. For now, it is probably best to just let the vCPU spin forever. >> >> This wouldn't be an issue for Xen as do_softirq() would be called at >> every loop. > > Thank you for clarification. Fair enough and sounds reasonable. I added logic to properly handle the return value of handle_hvm_io_completion() as you had suggested. For test purpose I simulated handle_hvm_io_completion() to return false sometimes (I couldn't detect real "pending I/O" failure during testing) to see how new logic behaved. I assume I can take this solution for non-RFC series (?) --- xen/arch/arm/traps.c | 36 ++++++++++++++++++++++-------------- xen/common/hvm/ioreq.c | 9 ++++++++- xen/include/asm-arm/domain.h | 1 + xen/include/xen/hvm/ioreq.h | 5 +++++ 4 files changed, 36 insertions(+), 15 deletions(-) diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 974c744..f74b514 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -2264,12 +2264,26 @@ static void check_for_pcpu_work(void) * Process pending work for the vCPU. Any call should be fast or * implement preemption. */ -static void check_for_vcpu_work(void) +static bool check_for_vcpu_work(void) { struct vcpu *v = current; +#ifdef CONFIG_IOREQ_SERVER + if ( hvm_domain_has_ioreq_server(v->domain) ) + { + bool handled; + + local_irq_enable(); + handled = handle_hvm_io_completion(v); + local_irq_disable(); + + if ( !handled ) + return true; + } +#endif + if ( likely(!v->arch.need_flush_to_ram) ) - return; + return false; /* * Give a chance for the pCPU to process work before handling the vCPU @@ -2280,6 +2294,8 @@ static void check_for_vcpu_work(void) local_irq_enable(); p2m_flush_vm(v); local_irq_disable(); + + return false; } /* @@ -2290,20 +2306,12 @@ static void check_for_vcpu_work(void) */ void leave_hypervisor_to_guest(void) { -#ifdef CONFIG_IOREQ_SERVER - /* - * XXX: Check the return. Shall we call that in - * continue_running and context_switch instead? - * The benefits would be to avoid calling - * handle_hvm_io_completion on every return. - */ - local_irq_enable(); - handle_hvm_io_completion(current); -#endif local_irq_disable(); - check_for_vcpu_work(); - check_for_pcpu_work(); + do + { + check_for_pcpu_work(); + } while ( check_for_vcpu_work() ); vgic_sync_to_lrs(); diff --git a/xen/common/hvm/ioreq.c b/xen/common/hvm/ioreq.c index 7e1fa23..81b41ab 100644 --- a/xen/common/hvm/ioreq.c +++ b/xen/common/hvm/ioreq.c @@ -38,9 +38,15 @@ static void set_ioreq_server(struct domain *d, unsigned int id, struct hvm_ioreq_server *s) { ASSERT(id < MAX_NR_IOREQ_SERVERS); - ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]); + ASSERT((!s && d->arch.hvm.ioreq_server.server[id]) || + (s && !d->arch.hvm.ioreq_server.server[id])); d->arch.hvm.ioreq_server.server[id] = s; + + if ( s ) + d->arch.hvm.ioreq_server.nr_servers ++; + else + d->arch.hvm.ioreq_server.nr_servers --; } /* @@ -1415,6 +1421,7 @@ unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered) void hvm_ioreq_init(struct domain *d) { spin_lock_init(&d->arch.hvm.ioreq_server.lock); + d->arch.hvm.ioreq_server.nr_servers = 0; arch_hvm_ioreq_init(d); } diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 6a01d69..484bd1a 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -68,6 +68,7 @@ struct hvm_domain struct { spinlock_t lock; struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + unsigned int nr_servers; } ioreq_server; bool_t qemu_mapcache_invalidate; diff --git a/xen/include/xen/hvm/ioreq.h b/xen/include/xen/hvm/ioreq.h index 40b7b5e..8f78852 100644 --- a/xen/include/xen/hvm/ioreq.h +++ b/xen/include/xen/hvm/ioreq.h @@ -23,6 +23,11 @@ #include <asm/hvm/ioreq.h> +static inline bool hvm_domain_has_ioreq_server(const struct domain *d) +{ + return (d->arch.hvm.ioreq_server.nr_servers > 0); +} + #define GET_IOREQ_SERVER(d, id) \ (d)->arch.hvm.ioreq_server.server[id]
On 11.08.20 20:50, Julien Grall wrote: Hi Julien > > > On 11/08/2020 18:09, Oleksandr wrote: >> >> On 05.08.20 12:32, Julien Grall wrote: >> >> Hi Julien, Stefano >> >>> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>>>> index 5fdb6e8..5823f11 100644 >>>>> --- a/xen/include/asm-arm/p2m.h >>>>> +++ b/xen/include/asm-arm/p2m.h >>>>> @@ -385,10 +385,11 @@ static inline int >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >>>>> mfn_t mfn) >>>>> { >>>>> /* >>>>> - * NOTE: If this is implemented then proper reference >>>>> counting of >>>>> - * foreign entries will need to be implemented. >>>>> + * XXX: handle properly reference. It looks like the page may >>>>> not always >>>>> + * belong to d. >>>> >>>> Just as a reference, and without taking away anything from the >>>> comment, >>>> I think that QEMU is doing its own internal reference counting for >>>> these >>>> mappings. >>> >>> I am not sure how this matters here? We can't really trust the DM to >>> do the right thing if it is not running in dom0. >>> >>> But, IIRC, the problem is some of the pages doesn't belong to do a >>> domain, so it is not possible to treat them as foreign mapping (e.g. >>> you wouldn't be able to grab a reference). This investigation was >>> done a couple of years ago, so this may have changed in recent Xen. >> >> Well, emulator is going to be used in driver domain, so this TODO >> must be resolved. I suspect that the check for a hardware domain in >> acquire_resource() which I skipped in a hackish way [1] could be >> simply removed once proper reference counting is implemented in Xen, >> correct? > > It depends how you are going to solve it. If you manage to solve it in > a generic way, then yes you could resolve. If not (i.e. it is solved > in an arch-specific way), we would need to keep the check on arch that > are not able to deal with it. See more below. > >> >> Could you please provide some pointers on that problem? Maybe some >> questions need to be investigated again? Unfortunately, it is not >> completely clear to me the direction to follow... >> >> *** >> I am wondering whether the similar problem exists on x86 as well? > > It is somewhat different. On Arm, we are able to handle properly > foreign mapping (i.e. mapping page from a another domain) as we would > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in > xenmem_add_to_physmap()). The reference will then be released when the > entry is removed from the P2M (see p2m_free_entry()). > > If all the pages given to set_foreign_p2m_entry() belong to a domain, > then you could use the same approach. > > However, I remember to run into some issues in some of the cases. I > had a quick looked at the caller and I wasn't able to find any use > cases that may be an issue. > > The refcounting in the IOREQ code has changed after XSA-276 (this was > found while working on the Arm port). Probably the best way to figure > out if it works would be to try it and see if it fails. > > Note that set_foreign_p2m_entry() doesn't have a parameter for the > foreign domain. You would need to add a extra parameter for this. > >> The FIXME tag (before checking for a hardware domain in >> acquire_resource()) in the common code makes me think it is a common >> issue. From other side x86's >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one >> (which returned -EOPNOTSUPP so far). Or these are unrelated? > > At the moment, x86 doesn't support refcounting for foreign mapping. > Hence the reason to restrict them to the hardware domain. Thank you for the pointers! I checked that all pages given to set_foreign_p2m_entry() belonged to a domain (at least in my use-case). I noticed two calls for acquiring resource at the DomU creation time, the first call was for grant table (single gfn) and the second for ioreq server which carried 2 gfns (for shared and buffered rings I assume). For the test purpose, I passed these gfns to get_page_from_gfn() in order to grab references on the pages, after that I tried to destroy DomU without calling put_page() for these pages. The fact that I couldn't destroy DomU completely (a zombie domain was observed) made me think that references were still taken, so worked as expected. I implemented a test patch (which uses approach from xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check whether it would work. --- xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++ xen/common/memory.c | 2 +- xen/include/asm-arm/p2m.h | 12 ++---------- 3 files changed, 33 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index e9ccba8..7359715 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn, return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); } +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn) +{ + struct page_info *page; + p2m_type_t p2mt; + int rc; + + /* + * Take reference to the foreign domain page. Reference will be released + * in p2m_put_l3_page(). + */ + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC); + if ( !page ) + return -EINVAL; + + if ( p2m_is_ram(p2mt) ) + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : p2m_map_foreign_ro; + else + { + put_page(page); + return -EINVAL; + } + + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt); + if ( rc ) + put_page(page); + + return 0; +} + static struct page_info *p2m_allocate_root(void) { struct page_info *page; diff --git a/xen/common/memory.c b/xen/common/memory.c index 8d9f0a8..1de1d4f 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -1171,7 +1171,7 @@ static int acquire_resource( for ( i = 0; !rc && i < xmar.nr_frames; i++ ) { - rc = set_foreign_p2m_entry(currd, gfn_list[i], + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], _mfn(mfn_list[i])); /* rc should be -EIO for any iteration other than the first */ if ( rc && i ) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 5823f11..53ce373 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order) return gfn_add(gfn, 1UL << order); } -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, - mfn_t mfn) -{ - /* - * XXX: handle properly reference. It looks like the page may not always - * belong to d. - */ - - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); -} +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn); /* * A vCPU has cache enabled only when the MMU is enabled and data cache
On 13/08/2020 19:41, Oleksandr wrote: > Rebooting domain 2 > root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at > ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 > (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- > (XEN) CPU: 3 > (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c > (XEN) LR: 0000000000246ef0 > (XEN) SP: 0000800725eafd80 > (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) > (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f > (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000 > (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020 > (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000 > (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001 > (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0 > (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0 > (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1 > (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d > (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20 > (XEN) > (XEN) VTCR_EL2: 80023558 > (XEN) VTTBR_EL2: 0002000765f04000 > (XEN) > (XEN) SCTLR_EL2: 30cd183d > (XEN) HCR_EL2: 000000008078663f > (XEN) TTBR0_EL2: 00000000781c5000 > (XEN) > (XEN) ESR_EL2: f2000001 > (XEN) HPFAR_EL2: 0000000000030010 > (XEN) FAR_EL2: ffff000008005f00 > (XEN) > (XEN) Xen stack trace from sp=0000800725eafd80: > (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 > 00000000002477c8 > (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 > 0000000000000002 > (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 > 0000800725eafeb0 > (XEN) 0000800725eaff30 0000000060000145 000000000027882c > 0000800725eafeb0 > (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 > 0000000000000006 > (XEN) ffff800000000000 0000000000000002 000000005a000ea1 > 000000019bc60002 > (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 > 000000000027c7d8 > (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 > 0000000000279f98 > (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 > 0000000000262c58 > (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 > 0000000000000001 > (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 > ffff800052e65100 > (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 > 0000000000000000 > (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 > 0000000000000002 > (XEN) 0000000000000001 0000000000000001 0000000000000029 > 0000ffff9badb3d0 > (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 > ffff8000460ec210 > (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 > 0000000000000124 > (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 > ffff0000223dbd20 > (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 > 5a000ea160000145 > (XEN) 0000000060000000 0000000000000000 0000000000000000 > ffff800052e65100 > (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 > 0000000000000000 > (XEN) Xen call trace: > (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC) > (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR) > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) PSCI cpu off failed for CPU0 err=-3 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > > Either I did something wrong (most likely) or there is an issue with > page ref-counting in the IOREQ code. I am still trying to understand > what is going on. At a first glance, the implement of set_foreign_p2m_entry() looks fine to me. > Some notes on that: > 1. I checked that put_page() was called for these pages in > p2m_put_l3_page() when destroying domain. This happened before > hvm_free_ioreq_mfn() execution. > 2. There was no BUG detected if I passed "p2m_ram_rw" instead of > "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't > be fully destroyed because of the reference taken. This definitely looks like a page reference issue. Would it be possible to print where the page reference are dropped? A WARN() in put_page() would help. To avoid a lot of message, I tend to use a global variable that store the page I want to watch. Cheers,
Hi Sorry for the possible format issue. On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote: > > On 11.08.20 20:50, Julien Grall wrote: > > Hi Julien > > > > > > > On 11/08/2020 18:09, Oleksandr wrote: > >> > >> On 05.08.20 12:32, Julien Grall wrote: > >> > >> Hi Julien, Stefano > >> > >>> > >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > >>>>> index 5fdb6e8..5823f11 100644 > >>>>> --- a/xen/include/asm-arm/p2m.h > >>>>> +++ b/xen/include/asm-arm/p2m.h > >>>>> @@ -385,10 +385,11 @@ static inline int > >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn, > >>>>> mfn_t mfn) > >>>>> { > >>>>> /* > >>>>> - * NOTE: If this is implemented then proper reference > >>>>> counting of > >>>>> - * foreign entries will need to be implemented. > >>>>> + * XXX: handle properly reference. It looks like the page may > >>>>> not always > >>>>> + * belong to d. > >>>> > >>>> Just as a reference, and without taking away anything from the > >>>> comment, > >>>> I think that QEMU is doing its own internal reference counting for > >>>> these > >>>> mappings. > >>> > >>> I am not sure how this matters here? We can't really trust the DM to > >>> do the right thing if it is not running in dom0. > >>> > >>> But, IIRC, the problem is some of the pages doesn't belong to do a > >>> domain, so it is not possible to treat them as foreign mapping (e.g. > >>> you wouldn't be able to grab a reference). This investigation was > >>> done a couple of years ago, so this may have changed in recent Xen. > >> > >> Well, emulator is going to be used in driver domain, so this TODO > >> must be resolved. I suspect that the check for a hardware domain in > >> acquire_resource() which I skipped in a hackish way [1] could be > >> simply removed once proper reference counting is implemented in Xen, > >> correct? > > > > It depends how you are going to solve it. If you manage to solve it in > > a generic way, then yes you could resolve. If not (i.e. it is solved > > in an arch-specific way), we would need to keep the check on arch that > > are not able to deal with it. See more below. > > > >> > >> Could you please provide some pointers on that problem? Maybe some > >> questions need to be investigated again? Unfortunately, it is not > >> completely clear to me the direction to follow... > >> > >> *** > >> I am wondering whether the similar problem exists on x86 as well? > > > > It is somewhat different. On Arm, we are able to handle properly > > foreign mapping (i.e. mapping page from a another domain) as we would > > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in > > xenmem_add_to_physmap()). The reference will then be released when the > > entry is removed from the P2M (see p2m_free_entry()). > > > > If all the pages given to set_foreign_p2m_entry() belong to a domain, > > then you could use the same approach. > > > > However, I remember to run into some issues in some of the cases. I > > had a quick looked at the caller and I wasn't able to find any use > > cases that may be an issue. > > > > The refcounting in the IOREQ code has changed after XSA-276 (this was > > found while working on the Arm port). Probably the best way to figure > > out if it works would be to try it and see if it fails. > > > > Note that set_foreign_p2m_entry() doesn't have a parameter for the > > foreign domain. You would need to add a extra parameter for this. > > > >> The FIXME tag (before checking for a hardware domain in > >> acquire_resource()) in the common code makes me think it is a common > >> issue. From other side x86's > >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one > >> (which returned -EOPNOTSUPP so far). Or these are unrelated? > > > > At the moment, x86 doesn't support refcounting for foreign mapping. > > Hence the reason to restrict them to the hardware domain. > > > Thank you for the pointers! > > > I checked that all pages given to set_foreign_p2m_entry() belonged to a > domain (at least in my use-case). I noticed two calls for acquiring > resource at the DomU creation time, the first call was for grant table > (single gfn) > and the second for ioreq server which carried 2 gfns (for shared and > buffered rings I assume). For the test purpose, I passed these gfns to > get_page_from_gfn() in order to grab references on the pages, after that > I tried to destroy DomU without calling put_page() for these pages. The > fact that I couldn't destroy DomU completely (a zombie domain was > observed) made me think that references were still taken, so worked as > expected. > > > I implemented a test patch (which uses approach from > xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check > whether it would work. > > > --- > xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++ > xen/common/memory.c | 2 +- > xen/include/asm-arm/p2m.h | 12 ++---------- > 3 files changed, 33 insertions(+), 11 deletions(-) > > diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c > index e9ccba8..7359715 100644 > --- a/xen/arch/arm/p2m.c > +++ b/xen/arch/arm/p2m.c > @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d, > gfn_t gfn, mfn_t mfn, > return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); > } > > +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, > + unsigned long gfn, mfn_t mfn) > +{ > + struct page_info *page; > + p2m_type_t p2mt; > + int rc; > + > + /* > + * Take reference to the foreign domain page. Reference will be > released > + * in p2m_put_l3_page(). > + */ > + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC); > + if ( !page ) > + return -EINVAL; > + > + if ( p2m_is_ram(p2mt) ) > + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : > p2m_map_foreign_ro; > + else > + { > + put_page(page); > + return -EINVAL; > + } > + > + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt); > + if ( rc ) > + put_page(page); > + > + return 0; > +} > + > static struct page_info *p2m_allocate_root(void) > { > struct page_info *page; > diff --git a/xen/common/memory.c b/xen/common/memory.c > index 8d9f0a8..1de1d4f 100644 > --- a/xen/common/memory.c > +++ b/xen/common/memory.c > @@ -1171,7 +1171,7 @@ static int acquire_resource( > > for ( i = 0; !rc && i < xmar.nr_frames; i++ ) > { > - rc = set_foreign_p2m_entry(currd, gfn_list[i], > + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], > _mfn(mfn_list[i])); > /* rc should be -EIO for any iteration other than the first > */ > if ( rc && i ) > diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h > index 5823f11..53ce373 100644 > --- a/xen/include/asm-arm/p2m.h > +++ b/xen/include/asm-arm/p2m.h > @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, > unsigned int order) > return gfn_add(gfn, 1UL << order); > } > > -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long > gfn, > - mfn_t mfn) > -{ > - /* > - * XXX: handle properly reference. It looks like the page may not > always > - * belong to d. > - */ > - > - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); > -} > +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, > + unsigned long gfn, mfn_t mfn); > > /* > * A vCPU has cache enabled only when the MMU is enabled and data cache > -- > 2.7.4 > > > And with that patch applied I was facing a BUG when destroying/rebooting > DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered > that BUG: > > > Rebooting domain 2 > root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at > ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 > (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- > (XEN) CPU: 3 > (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c > (XEN) LR: 0000000000246ef0 > (XEN) SP: 0000800725eafd80 > (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) > (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f > (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000 > (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020 > (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000 > (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001 > (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0 > (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0 > (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1 > (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d > (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20 > (XEN) > (XEN) VTCR_EL2: 80023558 > (XEN) VTTBR_EL2: 0002000765f04000 > (XEN) > (XEN) SCTLR_EL2: 30cd183d > (XEN) HCR_EL2: 000000008078663f > (XEN) TTBR0_EL2: 00000000781c5000 > (XEN) > (XEN) ESR_EL2: f2000001 > (XEN) HPFAR_EL2: 0000000000030010 > (XEN) FAR_EL2: ffff000008005f00 > (XEN) > (XEN) Xen stack trace from sp=0000800725eafd80: > (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 > 00000000002477c8 > (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 > 0000000000000002 > (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 > 0000800725eafeb0 > (XEN) 0000800725eaff30 0000000060000145 000000000027882c > 0000800725eafeb0 > (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 > 0000000000000006 > (XEN) ffff800000000000 0000000000000002 000000005a000ea1 > 000000019bc60002 > (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 > 000000000027c7d8 > (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 > 0000000000279f98 > (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 > 0000000000262c58 > (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 > 0000000000000001 > (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 > ffff800052e65100 > (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 > 0000000000000000 > (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 > 0000000000000002 > (XEN) 0000000000000001 0000000000000001 0000000000000029 > 0000ffff9badb3d0 > (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 > ffff8000460ec210 > (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 > 0000000000000124 > (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 > ffff0000223dbd20 > (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 > 5a000ea160000145 > (XEN) 0000000060000000 0000000000000000 0000000000000000 > ffff800052e65100 > (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 > 0000000000000000 > (XEN) Xen call trace: > (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC) > (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR) > (XEN) > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 3: > (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > (XEN) > (XEN) **************************************** > (XEN) Panic on CPU 0: > (XEN) PSCI cpu off failed for CPU0 err=-3 > (XEN) **************************************** > (XEN) > (XEN) Reboot in five seconds... > > > > Either I did something wrong (most likely) or there is an issue with > page ref-counting in the IOREQ code. I am still trying to understand > what is going on. > Some notes on that: > 1. I checked that put_page() was called for these pages in > p2m_put_l3_page() when destroying domain. This happened before > hvm_free_ioreq_mfn() execution. > 2. There was no BUG detected if I passed "p2m_ram_rw" instead of > "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't > be fully destroyed because of the reference taken. > I think I understand why BUG is triggered. I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?) 1. hvm_alloc_ioreq_mfn() -> ref 2 2. set_foreign_p2m_entry() -> ref still 2 3. p2m_put_l3_page() -> ref 1 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG
On 13.08.20 23:36, Julien Grall wrote: Hi Julien > > > On 13/08/2020 19:41, Oleksandr wrote: >> Rebooting domain 2 >> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at >> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- >> (XEN) CPU: 3 >> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c >> (XEN) LR: 0000000000246ef0 >> (XEN) SP: 0000800725eafd80 >> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) >> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: >> 000000000000001f >> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: >> 0000000000400000 >> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: >> 0000000000000020 >> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: >> 0400000000000000 >> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: >> 0000000000000001 >> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: >> 0000ffff9badb3d0 >> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: >> 0000800725e68ec0 >> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: >> 000000005a000ea1 >> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: >> 000000000000001d >> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: >> ffff0000223dbd20 >> (XEN) >> (XEN) VTCR_EL2: 80023558 >> (XEN) VTTBR_EL2: 0002000765f04000 >> (XEN) >> (XEN) SCTLR_EL2: 30cd183d >> (XEN) HCR_EL2: 000000008078663f >> (XEN) TTBR0_EL2: 00000000781c5000 >> (XEN) >> (XEN) ESR_EL2: f2000001 >> (XEN) HPFAR_EL2: 0000000000030010 >> (XEN) FAR_EL2: ffff000008005f00 >> (XEN) >> (XEN) Xen stack trace from sp=0000800725eafd80: >> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 >> 00000000002477c8 >> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 >> 0000000000000002 >> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 >> 0000800725eafeb0 >> (XEN) 0000800725eaff30 0000000060000145 000000000027882c >> 0000800725eafeb0 >> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 >> 0000000000000006 >> (XEN) ffff800000000000 0000000000000002 000000005a000ea1 >> 000000019bc60002 >> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 >> 000000000027c7d8 >> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 >> 0000000000279f98 >> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 >> 0000000000262c58 >> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 >> 0000000000000001 >> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 >> ffff800052e65100 >> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 >> 0000000000000000 >> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 >> 0000000000000002 >> (XEN) 0000000000000001 0000000000000001 0000000000000029 >> 0000ffff9badb3d0 >> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 >> ffff8000460ec210 >> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 >> 0000000000000124 >> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 >> ffff0000223dbd20 >> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 >> 5a000ea160000145 >> (XEN) 0000000060000000 0000000000000000 0000000000000000 >> ffff800052e65100 >> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 >> 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC) >> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR) >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 3: >> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) PSCI cpu off failed for CPU0 err=-3 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> >> >> Either I did something wrong (most likely) or there is an issue with >> page ref-counting in the IOREQ code. I am still trying to understand >> what is going on. > > At a first glance, the implement of set_foreign_p2m_entry() looks fine > to me. > >> Some notes on that: >> 1. I checked that put_page() was called for these pages in >> p2m_put_l3_page() when destroying domain. This happened before >> hvm_free_ioreq_mfn() execution. >> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of >> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU >> couldn't be fully destroyed because of the reference taken. > > This definitely looks like a page reference issue. Would it be > possible to print where the page reference are dropped? A WARN() in > put_page() would help. > > To avoid a lot of message, I tend to use a global variable that store > the page I want to watch. Unfortunately it is unclear from the log who calls put_page() two times. One of the call is made by p2m_put_l3_page() I assume, but who makes a second call? Needs debugging. Rebooting domain 2 root@generic-armv8-xt-dom0:~# (XEN) put_page[1553] 0000000810e60e38 ---> ref = 3 (XEN) Xen WARN at mm.c:1554 (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) PC: 0000000000272a48 put_page+0xa0/0xc4 (XEN) LR: 0000000000272a48 (XEN) SP: 0000800725eaf990 (XEN) CPSR: 80000249 MODE:64-bit EL2h (Hypervisor, handler) (XEN) X0: 0000000000310028 X1: 0000000000000001 X2: 0000800725ca4000 (XEN) X3: 0000000000000020 X4: 0000000000000000 X5: 0000000000000020 (XEN) X6: 0080808080808080 X7: fefefefefefeff09 X8: 7f7f7f7f7f7f7f7f (XEN) X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101 (XEN) X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf6e8 (XEN) X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070 (XEN) X18: 000000000000010f X19: 8040000000000003 X20: 0000000810e60e38 (XEN) X21: 0000800725f07208 X22: 0000800725f07208 X23: 000000000051c041 (XEN) X24: 0000000000000001 X25: 0000800725eafa78 X26: 000000000009c041 (XEN) X27: 0000000000000000 X28: 0000000000000007 FP: ffff00002212bd50 (XEN) (XEN) VTCR_EL2: 80023558 (XEN) VTTBR_EL2: 0002000765f04000 (XEN) (XEN) SCTLR_EL2: 30cd183d (XEN) HCR_EL2: 000000008078663f (XEN) TTBR0_EL2: 00000000781c5000 (XEN) (XEN) ESR_EL2: f2000001 (XEN) HPFAR_EL2: 0000000000030010 (XEN) FAR_EL2: ffff000008005f00 (XEN) (XEN) Xen stack trace from sp=0000800725eaf990: (XEN) 034000051c0417ff 0000000000000000 00000000002743e8 0000800725f07208 (XEN) 034000051c0417ff 0000000000000000 0000800725e6d208 0000800725f07208 (XEN) 0000000000000003 0000000000274910 0000000000000000 ffffffffffffffff (XEN) 0000000000000001 000000000009c041 0000800725f07208 0000000000000000 (XEN) 0000000000000012 0000000000000007 0000000000000001 0000000000000009 (XEN) 0000000000274e00 0000800725f07208 ffffffffffffffff 0000000025eafb0c (XEN) 0000800725f07000 ffff000008f86528 0000000000000000 0000000200000000 (XEN) 00000041000000e0 0000800725e6d000 0000000000000001 0000800725f07208 (XEN) 000000000009c041 0000000000000000 0000800725f07000 ffff000008f86528 (XEN) ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40 ffff800052784380 (XEN) 0000000000275770 0000800725eafb08 0000000000000000 0000000025f07000 (XEN) fffffffffffffffe 0000800725f07000 0000000810e60e38 000000000021a930 (XEN) 0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0 (XEN) 0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528 (XEN) 0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d (XEN) 00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920 (XEN) 00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0 (XEN) ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0 (XEN) 0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8 (XEN) 0000000000000002 0000000000000000 0000000000000000 0000800725eafc70 (XEN) 0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240 (XEN) 000000000026c920 0000800725ecd128 0000000000000002 0000000000000000 (XEN) 0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49 (XEN) 0000000000000002 0000000000000000 0000000000276fc4 0000000000000001 (XEN) 0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0 (XEN) 000000000023f53c 000000000027dc14 0000000000311390 0000000000346008 (XEN) 000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004 (XEN) 0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20 (XEN) ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800 (XEN) 0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001 (XEN) 0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000 (XEN) 0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c (XEN) ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002 (XEN) 000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c (XEN) 0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145 (XEN) 00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80 (XEN) 0000800725eca000 0000000000310280 0000000000000000 0000000000000000 (XEN) 000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d (XEN) 0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0 (XEN) 000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80 (XEN) Xen call trace: (XEN) [<0000000000272a48>] put_page+0xa0/0xc4 (PC) (XEN) [<0000000000272a48>] put_page+0xa0/0xc4 (LR) (XEN) (XEN) put_page[1553] 0000000810e60e38 ---> ref = 2 (XEN) Xen WARN at mm.c:1554 (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) PC: 0000000000272a48 put_page+0xa0/0xc4 (XEN) LR: 0000000000272a48 (XEN) SP: 0000800725eafaf0 (XEN) CPSR: 80000249 MODE:64-bit EL2h (Hypervisor, handler) (XEN) X0: 0000000000310028 X1: 0000000000000000 X2: 0000800725ca4000 (XEN) X3: 0000000000000020 X4: 0000000000000000 X5: 0000000000000021 (XEN) X6: 0080808080808080 X7: fefefefefefeff09 X8: 7f7f7f7f7f7f7f7f (XEN) X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101 (XEN) X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eaf848 (XEN) X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eaf070 (XEN) X18: 000000000000010f X19: 8040000000000002 X20: 0000000810e60e38 (XEN) X21: 0000000810e60e38 X22: 0000000000000000 X23: 0000800725f07000 (XEN) X24: ffff000008f86528 X25: ffff8000501b6540 X26: ffff8000501b6550 (XEN) X27: ffff8000517bdb40 X28: ffff800052784380 FP: ffff00002212bd50 (XEN) (XEN) VTCR_EL2: 80023558 (XEN) VTTBR_EL2: 0002000765f04000 (XEN) (XEN) SCTLR_EL2: 30cd183d (XEN) HCR_EL2: 000000008078663f (XEN) TTBR0_EL2: 00000000781c5000 (XEN) (XEN) ESR_EL2: f2000001 (XEN) HPFAR_EL2: 0000000000030010 (XEN) FAR_EL2: ffff000008005f00 (XEN) (XEN) Xen stack trace from sp=0000800725eafaf0: (XEN) 0000000000000000 0000800725f07000 000000000021a93c 000000000021a930 (XEN) 0000000000236d18 0000800725f1b990 0000800725eafeb0 0000800725eafeb0 (XEN) 0000800725eaff30 00000000a0000145 000000005a000ea1 ffff000008f86528 (XEN) 0000800725f566a0 00000000002a9088 ffff00000814acc0 0000000a55bb542d (XEN) 00000000002789d8 0000800725f1b990 00000000002b6cb8 0000800725f03920 (XEN) 00000000002b6cb8 0000000c9084be29 0000000000310228 0000800725eafbf0 (XEN) ffff00000814f160 0000800725eafbe0 0000000000310228 0000800725eafbf0 (XEN) 0000000000310228 0000000100000002 00000000002b6cb8 0000000000237aa8 (XEN) 0000000000000002 0000000000000000 0000000000000000 0000800725eafc70 (XEN) 0000800725ecd6c8 0000800725ecddf0 00000000000000a0 0000000000000240 (XEN) 000000000026c920 0000800725ecd128 0000000000000002 0000000000000000 (XEN) 0000800725ecd000 0000000000000001 000000000026bfa8 0000000c90bf0b49 (XEN) 0000000000000002 0000000000000000 0000000000276fc4 0000000000000001 (XEN) 0000800725e64000 0000800725e68a60 0000800725e64000 0000800725f566a0 (XEN) 000000000023f53c 000000000027dc14 0000000000311390 0000000000346008 (XEN) 000000000027651c 0000800725eafdd8 0000000000240e44 0000000000000004 (XEN) 0000000000311390 0000000000240e80 0000800725e64000 0000000000240f20 (XEN) ffff000022127ff0 000000000009c041 000000005a000ea1 ffff800011ae9800 (XEN) 0000000000000124 0000000000240f2c fffffffffffffff2 0000000000000001 (XEN) 0000800725e68ec0 0000800725e68ed0 000000000024694c 00008004dc040000 (XEN) 0000800725e68ec0 00008004dc040000 0000000000247c88 0000000000247c7c (XEN) ffffffffffffffea 0000000000000001 ffff800011ae9e80 0000000000000002 (XEN) 000000005a000ea1 0000ffffc52989c0 00000000002463b8 000000000024613c (XEN) 0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145 (XEN) 00000000002789d8 0000800725eafeb0 0000800725eafeb0 01ff00000935de80 (XEN) 0000800725eca000 0000000000310280 0000000000000000 0000000000000000 (XEN) 000000005a000ea1 ffff800011ae9800 0000000000000124 000000000000001d (XEN) 0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0 (XEN) 000000005a000ea1 000000000027a144 0000000000000000 ffff80004e381f80 (XEN) 0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249 (XEN) 000000000000000f ffff00002212bd90 ffff80004e381f80 000000000009c041 (XEN) ffff7dffff000000 0000ffffb603d000 0000000000000000 0000ffffb603c000 (XEN) ffff8000521fdc08 0000000000000200 ffff8000517bdb60 0000000000000000 (XEN) 0000000000000000 0000ffffc529614f 000000000000001b 0000000000000001 (XEN) 000000000000000c 0000ffffb5eaf070 000000000000010f 0000000000000002 (XEN) ffff80004e381f80 0000000000007ff0 ffff7e0000000000 0000000000000001 (XEN) ffff000008f86528 ffff8000501b6540 ffff8000501b6550 ffff8000517bdb40 (XEN) ffff800052784380 ffff00002212bd50 ffff000008537ab8 ffffffffffffffff (XEN) ffff0000080c1790 5a000ea1a0000145 0000000060000000 0000000000000000 (XEN) 0000000000000000 ffff800052784380 ffff00002212bd50 0000ffffb5eaf078 (XEN) Xen call trace: (XEN) [<0000000000272a48>] put_page+0xa0/0xc4 (PC) (XEN) [<0000000000272a48>] put_page+0xa0/0xc4 (LR) (XEN) (XEN) hvm_free_ioreq_mfn[417] ---> ref = 1 (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- (XEN) CPU: 2 (XEN) PC: 0000000000246e2c ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0 (XEN) LR: 0000000000246dcc (XEN) SP: 0000800725eafd70 (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000 (XEN) X6: 0080808080808080 X7: fefefefefefeff09 X8: 7f7f7f7f7f7f7f7f (XEN) X9: 756e6c64513d313b X10: 7f7f7f7f7f7f7f7f X11: 0101010101010101 (XEN) X12: 0000000000000008 X13: 000000000028b7d0 X14: 0000800725eafac8 (XEN) X15: 0000000000000020 X16: 0000000000000000 X17: 0000ffffb5eab3d0 (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000000810e60e48 (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1 (XEN) X24: ffff800011ae9800 X25: 0000000000000124 X26: 000000000000001d (XEN) X27: ffff000008ad1000 X28: ffff800052784380 FP: ffff00002212bd20 (XEN) (XEN) VTCR_EL2: 80023558 (XEN) VTTBR_EL2: 0002000765f04000 (XEN) (XEN) SCTLR_EL2: 30cd183d (XEN) HCR_EL2: 000000008078663f (XEN) TTBR0_EL2: 00000000781c5000 (XEN) (XEN) ESR_EL2: f2000001 (XEN) HPFAR_EL2: 0000000000030010 (XEN) FAR_EL2: ffff000008005f00 (XEN) (XEN) Xen stack trace from sp=0000800725eafd70: (XEN) 0000800725e68ec0 0000800725e68ec0 0000000000246f7c 0000800725e68ec0 (XEN) 00008004dc040000 00000000002476cc ffffffffffffffea 0000000000000001 (XEN) ffff800011ae9e80 0000000000000002 00000000002462bc 000000000024613c (XEN) 0000800725eafeb0 0000800725eafeb0 0000800725eaff30 0000000060000145 (XEN) 00000000002789d8 0000800725fb50a4 00000000ffffffff 0100000000277704 (XEN) 00008004dc040000 0000000000000006 0000000000000000 0000000000310288 (XEN) 0000000000000004 0000000125ec0002 0000ffffc52989a8 0000000000000020 (XEN) 0000000000000004 000000000027c984 000000005a000ea1 0000800725eafeb0 (XEN) 000000005a000ea1 000000000027a144 0000000000000000 ffff800011ae9100 (XEN) 0000800725eaffb8 0000000000262c58 0000000000262c4c 07e0000160000249 (XEN) 0000000000000002 0000000000000001 ffff800011ae9e80 ffff800011ae9e88 (XEN) ffff800011ae9108 ffff800052784380 000000005068e148 0000ffffc5498000 (XEN) ffff80005156ac10 0000000000000000 00e800008f88bf53 0400000000000000 (XEN) ffff7e00013e22c0 0000000000000002 0000000000000001 0000000000000001 (XEN) 0000000000000029 0000ffffb5eab3d0 000000000000010f ffff800011ae9110 (XEN) ffff800011ae9100 ffff800011ae9110 0000000000000001 ffff800011ae9e80 (XEN) ffff800011ae9800 0000000000000124 000000000000001d ffff000008ad1000 (XEN) ffff800052784380 ffff00002212bd20 ffff000008537004 ffffffffffffffff (XEN) ffff0000080c17e4 5a000ea160000145 0000000060000000 0000000000000000 (XEN) 0000000000000000 ffff800052784380 ffff00002212bd20 0000ffffb5eab3dc (XEN) 0000000000000000 0000000000000000 (XEN) Xen call trace: (XEN) [<0000000000246e2c>] ioreq.c#hvm_free_ioreq_mfn+0xbc/0xc0 (PC) (XEN) [<0000000000246dcc>] ioreq.c#hvm_free_ioreq_mfn+0x5c/0xc0 (LR) (XEN) (XEN) (XEN) **************************************** (XEN) Panic on CPU 2: (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds... (XEN) (XEN) **************************************** (XEN) Panic on CPU 0: (XEN) PSCI cpu off failed for CPU0 err=-3 (XEN) **************************************** (XEN) (XEN) Reboot in five seconds...
On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote: > > > Hi > > Sorry for the possible format issue. > > On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote: >> >> >> On 11.08.20 20:50, Julien Grall wrote: >> >> Hi Julien >> >> > >> > >> > On 11/08/2020 18:09, Oleksandr wrote: >> >> >> >> On 05.08.20 12:32, Julien Grall wrote: >> >> >> >> Hi Julien, Stefano >> >> >> >>> >> >>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >> >>>>> index 5fdb6e8..5823f11 100644 >> >>>>> --- a/xen/include/asm-arm/p2m.h >> >>>>> +++ b/xen/include/asm-arm/p2m.h >> >>>>> @@ -385,10 +385,11 @@ static inline int >> >>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >> >>>>> mfn_t mfn) >> >>>>> { >> >>>>> /* >> >>>>> - * NOTE: If this is implemented then proper reference >> >>>>> counting of >> >>>>> - * foreign entries will need to be implemented. >> >>>>> + * XXX: handle properly reference. It looks like the page may >> >>>>> not always >> >>>>> + * belong to d. >> >>>> >> >>>> Just as a reference, and without taking away anything from the >> >>>> comment, >> >>>> I think that QEMU is doing its own internal reference counting for >> >>>> these >> >>>> mappings. >> >>> >> >>> I am not sure how this matters here? We can't really trust the DM to >> >>> do the right thing if it is not running in dom0. >> >>> >> >>> But, IIRC, the problem is some of the pages doesn't belong to do a >> >>> domain, so it is not possible to treat them as foreign mapping (e.g. >> >>> you wouldn't be able to grab a reference). This investigation was >> >>> done a couple of years ago, so this may have changed in recent Xen. >> >> >> >> Well, emulator is going to be used in driver domain, so this TODO >> >> must be resolved. I suspect that the check for a hardware domain in >> >> acquire_resource() which I skipped in a hackish way [1] could be >> >> simply removed once proper reference counting is implemented in Xen, >> >> correct? >> > >> > It depends how you are going to solve it. If you manage to solve it in >> > a generic way, then yes you could resolve. If not (i.e. it is solved >> > in an arch-specific way), we would need to keep the check on arch that >> > are not able to deal with it. See more below. >> > >> >> >> >> Could you please provide some pointers on that problem? Maybe some >> >> questions need to be investigated again? Unfortunately, it is not >> >> completely clear to me the direction to follow... >> >> >> >> *** >> >> I am wondering whether the similar problem exists on x86 as well? >> > >> > It is somewhat different. On Arm, we are able to handle properly >> > foreign mapping (i.e. mapping page from a another domain) as we would >> > grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in >> > xenmem_add_to_physmap()). The reference will then be released when the >> > entry is removed from the P2M (see p2m_free_entry()). >> > >> > If all the pages given to set_foreign_p2m_entry() belong to a domain, >> > then you could use the same approach. >> > >> > However, I remember to run into some issues in some of the cases. I >> > had a quick looked at the caller and I wasn't able to find any use >> > cases that may be an issue. >> > >> > The refcounting in the IOREQ code has changed after XSA-276 (this was >> > found while working on the Arm port). Probably the best way to figure >> > out if it works would be to try it and see if it fails. >> > >> > Note that set_foreign_p2m_entry() doesn't have a parameter for the >> > foreign domain. You would need to add a extra parameter for this. >> > >> >> The FIXME tag (before checking for a hardware domain in >> >> acquire_resource()) in the common code makes me think it is a common >> >> issue. From other side x86's >> >> implementation of set_foreign_p2m_entry() is exists unlike Arm's one >> >> (which returned -EOPNOTSUPP so far). Or these are unrelated? >> > >> > At the moment, x86 doesn't support refcounting for foreign mapping. >> > Hence the reason to restrict them to the hardware domain. >> >> >> Thank you for the pointers! >> >> >> I checked that all pages given to set_foreign_p2m_entry() belonged to a >> domain (at least in my use-case). I noticed two calls for acquiring >> resource at the DomU creation time, the first call was for grant table >> (single gfn) >> and the second for ioreq server which carried 2 gfns (for shared and >> buffered rings I assume). For the test purpose, I passed these gfns to >> get_page_from_gfn() in order to grab references on the pages, after that >> I tried to destroy DomU without calling put_page() for these pages. The >> fact that I couldn't destroy DomU completely (a zombie domain was >> observed) made me think that references were still taken, so worked as >> expected. >> >> >> I implemented a test patch (which uses approach from >> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check >> whether it would work. >> >> >> --- >> xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++ >> xen/common/memory.c | 2 +- >> xen/include/asm-arm/p2m.h | 12 ++---------- >> 3 files changed, 33 insertions(+), 11 deletions(-) >> >> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >> index e9ccba8..7359715 100644 >> --- a/xen/arch/arm/p2m.c >> +++ b/xen/arch/arm/p2m.c >> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d, >> gfn_t gfn, mfn_t mfn, >> return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); >> } >> >> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, >> + unsigned long gfn, mfn_t mfn) >> +{ >> + struct page_info *page; >> + p2m_type_t p2mt; >> + int rc; >> + >> + /* >> + * Take reference to the foreign domain page. Reference will be >> released >> + * in p2m_put_l3_page(). >> + */ >> + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC); >> + if ( !page ) >> + return -EINVAL; >> + >> + if ( p2m_is_ram(p2mt) ) >> + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : >> p2m_map_foreign_ro; >> + else >> + { >> + put_page(page); >> + return -EINVAL; >> + } >> + >> + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt); >> + if ( rc ) >> + put_page(page); >> + >> + return 0; >> +} >> + >> static struct page_info *p2m_allocate_root(void) >> { >> struct page_info *page; >> diff --git a/xen/common/memory.c b/xen/common/memory.c >> index 8d9f0a8..1de1d4f 100644 >> --- a/xen/common/memory.c >> +++ b/xen/common/memory.c >> @@ -1171,7 +1171,7 @@ static int acquire_resource( >> >> for ( i = 0; !rc && i < xmar.nr_frames; i++ ) >> { >> - rc = set_foreign_p2m_entry(currd, gfn_list[i], >> + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], >> _mfn(mfn_list[i])); >> /* rc should be -EIO for any iteration other than the first */ >> if ( rc && i ) >> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >> index 5823f11..53ce373 100644 >> --- a/xen/include/asm-arm/p2m.h >> +++ b/xen/include/asm-arm/p2m.h >> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, >> unsigned int order) >> return gfn_add(gfn, 1UL << order); >> } >> >> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long >> gfn, >> - mfn_t mfn) >> -{ >> - /* >> - * XXX: handle properly reference. It looks like the page may not >> always >> - * belong to d. >> - */ >> - >> - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); >> -} >> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, >> + unsigned long gfn, mfn_t mfn); >> >> /* >> * A vCPU has cache enabled only when the MMU is enabled and data cache >> -- >> 2.7.4 >> >> >> And with that patch applied I was facing a BUG when destroying/rebooting >> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered >> that BUG: >> >> >> Rebooting domain 2 >> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at >> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- >> (XEN) CPU: 3 >> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c >> (XEN) LR: 0000000000246ef0 >> (XEN) SP: 0000800725eafd80 >> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) >> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f >> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000 >> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020 >> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000 >> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001 >> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0 >> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0 >> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1 >> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d >> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20 >> (XEN) >> (XEN) VTCR_EL2: 80023558 >> (XEN) VTTBR_EL2: 0002000765f04000 >> (XEN) >> (XEN) SCTLR_EL2: 30cd183d >> (XEN) HCR_EL2: 000000008078663f >> (XEN) TTBR0_EL2: 00000000781c5000 >> (XEN) >> (XEN) ESR_EL2: f2000001 >> (XEN) HPFAR_EL2: 0000000000030010 >> (XEN) FAR_EL2: ffff000008005f00 >> (XEN) >> (XEN) Xen stack trace from sp=0000800725eafd80: >> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8 >> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002 >> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0 >> (XEN) 0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0 >> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006 >> (XEN) ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002 >> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8 >> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98 >> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58 >> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001 >> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100 >> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000 >> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002 >> (XEN) 0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0 >> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210 >> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124 >> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20 >> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145 >> (XEN) 0000000060000000 0000000000000000 0000000000000000 ffff800052e65100 >> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000 >> (XEN) Xen call trace: >> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC) >> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR) >> (XEN) >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 3: >> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> (XEN) >> (XEN) **************************************** >> (XEN) Panic on CPU 0: >> (XEN) PSCI cpu off failed for CPU0 err=-3 >> (XEN) **************************************** >> (XEN) >> (XEN) Reboot in five seconds... >> >> >> >> Either I did something wrong (most likely) or there is an issue with >> page ref-counting in the IOREQ code. I am still trying to understand >> what is going on. >> Some notes on that: >> 1. I checked that put_page() was called for these pages in >> p2m_put_l3_page() when destroying domain. This happened before >> hvm_free_ioreq_mfn() execution. >> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of >> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't >> be fully destroyed because of the reference taken. > > > I think I understand why BUG is triggered. > > I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?) > > 1. hvm_alloc_ioreq_mfn() -> ref 2 > 2. set_foreign_p2m_entry() -> ref still 2 > 3. p2m_put_l3_page() -> ref 1 > 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG I looked again at your diff. It is actually not doing the right thing. The parameter 'gfn' is a physical frame from 'd' (your current domain) not 'fd'. So you will end up grabbing a reference count on the wrong page. You are quite lucky the 'gfn' is also valid for your foreign domain. But in this case, you already have the MFN in hand. So what you want to do is something like: if (!get_page(mfn_to_page(mfn), fd)) return -EINVAL; /* Map page */
On 14.08.20 01:14, Julien Grall wrote: Hi Julien > On Thu, 13 Aug 2020 at 21:40, Oleksandr Tyshchenko <olekstysh@gmail.com> wrote: >> >> Hi >> >> Sorry for the possible format issue. >> >> On Thu, Aug 13, 2020 at 9:42 PM Oleksandr <olekstysh@gmail.com> wrote: >>> >>> On 11.08.20 20:50, Julien Grall wrote: >>> >>> Hi Julien >>> >>>> >>>> On 11/08/2020 18:09, Oleksandr wrote: >>>>> On 05.08.20 12:32, Julien Grall wrote: >>>>> >>>>> Hi Julien, Stefano >>>>> >>>>>>>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>>>>>>> index 5fdb6e8..5823f11 100644 >>>>>>>> --- a/xen/include/asm-arm/p2m.h >>>>>>>> +++ b/xen/include/asm-arm/p2m.h >>>>>>>> @@ -385,10 +385,11 @@ static inline int >>>>>>>> set_foreign_p2m_entry(struct domain *d, unsigned long gfn, >>>>>>>> mfn_t mfn) >>>>>>>> { >>>>>>>> /* >>>>>>>> - * NOTE: If this is implemented then proper reference >>>>>>>> counting of >>>>>>>> - * foreign entries will need to be implemented. >>>>>>>> + * XXX: handle properly reference. It looks like the page may >>>>>>>> not always >>>>>>>> + * belong to d. >>>>>>> Just as a reference, and without taking away anything from the >>>>>>> comment, >>>>>>> I think that QEMU is doing its own internal reference counting for >>>>>>> these >>>>>>> mappings. >>>>>> I am not sure how this matters here? We can't really trust the DM to >>>>>> do the right thing if it is not running in dom0. >>>>>> >>>>>> But, IIRC, the problem is some of the pages doesn't belong to do a >>>>>> domain, so it is not possible to treat them as foreign mapping (e.g. >>>>>> you wouldn't be able to grab a reference). This investigation was >>>>>> done a couple of years ago, so this may have changed in recent Xen. >>>>> Well, emulator is going to be used in driver domain, so this TODO >>>>> must be resolved. I suspect that the check for a hardware domain in >>>>> acquire_resource() which I skipped in a hackish way [1] could be >>>>> simply removed once proper reference counting is implemented in Xen, >>>>> correct? >>>> It depends how you are going to solve it. If you manage to solve it in >>>> a generic way, then yes you could resolve. If not (i.e. it is solved >>>> in an arch-specific way), we would need to keep the check on arch that >>>> are not able to deal with it. See more below. >>>> >>>>> Could you please provide some pointers on that problem? Maybe some >>>>> questions need to be investigated again? Unfortunately, it is not >>>>> completely clear to me the direction to follow... >>>>> >>>>> *** >>>>> I am wondering whether the similar problem exists on x86 as well? >>>> It is somewhat different. On Arm, we are able to handle properly >>>> foreign mapping (i.e. mapping page from a another domain) as we would >>>> grab a reference on the page (see XENMAPSPACE_gmfn_foreign handling in >>>> xenmem_add_to_physmap()). The reference will then be released when the >>>> entry is removed from the P2M (see p2m_free_entry()). >>>> >>>> If all the pages given to set_foreign_p2m_entry() belong to a domain, >>>> then you could use the same approach. >>>> >>>> However, I remember to run into some issues in some of the cases. I >>>> had a quick looked at the caller and I wasn't able to find any use >>>> cases that may be an issue. >>>> >>>> The refcounting in the IOREQ code has changed after XSA-276 (this was >>>> found while working on the Arm port). Probably the best way to figure >>>> out if it works would be to try it and see if it fails. >>>> >>>> Note that set_foreign_p2m_entry() doesn't have a parameter for the >>>> foreign domain. You would need to add a extra parameter for this. >>>> >>>>> The FIXME tag (before checking for a hardware domain in >>>>> acquire_resource()) in the common code makes me think it is a common >>>>> issue. From other side x86's >>>>> implementation of set_foreign_p2m_entry() is exists unlike Arm's one >>>>> (which returned -EOPNOTSUPP so far). Or these are unrelated? >>>> At the moment, x86 doesn't support refcounting for foreign mapping. >>>> Hence the reason to restrict them to the hardware domain. >>> >>> Thank you for the pointers! >>> >>> >>> I checked that all pages given to set_foreign_p2m_entry() belonged to a >>> domain (at least in my use-case). I noticed two calls for acquiring >>> resource at the DomU creation time, the first call was for grant table >>> (single gfn) >>> and the second for ioreq server which carried 2 gfns (for shared and >>> buffered rings I assume). For the test purpose, I passed these gfns to >>> get_page_from_gfn() in order to grab references on the pages, after that >>> I tried to destroy DomU without calling put_page() for these pages. The >>> fact that I couldn't destroy DomU completely (a zombie domain was >>> observed) made me think that references were still taken, so worked as >>> expected. >>> >>> >>> I implemented a test patch (which uses approach from >>> xenmem_add_to_physmap_one() for XENMAPSPACE_gmfn_foreign case) to check >>> whether it would work. >>> >>> >>> --- >>> xen/arch/arm/p2m.c | 30 ++++++++++++++++++++++++++++++ >>> xen/common/memory.c | 2 +- >>> xen/include/asm-arm/p2m.h | 12 ++---------- >>> 3 files changed, 33 insertions(+), 11 deletions(-) >>> >>> diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c >>> index e9ccba8..7359715 100644 >>> --- a/xen/arch/arm/p2m.c >>> +++ b/xen/arch/arm/p2m.c >>> @@ -1385,6 +1385,36 @@ int guest_physmap_remove_page(struct domain *d, >>> gfn_t gfn, mfn_t mfn, >>> return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); >>> } >>> >>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, >>> + unsigned long gfn, mfn_t mfn) >>> +{ >>> + struct page_info *page; >>> + p2m_type_t p2mt; >>> + int rc; >>> + >>> + /* >>> + * Take reference to the foreign domain page. Reference will be >>> released >>> + * in p2m_put_l3_page(). >>> + */ >>> + page = get_page_from_gfn(fd, gfn, &p2mt, P2M_ALLOC); >>> + if ( !page ) >>> + return -EINVAL; >>> + >>> + if ( p2m_is_ram(p2mt) ) >>> + p2mt = (p2mt == p2m_ram_rw) ? p2m_map_foreign_rw : >>> p2m_map_foreign_ro; >>> + else >>> + { >>> + put_page(page); >>> + return -EINVAL; >>> + } >>> + >>> + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2mt); >>> + if ( rc ) >>> + put_page(page); >>> + >>> + return 0; >>> +} >>> + >>> static struct page_info *p2m_allocate_root(void) >>> { >>> struct page_info *page; >>> diff --git a/xen/common/memory.c b/xen/common/memory.c >>> index 8d9f0a8..1de1d4f 100644 >>> --- a/xen/common/memory.c >>> +++ b/xen/common/memory.c >>> @@ -1171,7 +1171,7 @@ static int acquire_resource( >>> >>> for ( i = 0; !rc && i < xmar.nr_frames; i++ ) >>> { >>> - rc = set_foreign_p2m_entry(currd, gfn_list[i], >>> + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], >>> _mfn(mfn_list[i])); >>> /* rc should be -EIO for any iteration other than the first */ >>> if ( rc && i ) >>> diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h >>> index 5823f11..53ce373 100644 >>> --- a/xen/include/asm-arm/p2m.h >>> +++ b/xen/include/asm-arm/p2m.h >>> @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, >>> unsigned int order) >>> return gfn_add(gfn, 1UL << order); >>> } >>> >>> -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long >>> gfn, >>> - mfn_t mfn) >>> -{ >>> - /* >>> - * XXX: handle properly reference. It looks like the page may not >>> always >>> - * belong to d. >>> - */ >>> - >>> - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); >>> -} >>> +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, >>> + unsigned long gfn, mfn_t mfn); >>> >>> /* >>> * A vCPU has cache enabled only when the MMU is enabled and data cache >>> -- >>> 2.7.4 >>> >>> >>> And with that patch applied I was facing a BUG when destroying/rebooting >>> DomU. The call of put_page_alloc_ref() in hvm_free_ioreq_mfn() triggered >>> that BUG: >>> >>> >>> Rebooting domain 2 >>> root@generic-armv8-xt-dom0:~# (XEN) Xen BUG at >>> ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >>> (XEN) ----[ Xen-4.14.0 arm64 debug=y Not tainted ]---- >>> (XEN) CPU: 3 >>> (XEN) PC: 0000000000246f28 ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c >>> (XEN) LR: 0000000000246ef0 >>> (XEN) SP: 0000800725eafd80 >>> (XEN) CPSR: 60000249 MODE:64-bit EL2h (Hypervisor, handler) >>> (XEN) X0: 0000000000000001 X1: 403fffffffffffff X2: 000000000000001f >>> (XEN) X3: 0000000080000000 X4: 0000000000000000 X5: 0000000000400000 >>> (XEN) X6: 0000800725eafe24 X7: 0000ffffd1ef3e08 X8: 0000000000000020 >>> (XEN) X9: 0000000000000000 X10: 00e800008ecebf53 X11: 0400000000000000 >>> (XEN) X12: ffff7e00013b3ac0 X13: 0000000000000002 X14: 0000000000000001 >>> (XEN) X15: 0000000000000001 X16: 0000000000000029 X17: 0000ffff9badb3d0 >>> (XEN) X18: 000000000000010f X19: 0000000810e60e38 X20: 0000800725e68ec0 >>> (XEN) X21: 0000000000000000 X22: 00008004dc0404a0 X23: 000000005a000ea1 >>> (XEN) X24: ffff8000460ec280 X25: 0000000000000124 X26: 000000000000001d >>> (XEN) X27: ffff000008ad1000 X28: ffff800052e65100 FP: ffff0000223dbd20 >>> (XEN) >>> (XEN) VTCR_EL2: 80023558 >>> (XEN) VTTBR_EL2: 0002000765f04000 >>> (XEN) >>> (XEN) SCTLR_EL2: 30cd183d >>> (XEN) HCR_EL2: 000000008078663f >>> (XEN) TTBR0_EL2: 00000000781c5000 >>> (XEN) >>> (XEN) ESR_EL2: f2000001 >>> (XEN) HPFAR_EL2: 0000000000030010 >>> (XEN) FAR_EL2: ffff000008005f00 >>> (XEN) >>> (XEN) Xen stack trace from sp=0000800725eafd80: >>> (XEN) 0000800725e68ec0 0000000000247078 00008004dc040000 00000000002477c8 >>> (XEN) ffffffffffffffea 0000000000000001 ffff8000460ec500 0000000000000002 >>> (XEN) 000000000024645c 00000000002462dc 0000800725eafeb0 0000800725eafeb0 >>> (XEN) 0000800725eaff30 0000000060000145 000000000027882c 0000800725eafeb0 >>> (XEN) 0000800725eafeb0 01ff00000935de80 00008004dc040000 0000000000000006 >>> (XEN) ffff800000000000 0000000000000002 000000005a000ea1 000000019bc60002 >>> (XEN) 0000ffffd1ef3e08 0000000000000020 0000000000000004 000000000027c7d8 >>> (XEN) 000000005a000ea1 0000800725eafeb0 000000005a000ea1 0000000000279f98 >>> (XEN) 0000000000000000 ffff8000460ec200 0000800725eaffb8 0000000000262c58 >>> (XEN) 0000000000262c4c 07e0000160000249 0000000000000002 0000000000000001 >>> (XEN) ffff8000460ec500 ffff8000460ec508 ffff8000460ec208 ffff800052e65100 >>> (XEN) 000000005060b478 0000ffffd20f3000 ffff7e00013c77e0 0000000000000000 >>> (XEN) 00e800008ecebf53 0400000000000000 ffff7e00013b3ac0 0000000000000002 >>> (XEN) 0000000000000001 0000000000000001 0000000000000029 0000ffff9badb3d0 >>> (XEN) 000000000000010f ffff8000460ec210 ffff8000460ec200 ffff8000460ec210 >>> (XEN) 0000000000000001 ffff8000460ec500 ffff8000460ec280 0000000000000124 >>> (XEN) 000000000000001d ffff000008ad1000 ffff800052e65100 ffff0000223dbd20 >>> (XEN) ffff000008537004 ffffffffffffffff ffff0000080c17e4 5a000ea160000145 >>> (XEN) 0000000060000000 0000000000000000 0000000000000000 ffff800052e65100 >>> (XEN) ffff0000223dbd20 0000ffff9badb3dc 0000000000000000 0000000000000000 >>> (XEN) Xen call trace: >>> (XEN) [<0000000000246f28>] ioreq.c#hvm_free_ioreq_mfn+0x68/0x6c (PC) >>> (XEN) [<0000000000246ef0>] ioreq.c#hvm_free_ioreq_mfn+0x30/0x6c (LR) >>> (XEN) >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 3: >>> (XEN) Xen BUG at ...tAUTOINC+bb71237a55-r0/git/xen/include/xen/mm.h:683 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Reboot in five seconds... >>> (XEN) >>> (XEN) **************************************** >>> (XEN) Panic on CPU 0: >>> (XEN) PSCI cpu off failed for CPU0 err=-3 >>> (XEN) **************************************** >>> (XEN) >>> (XEN) Reboot in five seconds... >>> >>> >>> >>> Either I did something wrong (most likely) or there is an issue with >>> page ref-counting in the IOREQ code. I am still trying to understand >>> what is going on. >>> Some notes on that: >>> 1. I checked that put_page() was called for these pages in >>> p2m_put_l3_page() when destroying domain. This happened before >>> hvm_free_ioreq_mfn() execution. >>> 2. There was no BUG detected if I passed "p2m_ram_rw" instead of >>> "p2m_map_foreign_rw" in guest_physmap_add_entry(), but the DomU couldn't >>> be fully destroyed because of the reference taken. >> >> I think I understand why BUG is triggered. >> >> I checked "page->count_info & PGC_count_mask" and noticed that get_page_from_gfn() doesn't seem to increase ref counter (but it should?) >> >> 1. hvm_alloc_ioreq_mfn() -> ref 2 >> 2. set_foreign_p2m_entry() -> ref still 2 >> 3. p2m_put_l3_page() -> ref 1 >> 4. hvm_free_ioreq_mfn() calls put_page_alloc_ref() with ref 1 which triggers BUG > I looked again at your diff. It is actually not doing the right thing. > The parameter 'gfn' is a physical frame from 'd' (your current domain) > not 'fd'. > So you will end up grabbing a reference count on the wrong page. You > are quite lucky the 'gfn' is also valid for your foreign domain. > > But in this case, you already have the MFN in hand. So what you want > to do is something like: > > if (!get_page(mfn_to_page(mfn), fd)) > return -EINVAL; > > /* Map page */ Indeed, thank you for the pointer. Now it is clear for me what went wrong. With the proposed change I didn't face any issues in my setup! BTW, below the IOREQ server page life-cycle which looks correct. create domain: (XEN) 0000000810e60e38(0->1): hvm_alloc_ioreq_mfn() -> alloc_domheap_page() (XEN) 0000000810e60e38(1->2): hvm_alloc_ioreq_mfn() -> get_page_and_type() (XEN) 0000000810e60e38(2->3): acquire_resource() -> set_foreign_p2m_entry() -> get_page() reboot domain: (XEN) 0000000810e60e38(3->4): do_memory_op() -> get_page_from_gfn() (XEN) 0000000810e60e38(4->3): do_memory_op() -> guest_physmap_remove_page() -> p2m_put_l3_page() -> put_page() (XEN) 0000000810e60e38(3->2): do_memory_op() -> put_page() (XEN) 0000000810e60e38(2->1): hvm_free_ioreq_mfn() -> put_page_alloc_ref() (XEN) 0000000810e60e38(1->0): hvm_free_ioreq_mfn() -> put_page_and_type() --- xen/arch/arm/p2m.c | 16 ++++++++++++++++ xen/common/memory.c | 2 +- xen/include/asm-arm/p2m.h | 12 ++---------- 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c index e9ccba8..4c99dd6 100644 --- a/xen/arch/arm/p2m.c +++ b/xen/arch/arm/p2m.c @@ -1385,6 +1385,22 @@ int guest_physmap_remove_page(struct domain *d, gfn_t gfn, mfn_t mfn, return p2m_remove_mapping(d, gfn, (1 << page_order), mfn); } +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn) +{ + struct page_info *page = mfn_to_page(mfn); + int rc; + + if ( !get_page(page, fd) ) + return -EINVAL; + + rc = guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_map_foreign_rw); + if ( rc ) + put_page(page); + + return 0; +} + static struct page_info *p2m_allocate_root(void) { struct page_info *page; diff --git a/xen/common/memory.c b/xen/common/memory.c index 8d9f0a8..1de1d4f 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -1171,7 +1171,7 @@ static int acquire_resource( for ( i = 0; !rc && i < xmar.nr_frames; i++ ) { - rc = set_foreign_p2m_entry(currd, gfn_list[i], + rc = set_foreign_p2m_entry(currd, d, gfn_list[i], _mfn(mfn_list[i])); /* rc should be -EIO for any iteration other than the first */ if ( rc && i ) diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 5823f11..53ce373 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -381,16 +381,8 @@ static inline gfn_t gfn_next_boundary(gfn_t gfn, unsigned int order) return gfn_add(gfn, 1UL << order); } -static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, - mfn_t mfn) -{ - /* - * XXX: handle properly reference. It looks like the page may not always - * belong to d. - */ - - return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); -} +int set_foreign_p2m_entry(struct domain *d, struct domain *fd, + unsigned long gfn, mfn_t mfn); /* * A vCPU has cache enabled only when the MMU is enabled and data cache
Hi, On 04/08/2020 15:01, Julien Grall wrote: > On 04/08/2020 08:49, Paul Durrant wrote: >>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c >>> index 931404c..b5fc066 100644 >>> --- a/tools/libxc/xc_dom_arm.c >>> +++ b/tools/libxc/xc_dom_arm.c >>> @@ -26,11 +26,19 @@ >>> #include "xg_private.h" >>> #include "xc_dom.h" >>> >>> -#define NR_MAGIC_PAGES 4 >>> + >>> #define CONSOLE_PFN_OFFSET 0 >>> #define XENSTORE_PFN_OFFSET 1 >>> #define MEMACCESS_PFN_OFFSET 2 >>> #define VUART_PFN_OFFSET 3 >>> +#define IOREQ_SERVER_PFN_OFFSET 4 >>> + >>> +#define NR_IOREQ_SERVER_PAGES 8 >>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) >>> + >>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) >>> + >>> +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) >> >> Why introduce 'magic pages' for Arm? It's quite a horrible hack that >> we have begun to do away with by adding resource mapping. > > This would require us to mandate at least Linux 4.17 in a domain that > will run an IOREQ server. If we don't mandate this, the minimum version > would be 4.10 where DM OP was introduced. > > Because of XSA-300, we could technically not safely run an IOREQ server > with existing Linux. So it is probably OK to enforce the use of the > acquire interface. One more thing. We are using atomic operations on the IOREQ pages. As our implementation is based on LL/SC instructions so far, we have mitigation in place to prevent a domain DoS Xen. However, this relies on the page to be mapped in a single domain at the time. AFAICT, with the legacy interface, the pages will be mapped in both the target and the emulator. So this would defeat the mitigation we have in place. Because the legacy interface is relying on foreign mapping, the page has to be mapped in the target P2M. It might be possible to restrict the access for the target by setting the p2m bits r, w to 0. This would still allow the foreign mapping to work as we only check the p2m type during mapping. Anyway, I think we agreed that we want to avoid to introduce the legacy interface. But I wanted to answer just for completeness and keep a record of potential pitfalls with the legacy interface on Arm. > > Note that I haven't yet looked at the rest of the series. So I am not > sure if there is more work necessary to enable it. > > Cheers, >
On 15.08.20 20:56, Julien Grall wrote: Hi Julien. > Hi, > > On 04/08/2020 15:01, Julien Grall wrote: >> On 04/08/2020 08:49, Paul Durrant wrote: >>>> diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c >>>> index 931404c..b5fc066 100644 >>>> --- a/tools/libxc/xc_dom_arm.c >>>> +++ b/tools/libxc/xc_dom_arm.c >>>> @@ -26,11 +26,19 @@ >>>> #include "xg_private.h" >>>> #include "xc_dom.h" >>>> >>>> -#define NR_MAGIC_PAGES 4 >>>> + >>>> #define CONSOLE_PFN_OFFSET 0 >>>> #define XENSTORE_PFN_OFFSET 1 >>>> #define MEMACCESS_PFN_OFFSET 2 >>>> #define VUART_PFN_OFFSET 3 >>>> +#define IOREQ_SERVER_PFN_OFFSET 4 >>>> + >>>> +#define NR_IOREQ_SERVER_PAGES 8 >>>> +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) >>>> + >>>> +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) >>>> + >>>> +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) >>> >>> Why introduce 'magic pages' for Arm? It's quite a horrible hack that >>> we have begun to do away with by adding resource mapping. >> >> This would require us to mandate at least Linux 4.17 in a domain that >> will run an IOREQ server. If we don't mandate this, the minimum >> version would be 4.10 where DM OP was introduced. >> >> Because of XSA-300, we could technically not safely run an IOREQ >> server with existing Linux. So it is probably OK to enforce the use >> of the acquire interface. > One more thing. We are using atomic operations on the IOREQ pages. As > our implementation is based on LL/SC instructions so far, we have > mitigation in place to prevent a domain DoS Xen. However, this relies > on the page to be mapped in a single domain at the time. > > AFAICT, with the legacy interface, the pages will be mapped in both > the target and the emulator. So this would defeat the mitigation we > have in place. > > Because the legacy interface is relying on foreign mapping, the page > has to be mapped in the target P2M. It might be possible to restrict > the access for the target by setting the p2m bits r, w to 0. This > would still allow the foreign mapping to work as we only check the p2m > type during mapping. > > Anyway, I think we agreed that we want to avoid to introduce the > legacy interface. But I wanted to answer just for completeness and > keep a record of potential pitfalls with the legacy interface on Arm. ok, the HVMOP plumbing on Arm will be dropped for non-RFC series. It seems that xenforeignmemory_map_resource() does needed things. Of course, the corresponding Linux patch to support IOCTL_PRIVCMD_MMAP_RESOURCE was cherry-picked for that purpose (I am currently using v4.14). Thank you.
Hello all. I would like to clarify some questions based on the comments for the patch series. I put them together (please see below). On 06.08.20 14:29, Jan Beulich wrote: > On 06.08.2020 13:08, Julien Grall wrote: >> On 05/08/2020 20:30, Oleksandr wrote: >>> I was thinking how to split handle_hvm_io_completion() >>> gracefully but I failed find a good solution for that, so decided to add >>> two stubs (msix_write_completion and handle_realmode_completion) on Arm. >>> I could add a comment describing why they are here if appropriate. But >>> if you think they shouldn't be called from the common code in any way, I >>> will try to split it. >> I am not entirely sure what msix_write_completion is meant to do on x86. >> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help? > Due to the split brain model of handling PCI pass-through (between > Xen and qemu), a guest writing to an MSI-X entry needs this write > handed to qemu, and upon completion of the write there Xen also > needs to take some extra action. 1. Regarding common handle_hvm_io_completion() implementation: Could msix_write_completion() be called later on so we would be able to split handle_hvm_io_completion() gracefully or could we call it from handle_mmio()? The reason I am asking is to avoid calling it from the common code in order to avoid introducing stub on Arm which is not going to be ever implemented (if msix_write_completion() is purely x86 material). For the non-RFC patch series I moved handle_realmode_completion to the x86 code and now my local implementation looks like: bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; struct hvm_ioreq_server *s; struct hvm_ioreq_vcpu *sv; enum hvm_io_completion io_completion; if ( has_vpci(d) && vpci_process_pending(v) ) { raise_softirq(SCHEDULE_SOFTIRQ); return false; } sv = get_pending_vcpu(v, &s); if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) return false; vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? STATE_IORESP_READY : STATE_IOREQ_NONE; msix_write_completion(v); vcpu_end_shutdown_deferral(v); io_completion = vio->io_completion; vio->io_completion = HVMIO_no_completion; switch ( io_completion ) { case HVMIO_no_completion: break; case HVMIO_mmio_completion: return handle_mmio(); case HVMIO_pio_completion: return handle_pio(vio->io_req.addr, vio->io_req.size, vio->io_req.dir); default: return arch_handle_hvm_io_completion(io_completion); } return true; } 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio(): There was a request to consider renaming that function which is called from the common code in the context of IOREQ series. The point is, that the name of the function is pretty generic and can be confusing on Arm (we already have a try_handle_mmio()). I noticed that except common code that function is called from a few places on x86 (I am not even sure whether all of them are IOREQ related). The question is would x86 folks be happy with such renaming? Alternatively I could provide the following in include/asm-arm/hvm/ioreq.h without renaming it in the common code and still using non-confusing variant on Arm (however I am not sure whether this is a good idea): #define handle_mmio ioreq_handle_complete_mmio 3. Regarding common IOREQ/DM stuff location: Currently it is located at: common/hvm/... include/xen/hvm/... For the non-RFC patch series I am going to avoid using "hvm" name (which is internal detail of arch specific code and shouldn't be exposed to the common code). The question is whether I should use another directory name (probably ioreq?) or just place them in common root directory? Could you please share your opinion?
On 20.08.2020 20:30, Oleksandr wrote: > On 06.08.20 14:29, Jan Beulich wrote: >> On 06.08.2020 13:08, Julien Grall wrote: >>> On 05/08/2020 20:30, Oleksandr wrote: >>>> I was thinking how to split handle_hvm_io_completion() >>>> gracefully but I failed find a good solution for that, so decided to add >>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm. >>>> I could add a comment describing why they are here if appropriate. But >>>> if you think they shouldn't be called from the common code in any way, I >>>> will try to split it. >>> I am not entirely sure what msix_write_completion is meant to do on x86. >>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help? >> Due to the split brain model of handling PCI pass-through (between >> Xen and qemu), a guest writing to an MSI-X entry needs this write >> handed to qemu, and upon completion of the write there Xen also >> needs to take some extra action. > > > 1. Regarding common handle_hvm_io_completion() implementation: > > Could msix_write_completion() be called later on so we would be able to > split handle_hvm_io_completion() gracefully or could we call it from > handle_mmio()? > The reason I am asking is to avoid calling it from the common code in > order to avoid introducing stub on Arm which is not going to be ever > implemented > (if msix_write_completion() is purely x86 material). I'm unconvinced of this last fact, but as with about everything it is quite certainly possible to call the function later. The question is how ugly this would become, as this may involve redundant conditionals (i.e. ones which need to remain in sync) and/or extra propagation of state. > For the non-RFC patch series I moved handle_realmode_completion to the > x86 code and now my local implementation looks like: > > bool handle_hvm_io_completion(struct vcpu *v) > { > struct domain *d = v->domain; > struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; > struct hvm_ioreq_server *s; > struct hvm_ioreq_vcpu *sv; > enum hvm_io_completion io_completion; > > if ( has_vpci(d) && vpci_process_pending(v) ) > { > raise_softirq(SCHEDULE_SOFTIRQ); > return false; > } > > sv = get_pending_vcpu(v, &s); > if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) > return false; > > vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? > STATE_IORESP_READY : STATE_IOREQ_NONE; > > msix_write_completion(v); > vcpu_end_shutdown_deferral(v); > > io_completion = vio->io_completion; > vio->io_completion = HVMIO_no_completion; > > switch ( io_completion ) > { > case HVMIO_no_completion: > break; > > case HVMIO_mmio_completion: > return handle_mmio(); > > case HVMIO_pio_completion: > return handle_pio(vio->io_req.addr, vio->io_req.size, > vio->io_req.dir); > > default: > return arch_handle_hvm_io_completion(io_completion); > } > > return true; > } > > 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio(): > > There was a request to consider renaming that function which is called > from the common code in the context of IOREQ series. > The point is, that the name of the function is pretty generic and can be > confusing on Arm (we already have a try_handle_mmio()). > I noticed that except common code that function is called from a few > places on x86 (I am not even sure whether all of them are IOREQ related). > The question is would x86 folks be happy with such renaming? handle_mmio() without any parameters and used for a varying set of purposes was imo never a good choice of name. The situation has improved, but can do with further improvement. The new name, if to be used for truly renaming the function need to fit all uses though. As such, I don't think ioreq_handle_complete_mmio() is an appropriate name. > Alternatively I could provide the following in > include/asm-arm/hvm/ioreq.h without renaming it in the common code and > still using non-confusing variant on Arm (however I am not sure whether > this is a good idea): > > #define handle_mmio ioreq_handle_complete_mmio If anything, for x86 it ought to be the other way around, at which point you wouldn't need any alias #define on Arm. > 3. Regarding common IOREQ/DM stuff location: > > Currently it is located at: > common/hvm/... > include/xen/hvm/... > > For the non-RFC patch series I am going to avoid using "hvm" name (which > is internal detail of arch specific code and shouldn't be exposed to the > common code). > The question is whether I should use another directory name (probably > ioreq?) or just place them in common root directory? I think there are arguments for and against hvm/. I'm not of the opinion that ioreq/ would be a good name, so if hvm/ was to be ruled out, I think the file(s) shouldn't go into separate subdirs at all. Jan
On 21.08.20 09:16, Jan Beulich wrote: Hi Jan. Thank you for your answer. > On 20.08.2020 20:30, Oleksandr wrote: >> On 06.08.20 14:29, Jan Beulich wrote: >>> On 06.08.2020 13:08, Julien Grall wrote: >>>> On 05/08/2020 20:30, Oleksandr wrote: >>>>> I was thinking how to split handle_hvm_io_completion() >>>>> gracefully but I failed find a good solution for that, so decided to add >>>>> two stubs (msix_write_completion and handle_realmode_completion) on Arm. >>>>> I could add a comment describing why they are here if appropriate. But >>>>> if you think they shouldn't be called from the common code in any way, I >>>>> will try to split it. >>>> I am not entirely sure what msix_write_completion is meant to do on x86. >>>> Is it dealing with virtual MSIx? Maybe Jan, Roger or Paul could help? >>> Due to the split brain model of handling PCI pass-through (between >>> Xen and qemu), a guest writing to an MSI-X entry needs this write >>> handed to qemu, and upon completion of the write there Xen also >>> needs to take some extra action. >> >> 1. Regarding common handle_hvm_io_completion() implementation: >> >> Could msix_write_completion() be called later on so we would be able to >> split handle_hvm_io_completion() gracefully or could we call it from >> handle_mmio()? >> The reason I am asking is to avoid calling it from the common code in >> order to avoid introducing stub on Arm which is not going to be ever >> implemented >> (if msix_write_completion() is purely x86 material). > I'm unconvinced of this last fact, but as with about everything it is > quite certainly possible to call the function later. The question is > how ugly this would become, as this may involve redundant conditionals > (i.e. ones which need to remain in sync) and/or extra propagation of > state. I understand. Would it be better to make handle_hvm_io_completion() per arch then? This would avoid using various stubs on Arm (we could get rid of has_vpci, msix_write_completion, handle_pio, arch_handle_hvm_io_completion, etc) and avoid renaming of handle_mmio(). Julien what is your opinion on that? For example the Arm implementation would look like: bool handle_hvm_io_completion(struct vcpu *v) { struct domain *d = v->domain; struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; struct hvm_ioreq_server *s; struct hvm_ioreq_vcpu *sv; enum hvm_io_completion io_completion; sv = get_pending_vcpu(v, &s); if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) ) return false; vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ? STATE_IORESP_READY : STATE_IOREQ_NONE; vcpu_end_shutdown_deferral(v); io_completion = vio->io_completion; vio->io_completion = HVMIO_no_completion; switch ( io_completion ) { case HVMIO_no_completion: break; case HVMIO_mmio_completion: return ioreq_handle_complete_mmio(); default: ASSERT_UNREACHABLE(); break; } return true; } >> >> 2. Regarding renaming common handle_mmio() to ioreq_handle_complete_mmio(): >> >> There was a request to consider renaming that function which is called >> from the common code in the context of IOREQ series. >> The point is, that the name of the function is pretty generic and can be >> confusing on Arm (we already have a try_handle_mmio()). >> I noticed that except common code that function is called from a few >> places on x86 (I am not even sure whether all of them are IOREQ related). >> The question is would x86 folks be happy with such renaming? > handle_mmio() without any parameters and used for a varying set > of purposes was imo never a good choice of name. The situation > has improved, but can do with further improvement. The new name, > if to be used for truly renaming the function need to fit all > uses though. As such, I don't think ioreq_handle_complete_mmio() > is an appropriate name. > >> Alternatively I could provide the following in >> include/asm-arm/hvm/ioreq.h without renaming it in the common code and >> still using non-confusing variant on Arm (however I am not sure whether >> this is a good idea): >> >> #define handle_mmio ioreq_handle_complete_mmio > If anything, for x86 it ought to be the other way around, at > which point you wouldn't need any alias #define on Arm. But could this approach be accepted? I think it would be the easiest way to avoid confusing on Arm and avoid renaming that function in the whole x86 code. > >> 3. Regarding common IOREQ/DM stuff location: >> >> Currently it is located at: >> common/hvm/... >> include/xen/hvm/... >> >> For the non-RFC patch series I am going to avoid using "hvm" name (which >> is internal detail of arch specific code and shouldn't be exposed to the >> common code). >> The question is whether I should use another directory name (probably >> ioreq?) or just place them in common root directory? > I think there are arguments for and against hvm/. I'm not of > the opinion that ioreq/ would be a good name, so if hvm/ was to > be ruled out, I think the file(s) shouldn't go into separate > subdirs at all. Got it.
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index 931404c..b5fc066 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -26,11 +26,19 @@ #include "xg_private.h" #include "xc_dom.h" -#define NR_MAGIC_PAGES 4 + #define CONSOLE_PFN_OFFSET 0 #define XENSTORE_PFN_OFFSET 1 #define MEMACCESS_PFN_OFFSET 2 #define VUART_PFN_OFFSET 3 +#define IOREQ_SERVER_PFN_OFFSET 4 + +#define NR_IOREQ_SERVER_PAGES 8 +#define NR_MAGIC_PAGES (4 + NR_IOREQ_SERVER_PAGES) + +#define GUEST_MAGIC_BASE_PFN (GUEST_MAGIC_BASE >> XC_PAGE_SHIFT) + +#define special_pfn(x) (GUEST_MAGIC_BASE_PFN + (x)) #define LPAE_SHIFT 9 @@ -51,7 +59,7 @@ const char *xc_domain_get_native_protocol(xc_interface *xch, static int alloc_magic_pages(struct xc_dom_image *dom) { int rc, i; - const xen_pfn_t base = GUEST_MAGIC_BASE >> XC_PAGE_SHIFT; + const xen_pfn_t base = special_pfn(0); xen_pfn_t p2m[NR_MAGIC_PAGES]; BUILD_BUG_ON(NR_MAGIC_PAGES > GUEST_MAGIC_SIZE >> XC_PAGE_SHIFT); @@ -71,10 +79,9 @@ static int alloc_magic_pages(struct xc_dom_image *dom) dom->xenstore_pfn = base + XENSTORE_PFN_OFFSET; dom->vuart_gfn = base + VUART_PFN_OFFSET; - xc_clear_domain_page(dom->xch, dom->guest_domid, dom->console_pfn); - xc_clear_domain_page(dom->xch, dom->guest_domid, dom->xenstore_pfn); - xc_clear_domain_page(dom->xch, dom->guest_domid, base + MEMACCESS_PFN_OFFSET); - xc_clear_domain_page(dom->xch, dom->guest_domid, dom->vuart_gfn); + /* XXX: Check return */ + xc_clear_domain_pages(dom->xch, dom->guest_domid, special_pfn(0), + NR_MAGIC_PAGES); xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_CONSOLE_PFN, dom->console_pfn); @@ -88,6 +95,12 @@ static int alloc_magic_pages(struct xc_dom_image *dom) xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_STORE_EVTCHN, dom->xenstore_evtchn); + /* Tell the domain where the pages are and how many there are. */ + xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_IOREQ_SERVER_PFN, + special_pfn(IOREQ_SERVER_PFN_OFFSET)); + xc_hvm_param_set(dom->xch, dom->guest_domid, HVM_PARAM_NR_IOREQ_SERVER_PAGES, + NR_IOREQ_SERVER_PAGES); + return 0; } diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig index 2777388..6b8a969 100644 --- a/xen/arch/arm/Kconfig +++ b/xen/arch/arm/Kconfig @@ -13,6 +13,7 @@ config ARM_64 def_bool y depends on 64BIT select HAS_FAST_MULTIPLY + select IOREQ_SERVER config ARM def_bool y diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile index 7e82b21..617fa3e 100644 --- a/xen/arch/arm/Makefile +++ b/xen/arch/arm/Makefile @@ -13,6 +13,7 @@ obj-y += cpuerrata.o obj-y += cpufeature.o obj-y += decode.o obj-y += device.o +obj-$(CONFIG_IOREQ_SERVER) += dm.o obj-y += domain.o obj-y += domain_build.init.o obj-y += domctl.o @@ -27,6 +28,7 @@ obj-y += guest_atomics.o obj-y += guest_walk.o obj-y += hvm.o obj-y += io.o +obj-$(CONFIG_IOREQ_SERVER) += ioreq.o obj-y += irq.o obj-y += kernel.init.o obj-$(CONFIG_LIVEPATCH) += livepatch.o diff --git a/xen/arch/arm/dm.c b/xen/arch/arm/dm.c new file mode 100644 index 0000000..2437099 --- /dev/null +++ b/xen/arch/arm/dm.c @@ -0,0 +1,34 @@ +/* + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/hypercall.h> +#include <asm/vgic.h> + +int arch_dm_op(struct xen_dm_op *op, struct domain *d, + const struct dmop_args *op_args, bool *const_op) +{ + return -EOPNOTSUPP; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 3116932..658eec0 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -12,6 +12,7 @@ #include <xen/bitops.h> #include <xen/errno.h> #include <xen/grant_table.h> +#include <xen/hvm/ioreq.h> #include <xen/hypercall.h> #include <xen/init.h> #include <xen/lib.h> @@ -681,6 +682,10 @@ int arch_domain_create(struct domain *d, ASSERT(config != NULL); +#ifdef CONFIG_IOREQ_SERVER + hvm_ioreq_init(d); +#endif + /* p2m_init relies on some value initialized by the IOMMU subsystem */ if ( (rc = iommu_domain_init(d, config->iommu_opts)) != 0 ) goto fail; @@ -999,6 +1004,10 @@ int domain_relinquish_resources(struct domain *d) if (ret ) return ret; +#ifdef CONFIG_IOREQ_SERVER + hvm_destroy_all_ioreq_servers(d); +#endif + PROGRESS(xen): ret = relinquish_memory(d, &d->xenpage_list); if ( ret ) diff --git a/xen/arch/arm/hvm.c b/xen/arch/arm/hvm.c index 8951b34..0379493 100644 --- a/xen/arch/arm/hvm.c +++ b/xen/arch/arm/hvm.c @@ -51,6 +51,14 @@ static int hvm_allow_set_param(const struct domain *d, unsigned int param) case HVM_PARAM_MONITOR_RING_PFN: return d == current->domain ? -EPERM : 0; + /* + * XXX: Do we need to follow x86's logic here: + * "The following parameters should only be changed once"? + */ + case HVM_PARAM_IOREQ_SERVER_PFN: + case HVM_PARAM_NR_IOREQ_SERVER_PAGES: + return 0; + /* Writeable only by Xen, hole, deprecated, or out-of-range. */ default: return -EINVAL; @@ -69,6 +77,11 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param) case HVM_PARAM_CONSOLE_EVTCHN: return 0; + /* XXX: Could these be read by someone? What policy to apply? */ + case HVM_PARAM_IOREQ_SERVER_PFN: + case HVM_PARAM_NR_IOREQ_SERVER_PAGES: + return 0; + /* * The following parameters are intended for toolstack usage only. * They may not be read by the domain. @@ -82,6 +95,37 @@ static int hvm_allow_get_param(const struct domain *d, unsigned int param) } } +static int hvmop_set_param(struct domain *d, const struct xen_hvm_param *a) +{ + int rc = 0; + + switch ( a->index ) + { + case HVM_PARAM_IOREQ_SERVER_PFN: + d->arch.hvm.ioreq_gfn.base = a->value; + break; + case HVM_PARAM_NR_IOREQ_SERVER_PAGES: + { + unsigned int i; + + if ( a->value == 0 || + a->value > sizeof(d->arch.hvm.ioreq_gfn.mask) * 8 ) + { + rc = -EINVAL; + break; + } + for ( i = 0; i < a->value; i++ ) + set_bit(i, &d->arch.hvm.ioreq_gfn.mask); + + break; + } + } + + d->arch.hvm.params[a->index] = a->value; + + return rc; +} + long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc = 0; @@ -111,7 +155,7 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg) if ( rc ) goto param_fail; - d->arch.hvm.params[a.index] = a.value; + rc = hvmop_set_param(d, &a); } else { diff --git a/xen/arch/arm/io.c b/xen/arch/arm/io.c index ae7ef96..436f669 100644 --- a/xen/arch/arm/io.c +++ b/xen/arch/arm/io.c @@ -16,6 +16,7 @@ * GNU General Public License for more details. */ +#include <xen/hvm/ioreq.h> #include <xen/lib.h> #include <xen/spinlock.h> #include <xen/sched.h> @@ -107,6 +108,62 @@ static const struct mmio_handler *find_mmio_handler(struct domain *d, return handler; } +#ifdef CONFIG_IOREQ_SERVER +static enum io_state try_fwd_ioserv(struct cpu_user_regs *regs, + struct vcpu *v, mmio_info_t *info) +{ + struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io; + ioreq_t p = { + .type = IOREQ_TYPE_COPY, + .addr = info->gpa, + .size = 1 << info->dabt.size, + .count = 0, + .dir = !info->dabt.write, + .df = 0, /* XXX: What's for? */ + .data = get_user_reg(regs, info->dabt.reg), + .state = STATE_IOREQ_READY, + }; + struct hvm_ioreq_server *s = NULL; + enum io_state rc; + + switch ( vio->io_req.state ) + { + case STATE_IOREQ_NONE: + break; + default: + printk("d%u wrong state %u\n", v->domain->domain_id, + vio->io_req.state); + return IO_ABORT; + } + + s = hvm_select_ioreq_server(v->domain, &p); + if ( !s ) + return IO_UNHANDLED; + + if ( !info->dabt.valid ) + { + printk("Valid bit not set\n"); + return IO_ABORT; + } + + vio->io_req = p; + + rc = hvm_send_ioreq(s, &p, 0); + if ( rc != IO_RETRY || v->domain->is_shutting_down ) + vio->io_req.state = STATE_IOREQ_NONE; + else if ( !hvm_ioreq_needs_completion(&vio->io_req) ) + rc = IO_HANDLED; + else + vio->io_completion = HVMIO_mmio_completion; + + /* XXX: Decide what to do */ + if ( rc == IO_RETRY ) + rc = IO_HANDLED; + + return rc; +} +#endif + enum io_state try_handle_mmio(struct cpu_user_regs *regs, const union hsr hsr, paddr_t gpa) @@ -123,7 +180,15 @@ enum io_state try_handle_mmio(struct cpu_user_regs *regs, handler = find_mmio_handler(v->domain, info.gpa); if ( !handler ) - return IO_UNHANDLED; + { + int rc = IO_UNHANDLED; + +#ifdef CONFIG_IOREQ_SERVER + rc = try_fwd_ioserv(regs, v, &info); +#endif + + return rc; + } /* All the instructions used on emulated MMIO region should be valid */ if ( !dabt.valid ) diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c new file mode 100644 index 0000000..a9cc839 --- /dev/null +++ b/xen/arch/arm/ioreq.c @@ -0,0 +1,86 @@ +/* + * arm/ioreq.c: hardware virtual machine I/O emulation + * + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/ctype.h> +#include <xen/hvm/ioreq.h> +#include <xen/init.h> +#include <xen/lib.h> +#include <xen/trace.h> +#include <xen/sched.h> +#include <xen/irq.h> +#include <xen/softirq.h> +#include <xen/domain.h> +#include <xen/domain_page.h> +#include <xen/event.h> +#include <xen/paging.h> +#include <xen/vpci.h> + +#include <public/hvm/dm_op.h> +#include <public/hvm/ioreq.h> + +bool handle_mmio(void) +{ + struct vcpu *v = current; + struct cpu_user_regs *regs = guest_cpu_user_regs(); + const union hsr hsr = { .bits = regs->hsr }; + const struct hsr_dabt dabt = hsr.dabt; + /* Code is similar to handle_read */ + uint8_t size = (1 << dabt.size) * 8; + register_t r = v->arch.hvm.hvm_io.io_req.data; + + /* We should only be here on Guest Data Abort */ + ASSERT(dabt.ec == HSR_EC_DATA_ABORT_LOWER_EL); + + /* We are done with the IO */ + /* XXX: Is it the right place? */ + v->arch.hvm.hvm_io.io_req.state = STATE_IOREQ_NONE; + + /* XXX: Do we need to take care of write here ? */ + if ( dabt.write ) + return true; + + /* + * Sign extend if required. + * Note that we expect the read handler to have zeroed the bits + * outside the requested access size. + */ + if ( dabt.sign && (r & (1UL << (size - 1))) ) + { + /* + * We are relying on register_t using the same as + * an unsigned long in order to keep the 32-bit assembly + * code smaller. + */ + BUILD_BUG_ON(sizeof(register_t) != sizeof(unsigned long)); + r |= (~0UL) << size; + } + + set_user_reg(regs, dabt.reg, r); + + return true; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c index 8f40d0e..4cdf098 100644 --- a/xen/arch/arm/traps.c +++ b/xen/arch/arm/traps.c @@ -18,6 +18,7 @@ #include <xen/domain_page.h> #include <xen/errno.h> +#include <xen/hvm/ioreq.h> #include <xen/hypercall.h> #include <xen/init.h> #include <xen/iocap.h> @@ -1384,6 +1385,9 @@ static arm_hypercall_t arm_hypercall_table[] = { #ifdef CONFIG_HYPFS HYPERCALL(hypfs_op, 5), #endif +#ifdef CONFIG_IOREQ_SERVER + HYPERCALL(dm_op, 3), +#endif }; #ifndef NDEBUG @@ -1958,6 +1962,9 @@ static void do_trap_stage2_abort_guest(struct cpu_user_regs *regs, case IO_UNHANDLED: /* IO unhandled, try another way to handle it. */ break; + default: + /* XXX: Handle IO_RETRY */ + ASSERT_UNREACHABLE(); } } @@ -2275,6 +2282,16 @@ static void check_for_vcpu_work(void) */ void leave_hypervisor_to_guest(void) { +#ifdef CONFIG_IOREQ_SERVER + /* + * XXX: Check the return. Shall we call that in + * continue_running and context_switch instead? + * The benefits would be to avoid calling + * handle_hvm_io_completion on every return. + */ + local_irq_enable(); + handle_hvm_io_completion(current); +#endif local_irq_disable(); check_for_vcpu_work(); diff --git a/xen/common/memory.c b/xen/common/memory.c index 9283e5e..0000477 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -8,6 +8,7 @@ */ #include <xen/domain_page.h> +#include <xen/hvm/ioreq.h> #include <xen/types.h> #include <xen/lib.h> #include <xen/mm.h> @@ -30,10 +31,6 @@ #include <public/memory.h> #include <xsm/xsm.h> -#ifdef CONFIG_IOREQ_SERVER -#include <xen/hvm/ioreq.h> -#endif - #ifdef CONFIG_X86 #include <asm/guest.h> #endif diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h index 4e2f582..e060b0a 100644 --- a/xen/include/asm-arm/domain.h +++ b/xen/include/asm-arm/domain.h @@ -11,12 +11,64 @@ #include <asm/vgic.h> #include <asm/vpl011.h> #include <public/hvm/params.h> +#include <public/hvm/dm_op.h> +#include <public/hvm/ioreq.h> #include <xen/serial.h> #include <xen/rbtree.h> +struct hvm_ioreq_page { + gfn_t gfn; + struct page_info *page; + void *va; +}; + +struct hvm_ioreq_vcpu { + struct list_head list_entry; + struct vcpu *vcpu; + evtchn_port_t ioreq_evtchn; + bool pending; +}; + +#define NR_IO_RANGE_TYPES (XEN_DMOP_IO_RANGE_PCI + 1) +#define MAX_NR_IO_RANGES 256 + +#define MAX_NR_IOREQ_SERVERS 8 +#define DEFAULT_IOSERVID 0 + +struct hvm_ioreq_server { + struct domain *target, *emulator; + + /* Lock to serialize toolstack modifications */ + spinlock_t lock; + + struct hvm_ioreq_page ioreq; + struct list_head ioreq_vcpu_list; + struct hvm_ioreq_page bufioreq; + + /* Lock to serialize access to buffered ioreq ring */ + spinlock_t bufioreq_lock; + evtchn_port_t bufioreq_evtchn; + struct rangeset *range[NR_IO_RANGE_TYPES]; + bool enabled; + uint8_t bufioreq_handling; +}; + struct hvm_domain { uint64_t params[HVM_NR_PARAMS]; + + /* Guest page range used for non-default ioreq servers */ + struct { + unsigned long base; + unsigned long mask; + unsigned long legacy_mask; /* indexed by HVM param number */ + } ioreq_gfn; + + /* Lock protects all other values in the sub-struct and the default */ + struct { + spinlock_t lock; + struct hvm_ioreq_server *server[MAX_NR_IOREQ_SERVERS]; + } ioreq_server; }; #ifdef CONFIG_ARM_64 @@ -93,6 +145,29 @@ struct arch_domain #endif } __cacheline_aligned; +enum hvm_io_completion { + HVMIO_no_completion, + HVMIO_mmio_completion, + HVMIO_pio_completion, + HVMIO_realmode_completion +}; + +struct hvm_vcpu_io { + /* I/O request in flight to device model. */ + enum hvm_io_completion io_completion; + ioreq_t io_req; + + /* + * HVM emulation: + * Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn. + * The latter is known to be an MMIO frame (not RAM). + * This translation is only valid for accesses as per @mmio_access. + */ + struct npfec mmio_access; + unsigned long mmio_gla; + unsigned long mmio_gpfn; +}; + struct arch_vcpu { struct { @@ -206,6 +281,11 @@ struct arch_vcpu */ bool need_flush_to_ram; + struct hvm_vcpu + { + struct hvm_vcpu_io hvm_io; + } hvm; + } __cacheline_aligned; void vcpu_show_execution_state(struct vcpu *); diff --git a/xen/include/asm-arm/hvm/ioreq.h b/xen/include/asm-arm/hvm/ioreq.h new file mode 100644 index 0000000..83a560c --- /dev/null +++ b/xen/include/asm-arm/hvm/ioreq.h @@ -0,0 +1,103 @@ +/* + * hvm.h: Hardware virtual machine assist interface definitions. + * + * Copyright (c) 2016 Citrix Systems Inc. + * Copyright (c) 2019 Arm ltd. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#ifndef __ASM_ARM_HVM_IOREQ_H__ +#define __ASM_ARM_HVM_IOREQ_H__ + +#include <public/hvm/ioreq.h> +#include <public/hvm/dm_op.h> + +#define has_vpci(d) (false) + +bool handle_mmio(void); + +static inline bool handle_pio(uint16_t port, unsigned int size, int dir) +{ + /* XXX */ + BUG(); + return true; +} + +static inline paddr_t hvm_mmio_first_byte(const ioreq_t *p) +{ + return p->addr; +} + +static inline paddr_t hvm_mmio_last_byte(const ioreq_t *p) +{ + unsigned long size = p->size; + + return p->addr + size - 1; +} + +struct hvm_ioreq_server; + +static inline int p2m_set_ioreq_server(struct domain *d, + unsigned int flags, + struct hvm_ioreq_server *s) +{ + return -EOPNOTSUPP; +} + +static inline void msix_write_completion(struct vcpu *v) +{ +} + +static inline void handle_realmode_completion(void) +{ + ASSERT_UNREACHABLE(); +} + +static inline void paging_mark_pfn_dirty(struct domain *d, pfn_t pfn) +{ +} + +static inline void hvm_get_ioreq_server_range_type(struct domain *d, + ioreq_t *p, + uint8_t *type, + uint64_t *addr) +{ + *type = (p->type == IOREQ_TYPE_PIO) ? + XEN_DMOP_IO_RANGE_PORT : XEN_DMOP_IO_RANGE_MEMORY; + *addr = p->addr; +} + +static inline void arch_hvm_ioreq_init(struct domain *d) +{ +} + +static inline void arch_hvm_ioreq_destroy(struct domain *d) +{ +} + +#define IOREQ_IO_HANDLED IO_HANDLED +#define IOREQ_IO_UNHANDLED IO_UNHANDLED +#define IOREQ_IO_RETRY IO_RETRY + +#endif /* __ASM_X86_HVM_IOREQ_H__ */ + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/include/asm-arm/mmio.h b/xen/include/asm-arm/mmio.h index 8dbfb27..7ab873c 100644 --- a/xen/include/asm-arm/mmio.h +++ b/xen/include/asm-arm/mmio.h @@ -37,6 +37,7 @@ enum io_state IO_ABORT, /* The IO was handled by the helper and led to an abort. */ IO_HANDLED, /* The IO was successfully handled by the helper. */ IO_UNHANDLED, /* The IO was not handled by the helper. */ + IO_RETRY, /* Retry the emulation for some reason */ }; typedef int (*mmio_read_t)(struct vcpu *v, mmio_info_t *info, diff --git a/xen/include/asm-arm/p2m.h b/xen/include/asm-arm/p2m.h index 5fdb6e8..5823f11 100644 --- a/xen/include/asm-arm/p2m.h +++ b/xen/include/asm-arm/p2m.h @@ -385,10 +385,11 @@ static inline int set_foreign_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn) { /* - * NOTE: If this is implemented then proper reference counting of - * foreign entries will need to be implemented. + * XXX: handle properly reference. It looks like the page may not always + * belong to d. */ - return -EOPNOTSUPP; + + return guest_physmap_add_entry(d, _gfn(gfn), mfn, 0, p2m_ram_rw); } /* diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index 2368ace..317455a 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -713,14 +713,14 @@ static XSM_INLINE int xsm_pmu_op (XSM_DEFAULT_ARG struct domain *d, unsigned int } } +#endif /* CONFIG_X86 */ + static XSM_INLINE int xsm_dm_op(XSM_DEFAULT_ARG struct domain *d) { XSM_ASSERT_ACTION(XSM_DM_PRIV); return xsm_default_action(action, current->domain, d); } -#endif /* CONFIG_X86 */ - #ifdef CONFIG_ARGO static XSM_INLINE int xsm_argo_enable(const struct domain *d) { diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index a80bcf3..2a9b39d 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -177,8 +177,8 @@ struct xsm_operations { int (*ioport_permission) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*ioport_mapping) (struct domain *d, uint32_t s, uint32_t e, uint8_t allow); int (*pmu_op) (struct domain *d, unsigned int op); - int (*dm_op) (struct domain *d); #endif + int (*dm_op) (struct domain *d); int (*xen_version) (uint32_t cmd); int (*domain_resource_map) (struct domain *d); #ifdef CONFIG_ARGO @@ -688,13 +688,13 @@ static inline int xsm_pmu_op (xsm_default_t def, struct domain *d, unsigned int return xsm_ops->pmu_op(d, op); } +#endif /* CONFIG_X86 */ + static inline int xsm_dm_op(xsm_default_t def, struct domain *d) { return xsm_ops->dm_op(d); } -#endif /* CONFIG_X86 */ - static inline int xsm_xen_version (xsm_default_t def, uint32_t op) { return xsm_ops->xen_version(op); diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index d4cce68..e3afd06 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -148,8 +148,8 @@ void __init xsm_fixup_ops (struct xsm_operations *ops) set_to_dummy_if_null(ops, ioport_permission); set_to_dummy_if_null(ops, ioport_mapping); set_to_dummy_if_null(ops, pmu_op); - set_to_dummy_if_null(ops, dm_op); #endif + set_to_dummy_if_null(ops, dm_op); set_to_dummy_if_null(ops, xen_version); set_to_dummy_if_null(ops, domain_resource_map); #ifdef CONFIG_ARGO diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index a314bf8..645192a 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -1662,14 +1662,13 @@ static int flask_pmu_op (struct domain *d, unsigned int op) return -EPERM; } } +#endif /* CONFIG_X86 */ static int flask_dm_op(struct domain *d) { return current_has_perm(d, SECCLASS_HVM, HVM__DM); } -#endif /* CONFIG_X86 */ - static int flask_xen_version (uint32_t op) { u32 dsid = domain_sid(current->domain); @@ -1872,8 +1871,8 @@ static struct xsm_operations flask_ops = { .ioport_permission = flask_ioport_permission, .ioport_mapping = flask_ioport_mapping, .pmu_op = flask_pmu_op, - .dm_op = flask_dm_op, #endif + .dm_op = flask_dm_op, .xen_version = flask_xen_version, .domain_resource_map = flask_domain_resource_map, #ifdef CONFIG_ARGO