diff mbox series

[V1,02/16] xen/ioreq: Make x86's IOREQ feature common

Message ID 1599769330-17656-3-git-send-email-olekstysh@gmail.com (mailing list archive)
State Superseded
Headers show
Series IOREQ feature (+ virtio-mmio) on Arm | expand

Commit Message

Oleksandr Tyshchenko Sept. 10, 2020, 8:21 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

As a lot of x86 code can be re-used on Arm later on, this patch
moves previously prepared IOREQ support to the common code.

The code movement is almost a verbatim copy with re-ordering
the headers alphabetically.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall <julien.grall@arm.com>
Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
   - was split into three patches:
     - x86/ioreq: Prepare IOREQ feature for making it common
     - xen/ioreq: Make x86's IOREQ feature common
     - xen/ioreq: Make x86's hvm_ioreq_needs_completion() common
   - update MAINTAINERS file
   - do not use a separate subdir for the IOREQ stuff, move it to:
     - xen/common/ioreq.c
     - xen/include/xen/ioreq.h
   - update x86's files to include xen/ioreq.h
   - remove unneeded headers in arch/x86/hvm/ioreq.c
   - re-order the headers alphabetically in common/ioreq.c
   - update common/ioreq.c according to the newly introduced arch functions:
     arch_hvm_destroy_ioreq_server()/arch_handle_hvm_io_completion()
---
---
 MAINTAINERS                     |    8 +-
 xen/arch/x86/Kconfig            |    1 +
 xen/arch/x86/hvm/dm.c           |    2 +-
 xen/arch/x86/hvm/emulate.c      |    2 +-
 xen/arch/x86/hvm/hvm.c          |    2 +-
 xen/arch/x86/hvm/io.c           |    2 +-
 xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
 xen/arch/x86/hvm/stdvga.c       |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
 xen/arch/x86/mm.c               |    2 +-
 xen/arch/x86/mm/shadow/common.c |    2 +-
 xen/common/Kconfig              |    3 +
 xen/common/Makefile             |    1 +
 xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/ioreq.h |   35 +-
 xen/include/xen/ioreq.h         |   82 +++
 16 files changed, 1533 insertions(+), 1449 deletions(-)
 create mode 100644 xen/common/ioreq.c
 create mode 100644 xen/include/xen/ioreq.h

Comments

Jan Beulich Sept. 14, 2020, 2:17 p.m. UTC | #1
On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
> ---
>  MAINTAINERS                     |    8 +-
>  xen/arch/x86/Kconfig            |    1 +
>  xen/arch/x86/hvm/dm.c           |    2 +-
>  xen/arch/x86/hvm/emulate.c      |    2 +-
>  xen/arch/x86/hvm/hvm.c          |    2 +-
>  xen/arch/x86/hvm/io.c           |    2 +-
>  xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
>  xen/arch/x86/hvm/stdvga.c       |    2 +-
>  xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
>  xen/arch/x86/mm.c               |    2 +-
>  xen/arch/x86/mm/shadow/common.c |    2 +-
>  xen/common/Kconfig              |    3 +
>  xen/common/Makefile             |    1 +
>  xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++

This suggests it was almost the entire file which got moved. It would
be really nice if you could convince git to show the diff here, rather
than removal and addition of 1400 lines.

Additionally I wonder whether what's left in the original file wouldn't
better become inline functions now. If this was done in the previous
patch, the change would be truly a rename then, I think.

> --- a/xen/include/asm-x86/hvm/ioreq.h
> +++ b/xen/include/asm-x86/hvm/ioreq.h
> @@ -19,41 +19,12 @@
>  #ifndef __ASM_X86_HVM_IOREQ_H__
>  #define __ASM_X86_HVM_IOREQ_H__
>  
> -bool hvm_io_pending(struct vcpu *v);
> -bool handle_hvm_io_completion(struct vcpu *v);
> -bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
> +#include <asm/hvm/emulate.h>
> +#include <asm/hvm/hvm.h>
> +#include <asm/hvm/vmx/vmx.h>

Are all three really needed here? Especially the last one strikes me as
odd.

> --- /dev/null
> +++ b/xen/include/xen/ioreq.h
> @@ -0,0 +1,82 @@
> +/*
> + * ioreq.h: Hardware virtual machine assist interface definitions.
> + *
> + * Copyright (c) 2016 Citrix Systems Inc.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + * You should have received a copy of the GNU General Public License along with
> + * this program; If not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef __IOREQ_H__
> +#define __IOREQ_H__

__XEN_IOREQ_H__ please.

> +#include <xen/sched.h>
> +
> +#include <asm/hvm/ioreq.h>

Is this include really needed here (i.e. by the code further down in
the file, and hence by everyone including this header), or rather
just in a few specific .c files?

> +#define GET_IOREQ_SERVER(d, id) \
> +    (d)->arch.hvm.ioreq_server.server[id]

arch.hvm.* feels like a layering violation when used in this header.

Jan
Oleksandr Tyshchenko Sept. 21, 2020, 7:02 p.m. UTC | #2
On 14.09.20 17:17, Jan Beulich wrote:

Hi Jan

> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>> ---
>>   MAINTAINERS                     |    8 +-
>>   xen/arch/x86/Kconfig            |    1 +
>>   xen/arch/x86/hvm/dm.c           |    2 +-
>>   xen/arch/x86/hvm/emulate.c      |    2 +-
>>   xen/arch/x86/hvm/hvm.c          |    2 +-
>>   xen/arch/x86/hvm/io.c           |    2 +-
>>   xen/arch/x86/hvm/ioreq.c        | 1425 +--------------------------------------
>>   xen/arch/x86/hvm/stdvga.c       |    2 +-
>>   xen/arch/x86/hvm/vmx/vvmx.c     |    3 +-
>>   xen/arch/x86/mm.c               |    2 +-
>>   xen/arch/x86/mm/shadow/common.c |    2 +-
>>   xen/common/Kconfig              |    3 +
>>   xen/common/Makefile             |    1 +
>>   xen/common/ioreq.c              | 1410 ++++++++++++++++++++++++++++++++++++++
> This suggests it was almost the entire file which got moved. It would
> be really nice if you could convince git to show the diff here, rather
> than removal and addition of 1400 lines.
>
> Additionally I wonder whether what's left in the original file wouldn't
> better become inline functions now. If this was done in the previous
> patch, the change would be truly a rename then, I think.
Last time when trying to make something inline in arch files for the RFC 
series (I don't remember exactly for what it was)
I got completely stuck with the build issues due to the header 
(inter-?)dependencies, which I failed to resolve).
Anyway I got your point and will try to experiment with that.


>
>> --- a/xen/include/asm-x86/hvm/ioreq.h
>> +++ b/xen/include/asm-x86/hvm/ioreq.h
>> @@ -19,41 +19,12 @@
>>   #ifndef __ASM_X86_HVM_IOREQ_H__
>>   #define __ASM_X86_HVM_IOREQ_H__
>>   
>> -bool hvm_io_pending(struct vcpu *v);
>> -bool handle_hvm_io_completion(struct vcpu *v);
>> -bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
>> +#include <asm/hvm/emulate.h>
>> +#include <asm/hvm/hvm.h>
>> +#include <asm/hvm/vmx/vmx.h>
> Are all three really needed here? Especially the last one strikes me as
> odd.

We can leave only #include <asm/hvm/emulate.h> here and move #include 
<asm/hvm/vmx/vmx.h> to x86/hvm/ioreq.c.
Also #include <asm/hvm/hvm.h> could be dropped.


>
>> --- /dev/null
>> +++ b/xen/include/xen/ioreq.h
>> @@ -0,0 +1,82 @@
>> +/*
>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>> + *
>> + * Copyright (c) 2016 Citrix Systems Inc.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License along with
>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef __IOREQ_H__
>> +#define __IOREQ_H__
> __XEN_IOREQ_H__ please.

ok


>
>> +#include <xen/sched.h>
>> +
>> +#include <asm/hvm/ioreq.h>
> Is this include really needed here (i.e. by the code further down in
> the file, and hence by everyone including this header), or rather
> just in a few specific .c files?
I think, just in few specific .c files. Which are x86/hvm/ioreq.c and 
common/ioreq.c now and several other files later on (x86/hvm/dm.c, 
arm/io.c, etc)
Shall I include that header in these files instead?


>
>> +#define GET_IOREQ_SERVER(d, id) \
>> +    (d)->arch.hvm.ioreq_server.server[id]
> arch.hvm.* feels like a layering violation when used in this header.
Got it. The only reason why GET_IOREQ_SERVER is here is inline 
get_ioreq_server(). I will make it non-inline and move both to 
common/ioreq.c.
I assume this layering violation issue applies for 
hvm_domain_has_ioreq_server() introduced in
[PATCH V1 11/16] xen/ioreq: Introduce hvm_domain_has_ioreq_server()
Jan Beulich Sept. 22, 2020, 6:33 a.m. UTC | #3
On 21.09.2020 21:02, Oleksandr wrote:
> On 14.09.20 17:17, Jan Beulich wrote:
>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>> --- /dev/null
>>> +++ b/xen/include/xen/ioreq.h
>>> @@ -0,0 +1,82 @@
>>> +/*
>>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>>> + *
>>> + * Copyright (c) 2016 Citrix Systems Inc.
>>> + *
>>> + * This program is free software; you can redistribute it and/or modify it
>>> + * under the terms and conditions of the GNU General Public License,
>>> + * version 2, as published by the Free Software Foundation.
>>> + *
>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>> + * more details.
>>> + *
>>> + * You should have received a copy of the GNU General Public License along with
>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>> + */
>>> +
>>> +#ifndef __IOREQ_H__
>>> +#define __IOREQ_H__
>> __XEN_IOREQ_H__ please.
> 
> ok
> 
> 
>>
>>> +#include <xen/sched.h>
>>> +
>>> +#include <asm/hvm/ioreq.h>
>> Is this include really needed here (i.e. by the code further down in
>> the file, and hence by everyone including this header), or rather
>> just in a few specific .c files?
> I think, just in few specific .c files. Which are x86/hvm/ioreq.c and 
> common/ioreq.c now and several other files later on (x86/hvm/dm.c, 
> arm/io.c, etc)
> Shall I include that header in these files instead?

Yes please, and please take this as a common guideline. While
private headers are often used to include things needed by all
of the (typically few) users of the header, non-private ones
shouldn't create unnecessary dependencies on other headers. As
you've said further up - you did run into hard to resolve
header dependencies yourself, and the practice of including
headers without strict need is one of the reasons of such
problems.

>>> +#define GET_IOREQ_SERVER(d, id) \
>>> +    (d)->arch.hvm.ioreq_server.server[id]
>> arch.hvm.* feels like a layering violation when used in this header.
> Got it. The only reason why GET_IOREQ_SERVER is here is inline 
> get_ioreq_server(). I will make it non-inline and move both to 
> common/ioreq.c.

Which won't make the layering violation go away. It's still
common rather than per-arch code then. As suggested elsewhere,
I think the whole ioreq_server struct wants to move into
struct domain itself, perhaps inside a new #ifdef (iirc one of
the patches introduces a suitable Kconfig option). This goes
alongside my suggestion to drop the "hvm" prefixes and infixes
from involved function names.

Jan
Oleksandr Tyshchenko Sept. 22, 2020, 9:58 a.m. UTC | #4
On 22.09.20 09:33, Jan Beulich wrote:

Hi Jan

> On 21.09.2020 21:02, Oleksandr wrote:
>> On 14.09.20 17:17, Jan Beulich wrote:
>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>> --- /dev/null
>>>> +++ b/xen/include/xen/ioreq.h
>>>> @@ -0,0 +1,82 @@
>>>> +/*
>>>> + * ioreq.h: Hardware virtual machine assist interface definitions.
>>>> + *
>>>> + * Copyright (c) 2016 Citrix Systems Inc.
>>>> + *
>>>> + * This program is free software; you can redistribute it and/or modify it
>>>> + * under the terms and conditions of the GNU General Public License,
>>>> + * version 2, as published by the Free Software Foundation.
>>>> + *
>>>> + * This program is distributed in the hope it will be useful, but WITHOUT
>>>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>>>> + * more details.
>>>> + *
>>>> + * You should have received a copy of the GNU General Public License along with
>>>> + * this program; If not, see <http://www.gnu.org/licenses/>.
>>>> + */
>>>> +
>>>> +#ifndef __IOREQ_H__
>>>> +#define __IOREQ_H__
>>> __XEN_IOREQ_H__ please.
>> ok
>>
>>
>>>> +#include <xen/sched.h>
>>>> +
>>>> +#include <asm/hvm/ioreq.h>
>>> Is this include really needed here (i.e. by the code further down in
>>> the file, and hence by everyone including this header), or rather
>>> just in a few specific .c files?
>> I think, just in few specific .c files. Which are x86/hvm/ioreq.c and
>> common/ioreq.c now and several other files later on (x86/hvm/dm.c,
>> arm/io.c, etc)
>> Shall I include that header in these files instead?
> Yes please, and please take this as a common guideline. While
> private headers are often used to include things needed by all
> of the (typically few) users of the header, non-private ones
> shouldn't create unnecessary dependencies on other headers. As
> you've said further up - you did run into hard to resolve
> header dependencies yourself, and the practice of including
> headers without strict need is one of the reasons of such
> problems.

Got it.


>
>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>> arch.hvm.* feels like a layering violation when used in this header.
>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>> get_ioreq_server(). I will make it non-inline and move both to
>> common/ioreq.c.
> Which won't make the layering violation go away. It's still
> common rather than per-arch code then. As suggested elsewhere,
> I think the whole ioreq_server struct wants to move into
> struct domain itself, perhaps inside a new #ifdef (iirc one of
> the patches introduces a suitable Kconfig option).
Well, your advise regarding ioreq_server sounds reasonable, but the 
common ioreq.c
still will have other *arch.hvm.* for both vcpu and domain. So looks 
like other involved structs should be moved
into *common* struct domain/vcpu itself, correct? Some of them could be 
moved easily since contain the same fields (arch.hvm.ioreq_gfn),
but some of them couldn't and seems to require to pull a lot of changes 
to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
Or I missed something?


> This goes
> alongside my suggestion to drop the "hvm" prefixes and infixes
> from involved function names.
Well, I assume this request as well as the request above should be 
addressed in the follow-up patches, as we want to keep the code movement 
in current patch as (almost) verbatim copy,
Am I correct?
Jan Beulich Sept. 22, 2020, 10:54 a.m. UTC | #5
On 22.09.2020 11:58, Oleksandr wrote:
> On 22.09.20 09:33, Jan Beulich wrote:
>> On 21.09.2020 21:02, Oleksandr wrote:
>>> On 14.09.20 17:17, Jan Beulich wrote:
>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>>> arch.hvm.* feels like a layering violation when used in this header.
>>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>>> get_ioreq_server(). I will make it non-inline and move both to
>>> common/ioreq.c.
>> Which won't make the layering violation go away. It's still
>> common rather than per-arch code then. As suggested elsewhere,
>> I think the whole ioreq_server struct wants to move into
>> struct domain itself, perhaps inside a new #ifdef (iirc one of
>> the patches introduces a suitable Kconfig option).
> Well, your advise regarding ioreq_server sounds reasonable, but the 
> common ioreq.c
> still will have other *arch.hvm.* for both vcpu and domain. So looks 
> like other involved structs should be moved
> into *common* struct domain/vcpu itself, correct? Some of them could be 
> moved easily since contain the same fields (arch.hvm.ioreq_gfn),
> but some of them couldn't and seems to require to pull a lot of changes 
> to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
> Or I missed something?

arch.hvm.params, iirc, is an x86 concept, and hence would need
abstracting away anyway. I expect this will be common pattern:
Either you want things to become generic (structure fields
living directly in struct domain, or at least not under arch.hvm),
or things need abstracting for per-arch handling.

>> This goes
>> alongside my suggestion to drop the "hvm" prefixes and infixes
>> from involved function names.
> Well, I assume this request as well as the request above should be 
> addressed in the follow-up patches, as we want to keep the code movement 
> in current patch as (almost) verbatim copy,
> Am I correct?

The renaming could imo be done before or after the move, but within
a single series. Doing it (or some of it) during the move may be
acceptable, but this primarily depends on the overall effect on the
patch that this would have. I.e. the patch better wouldn't become
gigantic just because all the renaming gets done in one go, and it's
hundreds of places that need touching.

Jan
Oleksandr Tyshchenko Sept. 22, 2020, 3:05 p.m. UTC | #6
On 22.09.20 13:54, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 11:58, Oleksandr wrote:
>> On 22.09.20 09:33, Jan Beulich wrote:
>>> On 21.09.2020 21:02, Oleksandr wrote:
>>>> On 14.09.20 17:17, Jan Beulich wrote:
>>>>> On 10.09.2020 22:21, Oleksandr Tyshchenko wrote:
>>>>>> +#define GET_IOREQ_SERVER(d, id) \
>>>>>> +    (d)->arch.hvm.ioreq_server.server[id]
>>>>> arch.hvm.* feels like a layering violation when used in this header.
>>>> Got it. The only reason why GET_IOREQ_SERVER is here is inline
>>>> get_ioreq_server(). I will make it non-inline and move both to
>>>> common/ioreq.c.
>>> Which won't make the layering violation go away. It's still
>>> common rather than per-arch code then. As suggested elsewhere,
>>> I think the whole ioreq_server struct wants to move into
>>> struct domain itself, perhaps inside a new #ifdef (iirc one of
>>> the patches introduces a suitable Kconfig option).
>> Well, your advise regarding ioreq_server sounds reasonable, but the
>> common ioreq.c
>> still will have other *arch.hvm.* for both vcpu and domain. So looks
>> like other involved structs should be moved
>> into *common* struct domain/vcpu itself, correct? Some of them could be
>> moved easily since contain the same fields (arch.hvm.ioreq_gfn),
>> but some of them couldn't and seems to require to pull a lot of changes
>> to the Xen code (arch.hvm.params, arch.hvm.hvm_io), I am afraid.
>> Or I missed something?
> arch.hvm.params, iirc, is an x86 concept, and hence would need
> abstracting away anyway. I expect this will be common pattern:
> Either you want things to become generic (structure fields
> living directly in struct domain, or at least not under arch.hvm),
> or things need abstracting for per-arch handling.
Got it.

Let me please clarify one more question.
In order to avoid the layering violation in current patch we could apply 
a complex approach.

1. *arch.hvm.ioreq_gfn* and *arch.hvm.ioreq_server*: Both structs go 
into common struct domain.

2. *arch.hvm.params*: Two functions that use it 
(hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into 
arch code completely or
     specific macro is used in common code:

    #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])

    I would prefer macro than moving functions to arch code (which are 
equal and should remain in sync).

3. *arch.hvm.hvm_io*: We could also use the following:

    #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
    #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)

    This way struct hvm_vcpu_io won't be used in common code as well.

    Are #2, 3 appropriate to go with?


Dirty non-tested patch (which applied on top of the whole series and 
targets Arm only) shows how it could look like.


diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
index 2e85ea7..5894bdab 100644
--- a/xen/common/ioreq.c
+++ b/xen/common/ioreq.c
@@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu 
*sv, ioreq_t *p)
  bool handle_hvm_io_completion(struct vcpu *v)
  {
      struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    ioreq_t io_req = ioreq_get_io_req(v);
      struct hvm_ioreq_server *s;
      struct hvm_ioreq_vcpu *sv;
      enum hvm_io_completion io_completion;
@@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
      if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
          return false;

-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
          STATE_IORESP_READY : STATE_IOREQ_NONE;

      msix_write_completion(v);
      vcpu_end_shutdown_deferral(v);

-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
+    io_completion = ioreq_get_io_completion(v);
+    ioreq_get_io_completion(v) = HVMIO_no_completion;

      switch ( io_completion )
      {
@@ -227,8 +227,8 @@ bool handle_hvm_io_completion(struct vcpu *v)
          return ioreq_handle_complete_mmio();

      case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
+        return handle_pio(io_req.addr, io_req.size,
+                          io_req.dir);

      default:
          return arch_handle_hvm_io_completion(io_completion);
@@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s)
      for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
      {
          if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
-            return _gfn(d->arch.hvm.params[i]);
+            return _gfn(ioreq_get_params(d, i));
      }

      return INVALID_GFN;
@@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct 
hvm_ioreq_server *s,

      for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
      {
-        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
+        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
               break;
      }
      if ( i > HVM_PARAM_BUFIOREQ_PFN )
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index 0e3ef20..ff761f5 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -21,6 +21,8 @@ struct hvm_domain
      uint64_t              params[HVM_NR_PARAMS];
  };

+#define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])
+
  #ifdef CONFIG_ARM_64
  enum domain_type {
      DOMAIN_32BIT,
@@ -120,6 +122,9 @@ struct hvm_vcpu_io {
      unsigned long       mmio_gpfn;
  };

+#define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
+#define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
+
  struct arch_vcpu
  {
      struct {
(END)


>
>>> This goes
>>> alongside my suggestion to drop the "hvm" prefixes and infixes
>>> from involved function names.
>> Well, I assume this request as well as the request above should be
>> addressed in the follow-up patches, as we want to keep the code movement
>> in current patch as (almost) verbatim copy,
>> Am I correct?
> The renaming could imo be done before or after the move, but within
> a single series. Doing it (or some of it) during the move may be
> acceptable, but this primarily depends on the overall effect on the
> patch that this would have. I.e. the patch better wouldn't become
> gigantic just because all the renaming gets done in one go, and it's
> hundreds of places that need touching.

Got it as well.

Thank you for the explanation.
Jan Beulich Sept. 22, 2020, 3:52 p.m. UTC | #7
On 22.09.2020 17:05, Oleksandr wrote:
> 2. *arch.hvm.params*: Two functions that use it 
> (hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into 
> arch code completely or
>      specific macro is used in common code:
> 
>     #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])

If Arm has the concept of params, then perhaps. But I didn't think
Arm does ...

>     I would prefer macro than moving functions to arch code (which are 
> equal and should remain in sync).

Yes, if the rest of the code is identical, I agree it's better to
merely abstract away small pieces like this one.

> 3. *arch.hvm.hvm_io*: We could also use the following:
> 
>     #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>     #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
> 
>     This way struct hvm_vcpu_io won't be used in common code as well.

But if Arm needs similar field, why keep them in arch.hvm.hvm_io?

> --- a/xen/common/ioreq.c
> +++ b/xen/common/ioreq.c
> @@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu 
> *sv, ioreq_t *p)
>   bool handle_hvm_io_completion(struct vcpu *v)
>   {
>       struct domain *d = v->domain;
> -    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
> +    ioreq_t io_req = ioreq_get_io_req(v);
>       struct hvm_ioreq_server *s;
>       struct hvm_ioreq_vcpu *sv;
>       enum hvm_io_completion io_completion;
> @@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
>       if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>           return false;
> 
> -    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
> +    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
>           STATE_IORESP_READY : STATE_IOREQ_NONE;

This is unlikely to be correct - you're now updating an on-stack
copy of the ioreq_t instead of what vio points at.

>       msix_write_completion(v);
>       vcpu_end_shutdown_deferral(v);
> 
> -    io_completion = vio->io_completion;
> -    vio->io_completion = HVMIO_no_completion;
> +    io_completion = ioreq_get_io_completion(v);
> +    ioreq_get_io_completion(v) = HVMIO_no_completion;

I think it's at least odd to have an lvalue with this kind of a
name. Perhaps want to drop "get" if it's really meant to be used
like this.

> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct 
> hvm_ioreq_server *s)
>       for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>       {
>           if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
> -            return _gfn(d->arch.hvm.params[i]);
> +            return _gfn(ioreq_get_params(d, i));
>       }
> 
>       return INVALID_GFN;
> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct 
> hvm_ioreq_server *s,
> 
>       for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>       {
> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>                break;
>       }
>       if ( i > HVM_PARAM_BUFIOREQ_PFN )

And these two are needed by Arm? Shouldn't Arm exclusively use
the new model, via acquire_resource?

Jan
Oleksandr Tyshchenko Sept. 23, 2020, 12:28 p.m. UTC | #8
On 22.09.20 18:52, Jan Beulich wrote:

Hi Jan

> On 22.09.2020 17:05, Oleksandr wrote:
>> 2. *arch.hvm.params*: Two functions that use it
>> (hvm_alloc_legacy_ioreq_gfn/hvm_free_legacy_ioreq_gfn) either go into
>> arch code completely or
>>       specific macro is used in common code:
>>
>>      #define ioreq_get_params(d, i) ((d)->arch.hvm.params[i])
> If Arm has the concept of params, then perhaps. But I didn't think
> Arm does ...
I think it has in some degree, there is a handling of 
HVMOP_set_param/HVMOP_get_param and
also there is a code to setup HVM_PARAM_CALLBACK_IRQ.


>
>>      I would prefer macro than moving functions to arch code (which are
>> equal and should remain in sync).
> Yes, if the rest of the code is identical, I agree it's better to
> merely abstract away small pieces like this one.

ok


>
>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>
>>      #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>      #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>
>>      This way struct hvm_vcpu_io won't be used in common code as well.
> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
For example Arm needs only the following (at least in the context of 
this series):

+struct hvm_vcpu_io {
+    /* I/O request in flight to device model. */
+    enum hvm_io_completion io_completion;
+    ioreq_t                io_req;
+
+    /*
+     * HVM emulation:
+     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
+     *  The latter is known to be an MMIO frame (not RAM).
+     *  This translation is only valid for accesses as per @mmio_access.
+     */
+    struct npfec        mmio_access;
+    unsigned long       mmio_gla;
+    unsigned long       mmio_gpfn;
+};

But for x86 the number of fields is quite bigger. If they were in same 
way applicable for both archs (as what we have with ioreq_server struct)
I would move it to the common domain. I didn't think of a better idea 
than just abstracting accesses to these (used in common ioreq.c) two 
fields by macro.


>
>> --- a/xen/common/ioreq.c
>> +++ b/xen/common/ioreq.c
>> @@ -194,7 +194,7 @@ static bool hvm_wait_for_io(struct hvm_ioreq_vcpu
>> *sv, ioreq_t *p)
>>    bool handle_hvm_io_completion(struct vcpu *v)
>>    {
>>        struct domain *d = v->domain;
>> -    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
>> +    ioreq_t io_req = ioreq_get_io_req(v);
>>        struct hvm_ioreq_server *s;
>>        struct hvm_ioreq_vcpu *sv;
>>        enum hvm_io_completion io_completion;
>> @@ -209,14 +209,14 @@ bool handle_hvm_io_completion(struct vcpu *v)
>>        if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
>>            return false;
>>
>> -    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
>> +    io_req.state = hvm_ioreq_needs_completion(&io_req) ?
>>            STATE_IORESP_READY : STATE_IOREQ_NONE;
> This is unlikely to be correct - you're now updating an on-stack
> copy of the ioreq_t instead of what vio points at.
Oh, thank you for pointing this, I should have used ioreq_t *io_req = 
&ioreq_get_io_req(v);
I don't like ioreq_get_io_req much), probably ioreq_req would sound a 
little bit better?


>
>>        msix_write_completion(v);
>>        vcpu_end_shutdown_deferral(v);
>>
>> -    io_completion = vio->io_completion;
>> -    vio->io_completion = HVMIO_no_completion;
>> +    io_completion = ioreq_get_io_completion(v);
>> +    ioreq_get_io_completion(v) = HVMIO_no_completion;
> I think it's at least odd to have an lvalue with this kind of a
> name. Perhaps want to drop "get" if it's really meant to be used
> like this.

ok


>
>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>> hvm_ioreq_server *s)
>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>        {
>>            if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>> -            return _gfn(d->arch.hvm.params[i]);
>> +            return _gfn(ioreq_get_params(d, i));
>>        }
>>
>>        return INVALID_GFN;
>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>> hvm_ioreq_server *s,
>>
>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>        {
>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>                 break;
>>        }
>>        if ( i > HVM_PARAM_BUFIOREQ_PFN )
> And these two are needed by Arm? Shouldn't Arm exclusively use
> the new model, via acquire_resource?
I dropped HVMOP plumbing on Arm as it was requested. Only acquire 
interface should be used.
This code is not supposed to be called on Arm, but it is a part of 
common code and we need to find a way how to abstract away *arch.hvm.params*
Am I correct?
Jan Beulich Sept. 24, 2020, 10:58 a.m. UTC | #9
On 23.09.2020 14:28, Oleksandr wrote:
> On 22.09.20 18:52, Jan Beulich wrote:
>> On 22.09.2020 17:05, Oleksandr wrote:
>>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>>
>>>      #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>>      #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>>
>>>      This way struct hvm_vcpu_io won't be used in common code as well.
>> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
> Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
> For example Arm needs only the following (at least in the context of 
> this series):
> 
> +struct hvm_vcpu_io {
> +    /* I/O request in flight to device model. */
> +    enum hvm_io_completion io_completion;
> +    ioreq_t                io_req;
> +
> +    /*
> +     * HVM emulation:
> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
> +     *  The latter is known to be an MMIO frame (not RAM).
> +     *  This translation is only valid for accesses as per @mmio_access.
> +     */
> +    struct npfec        mmio_access;
> +    unsigned long       mmio_gla;
> +    unsigned long       mmio_gpfn;
> +};
> 
> But for x86 the number of fields is quite bigger. If they were in same 
> way applicable for both archs (as what we have with ioreq_server struct)
> I would move it to the common domain. I didn't think of a better idea 
> than just abstracting accesses to these (used in common ioreq.c) two 
> fields by macro.

I'm surprised Arm would need all the three last fields that you
mention. Both here and ...

>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>> hvm_ioreq_server *s)
>>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>        {
>>>            if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>> -            return _gfn(d->arch.hvm.params[i]);
>>> +            return _gfn(ioreq_get_params(d, i));
>>>        }
>>>
>>>        return INVALID_GFN;
>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>> hvm_ioreq_server *s,
>>>
>>>        for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>        {
>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>                 break;
>>>        }
>>>        if ( i > HVM_PARAM_BUFIOREQ_PFN )
>> And these two are needed by Arm? Shouldn't Arm exclusively use
>> the new model, via acquire_resource?
> I dropped HVMOP plumbing on Arm as it was requested. Only acquire 
> interface should be used.
> This code is not supposed to be called on Arm, but it is a part of 
> common code and we need to find a way how to abstract away *arch.hvm.params*

... here I wonder whether you aren't moving more pieces to common
code than are actually arch-independent. I think a prereq step
missing so far is to clearly identify which pieces of the code
are arch-independent, and work towards abstracting away all of the
arch-dependent ones.

Jan
Oleksandr Tyshchenko Sept. 24, 2020, 3:38 p.m. UTC | #10
On 24.09.20 13:58, Jan Beulich wrote:

Hi Jan

> On 23.09.2020 14:28, Oleksandr wrote:
>> On 22.09.20 18:52, Jan Beulich wrote:
>>> On 22.09.2020 17:05, Oleksandr wrote:
>>>> 3. *arch.hvm.hvm_io*: We could also use the following:
>>>>
>>>>       #define ioreq_get_io_completion(v) ((v)->arch.hvm.hvm_io.io_completion)
>>>>       #define ioreq_get_io_req(v) ((v)->arch.hvm.hvm_io.io_req)
>>>>
>>>>       This way struct hvm_vcpu_io won't be used in common code as well.
>>> But if Arm needs similar field, why keep them in arch.hvm.hvm_io?
>> Yes, Arm needs the "some" fields, but not "all of them" as x86 has.
>> For example Arm needs only the following (at least in the context of
>> this series):
>>
>> +struct hvm_vcpu_io {
>> +    /* I/O request in flight to device model. */
>> +    enum hvm_io_completion io_completion;
>> +    ioreq_t                io_req;
>> +
>> +    /*
>> +     * HVM emulation:
>> +     *  Linear address @mmio_gla maps to MMIO physical frame @mmio_gpfn.
>> +     *  The latter is known to be an MMIO frame (not RAM).
>> +     *  This translation is only valid for accesses as per @mmio_access.
>> +     */
>> +    struct npfec        mmio_access;
>> +    unsigned long       mmio_gla;
>> +    unsigned long       mmio_gpfn;
>> +};
>>
>> But for x86 the number of fields is quite bigger. If they were in same
>> way applicable for both archs (as what we have with ioreq_server struct)
>> I would move it to the common domain. I didn't think of a better idea
>> than just abstracting accesses to these (used in common ioreq.c) two
>> fields by macro.
> I'm surprised Arm would need all the three last fields that you
> mention. Both here and ...
>
>>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>>> hvm_ioreq_server *s)
>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>         {
>>>>             if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>>> -            return _gfn(d->arch.hvm.params[i]);
>>>> +            return _gfn(ioreq_get_params(d, i));
>>>>         }
>>>>
>>>>         return INVALID_GFN;
>>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>>> hvm_ioreq_server *s,
>>>>
>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>         {
>>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>>                  break;
>>>>         }
>>>>         if ( i > HVM_PARAM_BUFIOREQ_PFN )
>>> And these two are needed by Arm? Shouldn't Arm exclusively use
>>> the new model, via acquire_resource?
>> I dropped HVMOP plumbing on Arm as it was requested. Only acquire
>> interface should be used.
>> This code is not supposed to be called on Arm, but it is a part of
>> common code and we need to find a way how to abstract away *arch.hvm.params*
> ... here I wonder whether you aren't moving more pieces to common
> code than are actually arch-independent. I think a prereq step
> missing so far is to clearly identify which pieces of the code
> are arch-independent, and work towards abstracting away all of the
> arch-dependent ones.
Unfortunately, not all things are clear and obvious from the very beginning.
I have to admit, I didn't even imagine earlier that *arch.hvm.* usage in 
the common code is a layering violation issue.
Hopefully, now it is clear as well as the steps to avoid it in future.

...


I saw your advise (but haven't answered yet there) regarding splitting 
struct hvm_vcpu_io in
[PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
features. I think, it makes sense.
The only remaining bits I would like to clarify here is 
*arch.hvm.params*. Should we really want to move HVM params field to the 
common code
rather than abstracting away by proposed macro? Although stores a few 
IOREQ params, it is not a (completely) IOREQ stuff and is specific to 
the architecture.
Jan Beulich Sept. 24, 2020, 3:51 p.m. UTC | #11
On 24.09.2020 17:38, Oleksandr wrote:
> On 24.09.20 13:58, Jan Beulich wrote:
>> On 23.09.2020 14:28, Oleksandr wrote:
>>> On 22.09.20 18:52, Jan Beulich wrote:
>>>> On 22.09.2020 17:05, Oleksandr wrote:
>>>>> @@ -247,7 +247,7 @@ static gfn_t hvm_alloc_legacy_ioreq_gfn(struct
>>>>> hvm_ioreq_server *s)
>>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>>         {
>>>>>             if ( !test_and_clear_bit(i, &d->ioreq_gfn.legacy_mask) )
>>>>> -            return _gfn(d->arch.hvm.params[i]);
>>>>> +            return _gfn(ioreq_get_params(d, i));
>>>>>         }
>>>>>
>>>>>         return INVALID_GFN;
>>>>> @@ -279,7 +279,7 @@ static bool hvm_free_legacy_ioreq_gfn(struct
>>>>> hvm_ioreq_server *s,
>>>>>
>>>>>         for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
>>>>>         {
>>>>> -        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
>>>>> +        if ( gfn_eq(gfn, _gfn(ioreq_get_params(d, i))) )
>>>>>                  break;
>>>>>         }
>>>>>         if ( i > HVM_PARAM_BUFIOREQ_PFN )
>>>> And these two are needed by Arm? Shouldn't Arm exclusively use
>>>> the new model, via acquire_resource?
>>> I dropped HVMOP plumbing on Arm as it was requested. Only acquire
>>> interface should be used.
>>> This code is not supposed to be called on Arm, but it is a part of
>>> common code and we need to find a way how to abstract away *arch.hvm.params*
>> ... here I wonder whether you aren't moving more pieces to common
>> code than are actually arch-independent. I think a prereq step
>> missing so far is to clearly identify which pieces of the code
>> are arch-independent, and work towards abstracting away all of the
>> arch-dependent ones.
> Unfortunately, not all things are clear and obvious from the very beginning.
> I have to admit, I didn't even imagine earlier that *arch.hvm.* usage in 
> the common code is a layering violation issue.
> Hopefully, now it is clear as well as the steps to avoid it in future.
> 
> ...
> 
> 
> I saw your advise (but haven't answered yet there) regarding splitting 
> struct hvm_vcpu_io in
> [PATCH V1 09/16] arm/ioreq: Introduce arch specific bits for IOREQ/DM 
> features. I think, it makes sense.
> The only remaining bits I would like to clarify here is 
> *arch.hvm.params*. Should we really want to move HVM params field to the 
> common code
> rather than abstracting away by proposed macro?

I don't think I suggested doing so. In fact I recall having voiced
my expectation that Arm wouldn't use this at all. So yes, I agree
this better wouldn't be moved out of arch.hvm, but instead accesses
be abstracted by another means.

Jan
Julien Grall Sept. 24, 2020, 6:01 p.m. UTC | #12
On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> +{
> +    unsigned int prev_state = STATE_IOREQ_NONE;
> +    unsigned int state = p->state;
> +    uint64_t data = ~0;
> +
> +    smp_rmb();
> +
> +    /*
> +     * The only reason we should see this condition be false is when an
> +     * emulator dying races with I/O being requested.
> +     */
> +    while ( likely(state != STATE_IOREQ_NONE) )
> +    {
> +        if ( unlikely(state < prev_state) )
> +        {
> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
> +                     prev_state, state);
> +            sv->pending = false;
> +            domain_crash(sv->vcpu->domain);
> +            return false; /* bail */
> +        }
> +
> +        switch ( prev_state = state )
> +        {
> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> +            p->state = STATE_IOREQ_NONE;
> +            data = p->data;
> +            break;
> +
> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> +        case STATE_IOREQ_INPROCESS:
> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> +                                      ({ state = p->state;
> +                                         smp_rmb();
> +                                         state != prev_state; }));
> +            continue;

As I pointed out previously [1], this helper was implemented with the 
expectation that wait_on_xen_event_channel() will not return if the vCPU 
got rescheduled.

However, this assumption doesn't hold on Arm.

I can see two solution:
    1) Re-execute the caller
    2) Prevent an IOREQ to disappear until the loop finish.

@Paul any opinions?

Cheers,

[1] 
https://lore.kernel.org/xen-devel/6bfc3920-8f29-188c-cff4-2b99dabe166f@xen.org/
Paul Durrant Sept. 25, 2020, 8:19 a.m. UTC | #13
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 24 September 2020 19:01
> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau
> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien Grall <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> 
> 
> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> > +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> > +{
> > +    unsigned int prev_state = STATE_IOREQ_NONE;
> > +    unsigned int state = p->state;
> > +    uint64_t data = ~0;
> > +
> > +    smp_rmb();
> > +
> > +    /*
> > +     * The only reason we should see this condition be false is when an
> > +     * emulator dying races with I/O being requested.
> > +     */
> > +    while ( likely(state != STATE_IOREQ_NONE) )
> > +    {
> > +        if ( unlikely(state < prev_state) )
> > +        {
> > +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
> > +                     prev_state, state);
> > +            sv->pending = false;
> > +            domain_crash(sv->vcpu->domain);
> > +            return false; /* bail */
> > +        }
> > +
> > +        switch ( prev_state = state )
> > +        {
> > +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> > +            p->state = STATE_IOREQ_NONE;
> > +            data = p->data;
> > +            break;
> > +
> > +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
> > +        case STATE_IOREQ_INPROCESS:
> > +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> > +                                      ({ state = p->state;
> > +                                         smp_rmb();
> > +                                         state != prev_state; }));
> > +            continue;
> 
> As I pointed out previously [1], this helper was implemented with the
> expectation that wait_on_xen_event_channel() will not return if the vCPU
> got rescheduled.
> 
> However, this assumption doesn't hold on Arm.
> 
> I can see two solution:
>     1) Re-execute the caller
>     2) Prevent an IOREQ to disappear until the loop finish.
> 
> @Paul any opinions?

The ioreq control plane is largely predicated on there being no pending I/O when the state of a server is modified, and it is assumed that domain_pause() is sufficient to achieve this. If that assumption doesn't hold then we need additional synchronization.

  Paul

> 
> Cheers,
> 
> [1]
> https://lore.kernel.org/xen-devel/6bfc3920-8f29-188c-cff4-2b99dabe166f@xen.org/
> 
> 
> --
> Julien Grall
Oleksandr Tyshchenko Sept. 30, 2020, 1:39 p.m. UTC | #14
Hi Julien

On 25.09.20 11:19, Paul Durrant wrote:
>> -----Original Message-----
>> From: Julien Grall <julien@xen.org>
>> Sent: 24 September 2020 19:01
>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew Cooper <andrew.cooper3@citrix.com>;
>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson <ian.jackson@eu.citrix.com>; Jan Beulich
>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei Liu <wl@xen.org>; Roger Pau
>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun Nakajima <jun.nakajima@intel.com>;
>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien Grall <julien.grall@arm.com>
>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
>>
>>
>>
>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>> +{
>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
>>> +    unsigned int state = p->state;
>>> +    uint64_t data = ~0;
>>> +
>>> +    smp_rmb();
>>> +
>>> +    /*
>>> +     * The only reason we should see this condition be false is when an
>>> +     * emulator dying races with I/O being requested.
>>> +     */
>>> +    while ( likely(state != STATE_IOREQ_NONE) )
>>> +    {
>>> +        if ( unlikely(state < prev_state) )
>>> +        {
>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
>>> +                     prev_state, state);
>>> +            sv->pending = false;
>>> +            domain_crash(sv->vcpu->domain);
>>> +            return false; /* bail */
>>> +        }
>>> +
>>> +        switch ( prev_state = state )
>>> +        {
>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>>> +            p->state = STATE_IOREQ_NONE;
>>> +            data = p->data;
>>> +            break;
>>> +
>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
>>> +        case STATE_IOREQ_INPROCESS:
>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
>>> +                                      ({ state = p->state;
>>> +                                         smp_rmb();
>>> +                                         state != prev_state; }));
>>> +            continue;
>> As I pointed out previously [1], this helper was implemented with the
>> expectation that wait_on_xen_event_channel() will not return if the vCPU
>> got rescheduled.
>>
>> However, this assumption doesn't hold on Arm.
>>
>> I can see two solution:
>>      1) Re-execute the caller
>>      2) Prevent an IOREQ to disappear until the loop finish.
>>
>> @Paul any opinions?
> The ioreq control plane is largely predicated on there being no pending I/O when the state of a server is modified, and it is assumed that domain_pause() is sufficient to achieve this. If that assumption doesn't hold then we need additional synchronization.
>
>    Paul
>
May I please clarify whether a concern still stands (with what was said 
above) and we need an additional synchronization on Arm?
Julien Grall Sept. 30, 2020, 5:47 p.m. UTC | #15
Hi,

On 30/09/2020 14:39, Oleksandr wrote:
> 
> Hi Julien
> 
> On 25.09.20 11:19, Paul Durrant wrote:
>>> -----Original Message-----
>>> From: Julien Grall <julien@xen.org>
>>> Sent: 24 September 2020 19:01
>>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>; 
>>> xen-devel@lists.xenproject.org
>>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew 
>>> Cooper <andrew.cooper3@citrix.com>;
>>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson 
>>> <ian.jackson@eu.citrix.com>; Jan Beulich
>>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei 
>>> Liu <wl@xen.org>; Roger Pau
>>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun 
>>> Nakajima <jun.nakajima@intel.com>;
>>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien 
>>> Grall <julien.grall@arm.com>
>>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
>>>
>>>
>>>
>>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
>>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
>>>> +{
>>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
>>>> +    unsigned int state = p->state;
>>>> +    uint64_t data = ~0;
>>>> +
>>>> +    smp_rmb();
>>>> +
>>>> +    /*
>>>> +     * The only reason we should see this condition be false is 
>>>> when an
>>>> +     * emulator dying races with I/O being requested.
>>>> +     */
>>>> +    while ( likely(state != STATE_IOREQ_NONE) )
>>>> +    {
>>>> +        if ( unlikely(state < prev_state) )
>>>> +        {
>>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition 
>>>> %u -> %u\n",
>>>> +                     prev_state, state);
>>>> +            sv->pending = false;
>>>> +            domain_crash(sv->vcpu->domain);
>>>> +            return false; /* bail */
>>>> +        }
>>>> +
>>>> +        switch ( prev_state = state )
>>>> +        {
>>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
>>>> +            p->state = STATE_IOREQ_NONE;
>>>> +            data = p->data;
>>>> +            break;
>>>> +
>>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> 
>>>> IORESP_READY */
>>>> +        case STATE_IOREQ_INPROCESS:
>>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
>>>> +                                      ({ state = p->state;
>>>> +                                         smp_rmb();
>>>> +                                         state != prev_state; }));
>>>> +            continue;
>>> As I pointed out previously [1], this helper was implemented with the
>>> expectation that wait_on_xen_event_channel() will not return if the vCPU
>>> got rescheduled.
>>>
>>> However, this assumption doesn't hold on Arm.
>>>
>>> I can see two solution:
>>>      1) Re-execute the caller
>>>      2) Prevent an IOREQ to disappear until the loop finish.
>>>
>>> @Paul any opinions?
>> The ioreq control plane is largely predicated on there being no 
>> pending I/O when the state of a server is modified, and it is assumed 
>> that domain_pause() is sufficient to achieve this. If that assumption 
>> doesn't hold then we need additional synchronization.

I don't think this assumption even hold on x86 because domain_pause() 
will not wait for I/O to finish.

On x86, the context switch will reset the stack and therefore 
wait_on_xen_event_channel() is not going to return. Instead, 
handle_hvm_io_completion() will be called from the tail callback in 
context_switch(). get_pending_vcpu() would return NULL as the IOREQ 
server disappeared. Although, it is not clear whether the vCPU will 
continue to run (or not).

Did I miss anything?

Regarding the fix itself, I am not sure what sort of synchronization we 
can do. Are you suggesting to wait for the I/O to complete? If so, how 
do we handle the case the IOREQ server died?

> May I please clarify whether a concern still stands (with what was said 
> above) and we need an additional synchronization on Arm?

Yes the concern is still there (See above).

Cheers,
Paul Durrant Oct. 1, 2020, 6:59 a.m. UTC | #16
> -----Original Message-----
> From: Julien Grall <julien@xen.org>
> Sent: 30 September 2020 18:48
> To: Oleksandr <olekstysh@gmail.com>; xen-devel@lists.xenproject.org
> Cc: paul@xen.org; 'Oleksandr Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Andrew Cooper'
> <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson'
> <ian.jackson@eu.citrix.com>; 'Jan Beulich' <jbeulich@suse.com>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>; 'Jun
> Nakajima' <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>;
> 'Julien Grall' <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> Hi,
> 
> On 30/09/2020 14:39, Oleksandr wrote:
> >
> > Hi Julien
> >
> > On 25.09.20 11:19, Paul Durrant wrote:
> >>> -----Original Message-----
> >>> From: Julien Grall <julien@xen.org>
> >>> Sent: 24 September 2020 19:01
> >>> To: Oleksandr Tyshchenko <olekstysh@gmail.com>;
> >>> xen-devel@lists.xenproject.org
> >>> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>; Andrew
> >>> Cooper <andrew.cooper3@citrix.com>;
> >>> George Dunlap <george.dunlap@citrix.com>; Ian Jackson
> >>> <ian.jackson@eu.citrix.com>; Jan Beulich
> >>> <jbeulich@suse.com>; Stefano Stabellini <sstabellini@kernel.org>; Wei
> >>> Liu <wl@xen.org>; Roger Pau
> >>> Monné <roger.pau@citrix.com>; Paul Durrant <paul@xen.org>; Jun
> >>> Nakajima <jun.nakajima@intel.com>;
> >>> Kevin Tian <kevin.tian@intel.com>; Tim Deegan <tim@xen.org>; Julien
> >>> Grall <julien.grall@arm.com>
> >>> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> >>>
> >>>
> >>>
> >>> On 10/09/2020 21:21, Oleksandr Tyshchenko wrote:
> >>>> +static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
> >>>> +{
> >>>> +    unsigned int prev_state = STATE_IOREQ_NONE;
> >>>> +    unsigned int state = p->state;
> >>>> +    uint64_t data = ~0;
> >>>> +
> >>>> +    smp_rmb();
> >>>> +
> >>>> +    /*
> >>>> +     * The only reason we should see this condition be false is
> >>>> when an
> >>>> +     * emulator dying races with I/O being requested.
> >>>> +     */
> >>>> +    while ( likely(state != STATE_IOREQ_NONE) )
> >>>> +    {
> >>>> +        if ( unlikely(state < prev_state) )
> >>>> +        {
> >>>> +            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition
> >>>> %u -> %u\n",
> >>>> +                     prev_state, state);
> >>>> +            sv->pending = false;
> >>>> +            domain_crash(sv->vcpu->domain);
> >>>> +            return false; /* bail */
> >>>> +        }
> >>>> +
> >>>> +        switch ( prev_state = state )
> >>>> +        {
> >>>> +        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
> >>>> +            p->state = STATE_IOREQ_NONE;
> >>>> +            data = p->data;
> >>>> +            break;
> >>>> +
> >>>> +        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} ->
> >>>> IORESP_READY */
> >>>> +        case STATE_IOREQ_INPROCESS:
> >>>> +            wait_on_xen_event_channel(sv->ioreq_evtchn,
> >>>> +                                      ({ state = p->state;
> >>>> +                                         smp_rmb();
> >>>> +                                         state != prev_state; }));
> >>>> +            continue;
> >>> As I pointed out previously [1], this helper was implemented with the
> >>> expectation that wait_on_xen_event_channel() will not return if the vCPU
> >>> got rescheduled.
> >>>
> >>> However, this assumption doesn't hold on Arm.
> >>>
> >>> I can see two solution:
> >>>      1) Re-execute the caller
> >>>      2) Prevent an IOREQ to disappear until the loop finish.
> >>>
> >>> @Paul any opinions?
> >> The ioreq control plane is largely predicated on there being no
> >> pending I/O when the state of a server is modified, and it is assumed
> >> that domain_pause() is sufficient to achieve this. If that assumption
> >> doesn't hold then we need additional synchronization.
> 
> I don't think this assumption even hold on x86 because domain_pause()
> will not wait for I/O to finish.
> 
> On x86, the context switch will reset the stack and therefore
> wait_on_xen_event_channel() is not going to return. Instead,
> handle_hvm_io_completion() will be called from the tail callback in
> context_switch(). get_pending_vcpu() would return NULL as the IOREQ
> server disappeared. Although, it is not clear whether the vCPU will
> continue to run (or not).
> 
> Did I miss anything?
> 
> Regarding the fix itself, I am not sure what sort of synchronization we
> can do. Are you suggesting to wait for the I/O to complete? If so, how
> do we handle the case the IOREQ server died?
> 

s/IOREQ server/emulator but that is a good point. If domain_pause() did wait for I/O to complete then this would have always been a problem so, with hindsight, it should have been obvious this was not the case.

Digging back, it looks like things would have probably been ok before 125833f5f1f0 "x86: fix ioreq-server event channel vulnerability" because, wait_on_xen_event_channel() and the loop condition above it did not dereference anything that would disappear with IOREQ server destruction (they used the shared page, which at this point was always a magic page and hence part of the target domain's memory). So things have probably been broken since 2014.

To fix the problem I think it is sufficient that we go back to a wait loop that can tolerate the IOREQ server disappearing between iterations and deals with that as a completed emulation (albeit returning f's for reads and sinking writes).

  Paul

> > May I please clarify whether a concern still stands (with what was said
> > above) and we need an additional synchronization on Arm?
> 
> Yes the concern is still there (See above).
> 
> Cheers,
> 
> --
> Julien Grall
Jan Beulich Oct. 1, 2020, 8:49 a.m. UTC | #17
On 30.09.2020 19:47, Julien Grall wrote:
> Regarding the fix itself, I am not sure what sort of synchronization we 
> can do. Are you suggesting to wait for the I/O to complete? If so, how 
> do we handle the case the IOREQ server died?

In simple cases retrying the entire request may be an option. However,
if the server died after some parts of a multi-part operation were
done already, I guess the resulting loss of state is bad enough to
warrant crashing the guest. This shouldn't be much different from e.g.
a device disappearing from a bare metal system - any partial I/O done
to/from it will leave the machine in an unpredictable state, which it
may be too difficult to recover from without rebooting. (Of course,
staying with this analogue, it may also be okay to simple consider
the operation "complete", leaving it to the guest to recover. The
main issue on the hypervisor side then would be to ensure we don't
expose any uninitialized [due to not having got written to] data to
the guest.)

Jan
Paul Durrant Oct. 1, 2020, 8:50 a.m. UTC | #18
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 01 October 2020 09:49
> To: Julien Grall <julien@xen.org>
> Cc: Oleksandr <olekstysh@gmail.com>; xen-devel@lists.xenproject.org; paul@xen.org; 'Oleksandr
> Tyshchenko' <oleksandr_tyshchenko@epam.com>; 'Andrew Cooper' <andrew.cooper3@citrix.com>; 'George
> Dunlap' <george.dunlap@citrix.com>; 'Ian Jackson' <ian.jackson@eu.citrix.com>; 'Stefano Stabellini'
> <sstabellini@kernel.org>; 'Wei Liu' <wl@xen.org>; 'Roger Pau Monné' <roger.pau@citrix.com>; 'Jun
> Nakajima' <jun.nakajima@intel.com>; 'Kevin Tian' <kevin.tian@intel.com>; 'Tim Deegan' <tim@xen.org>;
> 'Julien Grall' <julien.grall@arm.com>
> Subject: Re: [PATCH V1 02/16] xen/ioreq: Make x86's IOREQ feature common
> 
> On 30.09.2020 19:47, Julien Grall wrote:
> > Regarding the fix itself, I am not sure what sort of synchronization we
> > can do. Are you suggesting to wait for the I/O to complete? If so, how
> > do we handle the case the IOREQ server died?
> 
> In simple cases retrying the entire request may be an option. However,
> if the server died after some parts of a multi-part operation were
> done already, I guess the resulting loss of state is bad enough to
> warrant crashing the guest. This shouldn't be much different from e.g.
> a device disappearing from a bare metal system - any partial I/O done
> to/from it will leave the machine in an unpredictable state, which it
> may be too difficult to recover from without rebooting. (Of course,
> staying with this analogue, it may also be okay to simple consider
> the operation "complete", leaving it to the guest to recover. The
> main issue on the hypervisor side then would be to ensure we don't
> expose any uninitialized [due to not having got written to] data to
> the guest.)
> 

I'll try to take a look today and come up with a patch.

  Paul
diff mbox series

Patch

diff --git a/MAINTAINERS b/MAINTAINERS
index 33fe513..72ba472 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -333,6 +333,13 @@  X:	xen/drivers/passthrough/vtd/
 X:	xen/drivers/passthrough/device_tree.c
 F:	xen/include/xen/iommu.h
 
+I/O EMULATION (IOREQ)
+M:	Paul Durrant <paul@xen.org>
+S:	Supported
+F:	xen/common/ioreq.c
+F:	xen/include/xen/ioreq.h
+F:	xen/include/public/hvm/ioreq.h
+
 KCONFIG
 M:	Doug Goldstein <cardoe@cardoe.com>
 S:	Supported
@@ -549,7 +556,6 @@  F:	xen/arch/x86/hvm/ioreq.c
 F:	xen/include/asm-x86/hvm/emulate.h
 F:	xen/include/asm-x86/hvm/io.h
 F:	xen/include/asm-x86/hvm/ioreq.h
-F:	xen/include/public/hvm/ioreq.h
 
 X86 MEMORY MANAGEMENT
 M:	Jan Beulich <jbeulich@suse.com>
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index a636a4b..f5a9f87 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -91,6 +91,7 @@  config PV_LINEAR_PT
 
 config HVM
 	def_bool !PV_SHIM_EXCLUSIVE
+	select IOREQ_SERVER
 	prompt "HVM support"
 	---help---
 	  Interfaces to support HVM domains.  HVM domains require hardware
diff --git a/xen/arch/x86/hvm/dm.c b/xen/arch/x86/hvm/dm.c
index 9930d68..5ce484a 100644
--- a/xen/arch/x86/hvm/dm.c
+++ b/xen/arch/x86/hvm/dm.c
@@ -17,12 +17,12 @@ 
 #include <xen/event.h>
 #include <xen/guest_access.h>
 #include <xen/hypercall.h>
+#include <xen/ioreq.h>
 #include <xen/nospec.h>
 #include <xen/sched.h>
 
 #include <asm/hap.h>
 #include <asm/hvm/cacheattr.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/shadow.h>
 
 #include <xsm/xsm.h>
diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 8b4e73a..39bdf8d 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -10,6 +10,7 @@ 
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/sched.h>
 #include <xen/paging.h>
@@ -20,7 +21,6 @@ 
 #include <asm/xstate.h>
 #include <asm/hvm/emulate.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/monitor.h>
 #include <asm/hvm/trace.h>
 #include <asm/hvm/support.h>
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index a9d1685..498e0e0 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -20,6 +20,7 @@ 
 
 #include <xen/ctype.h>
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/lib.h>
 #include <xen/trace.h>
 #include <xen/sched.h>
@@ -64,7 +65,6 @@ 
 #include <asm/hvm/trace.h>
 #include <asm/hvm/nestedhvm.h>
 #include <asm/hvm/monitor.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/viridian.h>
 #include <asm/hvm/vm_event.h>
 #include <asm/altp2m.h>
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index 724ab44..14f8c89 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -19,6 +19,7 @@ 
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/mm.h>
 #include <xen/lib.h>
 #include <xen/errno.h>
@@ -35,7 +36,6 @@ 
 #include <asm/shadow.h>
 #include <asm/p2m.h>
 #include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vpt.h>
 #include <asm/hvm/vpic.h>
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index d912655..102b758 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -16,1086 +16,39 @@ 
  * this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
-#include <xen/ctype.h>
-#include <xen/init.h>
-#include <xen/lib.h>
-#include <xen/trace.h>
-#include <xen/sched.h>
-#include <xen/irq.h>
-#include <xen/softirq.h>
 #include <xen/domain.h>
-#include <xen/event.h>
-#include <xen/paging.h>
-#include <xen/vpci.h>
+#include <xen/ioreq.h>
 
-#include <asm/hvm/emulate.h>
-#include <asm/hvm/hvm.h>
-#include <asm/hvm/ioreq.h>
-#include <asm/hvm/vmx/vmx.h>
-
-#include <public/hvm/ioreq.h>
-#include <public/hvm/params.h>
-
-static void set_ioreq_server(struct domain *d, unsigned int id,
-                             struct hvm_ioreq_server *s)
-{
-    ASSERT(id < MAX_NR_IOREQ_SERVERS);
-    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
-
-    d->arch.hvm.ioreq_server.server[id] = s;
-}
-
-#define GET_IOREQ_SERVER(d, id) \
-    (d)->arch.hvm.ioreq_server.server[id]
-
-static struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
-                                                 unsigned int id)
-{
-    if ( id >= MAX_NR_IOREQ_SERVERS )
-        return NULL;
-
-    return GET_IOREQ_SERVER(d, id);
-}
-
-/*
- * Iterate over all possible ioreq servers.
- *
- * NOTE: The iteration is backwards such that more recently created
- *       ioreq servers are favoured in hvm_select_ioreq_server().
- *       This is a semantic that previously existed when ioreq servers
- *       were held in a linked list.
- */
-#define FOR_EACH_IOREQ_SERVER(d, id, s) \
-    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
-        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
-            continue; \
-        else
-
-static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
-{
-    shared_iopage_t *p = s->ioreq.va;
-
-    ASSERT((v == current) || !vcpu_runnable(v));
-    ASSERT(p != NULL);
-
-    return &p->vcpu_ioreq[v->vcpu_id];
-}
-
-static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
-                                               struct hvm_ioreq_server **srvp)
-{
-    struct domain *d = v->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct hvm_ioreq_vcpu *sv;
-
-        list_for_each_entry ( sv,
-                              &s->ioreq_vcpu_list,
-                              list_entry )
-        {
-            if ( sv->vcpu == v && sv->pending )
-            {
-                if ( srvp )
-                    *srvp = s;
-                return sv;
-            }
-        }
-    }
-
-    return NULL;
-}
-
-bool hvm_io_pending(struct vcpu *v)
-{
-    return get_pending_vcpu(v, NULL);
-}
-
-static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
-{
-    unsigned int prev_state = STATE_IOREQ_NONE;
-    unsigned int state = p->state;
-    uint64_t data = ~0;
-
-    smp_rmb();
-
-    /*
-     * The only reason we should see this condition be false is when an
-     * emulator dying races with I/O being requested.
-     */
-    while ( likely(state != STATE_IOREQ_NONE) )
-    {
-        if ( unlikely(state < prev_state) )
-        {
-            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
-                     prev_state, state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        switch ( prev_state = state )
-        {
-        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
-            p->state = STATE_IOREQ_NONE;
-            data = p->data;
-            break;
-
-        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
-        case STATE_IOREQ_INPROCESS:
-            wait_on_xen_event_channel(sv->ioreq_evtchn,
-                                      ({ state = p->state;
-                                         smp_rmb();
-                                         state != prev_state; }));
-            continue;
-
-        default:
-            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
-            sv->pending = false;
-            domain_crash(sv->vcpu->domain);
-            return false; /* bail */
-        }
-
-        break;
-    }
-
-    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
-    if ( hvm_ioreq_needs_completion(p) )
-        p->data = data;
-
-    sv->pending = false;
-
-    return true;
-}
-
-bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
-{
-    switch ( io_completion )
-    {
-    case HVMIO_realmode_completion:
-    {
-        struct hvm_emulate_ctxt ctxt;
-
-        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
-        vmx_realmode_emulate_one(&ctxt);
-        hvm_emulate_writeback(&ctxt);
-
-        break;
-    }
-
-    default:
-        ASSERT_UNREACHABLE();
-        break;
-    }
-
-    return true;
-}
-
-bool handle_hvm_io_completion(struct vcpu *v)
-{
-    struct domain *d = v->domain;
-    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
-    struct hvm_ioreq_server *s;
-    struct hvm_ioreq_vcpu *sv;
-    enum hvm_io_completion io_completion;
-
-    if ( has_vpci(d) && vpci_process_pending(v) )
-    {
-        raise_softirq(SCHEDULE_SOFTIRQ);
-        return false;
-    }
-
-    sv = get_pending_vcpu(v, &s);
-    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
-        return false;
-
-    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
-        STATE_IORESP_READY : STATE_IOREQ_NONE;
-
-    msix_write_completion(v);
-    vcpu_end_shutdown_deferral(v);
-
-    io_completion = vio->io_completion;
-    vio->io_completion = HVMIO_no_completion;
-
-    switch ( io_completion )
-    {
-    case HVMIO_no_completion:
-        break;
-
-    case HVMIO_mmio_completion:
-        return handle_mmio();
-
-    case HVMIO_pio_completion:
-        return handle_pio(vio->io_req.addr, vio->io_req.size,
-                          vio->io_req.dir);
-
-    default:
-        return arch_handle_hvm_io_completion(io_completion);
-    }
-
-    return true;
-}
-
-static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
-            return _gfn(d->arch.hvm.params[i]);
-    }
-
-    return INVALID_GFN;
-}
-
-static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
-    {
-        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
-            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
-    }
-
-    /*
-     * If we are out of 'normal' GFNs then we may still have a 'legacy'
-     * GFN available.
-     */
-    return hvm_alloc_legacy_ioreq_gfn(s);
-}
-
-static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
-                                      gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i;
-
-    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
-    {
-        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
-             break;
-    }
-    if ( i > HVM_PARAM_BUFIOREQ_PFN )
-        return false;
-
-    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
-    return true;
-}
-
-static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
-{
-    struct domain *d = s->target;
-    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
-
-    ASSERT(!gfn_eq(gfn, INVALID_GFN));
-
-    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
-    {
-        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
-        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
-    }
-}
-
-static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    destroy_ring_for_helper(&iorp->va, iorp->page);
-    iorp->page = NULL;
-
-    hvm_free_ioreq_gfn(s, iorp->gfn);
-    iorp->gfn = INVALID_GFN;
-}
-
-static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a page has already been allocated (which will happen on
-         * demand if hvm_get_ioreq_server_frame() is called), then
-         * mapping a guest frame is not permitted.
-         */
-        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    if ( d->is_dying )
-        return -EINVAL;
-
-    iorp->gfn = hvm_alloc_ioreq_gfn(s);
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return -ENOMEM;
-
-    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
-                                 &iorp->va);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, buf);
-
-    return rc;
-}
-
-static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page;
-
-    if ( iorp->page )
-    {
-        /*
-         * If a guest frame has already been mapped (which may happen
-         * on demand if hvm_get_ioreq_server_info() is called), then
-         * allocating a page is not permitted.
-         */
-        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
-            return -EPERM;
-
-        return 0;
-    }
-
-    page = alloc_domheap_page(s->target, MEMF_no_refcount);
-
-    if ( !page )
-        return -ENOMEM;
-
-    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
-    {
-        /*
-         * The domain can't possibly know about this page yet, so failure
-         * here is a clear indication of something fishy going on.
-         */
-        domain_crash(s->emulator);
-        return -ENODATA;
-    }
-
-    iorp->va = __map_domain_page_global(page);
-    if ( !iorp->va )
-        goto fail;
-
-    iorp->page = page;
-    clear_page(iorp->va);
-    return 0;
-
- fail:
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-
-    return -ENOMEM;
-}
-
-static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    struct page_info *page = iorp->page;
-
-    if ( !page )
-        return;
-
-    iorp->page = NULL;
-
-    unmap_domain_page_global(iorp->va);
-    iorp->va = NULL;
-
-    put_page_alloc_ref(page);
-    put_page_and_type(page);
-}
-
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
-{
-    const struct hvm_ioreq_server *s;
-    unsigned int id;
-    bool found = false;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
-        {
-            found = true;
-            break;
-        }
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return found;
-}
-
-static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return;
-
-    if ( guest_physmap_remove_page(d, iorp->gfn,
-                                   page_to_mfn(iorp->page), 0) )
-        domain_crash(d);
-    clear_page(iorp->va);
-}
-
-static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
-{
-    struct domain *d = s->target;
-    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
-    int rc;
-
-    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
-        return 0;
-
-    clear_page(iorp->va);
-
-    rc = guest_physmap_add_page(d, iorp->gfn,
-                                page_to_mfn(iorp->page), 0);
-    if ( rc == 0 )
-        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
-
-    return rc;
-}
-
-static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
-                                    struct hvm_ioreq_vcpu *sv)
-{
-    ASSERT(spin_is_locked(&s->lock));
-
-    if ( s->ioreq.va != NULL )
-    {
-        ioreq_t *p = get_ioreq(s, sv->vcpu);
-
-        p->vp_eport = sv->ioreq_evtchn;
-    }
-}
-
-#define HANDLE_BUFIOREQ(s) \
-    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
-
-static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
-                                     struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-    int rc;
-
-    sv = xzalloc(struct hvm_ioreq_vcpu);
-
-    rc = -ENOMEM;
-    if ( !sv )
-        goto fail1;
-
-    spin_lock(&s->lock);
-
-    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
-                                         s->emulator->domain_id, NULL);
-    if ( rc < 0 )
-        goto fail2;
-
-    sv->ioreq_evtchn = rc;
-
-    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-    {
-        rc = alloc_unbound_xen_event_channel(v->domain, 0,
-                                             s->emulator->domain_id, NULL);
-        if ( rc < 0 )
-            goto fail3;
-
-        s->bufioreq_evtchn = rc;
-    }
-
-    sv->vcpu = v;
-
-    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
-
-    if ( s->enabled )
-        hvm_update_ioreq_evtchn(s, sv);
-
-    spin_unlock(&s->lock);
-    return 0;
-
- fail3:
-    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
- fail2:
-    spin_unlock(&s->lock);
-    xfree(sv);
-
- fail1:
-    return rc;
-}
-
-static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
-                                         struct vcpu *v)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu != v )
-            continue;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-        break;
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv, *next;
-
-    spin_lock(&s->lock);
-
-    list_for_each_entry_safe ( sv,
-                               next,
-                               &s->ioreq_vcpu_list,
-                               list_entry )
-    {
-        struct vcpu *v = sv->vcpu;
-
-        list_del(&sv->list_entry);
-
-        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
-            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
-
-        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
-
-        xfree(sv);
-    }
-
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_map_ioreq_gfn(s, false);
-
-    if ( !rc && HANDLE_BUFIOREQ(s) )
-        rc = hvm_map_ioreq_gfn(s, true);
-
-    if ( rc )
-        hvm_unmap_ioreq_gfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
-{
-    hvm_unmap_ioreq_gfn(s, true);
-    hvm_unmap_ioreq_gfn(s, false);
-}
-
-static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
-{
-    int rc;
-
-    rc = hvm_alloc_ioreq_mfn(s, false);
-
-    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
-        rc = hvm_alloc_ioreq_mfn(s, true);
-
-    if ( rc )
-        hvm_free_ioreq_mfn(s, false);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
-{
-    hvm_free_ioreq_mfn(s, true);
-    hvm_free_ioreq_mfn(s, false);
-}
-
-static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
-{
-    unsigned int i;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-        rangeset_destroy(s->range[i]);
-}
-
-static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
-                                            ioservid_t id)
-{
-    unsigned int i;
-    int rc;
-
-    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-    {
-        char *name;
-
-        rc = asprintf(&name, "ioreq_server %d %s", id,
-                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
-                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
-                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
-                      "");
-        if ( rc )
-            goto fail;
-
-        s->range[i] = rangeset_new(s->target, name,
-                                   RANGESETF_prettyprint_hex);
-
-        xfree(name);
-
-        rc = -ENOMEM;
-        if ( !s->range[i] )
-            goto fail;
-
-        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
-    }
-
-    return 0;
-
- fail:
-    hvm_ioreq_server_free_rangesets(s);
-
-    return rc;
-}
-
-static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
-{
-    struct hvm_ioreq_vcpu *sv;
-
-    spin_lock(&s->lock);
-
-    if ( s->enabled )
-        goto done;
-
-    hvm_remove_ioreq_gfn(s, false);
-    hvm_remove_ioreq_gfn(s, true);
-
-    s->enabled = true;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-        hvm_update_ioreq_evtchn(s, sv);
-
-  done:
-    spin_unlock(&s->lock);
-}
-
-static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
-{
-    spin_lock(&s->lock);
-
-    if ( !s->enabled )
-        goto done;
-
-    hvm_add_ioreq_gfn(s, true);
-    hvm_add_ioreq_gfn(s, false);
-
-    s->enabled = false;
-
- done:
-    spin_unlock(&s->lock);
-}
-
-static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
-                                 struct domain *d, int bufioreq_handling,
-                                 ioservid_t id)
-{
-    struct domain *currd = current->domain;
-    struct vcpu *v;
-    int rc;
-
-    s->target = d;
-
-    get_knownalive_domain(currd);
-    s->emulator = currd;
-
-    spin_lock_init(&s->lock);
-    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
-    spin_lock_init(&s->bufioreq_lock);
-
-    s->ioreq.gfn = INVALID_GFN;
-    s->bufioreq.gfn = INVALID_GFN;
-
-    rc = hvm_ioreq_server_alloc_rangesets(s, id);
-    if ( rc )
-        return rc;
-
-    s->bufioreq_handling = bufioreq_handling;
-
-    for_each_vcpu ( d, v )
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail_add;
-    }
-
-    return 0;
-
- fail_add:
-    hvm_ioreq_server_remove_all_vcpus(s);
-    hvm_ioreq_server_unmap_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-    return rc;
-}
-
-static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
-{
-    ASSERT(!s->enabled);
-    hvm_ioreq_server_remove_all_vcpus(s);
-
-    /*
-     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
-     *       hvm_ioreq_server_free_pages() in that order.
-     *       This is because the former will do nothing if the pages
-     *       are not mapped, leaving the page to be freed by the latter.
-     *       However if the pages are mapped then the former will set
-     *       the page_info pointer to NULL, meaning the latter will do
-     *       nothing.
-     */
-    hvm_ioreq_server_unmap_pages(s);
-    hvm_ioreq_server_free_pages(s);
-
-    hvm_ioreq_server_free_rangesets(s);
-
-    put_domain(s->emulator);
-}
-
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int i;
-    int rc;
-
-    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
-        return -EINVAL;
-
-    s = xzalloc(struct hvm_ioreq_server);
-    if ( !s )
-        return -ENOMEM;
-
-    domain_pause(d);
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
-    {
-        if ( !GET_IOREQ_SERVER(d, i) )
-            break;
-    }
-
-    rc = -ENOSPC;
-    if ( i >= MAX_NR_IOREQ_SERVERS )
-        goto fail;
-
-    /*
-     * It is safe to call set_ioreq_server() prior to
-     * hvm_ioreq_server_init() since the target domain is paused.
-     */
-    set_ioreq_server(d, i, s);
-
-    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
-    if ( rc )
-    {
-        set_ioreq_server(d, i, NULL);
-        goto fail;
-    }
-
-    if ( id )
-        *id = i;
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    return 0;
-
- fail:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    domain_unpause(d);
-
-    xfree(s);
-    return rc;
-}
-
-/* Called when target domain is paused */
-int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
-{
-    return p2m_set_ioreq_server(s->target, 0, s);
-}
-
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    arch_hvm_destroy_ioreq_server(s);
-
-    hvm_ioreq_server_disable(s);
-
-    /*
-     * It is safe to call hvm_ioreq_server_deinit() prior to
-     * set_ioreq_server() since the target domain is paused.
-     */
-    hvm_ioreq_server_deinit(s);
-    set_ioreq_server(d, id, NULL);
-
-    domain_unpause(d);
-
-    xfree(s);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    if ( ioreq_gfn || bufioreq_gfn )
-    {
-        rc = hvm_ioreq_server_map_pages(s);
-        if ( rc )
-            goto out;
-    }
-
-    if ( ioreq_gfn )
-        *ioreq_gfn = gfn_x(s->ioreq.gfn);
-
-    if ( HANDLE_BUFIOREQ(s) )
-    {
-        if ( bufioreq_gfn )
-            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
-
-        if ( bufioreq_port )
-            *bufioreq_port = s->bufioreq_evtchn;
-    }
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    ASSERT(is_hvm_domain(d));
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    rc = hvm_ioreq_server_alloc_pages(s);
-    if ( rc )
-        goto out;
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
 
-    switch ( idx )
+bool arch_handle_hvm_io_completion(enum hvm_io_completion io_completion)
+{
+    switch ( io_completion )
     {
-    case XENMEM_resource_ioreq_server_frame_bufioreq:
-        rc = -ENOENT;
-        if ( !HANDLE_BUFIOREQ(s) )
-            goto out;
-
-        *mfn = page_to_mfn(s->bufioreq.page);
-        rc = 0;
-        break;
+    case HVMIO_realmode_completion:
+    {
+        struct hvm_emulate_ctxt ctxt;
 
-    case XENMEM_resource_ioreq_server_frame_ioreq(0):
-        *mfn = page_to_mfn(s->ioreq.page);
-        rc = 0;
-        break;
+        hvm_emulate_init_once(&ctxt, NULL, guest_cpu_user_regs());
+        vmx_realmode_emulate_one(&ctxt);
+        hvm_emulate_writeback(&ctxt);
 
-    default:
-        rc = -EINVAL;
         break;
     }
 
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end)
-{
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
     default:
-        r = NULL;
+        ASSERT_UNREACHABLE();
         break;
     }
 
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -EEXIST;
-    if ( rangeset_overlaps_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_add_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
+    return true;
 }
 
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end)
+/* Called when target domain is paused */
+int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s)
 {
-    struct hvm_ioreq_server *s;
-    struct rangeset *r;
-    int rc;
-
-    if ( start > end )
-        return -EINVAL;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    switch ( type )
-    {
-    case XEN_DMOP_IO_RANGE_PORT:
-    case XEN_DMOP_IO_RANGE_MEMORY:
-    case XEN_DMOP_IO_RANGE_PCI:
-        r = s->range[type];
-        break;
-
-    default:
-        r = NULL;
-        break;
-    }
-
-    rc = -EINVAL;
-    if ( !r )
-        goto out;
-
-    rc = -ENOENT;
-    if ( !rangeset_contains_range(r, start, end) )
-        goto out;
-
-    rc = rangeset_remove_range(r, start, end);
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
+    return p2m_set_ioreq_server(s->target, 0, s);
 }
 
 /*
@@ -1146,116 +99,6 @@  int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
     return rc;
 }
 
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled)
-{
-    struct hvm_ioreq_server *s;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    s = get_ioreq_server(d, id);
-
-    rc = -ENOENT;
-    if ( !s )
-        goto out;
-
-    rc = -EPERM;
-    if ( s->emulator != current->domain )
-        goto out;
-
-    domain_pause(d);
-
-    if ( enabled )
-        hvm_ioreq_server_enable(s);
-    else
-        hvm_ioreq_server_disable(s);
-
-    domain_unpause(d);
-
-    rc = 0;
-
- out:
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-    return rc;
-}
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-    int rc;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        rc = hvm_ioreq_server_add_vcpu(s, v);
-        if ( rc )
-            goto fail;
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return 0;
-
- fail:
-    while ( ++id != MAX_NR_IOREQ_SERVERS )
-    {
-        s = GET_IOREQ_SERVER(d, id);
-
-        if ( !s )
-            continue;
-
-        hvm_ioreq_server_remove_vcpu(s, v);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    return rc;
-}
-
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-        hvm_ioreq_server_remove_vcpu(s, v);
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
-void hvm_destroy_all_ioreq_servers(struct domain *d)
-{
-    struct hvm_ioreq_server *s;
-    unsigned int id;
-
-    arch_hvm_ioreq_destroy(d);
-
-    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
-
-    /* No need to domain_pause() as the domain is being torn down */
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        hvm_ioreq_server_disable(s);
-
-        /*
-         * It is safe to call hvm_ioreq_server_deinit() prior to
-         * set_ioreq_server() since the target domain is being destroyed.
-         */
-        hvm_ioreq_server_deinit(s);
-        set_ioreq_server(d, id, NULL);
-
-        xfree(s);
-    }
-
-    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
-}
-
 int hvm_get_ioreq_server_range_type(struct domain *d,
                                     ioreq_t *p,
                                     uint8_t *type,
@@ -1303,233 +146,6 @@  int hvm_get_ioreq_server_range_type(struct domain *d,
     return 0;
 }
 
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p)
-{
-    struct hvm_ioreq_server *s;
-    uint8_t type;
-    uint64_t addr;
-    unsigned int id;
-
-    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
-        return NULL;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        struct rangeset *r;
-
-        if ( !s->enabled )
-            continue;
-
-        r = s->range[type];
-
-        switch ( type )
-        {
-            unsigned long start, end;
-
-        case XEN_DMOP_IO_RANGE_PORT:
-            start = addr;
-            end = start + p->size - 1;
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_MEMORY:
-            start = hvm_mmio_first_byte(p);
-            end = hvm_mmio_last_byte(p);
-
-            if ( rangeset_contains_range(r, start, end) )
-                return s;
-
-            break;
-
-        case XEN_DMOP_IO_RANGE_PCI:
-            if ( rangeset_contains_singleton(r, addr >> 32) )
-            {
-                p->type = IOREQ_TYPE_PCI_CONFIG;
-                p->addr = addr;
-                return s;
-            }
-
-            break;
-        }
-    }
-
-    return NULL;
-}
-
-static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_page *iorp;
-    buffered_iopage_t *pg;
-    buf_ioreq_t bp = { .data = p->data,
-                       .addr = p->addr,
-                       .type = p->type,
-                       .dir = p->dir };
-    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
-    int qw = 0;
-
-    /* Ensure buffered_iopage fits in a page */
-    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
-
-    iorp = &s->bufioreq;
-    pg = iorp->va;
-
-    if ( !pg )
-        return IOREQ_IO_UNHANDLED;
-
-    /*
-     * Return 0 for the cases we can't deal with:
-     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
-     *  - we cannot buffer accesses to guest memory buffers, as the guest
-     *    may expect the memory buffer to be synchronously accessed
-     *  - the count field is usually used with data_is_ptr and since we don't
-     *    support data_is_ptr we do not waste space for the count field either
-     */
-    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
-        return 0;
-
-    switch ( p->size )
-    {
-    case 1:
-        bp.size = 0;
-        break;
-    case 2:
-        bp.size = 1;
-        break;
-    case 4:
-        bp.size = 2;
-        break;
-    case 8:
-        bp.size = 3;
-        qw = 1;
-        break;
-    default:
-        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
-        return IOREQ_IO_UNHANDLED;
-    }
-
-    spin_lock(&s->bufioreq_lock);
-
-    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
-         (IOREQ_BUFFER_SLOT_NUM - qw) )
-    {
-        /* The queue is full: send the iopacket through the normal path. */
-        spin_unlock(&s->bufioreq_lock);
-        return IOREQ_IO_UNHANDLED;
-    }
-
-    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
-
-    if ( qw )
-    {
-        bp.data = p->data >> 32;
-        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
-    }
-
-    /* Make the ioreq_t visible /before/ write_pointer. */
-    smp_wmb();
-    pg->ptrs.write_pointer += qw ? 2 : 1;
-
-    /* Canonicalize read/write pointers to prevent their overflow. */
-    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
-            qw++ < IOREQ_BUFFER_SLOT_NUM &&
-            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
-    {
-        union bufioreq_pointers old = pg->ptrs, new;
-        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
-
-        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
-        cmpxchg(&pg->ptrs.full, old.full, new.full);
-    }
-
-    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
-    spin_unlock(&s->bufioreq_lock);
-
-    return IOREQ_IO_HANDLED;
-}
-
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered)
-{
-    struct vcpu *curr = current;
-    struct domain *d = curr->domain;
-    struct hvm_ioreq_vcpu *sv;
-
-    ASSERT(s);
-
-    if ( buffered )
-        return hvm_send_buffered_ioreq(s, proto_p);
-
-    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-        return IOREQ_IO_RETRY;
-
-    list_for_each_entry ( sv,
-                          &s->ioreq_vcpu_list,
-                          list_entry )
-    {
-        if ( sv->vcpu == curr )
-        {
-            evtchn_port_t port = sv->ioreq_evtchn;
-            ioreq_t *p = get_ioreq(s, curr);
-
-            if ( unlikely(p->state != STATE_IOREQ_NONE) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
-                        p->state);
-                break;
-            }
-
-            if ( unlikely(p->vp_eport != port) )
-            {
-                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
-                        p->vp_eport);
-                break;
-            }
-
-            proto_p->state = STATE_IOREQ_NONE;
-            proto_p->vp_eport = port;
-            *p = *proto_p;
-
-            prepare_wait_on_xen_event_channel(port);
-
-            /*
-             * Following happens /after/ blocking and setting up ioreq
-             * contents. prepare_wait_on_xen_event_channel() is an implicit
-             * barrier.
-             */
-            p->state = STATE_IOREQ_READY;
-            notify_via_xen_event_channel(d, port);
-
-            sv->pending = true;
-            return IOREQ_IO_RETRY;
-        }
-    }
-
-    return IOREQ_IO_UNHANDLED;
-}
-
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
-{
-    struct domain *d = current->domain;
-    struct hvm_ioreq_server *s;
-    unsigned int id, failed = 0;
-
-    FOR_EACH_IOREQ_SERVER(d, id, s)
-    {
-        if ( !s->enabled )
-            continue;
-
-        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
-            failed++;
-    }
-
-    return failed;
-}
-
 static int hvm_access_cf8(
     int dir, unsigned int port, unsigned int bytes, uint32_t *val)
 {
@@ -1552,13 +168,6 @@  void arch_hvm_ioreq_destroy(struct domain *d)
 
 }
 
-void hvm_ioreq_init(struct domain *d)
-{
-    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
-
-    arch_hvm_ioreq_init(d);
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/stdvga.c b/xen/arch/x86/hvm/stdvga.c
index e267513..fd7cadb 100644
--- a/xen/arch/x86/hvm/stdvga.c
+++ b/xen/arch/x86/hvm/stdvga.c
@@ -27,10 +27,10 @@ 
  *  can have side effects.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/sched.h>
 #include <xen/domain_page.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/support.h>
 #include <xen/numa.h>
 #include <xen/paging.h>
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1e51689..50e4e6e 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -19,10 +19,11 @@ 
  *
  */
 
+#include <xen/ioreq.h>
+
 #include <asm/types.h>
 #include <asm/mtrr.h>
 #include <asm/p2m.h>
-#include <asm/hvm/ioreq.h>
 #include <asm/hvm/vmx/vmx.h>
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 638f6bf..776d2b6 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -100,6 +100,7 @@ 
  */
 
 #include <xen/init.h>
+#include <xen/ioreq.h>
 #include <xen/kernel.h>
 #include <xen/lib.h>
 #include <xen/mm.h>
@@ -141,7 +142,6 @@ 
 #include <asm/io_apic.h>
 #include <asm/pci.h>
 #include <asm/guest.h>
-#include <asm/hvm/ioreq.h>
 
 #include <asm/hvm/grant_table.h>
 #include <asm/pv/domain.h>
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index 7c7204f..3893579 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -20,6 +20,7 @@ 
  * along with this program; If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <xen/ioreq.h>
 #include <xen/types.h>
 #include <xen/mm.h>
 #include <xen/trace.h>
@@ -34,7 +35,6 @@ 
 #include <asm/current.h>
 #include <asm/flushtlb.h>
 #include <asm/shadow.h>
-#include <asm/hvm/ioreq.h>
 #include <xen/numa.h>
 #include "private.h"
 
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 15e3b79..fb6fb51 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -139,6 +139,9 @@  config HYPFS_CONFIG
 	  Disable this option in case you want to spare some memory or you
 	  want to hide the .config contents from dom0.
 
+config IOREQ_SERVER
+	bool
+
 config KEXEC
 	bool "kexec support"
 	default y
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 06881d0..8df2b6e 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -16,6 +16,7 @@  obj-$(CONFIG_GRANT_TABLE) += grant_table.o
 obj-y += guestcopy.o
 obj-bin-y += gunzip.init.o
 obj-$(CONFIG_HYPFS) += hypfs.o
+obj-$(CONFIG_IOREQ_SERVER) += ioreq.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
diff --git a/xen/common/ioreq.c b/xen/common/ioreq.c
new file mode 100644
index 0000000..5017617
--- /dev/null
+++ b/xen/common/ioreq.c
@@ -0,0 +1,1410 @@ 
+/*
+ * common/ioreq.c: hardware virtual machine I/O emulation
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <xen/ctype.h>
+#include <xen/domain.h>
+#include <xen/domain_page.h>
+#include <xen/event.h>
+#include <xen/init.h>
+#include <xen/ioreq.h>
+#include <xen/irq.h>
+#include <xen/lib.h>
+#include <xen/paging.h>
+#include <xen/sched.h>
+#include <xen/softirq.h>
+#include <xen/trace.h>
+#include <xen/vpci.h>
+
+#include <public/hvm/dm_op.h>
+#include <public/hvm/ioreq.h>
+#include <public/hvm/params.h>
+
+static void set_ioreq_server(struct domain *d, unsigned int id,
+                             struct hvm_ioreq_server *s)
+{
+    ASSERT(id < MAX_NR_IOREQ_SERVERS);
+    ASSERT(!s || !d->arch.hvm.ioreq_server.server[id]);
+
+    d->arch.hvm.ioreq_server.server[id] = s;
+}
+
+/*
+ * Iterate over all possible ioreq servers.
+ *
+ * NOTE: The iteration is backwards such that more recently created
+ *       ioreq servers are favoured in hvm_select_ioreq_server().
+ *       This is a semantic that previously existed when ioreq servers
+ *       were held in a linked list.
+ */
+#define FOR_EACH_IOREQ_SERVER(d, id, s) \
+    for ( (id) = MAX_NR_IOREQ_SERVERS; (id) != 0; ) \
+        if ( !(s = GET_IOREQ_SERVER(d, --(id))) ) \
+            continue; \
+        else
+
+static ioreq_t *get_ioreq(struct hvm_ioreq_server *s, struct vcpu *v)
+{
+    shared_iopage_t *p = s->ioreq.va;
+
+    ASSERT((v == current) || !vcpu_runnable(v));
+    ASSERT(p != NULL);
+
+    return &p->vcpu_ioreq[v->vcpu_id];
+}
+
+static struct hvm_ioreq_vcpu *get_pending_vcpu(const struct vcpu *v,
+                                               struct hvm_ioreq_server **srvp)
+{
+    struct domain *d = v->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct hvm_ioreq_vcpu *sv;
+
+        list_for_each_entry ( sv,
+                              &s->ioreq_vcpu_list,
+                              list_entry )
+        {
+            if ( sv->vcpu == v && sv->pending )
+            {
+                if ( srvp )
+                    *srvp = s;
+                return sv;
+            }
+        }
+    }
+
+    return NULL;
+}
+
+bool hvm_io_pending(struct vcpu *v)
+{
+    return get_pending_vcpu(v, NULL);
+}
+
+static bool hvm_wait_for_io(struct hvm_ioreq_vcpu *sv, ioreq_t *p)
+{
+    unsigned int prev_state = STATE_IOREQ_NONE;
+    unsigned int state = p->state;
+    uint64_t data = ~0;
+
+    smp_rmb();
+
+    /*
+     * The only reason we should see this condition be false is when an
+     * emulator dying races with I/O being requested.
+     */
+    while ( likely(state != STATE_IOREQ_NONE) )
+    {
+        if ( unlikely(state < prev_state) )
+        {
+            gdprintk(XENLOG_ERR, "Weird HVM ioreq state transition %u -> %u\n",
+                     prev_state, state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        switch ( prev_state = state )
+        {
+        case STATE_IORESP_READY: /* IORESP_READY -> NONE */
+            p->state = STATE_IOREQ_NONE;
+            data = p->data;
+            break;
+
+        case STATE_IOREQ_READY:  /* IOREQ_{READY,INPROCESS} -> IORESP_READY */
+        case STATE_IOREQ_INPROCESS:
+            wait_on_xen_event_channel(sv->ioreq_evtchn,
+                                      ({ state = p->state;
+                                         smp_rmb();
+                                         state != prev_state; }));
+            continue;
+
+        default:
+            gdprintk(XENLOG_ERR, "Weird HVM iorequest state %u\n", state);
+            sv->pending = false;
+            domain_crash(sv->vcpu->domain);
+            return false; /* bail */
+        }
+
+        break;
+    }
+
+    p = &sv->vcpu->arch.hvm.hvm_io.io_req;
+    if ( hvm_ioreq_needs_completion(p) )
+        p->data = data;
+
+    sv->pending = false;
+
+    return true;
+}
+
+bool handle_hvm_io_completion(struct vcpu *v)
+{
+    struct domain *d = v->domain;
+    struct hvm_vcpu_io *vio = &v->arch.hvm.hvm_io;
+    struct hvm_ioreq_server *s;
+    struct hvm_ioreq_vcpu *sv;
+    enum hvm_io_completion io_completion;
+
+    if ( has_vpci(d) && vpci_process_pending(v) )
+    {
+        raise_softirq(SCHEDULE_SOFTIRQ);
+        return false;
+    }
+
+    sv = get_pending_vcpu(v, &s);
+    if ( sv && !hvm_wait_for_io(sv, get_ioreq(s, v)) )
+        return false;
+
+    vio->io_req.state = hvm_ioreq_needs_completion(&vio->io_req) ?
+        STATE_IORESP_READY : STATE_IOREQ_NONE;
+
+    msix_write_completion(v);
+    vcpu_end_shutdown_deferral(v);
+
+    io_completion = vio->io_completion;
+    vio->io_completion = HVMIO_no_completion;
+
+    switch ( io_completion )
+    {
+    case HVMIO_no_completion:
+        break;
+
+    case HVMIO_mmio_completion:
+        return handle_mmio();
+
+    case HVMIO_pio_completion:
+        return handle_pio(vio->io_req.addr, vio->io_req.size,
+                          vio->io_req.dir);
+
+    default:
+        return arch_handle_hvm_io_completion(io_completion);
+    }
+
+    return true;
+}
+
+static gfn_t hvm_alloc_legacy_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    BUILD_BUG_ON(HVM_PARAM_BUFIOREQ_PFN != HVM_PARAM_IOREQ_PFN + 1);
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( !test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask) )
+            return _gfn(d->arch.hvm.params[i]);
+    }
+
+    return INVALID_GFN;
+}
+
+static gfn_t hvm_alloc_ioreq_gfn(struct hvm_ioreq_server *s)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = 0; i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8; i++ )
+    {
+        if ( test_and_clear_bit(i, &d->arch.hvm.ioreq_gfn.mask) )
+            return _gfn(d->arch.hvm.ioreq_gfn.base + i);
+    }
+
+    /*
+     * If we are out of 'normal' GFNs then we may still have a 'legacy'
+     * GFN available.
+     */
+    return hvm_alloc_legacy_ioreq_gfn(s);
+}
+
+static bool hvm_free_legacy_ioreq_gfn(struct hvm_ioreq_server *s,
+                                      gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i;
+
+    for ( i = HVM_PARAM_IOREQ_PFN; i <= HVM_PARAM_BUFIOREQ_PFN; i++ )
+    {
+        if ( gfn_eq(gfn, _gfn(d->arch.hvm.params[i])) )
+             break;
+    }
+    if ( i > HVM_PARAM_BUFIOREQ_PFN )
+        return false;
+
+    set_bit(i, &d->arch.hvm.ioreq_gfn.legacy_mask);
+    return true;
+}
+
+static void hvm_free_ioreq_gfn(struct hvm_ioreq_server *s, gfn_t gfn)
+{
+    struct domain *d = s->target;
+    unsigned int i = gfn_x(gfn) - d->arch.hvm.ioreq_gfn.base;
+
+    ASSERT(!gfn_eq(gfn, INVALID_GFN));
+
+    if ( !hvm_free_legacy_ioreq_gfn(s, gfn) )
+    {
+        ASSERT(i < sizeof(d->arch.hvm.ioreq_gfn.mask) * 8);
+        set_bit(i, &d->arch.hvm.ioreq_gfn.mask);
+    }
+}
+
+static void hvm_unmap_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    destroy_ring_for_helper(&iorp->va, iorp->page);
+    iorp->page = NULL;
+
+    hvm_free_ioreq_gfn(s, iorp->gfn);
+    iorp->gfn = INVALID_GFN;
+}
+
+static int hvm_map_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a page has already been allocated (which will happen on
+         * demand if hvm_get_ioreq_server_frame() is called), then
+         * mapping a guest frame is not permitted.
+         */
+        if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    if ( d->is_dying )
+        return -EINVAL;
+
+    iorp->gfn = hvm_alloc_ioreq_gfn(s);
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return -ENOMEM;
+
+    rc = prepare_ring_for_helper(d, gfn_x(iorp->gfn), &iorp->page,
+                                 &iorp->va);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, buf);
+
+    return rc;
+}
+
+static int hvm_alloc_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page;
+
+    if ( iorp->page )
+    {
+        /*
+         * If a guest frame has already been mapped (which may happen
+         * on demand if hvm_get_ioreq_server_info() is called), then
+         * allocating a page is not permitted.
+         */
+        if ( !gfn_eq(iorp->gfn, INVALID_GFN) )
+            return -EPERM;
+
+        return 0;
+    }
+
+    page = alloc_domheap_page(s->target, MEMF_no_refcount);
+
+    if ( !page )
+        return -ENOMEM;
+
+    if ( !get_page_and_type(page, s->target, PGT_writable_page) )
+    {
+        /*
+         * The domain can't possibly know about this page yet, so failure
+         * here is a clear indication of something fishy going on.
+         */
+        domain_crash(s->emulator);
+        return -ENODATA;
+    }
+
+    iorp->va = __map_domain_page_global(page);
+    if ( !iorp->va )
+        goto fail;
+
+    iorp->page = page;
+    clear_page(iorp->va);
+    return 0;
+
+ fail:
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+
+    return -ENOMEM;
+}
+
+static void hvm_free_ioreq_mfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    struct page_info *page = iorp->page;
+
+    if ( !page )
+        return;
+
+    iorp->page = NULL;
+
+    unmap_domain_page_global(iorp->va);
+    iorp->va = NULL;
+
+    put_page_alloc_ref(page);
+    put_page_and_type(page);
+}
+
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page)
+{
+    const struct hvm_ioreq_server *s;
+    unsigned int id;
+    bool found = false;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( (s->ioreq.page == page) || (s->bufioreq.page == page) )
+        {
+            found = true;
+            break;
+        }
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return found;
+}
+
+static void hvm_remove_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return;
+
+    if ( guest_physmap_remove_page(d, iorp->gfn,
+                                   page_to_mfn(iorp->page), 0) )
+        domain_crash(d);
+    clear_page(iorp->va);
+}
+
+static int hvm_add_ioreq_gfn(struct hvm_ioreq_server *s, bool buf)
+{
+    struct domain *d = s->target;
+    struct hvm_ioreq_page *iorp = buf ? &s->bufioreq : &s->ioreq;
+    int rc;
+
+    if ( gfn_eq(iorp->gfn, INVALID_GFN) )
+        return 0;
+
+    clear_page(iorp->va);
+
+    rc = guest_physmap_add_page(d, iorp->gfn,
+                                page_to_mfn(iorp->page), 0);
+    if ( rc == 0 )
+        paging_mark_pfn_dirty(d, _pfn(gfn_x(iorp->gfn)));
+
+    return rc;
+}
+
+static void hvm_update_ioreq_evtchn(struct hvm_ioreq_server *s,
+                                    struct hvm_ioreq_vcpu *sv)
+{
+    ASSERT(spin_is_locked(&s->lock));
+
+    if ( s->ioreq.va != NULL )
+    {
+        ioreq_t *p = get_ioreq(s, sv->vcpu);
+
+        p->vp_eport = sv->ioreq_evtchn;
+    }
+}
+
+#define HANDLE_BUFIOREQ(s) \
+    ((s)->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF)
+
+static int hvm_ioreq_server_add_vcpu(struct hvm_ioreq_server *s,
+                                     struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+    int rc;
+
+    sv = xzalloc(struct hvm_ioreq_vcpu);
+
+    rc = -ENOMEM;
+    if ( !sv )
+        goto fail1;
+
+    spin_lock(&s->lock);
+
+    rc = alloc_unbound_xen_event_channel(v->domain, v->vcpu_id,
+                                         s->emulator->domain_id, NULL);
+    if ( rc < 0 )
+        goto fail2;
+
+    sv->ioreq_evtchn = rc;
+
+    if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+    {
+        rc = alloc_unbound_xen_event_channel(v->domain, 0,
+                                             s->emulator->domain_id, NULL);
+        if ( rc < 0 )
+            goto fail3;
+
+        s->bufioreq_evtchn = rc;
+    }
+
+    sv->vcpu = v;
+
+    list_add(&sv->list_entry, &s->ioreq_vcpu_list);
+
+    if ( s->enabled )
+        hvm_update_ioreq_evtchn(s, sv);
+
+    spin_unlock(&s->lock);
+    return 0;
+
+ fail3:
+    free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+ fail2:
+    spin_unlock(&s->lock);
+    xfree(sv);
+
+ fail1:
+    return rc;
+}
+
+static void hvm_ioreq_server_remove_vcpu(struct hvm_ioreq_server *s,
+                                         struct vcpu *v)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu != v )
+            continue;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+        break;
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_remove_all_vcpus(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv, *next;
+
+    spin_lock(&s->lock);
+
+    list_for_each_entry_safe ( sv,
+                               next,
+                               &s->ioreq_vcpu_list,
+                               list_entry )
+    {
+        struct vcpu *v = sv->vcpu;
+
+        list_del(&sv->list_entry);
+
+        if ( v->vcpu_id == 0 && HANDLE_BUFIOREQ(s) )
+            free_xen_event_channel(v->domain, s->bufioreq_evtchn);
+
+        free_xen_event_channel(v->domain, sv->ioreq_evtchn);
+
+        xfree(sv);
+    }
+
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_map_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_map_ioreq_gfn(s, false);
+
+    if ( !rc && HANDLE_BUFIOREQ(s) )
+        rc = hvm_map_ioreq_gfn(s, true);
+
+    if ( rc )
+        hvm_unmap_ioreq_gfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_unmap_pages(struct hvm_ioreq_server *s)
+{
+    hvm_unmap_ioreq_gfn(s, true);
+    hvm_unmap_ioreq_gfn(s, false);
+}
+
+static int hvm_ioreq_server_alloc_pages(struct hvm_ioreq_server *s)
+{
+    int rc;
+
+    rc = hvm_alloc_ioreq_mfn(s, false);
+
+    if ( !rc && (s->bufioreq_handling != HVM_IOREQSRV_BUFIOREQ_OFF) )
+        rc = hvm_alloc_ioreq_mfn(s, true);
+
+    if ( rc )
+        hvm_free_ioreq_mfn(s, false);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_free_pages(struct hvm_ioreq_server *s)
+{
+    hvm_free_ioreq_mfn(s, true);
+    hvm_free_ioreq_mfn(s, false);
+}
+
+static void hvm_ioreq_server_free_rangesets(struct hvm_ioreq_server *s)
+{
+    unsigned int i;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+        rangeset_destroy(s->range[i]);
+}
+
+static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s,
+                                            ioservid_t id)
+{
+    unsigned int i;
+    int rc;
+
+    for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
+    {
+        char *name;
+
+        rc = asprintf(&name, "ioreq_server %d %s", id,
+                      (i == XEN_DMOP_IO_RANGE_PORT) ? "port" :
+                      (i == XEN_DMOP_IO_RANGE_MEMORY) ? "memory" :
+                      (i == XEN_DMOP_IO_RANGE_PCI) ? "pci" :
+                      "");
+        if ( rc )
+            goto fail;
+
+        s->range[i] = rangeset_new(s->target, name,
+                                   RANGESETF_prettyprint_hex);
+
+        xfree(name);
+
+        rc = -ENOMEM;
+        if ( !s->range[i] )
+            goto fail;
+
+        rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+    }
+
+    return 0;
+
+ fail:
+    hvm_ioreq_server_free_rangesets(s);
+
+    return rc;
+}
+
+static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s)
+{
+    struct hvm_ioreq_vcpu *sv;
+
+    spin_lock(&s->lock);
+
+    if ( s->enabled )
+        goto done;
+
+    hvm_remove_ioreq_gfn(s, false);
+    hvm_remove_ioreq_gfn(s, true);
+
+    s->enabled = true;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+        hvm_update_ioreq_evtchn(s, sv);
+
+  done:
+    spin_unlock(&s->lock);
+}
+
+static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s)
+{
+    spin_lock(&s->lock);
+
+    if ( !s->enabled )
+        goto done;
+
+    hvm_add_ioreq_gfn(s, true);
+    hvm_add_ioreq_gfn(s, false);
+
+    s->enabled = false;
+
+ done:
+    spin_unlock(&s->lock);
+}
+
+static int hvm_ioreq_server_init(struct hvm_ioreq_server *s,
+                                 struct domain *d, int bufioreq_handling,
+                                 ioservid_t id)
+{
+    struct domain *currd = current->domain;
+    struct vcpu *v;
+    int rc;
+
+    s->target = d;
+
+    get_knownalive_domain(currd);
+    s->emulator = currd;
+
+    spin_lock_init(&s->lock);
+    INIT_LIST_HEAD(&s->ioreq_vcpu_list);
+    spin_lock_init(&s->bufioreq_lock);
+
+    s->ioreq.gfn = INVALID_GFN;
+    s->bufioreq.gfn = INVALID_GFN;
+
+    rc = hvm_ioreq_server_alloc_rangesets(s, id);
+    if ( rc )
+        return rc;
+
+    s->bufioreq_handling = bufioreq_handling;
+
+    for_each_vcpu ( d, v )
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail_add;
+    }
+
+    return 0;
+
+ fail_add:
+    hvm_ioreq_server_remove_all_vcpus(s);
+    hvm_ioreq_server_unmap_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+    return rc;
+}
+
+static void hvm_ioreq_server_deinit(struct hvm_ioreq_server *s)
+{
+    ASSERT(!s->enabled);
+    hvm_ioreq_server_remove_all_vcpus(s);
+
+    /*
+     * NOTE: It is safe to call both hvm_ioreq_server_unmap_pages() and
+     *       hvm_ioreq_server_free_pages() in that order.
+     *       This is because the former will do nothing if the pages
+     *       are not mapped, leaving the page to be freed by the latter.
+     *       However if the pages are mapped then the former will set
+     *       the page_info pointer to NULL, meaning the latter will do
+     *       nothing.
+     */
+    hvm_ioreq_server_unmap_pages(s);
+    hvm_ioreq_server_free_pages(s);
+
+    hvm_ioreq_server_free_rangesets(s);
+
+    put_domain(s->emulator);
+}
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int i;
+    int rc;
+
+    if ( bufioreq_handling > HVM_IOREQSRV_BUFIOREQ_ATOMIC )
+        return -EINVAL;
+
+    s = xzalloc(struct hvm_ioreq_server);
+    if ( !s )
+        return -ENOMEM;
+
+    domain_pause(d);
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    for ( i = 0; i < MAX_NR_IOREQ_SERVERS; i++ )
+    {
+        if ( !GET_IOREQ_SERVER(d, i) )
+            break;
+    }
+
+    rc = -ENOSPC;
+    if ( i >= MAX_NR_IOREQ_SERVERS )
+        goto fail;
+
+    /*
+     * It is safe to call set_ioreq_server() prior to
+     * hvm_ioreq_server_init() since the target domain is paused.
+     */
+    set_ioreq_server(d, i, s);
+
+    rc = hvm_ioreq_server_init(s, d, bufioreq_handling, i);
+    if ( rc )
+    {
+        set_ioreq_server(d, i, NULL);
+        goto fail;
+    }
+
+    if ( id )
+        *id = i;
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    return 0;
+
+ fail:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    domain_unpause(d);
+
+    xfree(s);
+    return rc;
+}
+
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    arch_hvm_destroy_ioreq_server(s);
+
+    hvm_ioreq_server_disable(s);
+
+    /*
+     * It is safe to call hvm_ioreq_server_deinit() prior to
+     * set_ioreq_server() since the target domain is paused.
+     */
+    hvm_ioreq_server_deinit(s);
+    set_ioreq_server(d, id, NULL);
+
+    domain_unpause(d);
+
+    xfree(s);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    if ( ioreq_gfn || bufioreq_gfn )
+    {
+        rc = hvm_ioreq_server_map_pages(s);
+        if ( rc )
+            goto out;
+    }
+
+    if ( ioreq_gfn )
+        *ioreq_gfn = gfn_x(s->ioreq.gfn);
+
+    if ( HANDLE_BUFIOREQ(s) )
+    {
+        if ( bufioreq_gfn )
+            *bufioreq_gfn = gfn_x(s->bufioreq.gfn);
+
+        if ( bufioreq_port )
+            *bufioreq_port = s->bufioreq_evtchn;
+    }
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    ASSERT(is_hvm_domain(d));
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    rc = hvm_ioreq_server_alloc_pages(s);
+    if ( rc )
+        goto out;
+
+    switch ( idx )
+    {
+    case XENMEM_resource_ioreq_server_frame_bufioreq:
+        rc = -ENOENT;
+        if ( !HANDLE_BUFIOREQ(s) )
+            goto out;
+
+        *mfn = page_to_mfn(s->bufioreq.page);
+        rc = 0;
+        break;
+
+    case XENMEM_resource_ioreq_server_frame_ioreq(0):
+        *mfn = page_to_mfn(s->ioreq.page);
+        rc = 0;
+        break;
+
+    default:
+        rc = -EINVAL;
+        break;
+    }
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -EEXIST;
+    if ( rangeset_overlaps_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_add_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end)
+{
+    struct hvm_ioreq_server *s;
+    struct rangeset *r;
+    int rc;
+
+    if ( start > end )
+        return -EINVAL;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    switch ( type )
+    {
+    case XEN_DMOP_IO_RANGE_PORT:
+    case XEN_DMOP_IO_RANGE_MEMORY:
+    case XEN_DMOP_IO_RANGE_PCI:
+        r = s->range[type];
+        break;
+
+    default:
+        r = NULL;
+        break;
+    }
+
+    rc = -EINVAL;
+    if ( !r )
+        goto out;
+
+    rc = -ENOENT;
+    if ( !rangeset_contains_range(r, start, end) )
+        goto out;
+
+    rc = rangeset_remove_range(r, start, end);
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled)
+{
+    struct hvm_ioreq_server *s;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    s = get_ioreq_server(d, id);
+
+    rc = -ENOENT;
+    if ( !s )
+        goto out;
+
+    rc = -EPERM;
+    if ( s->emulator != current->domain )
+        goto out;
+
+    domain_pause(d);
+
+    if ( enabled )
+        hvm_ioreq_server_enable(s);
+    else
+        hvm_ioreq_server_disable(s);
+
+    domain_unpause(d);
+
+    rc = 0;
+
+ out:
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+    return rc;
+}
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+    int rc;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        rc = hvm_ioreq_server_add_vcpu(s, v);
+        if ( rc )
+            goto fail;
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return 0;
+
+ fail:
+    while ( ++id != MAX_NR_IOREQ_SERVERS )
+    {
+        s = GET_IOREQ_SERVER(d, id);
+
+        if ( !s )
+            continue;
+
+        hvm_ioreq_server_remove_vcpu(s, v);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    return rc;
+}
+
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+        hvm_ioreq_server_remove_vcpu(s, v);
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+void hvm_destroy_all_ioreq_servers(struct domain *d)
+{
+    struct hvm_ioreq_server *s;
+    unsigned int id;
+
+    arch_hvm_ioreq_destroy(d);
+
+    spin_lock_recursive(&d->arch.hvm.ioreq_server.lock);
+
+    /* No need to domain_pause() as the domain is being torn down */
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        hvm_ioreq_server_disable(s);
+
+        /*
+         * It is safe to call hvm_ioreq_server_deinit() prior to
+         * set_ioreq_server() since the target domain is being destroyed.
+         */
+        hvm_ioreq_server_deinit(s);
+        set_ioreq_server(d, id, NULL);
+
+        xfree(s);
+    }
+
+    spin_unlock_recursive(&d->arch.hvm.ioreq_server.lock);
+}
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p)
+{
+    struct hvm_ioreq_server *s;
+    uint8_t type;
+    uint64_t addr;
+    unsigned int id;
+
+    if ( hvm_get_ioreq_server_range_type(d, p, &type, &addr) )
+        return NULL;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        struct rangeset *r;
+
+        if ( !s->enabled )
+            continue;
+
+        r = s->range[type];
+
+        switch ( type )
+        {
+            unsigned long start, end;
+
+        case XEN_DMOP_IO_RANGE_PORT:
+            start = addr;
+            end = start + p->size - 1;
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_MEMORY:
+            start = hvm_mmio_first_byte(p);
+            end = hvm_mmio_last_byte(p);
+
+            if ( rangeset_contains_range(r, start, end) )
+                return s;
+
+            break;
+
+        case XEN_DMOP_IO_RANGE_PCI:
+            if ( rangeset_contains_singleton(r, addr >> 32) )
+            {
+                p->type = IOREQ_TYPE_PCI_CONFIG;
+                p->addr = addr;
+                return s;
+            }
+
+            break;
+        }
+    }
+
+    return NULL;
+}
+
+static int hvm_send_buffered_ioreq(struct hvm_ioreq_server *s, ioreq_t *p)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_page *iorp;
+    buffered_iopage_t *pg;
+    buf_ioreq_t bp = { .data = p->data,
+                       .addr = p->addr,
+                       .type = p->type,
+                       .dir = p->dir };
+    /* Timeoffset sends 64b data, but no address. Use two consecutive slots. */
+    int qw = 0;
+
+    /* Ensure buffered_iopage fits in a page */
+    BUILD_BUG_ON(sizeof(buffered_iopage_t) > PAGE_SIZE);
+
+    iorp = &s->bufioreq;
+    pg = iorp->va;
+
+    if ( !pg )
+        return IOREQ_IO_UNHANDLED;
+
+    /*
+     * Return 0 for the cases we can't deal with:
+     *  - 'addr' is only a 20-bit field, so we cannot address beyond 1MB
+     *  - we cannot buffer accesses to guest memory buffers, as the guest
+     *    may expect the memory buffer to be synchronously accessed
+     *  - the count field is usually used with data_is_ptr and since we don't
+     *    support data_is_ptr we do not waste space for the count field either
+     */
+    if ( (p->addr > 0xffffful) || p->data_is_ptr || (p->count != 1) )
+        return 0;
+
+    switch ( p->size )
+    {
+    case 1:
+        bp.size = 0;
+        break;
+    case 2:
+        bp.size = 1;
+        break;
+    case 4:
+        bp.size = 2;
+        break;
+    case 8:
+        bp.size = 3;
+        qw = 1;
+        break;
+    default:
+        gdprintk(XENLOG_WARNING, "unexpected ioreq size: %u\n", p->size);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    spin_lock(&s->bufioreq_lock);
+
+    if ( (pg->ptrs.write_pointer - pg->ptrs.read_pointer) >=
+         (IOREQ_BUFFER_SLOT_NUM - qw) )
+    {
+        /* The queue is full: send the iopacket through the normal path. */
+        spin_unlock(&s->bufioreq_lock);
+        return IOREQ_IO_UNHANDLED;
+    }
+
+    pg->buf_ioreq[pg->ptrs.write_pointer % IOREQ_BUFFER_SLOT_NUM] = bp;
+
+    if ( qw )
+    {
+        bp.data = p->data >> 32;
+        pg->buf_ioreq[(pg->ptrs.write_pointer+1) % IOREQ_BUFFER_SLOT_NUM] = bp;
+    }
+
+    /* Make the ioreq_t visible /before/ write_pointer. */
+    smp_wmb();
+    pg->ptrs.write_pointer += qw ? 2 : 1;
+
+    /* Canonicalize read/write pointers to prevent their overflow. */
+    while ( (s->bufioreq_handling == HVM_IOREQSRV_BUFIOREQ_ATOMIC) &&
+            qw++ < IOREQ_BUFFER_SLOT_NUM &&
+            pg->ptrs.read_pointer >= IOREQ_BUFFER_SLOT_NUM )
+    {
+        union bufioreq_pointers old = pg->ptrs, new;
+        unsigned int n = old.read_pointer / IOREQ_BUFFER_SLOT_NUM;
+
+        new.read_pointer = old.read_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        new.write_pointer = old.write_pointer - n * IOREQ_BUFFER_SLOT_NUM;
+        cmpxchg(&pg->ptrs.full, old.full, new.full);
+    }
+
+    notify_via_xen_event_channel(d, s->bufioreq_evtchn);
+    spin_unlock(&s->bufioreq_lock);
+
+    return IOREQ_IO_HANDLED;
+}
+
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered)
+{
+    struct vcpu *curr = current;
+    struct domain *d = curr->domain;
+    struct hvm_ioreq_vcpu *sv;
+
+    ASSERT(s);
+
+    if ( buffered )
+        return hvm_send_buffered_ioreq(s, proto_p);
+
+    if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
+        return IOREQ_IO_RETRY;
+
+    list_for_each_entry ( sv,
+                          &s->ioreq_vcpu_list,
+                          list_entry )
+    {
+        if ( sv->vcpu == curr )
+        {
+            evtchn_port_t port = sv->ioreq_evtchn;
+            ioreq_t *p = get_ioreq(s, curr);
+
+            if ( unlikely(p->state != STATE_IOREQ_NONE) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
+                        p->state);
+                break;
+            }
+
+            if ( unlikely(p->vp_eport != port) )
+            {
+                gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
+                        p->vp_eport);
+                break;
+            }
+
+            proto_p->state = STATE_IOREQ_NONE;
+            proto_p->vp_eport = port;
+            *p = *proto_p;
+
+            prepare_wait_on_xen_event_channel(port);
+
+            /*
+             * Following happens /after/ blocking and setting up ioreq
+             * contents. prepare_wait_on_xen_event_channel() is an implicit
+             * barrier.
+             */
+            p->state = STATE_IOREQ_READY;
+            notify_via_xen_event_channel(d, port);
+
+            sv->pending = true;
+            return IOREQ_IO_RETRY;
+        }
+    }
+
+    return IOREQ_IO_UNHANDLED;
+}
+
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered)
+{
+    struct domain *d = current->domain;
+    struct hvm_ioreq_server *s;
+    unsigned int id, failed = 0;
+
+    FOR_EACH_IOREQ_SERVER(d, id, s)
+    {
+        if ( !s->enabled )
+            continue;
+
+        if ( hvm_send_ioreq(s, p, buffered) == IOREQ_IO_UNHANDLED )
+            failed++;
+    }
+
+    return failed;
+}
+
+void hvm_ioreq_init(struct domain *d)
+{
+    spin_lock_init(&d->arch.hvm.ioreq_server.lock);
+
+    arch_hvm_ioreq_init(d);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/include/asm-x86/hvm/ioreq.h b/xen/include/asm-x86/hvm/ioreq.h
index 151b92b..dec1e71 100644
--- a/xen/include/asm-x86/hvm/ioreq.h
+++ b/xen/include/asm-x86/hvm/ioreq.h
@@ -19,41 +19,12 @@ 
 #ifndef __ASM_X86_HVM_IOREQ_H__
 #define __ASM_X86_HVM_IOREQ_H__
 
-bool hvm_io_pending(struct vcpu *v);
-bool handle_hvm_io_completion(struct vcpu *v);
-bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+#include <asm/hvm/emulate.h>
+#include <asm/hvm/hvm.h>
+#include <asm/hvm/vmx/vmx.h>
 
-int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
-                            ioservid_t *id);
-int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
-int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
-                              unsigned long *ioreq_gfn,
-                              unsigned long *bufioreq_gfn,
-                              evtchn_port_t *bufioreq_port);
-int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
-                               unsigned long idx, mfn_t *mfn);
-int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
-                                     uint32_t type, uint64_t start,
-                                     uint64_t end);
-int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
-                                         uint32_t type, uint64_t start,
-                                         uint64_t end);
 int hvm_map_mem_type_to_ioreq_server(struct domain *d, ioservid_t id,
                                      uint32_t type, uint32_t flags);
-int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
-                               bool enabled);
-
-int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
-void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
-void hvm_destroy_all_ioreq_servers(struct domain *d);
-
-struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
-                                                 ioreq_t *p);
-int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
-                   bool buffered);
-unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
-
-void hvm_ioreq_init(struct domain *d);
 
 int arch_hvm_destroy_ioreq_server(struct hvm_ioreq_server *s);
 
diff --git a/xen/include/xen/ioreq.h b/xen/include/xen/ioreq.h
new file mode 100644
index 0000000..f846034
--- /dev/null
+++ b/xen/include/xen/ioreq.h
@@ -0,0 +1,82 @@ 
+/*
+ * ioreq.h: Hardware virtual machine assist interface definitions.
+ *
+ * Copyright (c) 2016 Citrix Systems Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef __IOREQ_H__
+#define __IOREQ_H__
+
+#include <xen/sched.h>
+
+#include <asm/hvm/ioreq.h>
+
+#define GET_IOREQ_SERVER(d, id) \
+    (d)->arch.hvm.ioreq_server.server[id]
+
+static inline struct hvm_ioreq_server *get_ioreq_server(const struct domain *d,
+                                                        unsigned int id)
+{
+    if ( id >= MAX_NR_IOREQ_SERVERS )
+        return NULL;
+
+    return GET_IOREQ_SERVER(d, id);
+}
+
+bool hvm_io_pending(struct vcpu *v);
+bool handle_hvm_io_completion(struct vcpu *v);
+bool is_ioreq_server_page(struct domain *d, const struct page_info *page);
+
+int hvm_create_ioreq_server(struct domain *d, int bufioreq_handling,
+                            ioservid_t *id);
+int hvm_destroy_ioreq_server(struct domain *d, ioservid_t id);
+int hvm_get_ioreq_server_info(struct domain *d, ioservid_t id,
+                              unsigned long *ioreq_gfn,
+                              unsigned long *bufioreq_gfn,
+                              evtchn_port_t *bufioreq_port);
+int hvm_get_ioreq_server_frame(struct domain *d, ioservid_t id,
+                               unsigned long idx, mfn_t *mfn);
+int hvm_map_io_range_to_ioreq_server(struct domain *d, ioservid_t id,
+                                     uint32_t type, uint64_t start,
+                                     uint64_t end);
+int hvm_unmap_io_range_from_ioreq_server(struct domain *d, ioservid_t id,
+                                         uint32_t type, uint64_t start,
+                                         uint64_t end);
+int hvm_set_ioreq_server_state(struct domain *d, ioservid_t id,
+                               bool enabled);
+
+int hvm_all_ioreq_servers_add_vcpu(struct domain *d, struct vcpu *v);
+void hvm_all_ioreq_servers_remove_vcpu(struct domain *d, struct vcpu *v);
+void hvm_destroy_all_ioreq_servers(struct domain *d);
+
+struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
+                                                 ioreq_t *p);
+int hvm_send_ioreq(struct hvm_ioreq_server *s, ioreq_t *proto_p,
+                   bool buffered);
+unsigned int hvm_broadcast_ioreq(ioreq_t *p, bool buffered);
+
+void hvm_ioreq_init(struct domain *d);
+
+#endif /* __IOREQ_H__ */
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */