diff mbox

[v6,1/4] x86/ioreq server: Add HVMOP to map guest ram with p2m_ioreq_server to an ioreq server.

Message ID CAFLBxZa48fX5Bt39Xnfyerchdb18E4Fis=qDv6K8-sXt+k+YRQ@mail.gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

George Dunlap Sept. 22, 2016, 11:32 a.m. UTC
On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>
>
> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>
>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>> wrote:
>>>>
>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>
>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>> only support the emulation of write operations. And it can be
>>>>> further extended to support the emulation of read ones if an
>>>>> ioreq server has such requirement in the future.
>>>>>
>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>> owner's and flags parameter set to 0.
>>>>>
>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>> are only supported for HVMs with HAP enabled.
>>>>>
>>>>> Also note that only after one ioreq server claims its ownership
>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>> be allowed.
>>>>>
>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>> ---
>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>
>>>>> changes in v6:
>>>>>     - Clarify logic in hvmemul_do_io().
>>>>>     - Use recursive lock for ioreq server lock.
>>>>>     - Remove debug print when mapping ioreq server.
>>>>>     - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>     - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>     - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>       to/from HVMMEM_ram_rw are permitted.
>>>>>     - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
>>>>>       to avoid the race condition when a vm exit happens on a write-
>>>>>       protected page, just to find the ioreq server has been unmapped
>>>>>       already.
>>>>>     - Introduce a seperate patch to delay the release of p2m
>>>>>       lock to avoid the race condition.
>>>>>     - Introduce a seperate patch to handle the read-modify-write
>>>>>       operations on a write protected page.
>>>>>
>>>> Why do we need to do this?  Won't the default case just DTRT if it finds
>>>> that the ioreq server has been unmapped?
>>>
>>>
>>> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as
>>> "recal" or
>>> reset to p2m_ram_rw directly. So my understanding is that we do not wish
>>> to
>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>> server
>>> is unmapped.
>>> Yet without this domain_pause/unpause() pair, VM accesses may trigger an
>>> ept
>>> violation
>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>> find
>>> the ioreq
>>> server is NULL. Then we would have to provide handlers which just do the
>>> copy to/from
>>> actions for the VM. This seems awkward to me.
>>
>> So the race you're worried about is this:
>>
>> 1. Guest fault happens
>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>> 3. guest  finds no ioreq server present
>>
>> I think in that case the easiest thing to do would be to simply assume
>> there was a race and re-execute the instruction.  Is that not possible
>> for some reason?
>>
>>   -George
>
>
> Thanks for your reply, George. :)
> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
> condition:
>
> 1>  Like my previous explanation, in the read-modify-write scenario, the
> ioreq server will
> be NULL for the read emulation. But in such case, hypervisor will not
> discard this trap, instead
> it is supposed to do the copy work for the read access. So it would be
> difficult for hypervisor
> to decide if the ioreq server was detached due to a race condition, or if
> the ioreq server should
> be a NULL because we are emulating a read operation first for a
> read-modify-write instruction.

Wouldn't a patch like the attached work (applied on top of the whole series)?

 -George

Comments

Yu Zhang Sept. 22, 2016, 4:02 p.m. UTC | #1
On 9/22/2016 7:32 PM, George Dunlap wrote:
> On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang <yu.c.zhang@linux.intel.com> wrote:
>>
>> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>> wrote:
>>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>>> only support the emulation of write operations. And it can be
>>>>>> further extended to support the emulation of read ones if an
>>>>>> ioreq server has such requirement in the future.
>>>>>>
>>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>>> owner's and flags parameter set to 0.
>>>>>>
>>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>>> are only supported for HVMs with HAP enabled.
>>>>>>
>>>>>> Also note that only after one ioreq server claims its ownership
>>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>>> be allowed.
>>>>>>
>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>>> ---
>>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>>
>>>>>> changes in v6:
>>>>>>      - Clarify logic in hvmemul_do_io().
>>>>>>      - Use recursive lock for ioreq server lock.
>>>>>>      - Remove debug print when mapping ioreq server.
>>>>>>      - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>>      - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>>      - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>>        to/from HVMMEM_ram_rw are permitted.
>>>>>>      - Add domain_pause/unpause() in hvm_map_mem_type_to_ioreq_server()
>>>>>>        to avoid the race condition when a vm exit happens on a write-
>>>>>>        protected page, just to find the ioreq server has been unmapped
>>>>>>        already.
>>>>>>      - Introduce a seperate patch to delay the release of p2m
>>>>>>        lock to avoid the race condition.
>>>>>>      - Introduce a seperate patch to handle the read-modify-write
>>>>>>        operations on a write protected page.
>>>>>>
>>>>> Why do we need to do this?  Won't the default case just DTRT if it finds
>>>>> that the ioreq server has been unmapped?
>>>>
>>>> Well, patch 4 will either mark the remaining p2m_ioreq_server entries as
>>>> "recal" or
>>>> reset to p2m_ram_rw directly. So my understanding is that we do not wish
>>>> to
>>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>>> server
>>>> is unmapped.
>>>> Yet without this domain_pause/unpause() pair, VM accesses may trigger an
>>>> ept
>>>> violation
>>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>>> find
>>>> the ioreq
>>>> server is NULL. Then we would have to provide handlers which just do the
>>>> copy to/from
>>>> actions for the VM. This seems awkward to me.
>>> So the race you're worried about is this:
>>>
>>> 1. Guest fault happens
>>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>>> 3. guest  finds no ioreq server present
>>>
>>> I think in that case the easiest thing to do would be to simply assume
>>> there was a race and re-execute the instruction.  Is that not possible
>>> for some reason?
>>>
>>>    -George
>>
>> Thanks for your reply, George. :)
>> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
>> condition:
>>
>> 1>  Like my previous explanation, in the read-modify-write scenario, the
>> ioreq server will
>> be NULL for the read emulation. But in such case, hypervisor will not
>> discard this trap, instead
>> it is supposed to do the copy work for the read access. So it would be
>> difficult for hypervisor
>> to decide if the ioreq server was detached due to a race condition, or if
>> the ioreq server should
>> be a NULL because we are emulating a read operation first for a
>> read-modify-write instruction.
> Wouldn't a patch like the attached work (applied on top of the whole series)?

Thanks for your patch, George. I think it should work for 1>. But we 
still have the dead lock
problem.  :)

BTW, do you think a domain_pause will cause any new problem?

B.R.
Yu

>   -George
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
George Dunlap Sept. 23, 2016, 10:35 a.m. UTC | #2
On 22/09/16 17:02, Yu Zhang wrote:
> 
> 
> On 9/22/2016 7:32 PM, George Dunlap wrote:
>> On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang
>> <yu.c.zhang@linux.intel.com> wrote:
>>>
>>> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>> wrote:
>>>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>>>> only support the emulation of write operations. And it can be
>>>>>>> further extended to support the emulation of read ones if an
>>>>>>> ioreq server has such requirement in the future.
>>>>>>>
>>>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>>>> owner's and flags parameter set to 0.
>>>>>>>
>>>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>>>> are only supported for HVMs with HAP enabled.
>>>>>>>
>>>>>>> Also note that only after one ioreq server claims its ownership
>>>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>>>> be allowed.
>>>>>>>
>>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>>>> ---
>>>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>>>
>>>>>>> changes in v6:
>>>>>>>      - Clarify logic in hvmemul_do_io().
>>>>>>>      - Use recursive lock for ioreq server lock.
>>>>>>>      - Remove debug print when mapping ioreq server.
>>>>>>>      - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>>>      - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>>>      - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>>>        to/from HVMMEM_ram_rw are permitted.
>>>>>>>      - Add domain_pause/unpause() in
>>>>>>> hvm_map_mem_type_to_ioreq_server()
>>>>>>>        to avoid the race condition when a vm exit happens on a
>>>>>>> write-
>>>>>>>        protected page, just to find the ioreq server has been
>>>>>>> unmapped
>>>>>>>        already.
>>>>>>>      - Introduce a seperate patch to delay the release of p2m
>>>>>>>        lock to avoid the race condition.
>>>>>>>      - Introduce a seperate patch to handle the read-modify-write
>>>>>>>        operations on a write protected page.
>>>>>>>
>>>>>> Why do we need to do this?  Won't the default case just DTRT if it
>>>>>> finds
>>>>>> that the ioreq server has been unmapped?
>>>>>
>>>>> Well, patch 4 will either mark the remaining p2m_ioreq_server
>>>>> entries as
>>>>> "recal" or
>>>>> reset to p2m_ram_rw directly. So my understanding is that we do not
>>>>> wish
>>>>> to
>>>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>>>> server
>>>>> is unmapped.
>>>>> Yet without this domain_pause/unpause() pair, VM accesses may
>>>>> trigger an
>>>>> ept
>>>>> violation
>>>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>>>> find
>>>>> the ioreq
>>>>> server is NULL. Then we would have to provide handlers which just
>>>>> do the
>>>>> copy to/from
>>>>> actions for the VM. This seems awkward to me.
>>>> So the race you're worried about is this:
>>>>
>>>> 1. Guest fault happens
>>>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>>>> 3. guest  finds no ioreq server present
>>>>
>>>> I think in that case the easiest thing to do would be to simply assume
>>>> there was a race and re-execute the instruction.  Is that not possible
>>>> for some reason?
>>>>
>>>>    -George
>>>
>>> Thanks for your reply, George. :)
>>> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
>>> condition:
>>>
>>> 1>  Like my previous explanation, in the read-modify-write scenario, the
>>> ioreq server will
>>> be NULL for the read emulation. But in such case, hypervisor will not
>>> discard this trap, instead
>>> it is supposed to do the copy work for the read access. So it would be
>>> difficult for hypervisor
>>> to decide if the ioreq server was detached due to a race condition,
>>> or if
>>> the ioreq server should
>>> be a NULL because we are emulating a read operation first for a
>>> read-modify-write instruction.
>> Wouldn't a patch like the attached work (applied on top of the whole
>> series)?
> 
> Thanks for your patch, George. I think it should work for 1>. But we
> still have the dead lock
> problem.  :)
> 
> BTW, do you think a domain_pause will cause any new problem?

Well using a "big hammer" like domain_pause in a case like this is
usually indicates that there are other issues that aren't being solved
properly -- for instance, latent deadlocks or unhandled race conditions.
:-)  Leaving those issues around in the codebase but "papered over" by
domain_pause is storing up technical debt that future generations will
inherit and need to untangle.  (Particularly as in this case, there was
no comment *in the code* explaining what problems the domain_pause was
there to solve, so anyone wanting to try to remove it would need to just
figure out.)

domain_pause is relatively expensive to do (since you have to spin
waiting for all the vcpus to finish running) and completely stops the
domain from handling interrupts or anything for an arbitrary amount of
time.  As long as it doesn't happen often, the cost shouldn't be a major
issue; and as long as the domain_pause is short (less than 100ms), then
it shouldn't cause any more problems than running on a fairly busy
system should cause.

On the other hand, if every time we ran into a tricky situation we just
did a domain pause rather than solving the root issue, pretty soon the
domain would be paused several times per second.  The performance would
plummet, and fixing it would be a nightmare because you'd have hundreds
of undocumented issues to try to understand and fix.

So: the domain_pause itself isn't terrible (although it's better to
avoid it if we can); what's more of a problem is the potential issues
that it's hiding.  These issues can add up, so it's important to push
back and ask "why do we need this and can we solve it a different way"
pro-actively, as patches come in, rather than waiting until it becomes
an issue.

 -George
Yu Zhang Sept. 26, 2016, 6:57 a.m. UTC | #3
On 9/23/2016 6:35 PM, George Dunlap wrote:
> On 22/09/16 17:02, Yu Zhang wrote:
>>
>> On 9/22/2016 7:32 PM, George Dunlap wrote:
>>> On Thu, Sep 22, 2016 at 10:12 AM, Yu Zhang
>>> <yu.c.zhang@linux.intel.com> wrote:
>>>> On 9/21/2016 9:04 PM, George Dunlap wrote:
>>>>> On Fri, Sep 9, 2016 at 6:51 AM, Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>> wrote:
>>>>>>> On 9/2/2016 6:47 PM, Yu Zhang wrote:
>>>>>>>> A new HVMOP - HVMOP_map_mem_type_to_ioreq_server, is added to
>>>>>>>> let one ioreq server claim/disclaim its responsibility for the
>>>>>>>> handling of guest pages with p2m type p2m_ioreq_server. Users
>>>>>>>> of this HVMOP can specify which kind of operation is supposed
>>>>>>>> to be emulated in a parameter named flags. Currently, this HVMOP
>>>>>>>> only support the emulation of write operations. And it can be
>>>>>>>> further extended to support the emulation of read ones if an
>>>>>>>> ioreq server has such requirement in the future.
>>>>>>>>
>>>>>>>> For now, we only support one ioreq server for this p2m type, so
>>>>>>>> once an ioreq server has claimed its ownership, subsequent calls
>>>>>>>> of the HVMOP_map_mem_type_to_ioreq_server will fail. Users can also
>>>>>>>> disclaim the ownership of guest ram pages with p2m_ioreq_server, by
>>>>>>>> triggering this new HVMOP, with ioreq server id set to the current
>>>>>>>> owner's and flags parameter set to 0.
>>>>>>>>
>>>>>>>> Note both HVMOP_map_mem_type_to_ioreq_server and p2m_ioreq_server
>>>>>>>> are only supported for HVMs with HAP enabled.
>>>>>>>>
>>>>>>>> Also note that only after one ioreq server claims its ownership
>>>>>>>> of p2m_ioreq_server, will the p2m type change to p2m_ioreq_server
>>>>>>>> be allowed.
>>>>>>>>
>>>>>>>> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Signed-off-by: Yu Zhang <yu.c.zhang@linux.intel.com>
>>>>>>>> Acked-by: Tim Deegan <tim@xen.org>
>>>>>>>> ---
>>>>>>>> Cc: Paul Durrant <paul.durrant@citrix.com>
>>>>>>>> Cc: Jan Beulich <jbeulich@suse.com>
>>>>>>>> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>>>>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>>>>>>>> Cc: Jun Nakajima <jun.nakajima@intel.com>
>>>>>>>> Cc: Kevin Tian <kevin.tian@intel.com>
>>>>>>>> Cc: Tim Deegan <tim@xen.org>
>>>>>>>>
>>>>>>>> changes in v6:
>>>>>>>>       - Clarify logic in hvmemul_do_io().
>>>>>>>>       - Use recursive lock for ioreq server lock.
>>>>>>>>       - Remove debug print when mapping ioreq server.
>>>>>>>>       - Clarify code in ept_p2m_type_to_flags() for consistency.
>>>>>>>>       - Remove definition of P2M_IOREQ_HANDLE_WRITE_ACCESS.
>>>>>>>>       - Add comments for HVMMEM_ioreq_server to note only changes
>>>>>>>>         to/from HVMMEM_ram_rw are permitted.
>>>>>>>>       - Add domain_pause/unpause() in
>>>>>>>> hvm_map_mem_type_to_ioreq_server()
>>>>>>>>         to avoid the race condition when a vm exit happens on a
>>>>>>>> write-
>>>>>>>>         protected page, just to find the ioreq server has been
>>>>>>>> unmapped
>>>>>>>>         already.
>>>>>>>>       - Introduce a seperate patch to delay the release of p2m
>>>>>>>>         lock to avoid the race condition.
>>>>>>>>       - Introduce a seperate patch to handle the read-modify-write
>>>>>>>>         operations on a write protected page.
>>>>>>>>
>>>>>>> Why do we need to do this?  Won't the default case just DTRT if it
>>>>>>> finds
>>>>>>> that the ioreq server has been unmapped?
>>>>>> Well, patch 4 will either mark the remaining p2m_ioreq_server
>>>>>> entries as
>>>>>> "recal" or
>>>>>> reset to p2m_ram_rw directly. So my understanding is that we do not
>>>>>> wish
>>>>>> to
>>>>>> see a ept violation due to a p2m_ioreq_server access after the ioreq
>>>>>> server
>>>>>> is unmapped.
>>>>>> Yet without this domain_pause/unpause() pair, VM accesses may
>>>>>> trigger an
>>>>>> ept
>>>>>> violation
>>>>>> during the hvmop hypercall(hvm_map_mem_type_to_ioreq_server), just to
>>>>>> find
>>>>>> the ioreq
>>>>>> server is NULL. Then we would have to provide handlers which just
>>>>>> do the
>>>>>> copy to/from
>>>>>> actions for the VM. This seems awkward to me.
>>>>> So the race you're worried about is this:
>>>>>
>>>>> 1. Guest fault happens
>>>>> 2. ioreq server calls map_mem_type_to_ioreq_server, unhooking
>>>>> 3. guest  finds no ioreq server present
>>>>>
>>>>> I think in that case the easiest thing to do would be to simply assume
>>>>> there was a race and re-execute the instruction.  Is that not possible
>>>>> for some reason?
>>>>>
>>>>>     -George
>>>> Thanks for your reply, George. :)
>>>> Two reasons I'd like to use the domain_pause/unpause() to avoid the race
>>>> condition:
>>>>
>>>> 1>  Like my previous explanation, in the read-modify-write scenario, the
>>>> ioreq server will
>>>> be NULL for the read emulation. But in such case, hypervisor will not
>>>> discard this trap, instead
>>>> it is supposed to do the copy work for the read access. So it would be
>>>> difficult for hypervisor
>>>> to decide if the ioreq server was detached due to a race condition,
>>>> or if
>>>> the ioreq server should
>>>> be a NULL because we are emulating a read operation first for a
>>>> read-modify-write instruction.
>>> Wouldn't a patch like the attached work (applied on top of the whole
>>> series)?
>> Thanks for your patch, George. I think it should work for 1>. But we
>> still have the dead lock
>> problem.  :)
>>
>> BTW, do you think a domain_pause will cause any new problem?
> Well using a "big hammer" like domain_pause in a case like this is
> usually indicates that there are other issues that aren't being solved
> properly -- for instance, latent deadlocks or unhandled race conditions.
> :-)  Leaving those issues around in the codebase but "papered over" by
> domain_pause is storing up technical debt that future generations will
> inherit and need to untangle.  (Particularly as in this case, there was
> no comment *in the code* explaining what problems the domain_pause was
> there to solve, so anyone wanting to try to remove it would need to just
> figure out.)
>
> domain_pause is relatively expensive to do (since you have to spin
> waiting for all the vcpus to finish running) and completely stops the
> domain from handling interrupts or anything for an arbitrary amount of
> time.  As long as it doesn't happen often, the cost shouldn't be a major
> issue; and as long as the domain_pause is short (less than 100ms), then
> it shouldn't cause any more problems than running on a fairly busy
> system should cause.
>
> On the other hand, if every time we ran into a tricky situation we just
> did a domain pause rather than solving the root issue, pretty soon the
> domain would be paused several times per second.  The performance would
> plummet, and fixing it would be a nightmare because you'd have hundreds
> of undocumented issues to try to understand and fix.
>
> So: the domain_pause itself isn't terrible (although it's better to
> avoid it if we can); what's more of a problem is the potential issues
> that it's hiding.  These issues can add up, so it's important to push
> back and ask "why do we need this and can we solve it a different way"
> pro-actively, as patches come in, rather than waiting until it becomes
> an issue.
>
>   -George

Thanks for your thorough explanation on this point. And I agree. :)

I have given a proposal to solve this deadlock issue in another mail. 
Nested locks are
key to the potential deadlock, and I believe they are not necessary in 
this case, so using
these locks sequentially could be our way out.

And as to domain pause, I can remove them if we can accept the 
possibility when an
ept violation happens, yet to find the ioreq server has already been 
unmapped(discarding
this operation).

B.R.
Yu
diff mbox

Patch

From 4e4b14641cd94c0c9fe64606b329cdbbf9c6a92b Mon Sep 17 00:00:00 2001
From: George Dunlap <george.dunlap@citrix.com>
Date: Thu, 22 Sep 2016 12:30:26 +0100
Subject: [PATCH] x86/hvm: Handle both ioreq detach races and read-modify-write
 instructions

Compile-tested only.

Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
 xen/arch/x86/hvm/emulate.c | 68 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 56 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 564c117..120ef86 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -214,40 +214,84 @@  static int hvmemul_do_io(
         break;
     case X86EMUL_UNHANDLEABLE:
     {
+        /* 
+         * Xen isn't emulating the instruction internally, so see if
+         * there's an ioreq server that can handle it.  Rules:
+         *
+         * - PIO and "normal" mmio run through
+         * hvm_select_ioreq_server() to choose the ioreq server by
+         * range.  If no server is found, the access is ignored.
+         *
+         * - p2m_ioreq_server accesses are handled by the current
+         * ioreq_server for the domain, but there are some corner
+         * cases:
+         *
+         *   - If the domain ioreq_server is NULL, assume this is a
+         *   race between unlooking the ioreq server and
+         *   p2m_type_change and re-try the instruction.
+         *
+         *   - If the IOREQ_MEM_ACCESS_WRITE flag is not set, treat it
+         *   like a normal PIO or MMIO that doesn't have an ioreq
+         *   server (i.e., by ignoring it).
+         *
+         *   - If the accesss is a read, this must be part of a
+         *   read-modify-write instruction; emulate the read so that we have
+         *   it
+         */
+
         struct hvm_ioreq_server *s = NULL;
         p2m_type_t p2mt = p2m_invalid;
-
+        
         if ( is_mmio )
         {
             unsigned long gmfn = paddr_to_pfn(addr);
 
             (void) get_gfn_query_unlocked(currd, gmfn, &p2mt);
 
-            if ( p2mt == p2m_ioreq_server && dir == IOREQ_WRITE )
+            if ( p2mt == p2m_ioreq_server )
             {
                 unsigned int flags;
 
                 s = p2m_get_ioreq_server(currd, &flags);
+
+                /* 
+                 * If p2mt is ioreq_server but ioreq_server is NULL,
+                 * we probably lost a race with p2m_type_change; just
+                 * retry the access.
+                 */
+                if ( s == NULL )
+                {
+                    rc = X86EMUL_RETRY;
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                /* 
+                 * This is part of a read-modify-write instruction.
+                 * Emulate the read part so we have the value cached.
+                 */
+                if ( dir == IOREQ_READ )
+                {
+                    rc = hvm_process_io_intercept(&mem_handler, &p);
+                    vio->io_req.state = STATE_IOREQ_NONE;
+                    break;
+                }
+
+                
                 if ( !(flags & XEN_HVMOP_IOREQ_MEM_ACCESS_WRITE) )
+                {
                     s = NULL;
+                }
             }
         }
 
         if ( !s && p2mt != p2m_ioreq_server )
             s = hvm_select_ioreq_server(currd, &p);
 
-        /* If there is no suitable backing DM, just ignore accesses */
         if ( !s )
         {
-            /*
-             * For p2m_ioreq_server pages accessed with read-modify-write
-             * instructions, we provide a read handler to copy the data to
-             * the buffer.
-             */
-            if ( p2mt == p2m_ioreq_server )
-                rc = hvm_process_io_intercept(&mem_handler, &p);
-            else
-                rc = hvm_process_io_intercept(&null_handler, &p);
+            /* If there is no suitable backing DM, just ignore accesses */
+            rc = hvm_process_io_intercept(&null_handler, &p);
             vio->io_req.state = STATE_IOREQ_NONE;
         }
         else
-- 
2.1.4