diff mbox

[PULL,3/3] vhost-user: Attempt to fix a race with set_mem_table.

Message ID 1470842980-32481-4-git-send-email-mst@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Michael S. Tsirkin Aug. 10, 2016, 3:30 p.m. UTC
From: Prerna Saxena <prerna.saxena@nutanix.com>

The set_mem_table command currently does not seek a reply. Hence, there is
no easy way for a remote application to notify to QEMU when it finished
setting up memory, or if there were errors doing so.

As an example:
(1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
application). SET_MEM_TABLE does not require a reply according to the spec.
(2) Qemu commits the memory to the guest.
(3) Guest issues an I/O operation over a new memory region which was configured on (1).
(4) The application has not yet remapped the memory, but it sees the I/O request.
(5) The application cannot satisfy the request because it does not know about those GPAs.

While a guaranteed fix would require a protocol extension (committed separately),
a best-effort workaround for existing applications is to send a GET_FEATURES
message before completing the vhost_user_set_mem_table() call.
Since GET_FEATURES requires a reply, an application that processes vhost-user
messages synchronously would probably have completed the SET_MEM_TABLE before replying.

Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 hw/virtio/vhost-user.c | 127 ++++++++++++++++++++++++++-----------------------
 1 file changed, 67 insertions(+), 60 deletions(-)

Comments

Fam Zheng Aug. 12, 2016, 6:38 a.m. UTC | #1
On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> From: Prerna Saxena <prerna.saxena@nutanix.com>
> 
> The set_mem_table command currently does not seek a reply. Hence, there is
> no easy way for a remote application to notify to QEMU when it finished
> setting up memory, or if there were errors doing so.
> 
> As an example:
> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> application). SET_MEM_TABLE does not require a reply according to the spec.
> (2) Qemu commits the memory to the guest.
> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> (4) The application has not yet remapped the memory, but it sees the I/O request.
> (5) The application cannot satisfy the request because it does not know about those GPAs.
> 
> While a guaranteed fix would require a protocol extension (committed separately),
> a best-effort workaround for existing applications is to send a GET_FEATURES
> message before completing the vhost_user_set_mem_table() call.
> Since GET_FEATURES requires a reply, an application that processes vhost-user
> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> 
> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

Sporadic hangs are seen with test-vhost-user after this patch:

https://travis-ci.org/qemu/qemu/builds

Reverting seems to fix it for me.

Is this a known problem?

Fam
Prerna Saxena Aug. 12, 2016, 7:16 a.m. UTC | #2
On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:





>On Wed, 08/10 18:30, Michael S. Tsirkin wrote:

>> From: Prerna Saxena <prerna.saxena@nutanix.com>

>> 

>> The set_mem_table command currently does not seek a reply. Hence, there is

>> no easy way for a remote application to notify to QEMU when it finished

>> setting up memory, or if there were errors doing so.

>> 

>> As an example:

>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net

>> application). SET_MEM_TABLE does not require a reply according to the spec.

>> (2) Qemu commits the memory to the guest.

>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).

>> (4) The application has not yet remapped the memory, but it sees the I/O request.

>> (5) The application cannot satisfy the request because it does not know about those GPAs.

>> 

>> While a guaranteed fix would require a protocol extension (committed separately),

>> a best-effort workaround for existing applications is to send a GET_FEATURES

>> message before completing the vhost_user_set_mem_table() call.

>> Since GET_FEATURES requires a reply, an application that processes vhost-user

>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.

>> 

>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>

>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

>

>Sporadic hangs are seen with test-vhost-user after this patch:

>

>https://travis-ci.org/qemu/qemu/builds

>

>Reverting seems to fix it for me.

>

>Is this a known problem?

>

>Fam


Hi Fam,
Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
I am setting up the docker test env to repro this, but I think I can guess the problem :

In tests/vhost-user-test.c: 

static void chr_read(void *opaque, const uint8_t *buf, int size)
{
..[snip]..

    case VHOST_USER_SET_MEM_TABLE:
       /* received the mem table */
       memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
       s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));


       /* signal the test that it can continue */
       g_cond_signal(&s->data_cond);
       break;
..[snip]..
}


The test seems to be marked complete as soon as mem_table is copied. 
However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 

Thoughts : Michael, Fam, MarcAndre ?

Regards,
Prerna
Marc-André Lureau Aug. 12, 2016, 7:20 a.m. UTC | #3
Hi

----- Original Message -----
> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> may break existing vhost applications too ? If so, reverting it possibly
> better.
> What confuses me is why it doesn’t fail all the time, but only about 20% to
> 30% time as Fam reports.
> 
> Thoughts : Michael, Fam, MarcAndre ?

Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.

thanks
Peter Maydell Aug. 12, 2016, 12:01 p.m. UTC | #4
On 12 August 2016 at 08:20, Marc-André Lureau <mlureau@redhat.com> wrote:
> Hi
>
> ----- Original Message -----
>> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
>> may break existing vhost applications too ? If so, reverting it possibly
>> better.
>> What confuses me is why it doesn’t fail all the time, but only about 20% to
>> 30% time as Fam reports.
>>
>> Thoughts : Michael, Fam, MarcAndre ?
>
> Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.


If somebody would like to send a revert-patch to the list I'll apply
it to master (please cc me as I suspect the mailing list server is
down at the moment...) I've been seeing these failures in the
build tests I run for merges.

thanks
-- PMM
Michael S. Tsirkin Aug. 12, 2016, 3:47 p.m. UTC | #5
On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> > may break existing vhost applications too ? If so, reverting it possibly
> > better.
> > What confuses me is why it doesn’t fail all the time, but only about 20% to
> > 30% time as Fam reports.
> > 
> > Thoughts : Michael, Fam, MarcAndre ?
> 
> Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.
> 
> thanks

I guess that's the safest thing to do for 2.7.
At least that's not any worse than 2.6.
I still think it's a good idea long term and test should be fixed,
but let's revert for now.
Michael S. Tsirkin Aug. 12, 2016, 3:49 p.m. UTC | #6
On Fri, Aug 12, 2016 at 01:01:16PM +0100, Peter Maydell wrote:
> On 12 August 2016 at 08:20, Marc-André Lureau <mlureau@redhat.com> wrote:
> > Hi
> >
> > ----- Original Message -----
> >> sent a follow-up response to GET_FEATURES), I am now wondering if this patch
> >> may break existing vhost applications too ? If so, reverting it possibly
> >> better.
> >> What confuses me is why it doesn’t fail all the time, but only about 20% to
> >> 30% time as Fam reports.
> >>
> >> Thoughts : Michael, Fam, MarcAndre ?
> >
> > Indeed, I didn't ack that patch in the first place for that kind of reasons, so I would revert it.
> 
> 
> If somebody would like to send a revert-patch to the list I'll apply
> it to master (please cc me as I suspect the mailing list server is
> down at the moment...) I've been seeing these failures in the
> build tests I run for merges.
> 
> thanks
> -- PMM

Will do right now.
Marc-André Lureau Aug. 12, 2016, 3:54 p.m. UTC | #7
Hi

----- Original Message -----
> On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > Hi
> > 
> > ----- Original Message -----
> > > sent a follow-up response to GET_FEATURES), I am now wondering if this
> > > patch
> > > may break existing vhost applications too ? If so, reverting it possibly
> > > better.
> > > What confuses me is why it doesn’t fail all the time, but only about 20%
> > > to
> > > 30% time as Fam reports.
> > > 
> > > Thoughts : Michael, Fam, MarcAndre ?
> > 
> > Indeed, I didn't ack that patch in the first place for that kind of
> > reasons, so I would revert it.
> > 
> > thanks
> 
> I guess that's the safest thing to do for 2.7.
> At least that's not any worse than 2.6.
> I still think it's a good idea long term and test should be fixed,
> but let's revert for now.
> 

What about other backends that may have similar expectations from the protocol.

This patch is a hack, there is no reason to have it upstream. The solution is provided with the REPLY_ACK patch.
Michael S. Tsirkin Aug. 12, 2016, 9:12 p.m. UTC | #8
On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > sent a follow-up response to GET_FEATURES), I am now wondering if this
> > > > patch
> > > > may break existing vhost applications too ? If so, reverting it possibly
> > > > better.
> > > > What confuses me is why it doesn’t fail all the time, but only about 20%
> > > > to
> > > > 30% time as Fam reports.
> > > > 
> > > > Thoughts : Michael, Fam, MarcAndre ?
> > > 
> > > Indeed, I didn't ack that patch in the first place for that kind of
> > > reasons, so I would revert it.
> > > 
> > > thanks
> > 
> > I guess that's the safest thing to do for 2.7.
> > At least that's not any worse than 2.6.
> > I still think it's a good idea long term and test should be fixed,
> > but let's revert for now.
> > 
> 
> What about other backends that may have similar expectations from the protocol.
> 
> This patch is a hack, there is no reason to have it upstream.

The reason is to avoid crashes with existing backends.

> The solution is provided with the REPLY_ACK patch.

It needs a backend update though.

But the issue is old, it's not a regression. I think we lose nothing
by pushing the work-around out until after 2.7.
Marc-André Lureau Aug. 13, 2016, 6:13 a.m. UTC | #9
Hi

----- Original Message -----
> On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > Hi
> > 
> > ----- Original Message -----
> > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > Hi
> > > > 
> > > > ----- Original Message -----
> > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > this
> > > > > patch
> > > > > may break existing vhost applications too ? If so, reverting it
> > > > > possibly
> > > > > better.
> > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > 20%
> > > > > to
> > > > > 30% time as Fam reports.
> > > > > 
> > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > 
> > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > reasons, so I would revert it.
> > > > 
> > > > thanks
> > > 
> > > I guess that's the safest thing to do for 2.7.
> > > At least that's not any worse than 2.6.
> > > I still think it's a good idea long term and test should be fixed,
> > > but let's revert for now.
> > > 
> > 
> > What about other backends that may have similar expectations from the
> > protocol.
> > 
> > This patch is a hack, there is no reason to have it upstream.
> 
> The reason is to avoid crashes with existing backends.

Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...


> > The solution is provided with the REPLY_ACK patch.
> 
> It needs a backend update though.
> 
> But the issue is old, it's not a regression. I think we lose nothing
> by pushing the work-around out until after 2.7.
> 
> --
> MST
>
Michael S. Tsirkin Aug. 14, 2016, 2:30 a.m. UTC | #10
On Sat, Aug 13, 2016 at 02:13:46AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > ----- Original Message -----
> > > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > > this
> > > > > > patch
> > > > > > may break existing vhost applications too ? If so, reverting it
> > > > > > possibly
> > > > > > better.
> > > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > > 20%
> > > > > > to
> > > > > > 30% time as Fam reports.
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > > reasons, so I would revert it.
> > > > > 
> > > > > thanks
> > > > 
> > > > I guess that's the safest thing to do for 2.7.
> > > > At least that's not any worse than 2.6.
> > > > I still think it's a good idea long term and test should be fixed,
> > > > but let's revert for now.
> > > > 
> > > 
> > > What about other backends that may have similar expectations from the
> > > protocol.
> > > 
> > > This patch is a hack, there is no reason to have it upstream.
> > 
> > The reason is to avoid crashes with existing backends.
> 
> Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...
> 

Not guaranteed, but it helps with all those I'm aware of.

> > > The solution is provided with the REPLY_ACK patch.
> > 
> > It needs a backend update though.
> > 
> > But the issue is old, it's not a regression. I think we lose nothing
> > by pushing the work-around out until after 2.7.
> > 
> > --
> > MST
> >
Michael S. Tsirkin Aug. 14, 2016, 2:44 a.m. UTC | #11
On Sat, Aug 13, 2016 at 02:13:46AM -0400, Marc-André Lureau wrote:
> Hi
> 
> ----- Original Message -----
> > On Fri, Aug 12, 2016 at 11:54:54AM -0400, Marc-André Lureau wrote:
> > > Hi
> > > 
> > > ----- Original Message -----
> > > > On Fri, Aug 12, 2016 at 03:20:56AM -0400, Marc-André Lureau wrote:
> > > > > Hi
> > > > > 
> > > > > ----- Original Message -----
> > > > > > sent a follow-up response to GET_FEATURES), I am now wondering if
> > > > > > this
> > > > > > patch
> > > > > > may break existing vhost applications too ? If so, reverting it
> > > > > > possibly
> > > > > > better.
> > > > > > What confuses me is why it doesn’t fail all the time, but only about
> > > > > > 20%
> > > > > > to
> > > > > > 30% time as Fam reports.
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > Indeed, I didn't ack that patch in the first place for that kind of
> > > > > reasons, so I would revert it.
> > > > > 
> > > > > thanks
> > > > 
> > > > I guess that's the safest thing to do for 2.7.
> > > > At least that's not any worse than 2.6.
> > > > I still think it's a good idea long term and test should be fixed,
> > > > but let's revert for now.
> > > > 
> > > 
> > > What about other backends that may have similar expectations from the
> > > protocol.
> > > 
> > > This patch is a hack, there is no reason to have it upstream.
> > 
> > The reason is to avoid crashes with existing backends.
> 
> Which backend? I had a similar issue, it wasn't about crashes, and Prerna didn't mention crashes either, but anyway there is not guarantee that adding a GET_FEATURES message will solve it...
> 


So what bothers me the most about dropping this is that
there are no backends with the new feature implemented
at the moment.

The GET_FEATURES hack at least makes it easy to test
with existing backends ...




> > > The solution is provided with the REPLY_ACK patch.
> > 
> > It needs a backend update though.
> > 
> > But the issue is old, it's not a regression. I think we lose nothing
> > by pushing the work-around out until after 2.7.
> > 
> > --
> > MST
> >
Michael S. Tsirkin Aug. 14, 2016, 2:51 a.m. UTC | #12
On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> 
> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> 
> 
> 
> 
> 
> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> >> From: Prerna Saxena <prerna.saxena@nutanix.com>
> >> 
> >> The set_mem_table command currently does not seek a reply. Hence, there is
> >> no easy way for a remote application to notify to QEMU when it finished
> >> setting up memory, or if there were errors doing so.
> >> 
> >> As an example:
> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> >> application). SET_MEM_TABLE does not require a reply according to the spec.
> >> (2) Qemu commits the memory to the guest.
> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> >> (4) The application has not yet remapped the memory, but it sees the I/O request.
> >> (5) The application cannot satisfy the request because it does not know about those GPAs.
> >> 
> >> While a guaranteed fix would require a protocol extension (committed separately),
> >> a best-effort workaround for existing applications is to send a GET_FEATURES
> >> message before completing the vhost_user_set_mem_table() call.
> >> Since GET_FEATURES requires a reply, an application that processes vhost-user
> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> >> 
> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> >Sporadic hangs are seen with test-vhost-user after this patch:
> >
> >https://travis-ci.org/qemu/qemu/builds
> >
> >Reverting seems to fix it for me.
> >
> >Is this a known problem?
> >
> >Fam
> 
> Hi Fam,
> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> I am setting up the docker test env to repro this, but I think I can guess the problem :
> 
> In tests/vhost-user-test.c: 
> 
> static void chr_read(void *opaque, const uint8_t *buf, int size)
> {
> ..[snip]..
> 
>     case VHOST_USER_SET_MEM_TABLE:
>        /* received the mem table */
>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> 
> 
>        /* signal the test that it can continue */
>        g_cond_signal(&s->data_cond);
>        break;
> ..[snip]..
> }
> 
> 
> The test seems to be marked complete as soon as mem_table is copied. 
> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)

Hmm but why does it matter that data_cond is woken up?


> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.

What bothers me is that the new feature might cause the same
issue once we enable it in the test.

How about a patch to tests/vhost-user-test.c adding the new
protocol feature? I would be quite interested to see what
is going on with it.


> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 

And succeeds every time on my systems :(

> 
> Thoughts : Michael, Fam, MarcAndre ?
> 
> Regards,
> Prerna
Prerna Saxena Aug. 14, 2016, 9:42 a.m. UTC | #13
On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:


>On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:

>> 

>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:

>> 

>> 

>> 

>> 

>> 

>> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:

>> >> From: Prerna Saxena <prerna.saxena@nutanix.com>

>> >> 

>> >> The set_mem_table command currently does not seek a reply. Hence, there is

>> >> no easy way for a remote application to notify to QEMU when it finished

>> >> setting up memory, or if there were errors doing so.

>> >> 

>> >> As an example:

>> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net

>> >> application). SET_MEM_TABLE does not require a reply according to the spec.

>> >> (2) Qemu commits the memory to the guest.

>> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).

>> >> (4) The application has not yet remapped the memory, but it sees the I/O request.

>> >> (5) The application cannot satisfy the request because it does not know about those GPAs.

>> >> 

>> >> While a guaranteed fix would require a protocol extension (committed separately),

>> >> a best-effort workaround for existing applications is to send a GET_FEATURES

>> >> message before completing the vhost_user_set_mem_table() call.

>> >> Since GET_FEATURES requires a reply, an application that processes vhost-user

>> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.

>> >> 

>> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>

>> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>

>> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

>> >

>> >Sporadic hangs are seen with test-vhost-user after this patch:

>> >

>> >https://travis-ci.org/qemu/qemu/builds

>> >

>> >Reverting seems to fix it for me.

>> >

>> >Is this a known problem?

>> >

>> >Fam

>> 

>> Hi Fam,

>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.

>> I am setting up the docker test env to repro this, but I think I can guess the problem :

>> 

>> In tests/vhost-user-test.c: 

>> 

>> static void chr_read(void *opaque, const uint8_t *buf, int size)

>> {

>> ..[snip]..

>> 

>>     case VHOST_USER_SET_MEM_TABLE:

>>        /* received the mem table */

>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));

>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));

>> 

>> 

>>        /* signal the test that it can continue */

>>        g_cond_signal(&s->data_cond);

>>        break;

>> ..[snip]..

>> }

>> 

>> 

>> The test seems to be marked complete as soon as mem_table is copied. 

>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)

>

>Hmm but why does it matter that data_cond is woken up?


Michael, sorry, I didn’t quite understand that. Could you pls explain ?

>

>

>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.

>

>What bothers me is that the new feature might cause the same

>issue once we enable it in the test.


No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.

>

>How about a patch to tests/vhost-user-test.c adding the new

>protocol feature? I would be quite interested to see what

>is going on with it.


Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.

>

>

>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 

>

>And succeeds every time on my systems :(


+1 to that :( I have had no luck repro’ing it.

>

>> 

>> Thoughts : Michael, Fam, MarcAndre ?

>> 

>> Regards,

>>


Prerna
Michael S. Tsirkin Aug. 14, 2016, 9:39 p.m. UTC | #14
On Sun, Aug 14, 2016 at 09:42:11AM +0000, Prerna Saxena wrote:
> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> 
> >On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> >> 
> >> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> >> 
> >> 
> >> 
> >> 
> >> 
> >> >On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> >> >> From: Prerna Saxena <prerna.saxena@nutanix.com>
> >> >> 
> >> >> The set_mem_table command currently does not seek a reply. Hence, there is
> >> >> no easy way for a remote application to notify to QEMU when it finished
> >> >> setting up memory, or if there were errors doing so.
> >> >> 
> >> >> As an example:
> >> >> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> >> >> application). SET_MEM_TABLE does not require a reply according to the spec.
> >> >> (2) Qemu commits the memory to the guest.
> >> >> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> >> >> (4) The application has not yet remapped the memory, but it sees the I/O request.
> >> >> (5) The application cannot satisfy the request because it does not know about those GPAs.
> >> >> 
> >> >> While a guaranteed fix would require a protocol extension (committed separately),
> >> >> a best-effort workaround for existing applications is to send a GET_FEATURES
> >> >> message before completing the vhost_user_set_mem_table() call.
> >> >> Since GET_FEATURES requires a reply, an application that processes vhost-user
> >> >> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> >> >> 
> >> >> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> >> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> >> >> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >> >
> >> >Sporadic hangs are seen with test-vhost-user after this patch:
> >> >
> >> >https://travis-ci.org/qemu/qemu/builds
> >> >
> >> >Reverting seems to fix it for me.
> >> >
> >> >Is this a known problem?
> >> >
> >> >Fam
> >> 
> >> Hi Fam,
> >> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> >> I am setting up the docker test env to repro this, but I think I can guess the problem :
> >> 
> >> In tests/vhost-user-test.c: 
> >> 
> >> static void chr_read(void *opaque, const uint8_t *buf, int size)
> >> {
> >> ..[snip]..
> >> 
> >>     case VHOST_USER_SET_MEM_TABLE:
> >>        /* received the mem table */
> >>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> >>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> >> 
> >> 
> >>        /* signal the test that it can continue */
> >>        g_cond_signal(&s->data_cond);
> >>        break;
> >> ..[snip]..
> >> }
> >> 
> >> 
> >> The test seems to be marked complete as soon as mem_table is copied. 
> >> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> >
> >Hmm but why does it matter that data_cond is woken up?
> 
> Michael, sorry, I didn’t quite understand that. Could you pls explain ?


We do g_cond_signal but I don't see where does it prevent
test from responding to GET_FEATURES. Except if test exited
signaling success - but then why do we care?


> >
> >
> >> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> >
> >What bothers me is that the new feature might cause the same
> >issue once we enable it in the test.
> 
> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.

Absolutely - this reduces the risk - but how do we know that whatever
problem causes the test failures is not there with the new feature?


> >
> >How about a patch to tests/vhost-user-test.c adding the new
> >protocol feature? I would be quite interested to see what
> >is going on with it.
> 
> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.

If that passes on travis, then we'll be more confident -
after all it is the travis failures that cause us to
reconsider the work-around.

> >
> >
> >> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports. 
> >
> >And succeeds every time on my systems :(
> 
> +1 to that :( I have had no luck repro’ing it.
> 
> >
> >> 
> >> Thoughts : Michael, Fam, MarcAndre ?
> >> 
> >> Regards,
> >>
> 
> Prerna
Peter Maydell Aug. 15, 2016, 10:20 a.m. UTC | #15
On 12 August 2016 at 16:49, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Aug 12, 2016 at 01:01:16PM +0100, Peter Maydell wrote:
>> If somebody would like to send a revert-patch to the list I'll apply
>> it to master (please cc me as I suspect the mailing list server is
>> down at the moment...) I've been seeing these failures in the
>> build tests I run for merges.

> Will do right now.

Ping. This is blocking everything else for rc3 right now :-(
If I don't see a patch in the next few hours I'll just revert it
directly...

thanks
-- PMM
Maxime Coquelin Aug. 31, 2016, 11:19 a.m. UTC | #16
On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>
>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>
>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>
>>>
>>>
>>>
>>>
>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>
>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>> setting up memory, or if there were errors doing so.
>>>>>
>>>>> As an example:
>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>> (2) Qemu commits the memory to the guest.
>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>
>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>
>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>
>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>
>>>> https://travis-ci.org/qemu/qemu/builds
>>>>
>>>> Reverting seems to fix it for me.
>>>>
>>>> Is this a known problem?
>>>>
>>>> Fam
>>>
>>> Hi Fam,
>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>
>>> In tests/vhost-user-test.c:
>>>
>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>> {
>>> ..[snip]..
>>>
>>>     case VHOST_USER_SET_MEM_TABLE:
>>>        /* received the mem table */
>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>
>>>
>>>        /* signal the test that it can continue */
>>>        g_cond_signal(&s->data_cond);
>>>        break;
>>> ..[snip]..
>>> }
>>>
>>>
>>> The test seems to be marked complete as soon as mem_table is copied.
>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>
>> Hmm but why does it matter that data_cond is woken up?
>
> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>
>>
>>
>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>
>> What bothers me is that the new feature might cause the same
>> issue once we enable it in the test.
>
> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>
>>
>> How about a patch to tests/vhost-user-test.c adding the new
>> protocol feature? I would be quite interested to see what
>> is going on with it.
>
> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>
>>
>>
>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>
>> And succeeds every time on my systems :(
>
> +1 to that :( I have had no luck repro’ing it.
>
>>
>>>
>>> Thoughts : Michael, Fam, MarcAndre ?

I have managed to reproduce the hang by adding some debug prints into
vhost_user_get_features().

Doing this the issue is reproducible quite easily.
Another way to reproduce it in one shot is to strace (with following
forks option) vhost-user-test execution.

So, by adding debug prints at vhost_user_get_features() entry and exit,
we can see we never return from this function when hang happens.
Strace of Qemu instance shows that its thread keeps retrying to receive
GET_FEATURE reply:

write(1, "vhost_user_get_features IN: \n", 29) = 29
sendmsg(11, {msg_name=NULL, msg_namelen=0,
         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
...
recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0

The reason is that vhost-user-test never replies to Qemu,
because its thread handling the GET_FEATURES command is waiting for
the s->data_mutex lock.
This lock is held by the other vhost-user-test thread, executing
read_guest_mem().

The lock is never released because the thread is blocked in read
syscall, when read_guest_mem() is doing the readl().

This is because on Qemu side, the thread polling the qtest socket is
waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
the mutex is held by the thread trying to get the GET_FEATURE reply
(the TCG one).

So here is the deadlock.

That said, I don't see a clean way to solve this.
Any thoughts?

Regards,
Maxime
Michael S. Tsirkin Sept. 1, 2016, 1:46 p.m. UTC | #17
On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> 
> 
> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> > 
> > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > 
> > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > 
> > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > setting up memory, or if there were errors doing so.
> > > > > > 
> > > > > > As an example:
> > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > (2) Qemu commits the memory to the guest.
> > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > 
> > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > 
> > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > 
> > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > 
> > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > 
> > > > > Reverting seems to fix it for me.
> > > > > 
> > > > > Is this a known problem?
> > > > > 
> > > > > Fam
> > > > 
> > > > Hi Fam,
> > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > 
> > > > In tests/vhost-user-test.c:
> > > > 
> > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > {
> > > > ..[snip]..
> > > > 
> > > >     case VHOST_USER_SET_MEM_TABLE:
> > > >        /* received the mem table */
> > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > 
> > > > 
> > > >        /* signal the test that it can continue */
> > > >        g_cond_signal(&s->data_cond);
> > > >        break;
> > > > ..[snip]..
> > > > }
> > > > 
> > > > 
> > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > 
> > > Hmm but why does it matter that data_cond is woken up?
> > 
> > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > 
> > > 
> > > 
> > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > 
> > > What bothers me is that the new feature might cause the same
> > > issue once we enable it in the test.
> > 
> > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > 
> > > 
> > > How about a patch to tests/vhost-user-test.c adding the new
> > > protocol feature? I would be quite interested to see what
> > > is going on with it.
> > 
> > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > 
> > > 
> > > 
> > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > 
> > > And succeeds every time on my systems :(
> > 
> > +1 to that :( I have had no luck repro’ing it.
> > 
> > > 
> > > > 
> > > > Thoughts : Michael, Fam, MarcAndre ?
> 
> I have managed to reproduce the hang by adding some debug prints into
> vhost_user_get_features().
> 
> Doing this the issue is reproducible quite easily.
> Another way to reproduce it in one shot is to strace (with following
> forks option) vhost-user-test execution.
> 
> So, by adding debug prints at vhost_user_get_features() entry and exit,
> we can see we never return from this function when hang happens.
> Strace of Qemu instance shows that its thread keeps retrying to receive
> GET_FEATURE reply:
> 
> write(1, "vhost_user_get_features IN: \n", 29) = 29
> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> ...
> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> 
> The reason is that vhost-user-test never replies to Qemu,
> because its thread handling the GET_FEATURES command is waiting for
> the s->data_mutex lock.
> This lock is held by the other vhost-user-test thread, executing
> read_guest_mem().
> 
> The lock is never released because the thread is blocked in read
> syscall, when read_guest_mem() is doing the readl().
> 
> This is because on Qemu side, the thread polling the qtest socket is
> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> the mutex is held by the thread trying to get the GET_FEATURE reply
> (the TCG one).
> 
> So here is the deadlock.
> 
> That said, I don't see a clean way to solve this.
> Any thoughts?
> 
> Regards,
> Maxime

My thought is that we really need to do what I said:
avoid doing GET_FEATURES (and setting reply_ack)
on the first set_mem, and I quote:

	OK this all looks very reasonable (and I do like patch 1 too)
	but there's one source of waste here: we do not need to
	synchronize when we set up device the first time
	when hdev->memory_changed is false.

	I think we should test that and skip synch in both patches
	unless  hdev->memory_changed is set.

with that change test will start passing.
Maxime Coquelin Sept. 2, 2016, 8:57 a.m. UTC | #18
On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
>>
>>
>> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
>>> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>
>>>
>>>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>>>
>>>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>
>>>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>>>> setting up memory, or if there were errors doing so.
>>>>>>>
>>>>>>> As an example:
>>>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>>>> (2) Qemu commits the memory to the guest.
>>>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>>>
>>>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>>>
>>>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>
>>>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>>>
>>>>>> https://travis-ci.org/qemu/qemu/builds
>>>>>>
>>>>>> Reverting seems to fix it for me.
>>>>>>
>>>>>> Is this a known problem?
>>>>>>
>>>>>> Fam
>>>>>
>>>>> Hi Fam,
>>>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>>>
>>>>> In tests/vhost-user-test.c:
>>>>>
>>>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>>>> {
>>>>> ..[snip]..
>>>>>
>>>>>     case VHOST_USER_SET_MEM_TABLE:
>>>>>        /* received the mem table */
>>>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>>>
>>>>>
>>>>>        /* signal the test that it can continue */
>>>>>        g_cond_signal(&s->data_cond);
>>>>>        break;
>>>>> ..[snip]..
>>>>> }
>>>>>
>>>>>
>>>>> The test seems to be marked complete as soon as mem_table is copied.
>>>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>>>
>>>> Hmm but why does it matter that data_cond is woken up?
>>>
>>> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>>>
>>>>
>>>>
>>>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>>>
>>>> What bothers me is that the new feature might cause the same
>>>> issue once we enable it in the test.
>>>
>>> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>>>
>>>>
>>>> How about a patch to tests/vhost-user-test.c adding the new
>>>> protocol feature? I would be quite interested to see what
>>>> is going on with it.
>>>
>>> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>>>
>>>>
>>>>
>>>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>>>
>>>> And succeeds every time on my systems :(
>>>
>>> +1 to that :( I have had no luck repro’ing it.
>>>
>>>>
>>>>>
>>>>> Thoughts : Michael, Fam, MarcAndre ?
>>
>> I have managed to reproduce the hang by adding some debug prints into
>> vhost_user_get_features().
>>
>> Doing this the issue is reproducible quite easily.
>> Another way to reproduce it in one shot is to strace (with following
>> forks option) vhost-user-test execution.
>>
>> So, by adding debug prints at vhost_user_get_features() entry and exit,
>> we can see we never return from this function when hang happens.
>> Strace of Qemu instance shows that its thread keeps retrying to receive
>> GET_FEATURE reply:
>>
>> write(1, "vhost_user_get_features IN: \n", 29) = 29
>> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>> ...
>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>
>> The reason is that vhost-user-test never replies to Qemu,
>> because its thread handling the GET_FEATURES command is waiting for
>> the s->data_mutex lock.
>> This lock is held by the other vhost-user-test thread, executing
>> read_guest_mem().
>>
>> The lock is never released because the thread is blocked in read
>> syscall, when read_guest_mem() is doing the readl().
>>
>> This is because on Qemu side, the thread polling the qtest socket is
>> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
>> the mutex is held by the thread trying to get the GET_FEATURE reply
>> (the TCG one).
>>
>> So here is the deadlock.
>>
>> That said, I don't see a clean way to solve this.
>> Any thoughts?
>>
>> Regards,
>> Maxime
>
> My thought is that we really need to do what I said:
> avoid doing GET_FEATURES (and setting reply_ack)
> on the first set_mem, and I quote:
>
> 	OK this all looks very reasonable (and I do like patch 1 too)
> 	but there's one source of waste here: we do not need to
> 	synchronize when we set up device the first time
> 	when hdev->memory_changed is false.
>
> 	I think we should test that and skip synch in both patches
> 	unless  hdev->memory_changed is set.
>
> with that change test will start passing.

Actually, it looks like memory_changed is true even at first
SET_MEM_TABLE request.

Thanks,
Maxime
Michael S. Tsirkin Sept. 2, 2016, 5:29 p.m. UTC | #19
On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > 
> > > > 
> > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > > > 
> > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > 
> > > > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > > > setting up memory, or if there were errors doing so.
> > > > > > > > 
> > > > > > > > As an example:
> > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > > > (2) Qemu commits the memory to the guest.
> > > > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > > > 
> > > > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > > > 
> > > > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > 
> > > > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > > > 
> > > > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > > > 
> > > > > > > Reverting seems to fix it for me.
> > > > > > > 
> > > > > > > Is this a known problem?
> > > > > > > 
> > > > > > > Fam
> > > > > > 
> > > > > > Hi Fam,
> > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > > > 
> > > > > > In tests/vhost-user-test.c:
> > > > > > 
> > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > > > {
> > > > > > ..[snip]..
> > > > > > 
> > > > > >     case VHOST_USER_SET_MEM_TABLE:
> > > > > >        /* received the mem table */
> > > > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > > > 
> > > > > > 
> > > > > >        /* signal the test that it can continue */
> > > > > >        g_cond_signal(&s->data_cond);
> > > > > >        break;
> > > > > > ..[snip]..
> > > > > > }
> > > > > > 
> > > > > > 
> > > > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > > > 
> > > > > Hmm but why does it matter that data_cond is woken up?
> > > > 
> > > > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > > > 
> > > > > 
> > > > > 
> > > > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > > > 
> > > > > What bothers me is that the new feature might cause the same
> > > > > issue once we enable it in the test.
> > > > 
> > > > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > > > 
> > > > > 
> > > > > How about a patch to tests/vhost-user-test.c adding the new
> > > > > protocol feature? I would be quite interested to see what
> > > > > is going on with it.
> > > > 
> > > > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > > > 
> > > > > 
> > > > > 
> > > > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > > > 
> > > > > And succeeds every time on my systems :(
> > > > 
> > > > +1 to that :( I have had no luck repro’ing it.
> > > > 
> > > > > 
> > > > > > 
> > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > 
> > > I have managed to reproduce the hang by adding some debug prints into
> > > vhost_user_get_features().
> > > 
> > > Doing this the issue is reproducible quite easily.
> > > Another way to reproduce it in one shot is to strace (with following
> > > forks option) vhost-user-test execution.
> > > 
> > > So, by adding debug prints at vhost_user_get_features() entry and exit,
> > > we can see we never return from this function when hang happens.
> > > Strace of Qemu instance shows that its thread keeps retrying to receive
> > > GET_FEATURE reply:
> > > 
> > > write(1, "vhost_user_get_features IN: \n", 29) = 29
> > > sendmsg(11, {msg_name=NULL, msg_namelen=0,
> > >         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
> > >         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > ...
> > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > 
> > > The reason is that vhost-user-test never replies to Qemu,
> > > because its thread handling the GET_FEATURES command is waiting for
> > > the s->data_mutex lock.
> > > This lock is held by the other vhost-user-test thread, executing
> > > read_guest_mem().
> > > 
> > > The lock is never released because the thread is blocked in read
> > > syscall, when read_guest_mem() is doing the readl().
> > > 
> > > This is because on Qemu side, the thread polling the qtest socket is
> > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> > > the mutex is held by the thread trying to get the GET_FEATURE reply
> > > (the TCG one).
> > > 
> > > So here is the deadlock.
> > > 
> > > That said, I don't see a clean way to solve this.
> > > Any thoughts?
> > > 
> > > Regards,
> > > Maxime
> > 
> > My thought is that we really need to do what I said:
> > avoid doing GET_FEATURES (and setting reply_ack)
> > on the first set_mem, and I quote:
> > 
> > 	OK this all looks very reasonable (and I do like patch 1 too)
> > 	but there's one source of waste here: we do not need to
> > 	synchronize when we set up device the first time
> > 	when hdev->memory_changed is false.
> > 
> > 	I think we should test that and skip synch in both patches
> > 	unless  hdev->memory_changed is set.
> > 
> > with that change test will start passing.
> 
> Actually, it looks like memory_changed is true even at first
> SET_MEM_TABLE request.
> 
> Thanks,
> Maxime

Let's add another flag then? What we care about is that it's not
the first time set specify translations for a given address.
Maxime Coquelin Sept. 5, 2016, 1:06 p.m. UTC | #20
On 09/02/2016 07:29 PM, Michael S. Tsirkin wrote:
> On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
>>
>>
>> On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
>>> On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
>>>>
>>>>
>>>> On 08/14/2016 11:42 AM, Prerna Saxena wrote:
>>>>> On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
>>>>>
>>>>>
>>>>>> On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
>>>>>>>
>>>>>>> On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
>>>>>>>>> From: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>>>
>>>>>>>>> The set_mem_table command currently does not seek a reply. Hence, there is
>>>>>>>>> no easy way for a remote application to notify to QEMU when it finished
>>>>>>>>> setting up memory, or if there were errors doing so.
>>>>>>>>>
>>>>>>>>> As an example:
>>>>>>>>> (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
>>>>>>>>> application). SET_MEM_TABLE does not require a reply according to the spec.
>>>>>>>>> (2) Qemu commits the memory to the guest.
>>>>>>>>> (3) Guest issues an I/O operation over a new memory region which was configured on (1).
>>>>>>>>> (4) The application has not yet remapped the memory, but it sees the I/O request.
>>>>>>>>> (5) The application cannot satisfy the request because it does not know about those GPAs.
>>>>>>>>>
>>>>>>>>> While a guaranteed fix would require a protocol extension (committed separately),
>>>>>>>>> a best-effort workaround for existing applications is to send a GET_FEATURES
>>>>>>>>> message before completing the vhost_user_set_mem_table() call.
>>>>>>>>> Since GET_FEATURES requires a reply, an application that processes vhost-user
>>>>>>>>> messages synchronously would probably have completed the SET_MEM_TABLE before replying.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
>>>>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>>
>>>>>>>> Sporadic hangs are seen with test-vhost-user after this patch:
>>>>>>>>
>>>>>>>> https://travis-ci.org/qemu/qemu/builds
>>>>>>>>
>>>>>>>> Reverting seems to fix it for me.
>>>>>>>>
>>>>>>>> Is this a known problem?
>>>>>>>>
>>>>>>>> Fam
>>>>>>>
>>>>>>> Hi Fam,
>>>>>>> Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
>>>>>>> I am setting up the docker test env to repro this, but I think I can guess the problem :
>>>>>>>
>>>>>>> In tests/vhost-user-test.c:
>>>>>>>
>>>>>>> static void chr_read(void *opaque, const uint8_t *buf, int size)
>>>>>>> {
>>>>>>> ..[snip]..
>>>>>>>
>>>>>>>     case VHOST_USER_SET_MEM_TABLE:
>>>>>>>        /* received the mem table */
>>>>>>>        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
>>>>>>>        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
>>>>>>>
>>>>>>>
>>>>>>>        /* signal the test that it can continue */
>>>>>>>        g_cond_signal(&s->data_cond);
>>>>>>>        break;
>>>>>>> ..[snip]..
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> The test seems to be marked complete as soon as mem_table is copied.
>>>>>>> However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
>>>>>>
>>>>>> Hmm but why does it matter that data_cond is woken up?
>>>>>
>>>>> Michael, sorry, I didn’t quite understand that. Could you pls explain ?
>>>>>
>>>>>>
>>>>>>
>>>>>>> While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
>>>>>>
>>>>>> What bothers me is that the new feature might cause the same
>>>>>> issue once we enable it in the test.
>>>>>
>>>>> No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
>>>>>
>>>>>>
>>>>>> How about a patch to tests/vhost-user-test.c adding the new
>>>>>> protocol feature? I would be quite interested to see what
>>>>>> is going on with it.
>>>>>
>>>>> Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
>>>>>
>>>>>>
>>>>>>
>>>>>>> What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
>>>>>>
>>>>>> And succeeds every time on my systems :(
>>>>>
>>>>> +1 to that :( I have had no luck repro’ing it.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> Thoughts : Michael, Fam, MarcAndre ?
>>>>
>>>> I have managed to reproduce the hang by adding some debug prints into
>>>> vhost_user_get_features().
>>>>
>>>> Doing this the issue is reproducible quite easily.
>>>> Another way to reproduce it in one shot is to strace (with following
>>>> forks option) vhost-user-test execution.
>>>>
>>>> So, by adding debug prints at vhost_user_get_features() entry and exit,
>>>> we can see we never return from this function when hang happens.
>>>> Strace of Qemu instance shows that its thread keeps retrying to receive
>>>> GET_FEATURE reply:
>>>>
>>>> write(1, "vhost_user_get_features IN: \n", 29) = 29
>>>> sendmsg(11, {msg_name=NULL, msg_namelen=0,
>>>>         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
>>>>         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
>>>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>>>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>>> ...
>>>> recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
>>>> nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
>>>>
>>>> The reason is that vhost-user-test never replies to Qemu,
>>>> because its thread handling the GET_FEATURES command is waiting for
>>>> the s->data_mutex lock.
>>>> This lock is held by the other vhost-user-test thread, executing
>>>> read_guest_mem().
>>>>
>>>> The lock is never released because the thread is blocked in read
>>>> syscall, when read_guest_mem() is doing the readl().
>>>>
>>>> This is because on Qemu side, the thread polling the qtest socket is
>>>> waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
>>>> the mutex is held by the thread trying to get the GET_FEATURE reply
>>>> (the TCG one).
>>>>
>>>> So here is the deadlock.
>>>>
>>>> That said, I don't see a clean way to solve this.
>>>> Any thoughts?
>>>>
>>>> Regards,
>>>> Maxime
>>>
>>> My thought is that we really need to do what I said:
>>> avoid doing GET_FEATURES (and setting reply_ack)
>>> on the first set_mem, and I quote:
>>>
>>> 	OK this all looks very reasonable (and I do like patch 1 too)
>>> 	but there's one source of waste here: we do not need to
>>> 	synchronize when we set up device the first time
>>> 	when hdev->memory_changed is false.
>>>
>>> 	I think we should test that and skip synch in both patches
>>> 	unless  hdev->memory_changed is set.
>>>
>>> with that change test will start passing.
>>
>> Actually, it looks like memory_changed is true even at first
>> SET_MEM_TABLE request.
>>
>> Thanks,
>> Maxime
>
> Let's add another flag then? What we care about is that it's not
> the first time set specify translations for a given address.

I added a dedicated flag, that skips sync on two conditions:
  1. First set_mem_table call
  2. If only a new regions are added

It solves the hang seen with vhost-user-test app, and I think the patch
makes sense.

But IMHO the problem is deeper than that, and could under some
conditions still hang when running in TCG mode.
Imagine Qemu sends a random "GET_FEATURE" request after the
set_mem_table, and vhost-user-test read_guest_mem() is executed just 
before this second call (Let's say it was not scheduled for some time).

In this case, read_guest_mem() thread owns the data_mutex, and start
doing readl() calls. On Qemu side, as we are sending an update of the
mem table, we own the qemu_global_mutex, and the deadlock happen again:
  - Vhost-user-test
    * read_guest_mem() thread: Blocked in readl(), waiting for Qemu to
handle it (TCG mode only), owning the data_mutex lock.
    * Command handler thread: Received GET_FEATURE event, but wait for
data_mutex ownership to handle it.

  - Qemu
    * FDs polling thread: Wait for qemu_global_mutex ownership, to be
able to handle the readl() request from vhost-user-test.
    * TCG thread: Own the qemu_global_mutex, and poll to receive the
GET_FEATURE reply.

Maybe the GET_FEATURE case is not realistic, but what about
GET_VRING_BASE, that get called by vhost_net_stop()?

Thanks in advance,
Maxime
Michael S. Tsirkin Sept. 6, 2016, 2:22 a.m. UTC | #21
On Mon, Sep 05, 2016 at 03:06:09PM +0200, Maxime Coquelin wrote:
> 
> 
> On 09/02/2016 07:29 PM, Michael S. Tsirkin wrote:
> > On Fri, Sep 02, 2016 at 10:57:17AM +0200, Maxime Coquelin wrote:
> > > 
> > > 
> > > On 09/01/2016 03:46 PM, Michael S. Tsirkin wrote:
> > > > On Wed, Aug 31, 2016 at 01:19:47PM +0200, Maxime Coquelin wrote:
> > > > > 
> > > > > 
> > > > > On 08/14/2016 11:42 AM, Prerna Saxena wrote:
> > > > > > On 14/08/16 8:21 am, "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > > > > 
> > > > > > 
> > > > > > > On Fri, Aug 12, 2016 at 07:16:34AM +0000, Prerna Saxena wrote:
> > > > > > > > 
> > > > > > > > On 12/08/16 12:08 pm, "Fam Zheng" <famz@redhat.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > On Wed, 08/10 18:30, Michael S. Tsirkin wrote:
> > > > > > > > > > From: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > > > 
> > > > > > > > > > The set_mem_table command currently does not seek a reply. Hence, there is
> > > > > > > > > > no easy way for a remote application to notify to QEMU when it finished
> > > > > > > > > > setting up memory, or if there were errors doing so.
> > > > > > > > > > 
> > > > > > > > > > As an example:
> > > > > > > > > > (1) Qemu sends a SET_MEM_TABLE to the backend (eg, a vhost-user net
> > > > > > > > > > application). SET_MEM_TABLE does not require a reply according to the spec.
> > > > > > > > > > (2) Qemu commits the memory to the guest.
> > > > > > > > > > (3) Guest issues an I/O operation over a new memory region which was configured on (1).
> > > > > > > > > > (4) The application has not yet remapped the memory, but it sees the I/O request.
> > > > > > > > > > (5) The application cannot satisfy the request because it does not know about those GPAs.
> > > > > > > > > > 
> > > > > > > > > > While a guaranteed fix would require a protocol extension (committed separately),
> > > > > > > > > > a best-effort workaround for existing applications is to send a GET_FEATURES
> > > > > > > > > > message before completing the vhost_user_set_mem_table() call.
> > > > > > > > > > Since GET_FEATURES requires a reply, an application that processes vhost-user
> > > > > > > > > > messages synchronously would probably have completed the SET_MEM_TABLE before replying.
> > > > > > > > > > 
> > > > > > > > > > Signed-off-by: Prerna Saxena <prerna.saxena@nutanix.com>
> > > > > > > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > > > > > > > > 
> > > > > > > > > Sporadic hangs are seen with test-vhost-user after this patch:
> > > > > > > > > 
> > > > > > > > > https://travis-ci.org/qemu/qemu/builds
> > > > > > > > > 
> > > > > > > > > Reverting seems to fix it for me.
> > > > > > > > > 
> > > > > > > > > Is this a known problem?
> > > > > > > > > 
> > > > > > > > > Fam
> > > > > > > > 
> > > > > > > > Hi Fam,
> > > > > > > > Thanks for reporting the sporadic hangs. I had seen ‘make check’ pass on my Centos 6 environment, so missed this.
> > > > > > > > I am setting up the docker test env to repro this, but I think I can guess the problem :
> > > > > > > > 
> > > > > > > > In tests/vhost-user-test.c:
> > > > > > > > 
> > > > > > > > static void chr_read(void *opaque, const uint8_t *buf, int size)
> > > > > > > > {
> > > > > > > > ..[snip]..
> > > > > > > > 
> > > > > > > >     case VHOST_USER_SET_MEM_TABLE:
> > > > > > > >        /* received the mem table */
> > > > > > > >        memcpy(&s->memory, &msg.payload.memory, sizeof(msg.payload.memory));
> > > > > > > >        s->fds_num = qemu_chr_fe_get_msgfds(chr, s->fds, G_N_ELEMENTS(s->fds));
> > > > > > > > 
> > > > > > > > 
> > > > > > > >        /* signal the test that it can continue */
> > > > > > > >        g_cond_signal(&s->data_cond);
> > > > > > > >        break;
> > > > > > > > ..[snip]..
> > > > > > > > }
> > > > > > > > 
> > > > > > > > 
> > > > > > > > The test seems to be marked complete as soon as mem_table is copied.
> > > > > > > > However, this patch 3/3 changes the behaviour of the SET_MEM_TABLE vhost command implementation with qemu. SET_MEM_TABLE now sends out a new message GET_FEATURES, and the call is only completed once it receives features from the remote application. (or the test framework, as is the case here.)
> > > > > > > 
> > > > > > > Hmm but why does it matter that data_cond is woken up?
> > > > > > 
> > > > > > Michael, sorry, I didn’t quite understand that. Could you pls explain ?
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > While the test itself can be modified (Do not signal completion until we’ve sent a follow-up response to GET_FEATURES), I am now wondering if this patch may break existing vhost applications too ? If so, reverting it possibly better.
> > > > > > > 
> > > > > > > What bothers me is that the new feature might cause the same
> > > > > > > issue once we enable it in the test.
> > > > > > 
> > > > > > No it wont. The new feature is a protocol extension, and only works if it has been negotiated with. If not negotiated, that part of code is never executed.
> > > > > > 
> > > > > > > 
> > > > > > > How about a patch to tests/vhost-user-test.c adding the new
> > > > > > > protocol feature? I would be quite interested to see what
> > > > > > > is going on with it.
> > > > > > 
> > > > > > Yes that can be done. But you can see that the protocol extension patch will not change the behaviour of the _existing_ test.
> > > > > > 
> > > > > > > 
> > > > > > > 
> > > > > > > > What confuses me is why it doesn’t fail all the time, but only about 20% to 30% time as Fam reports.
> > > > > > > 
> > > > > > > And succeeds every time on my systems :(
> > > > > > 
> > > > > > +1 to that :( I have had no luck repro’ing it.
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > Thoughts : Michael, Fam, MarcAndre ?
> > > > > 
> > > > > I have managed to reproduce the hang by adding some debug prints into
> > > > > vhost_user_get_features().
> > > > > 
> > > > > Doing this the issue is reproducible quite easily.
> > > > > Another way to reproduce it in one shot is to strace (with following
> > > > > forks option) vhost-user-test execution.
> > > > > 
> > > > > So, by adding debug prints at vhost_user_get_features() entry and exit,
> > > > > we can see we never return from this function when hang happens.
> > > > > Strace of Qemu instance shows that its thread keeps retrying to receive
> > > > > GET_FEATURE reply:
> > > > > 
> > > > > write(1, "vhost_user_get_features IN: \n", 29) = 29
> > > > > sendmsg(11, {msg_name=NULL, msg_namelen=0,
> > > > >         msg_iov=[{iov_base="\1\0\0\0\1\0\0\0\0\0\0\0", iov_len=12}],
> > > > >         msg_iovlen=1, msg_controllen=0, msg_flags=0}, 0) = 12
> > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > > > ...
> > > > > recvmsg(11, {msg_namelen=0}, MSG_CMSG_CLOEXEC) = -1 EAGAIN
> > > > > nanosleep({0, 100000}, 0x7fff29f8dd70)  = 0
> > > > > 
> > > > > The reason is that vhost-user-test never replies to Qemu,
> > > > > because its thread handling the GET_FEATURES command is waiting for
> > > > > the s->data_mutex lock.
> > > > > This lock is held by the other vhost-user-test thread, executing
> > > > > read_guest_mem().
> > > > > 
> > > > > The lock is never released because the thread is blocked in read
> > > > > syscall, when read_guest_mem() is doing the readl().
> > > > > 
> > > > > This is because on Qemu side, the thread polling the qtest socket is
> > > > > waiting for the qemu_global_mutex (in os_host_main_loop_wait()), but
> > > > > the mutex is held by the thread trying to get the GET_FEATURE reply
> > > > > (the TCG one).
> > > > > 
> > > > > So here is the deadlock.
> > > > > 
> > > > > That said, I don't see a clean way to solve this.
> > > > > Any thoughts?
> > > > > 
> > > > > Regards,
> > > > > Maxime
> > > > 
> > > > My thought is that we really need to do what I said:
> > > > avoid doing GET_FEATURES (and setting reply_ack)
> > > > on the first set_mem, and I quote:
> > > > 
> > > > 	OK this all looks very reasonable (and I do like patch 1 too)
> > > > 	but there's one source of waste here: we do not need to
> > > > 	synchronize when we set up device the first time
> > > > 	when hdev->memory_changed is false.
> > > > 
> > > > 	I think we should test that and skip synch in both patches
> > > > 	unless  hdev->memory_changed is set.
> > > > 
> > > > with that change test will start passing.
> > > 
> > > Actually, it looks like memory_changed is true even at first
> > > SET_MEM_TABLE request.
> > > 
> > > Thanks,
> > > Maxime
> > 
> > Let's add another flag then? What we care about is that it's not
> > the first time set specify translations for a given address.
> 
> I added a dedicated flag, that skips sync on two conditions:
>  1. First set_mem_table call
>  2. If only a new regions are added
> 
> It solves the hang seen with vhost-user-test app, and I think the patch
> makes sense.
> 
> But IMHO the problem is deeper than that, and could under some
> conditions still hang when running in TCG mode.
> Imagine Qemu sends a random "GET_FEATURE" request after the
> set_mem_table, and vhost-user-test read_guest_mem() is executed just before
> this second call (Let's say it was not scheduled for some time).
>
> In this case, read_guest_mem() thread owns the data_mutex, and start
> doing readl() calls. On Qemu side, as we are sending an update of the
> mem table, we own the qemu_global_mutex, and the deadlock happen again:
>  - Vhost-user-test
>    * read_guest_mem() thread: Blocked in readl(), waiting for Qemu to
> handle it (TCG mode only), owning the data_mutex lock.
>    * Command handler thread: Received GET_FEATURE event, but wait for
> data_mutex ownership to handle it.
> 
>  - Qemu
>    * FDs polling thread: Wait for qemu_global_mutex ownership, to be
> able to handle the readl() request from vhost-user-test.
>    * TCG thread: Own the qemu_global_mutex, and poll to receive the
> GET_FEATURE reply.
> 
> Maybe the GET_FEATURE case is not realistic, but what about
> GET_VRING_BASE, that get called by vhost_net_stop()?
> 
> Thanks in advance,
> Maxime

I think most applications expect to be able to handle GET_VRING_BASE
at any time.
If application isn't ready to handle GET_VRING_BASE at any time,
we won't be able to stop the device.

This is not the case for this test but that's because it's
just a unit test, so incomplete.
diff mbox

Patch

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index b57454a..1a7d53c 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -263,66 +263,6 @@  static int vhost_user_set_log_base(struct vhost_dev *dev, uint64_t base,
     return 0;
 }
 
-static int vhost_user_set_mem_table(struct vhost_dev *dev,
-                                    struct vhost_memory *mem)
-{
-    int fds[VHOST_MEMORY_MAX_NREGIONS];
-    int i, fd;
-    size_t fd_num = 0;
-    bool reply_supported = virtio_has_feature(dev->protocol_features,
-                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
-
-    VhostUserMsg msg = {
-        .request = VHOST_USER_SET_MEM_TABLE,
-        .flags = VHOST_USER_VERSION,
-    };
-
-    if (reply_supported) {
-        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
-    }
-
-    for (i = 0; i < dev->mem->nregions; ++i) {
-        struct vhost_memory_region *reg = dev->mem->regions + i;
-        ram_addr_t offset;
-        MemoryRegion *mr;
-
-        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
-        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
-                                     &offset);
-        fd = memory_region_get_fd(mr);
-        if (fd > 0) {
-            msg.payload.memory.regions[fd_num].userspace_addr = reg->userspace_addr;
-            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
-            msg.payload.memory.regions[fd_num].guest_phys_addr = reg->guest_phys_addr;
-            msg.payload.memory.regions[fd_num].mmap_offset = offset;
-            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
-            fds[fd_num++] = fd;
-        }
-    }
-
-    msg.payload.memory.nregions = fd_num;
-
-    if (!fd_num) {
-        error_report("Failed initializing vhost-user memory map, "
-                     "consider using -object memory-backend-file share=on");
-        return -1;
-    }
-
-    msg.size = sizeof(msg.payload.memory.nregions);
-    msg.size += sizeof(msg.payload.memory.padding);
-    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
-
-    if (vhost_user_write(dev, &msg, fds, fd_num) < 0) {
-        return -1;
-    }
-
-    if (reply_supported) {
-        return process_message_reply(dev, msg.request);
-    }
-
-    return 0;
-}
-
 static int vhost_user_set_vring_addr(struct vhost_dev *dev,
                                      struct vhost_vring_addr *addr)
 {
@@ -537,6 +477,73 @@  static int vhost_user_get_features(struct vhost_dev *dev, uint64_t *features)
     return vhost_user_get_u64(dev, VHOST_USER_GET_FEATURES, features);
 }
 
+static int vhost_user_set_mem_table(struct vhost_dev *dev,
+                                    struct vhost_memory *mem)
+{
+    int fds[VHOST_MEMORY_MAX_NREGIONS];
+    int i, fd;
+    size_t fd_num = 0;
+    uint64_t features;
+    bool reply_supported = virtio_has_feature(dev->protocol_features,
+                                              VHOST_USER_PROTOCOL_F_REPLY_ACK);
+
+    VhostUserMsg msg = {
+        .request = VHOST_USER_SET_MEM_TABLE,
+        .flags = VHOST_USER_VERSION,
+    };
+
+    if (reply_supported) {
+        msg.flags |= VHOST_USER_NEED_REPLY_MASK;
+    }
+
+    for (i = 0; i < dev->mem->nregions; ++i) {
+        struct vhost_memory_region *reg = dev->mem->regions + i;
+        ram_addr_t offset;
+        MemoryRegion *mr;
+
+        assert((uintptr_t)reg->userspace_addr == reg->userspace_addr);
+        mr = memory_region_from_host((void *)(uintptr_t)reg->userspace_addr,
+                                     &offset);
+        fd = memory_region_get_fd(mr);
+        if (fd > 0) {
+            msg.payload.memory.regions[fd_num].userspace_addr
+                                             = reg->userspace_addr;
+            msg.payload.memory.regions[fd_num].memory_size  = reg->memory_size;
+            msg.payload.memory.regions[fd_num].guest_phys_addr
+                                             = reg->guest_phys_addr;
+            msg.payload.memory.regions[fd_num].mmap_offset = offset;
+            assert(fd_num < VHOST_MEMORY_MAX_NREGIONS);
+            fds[fd_num++] = fd;
+        }
+    }
+
+    msg.payload.memory.nregions = fd_num;
+
+    if (!fd_num) {
+        error_report("Failed initializing vhost-user memory map, "
+                     "consider using -object memory-backend-file share=on");
+        return -1;
+    }
+
+    msg.size = sizeof(msg.payload.memory.nregions);
+    msg.size += sizeof(msg.payload.memory.padding);
+    msg.size += fd_num * sizeof(VhostUserMemoryRegion);
+
+    vhost_user_write(dev, &msg, fds, fd_num);
+
+    if (reply_supported) {
+        return process_message_reply(dev, msg.request);
+    } else {
+        /* Note: It is (yet) unknown when the client application has finished
+         * remapping the GPA.
+         * Attempt to prevent a race by sending a command that requires a reply.
+         */
+        vhost_user_get_features(dev, &features);
+    }
+
+    return 0;
+}
+
 static int vhost_user_set_owner(struct vhost_dev *dev)
 {
     VhostUserMsg msg = {