diff mbox series

common/gnttab: Process softirqs while dumping grant tables

Message ID 1554481990-7569-1-git-send-email-andrew.cooper3@citrix.com (mailing list archive)
State New, archived
Headers show
Series common/gnttab: Process softirqs while dumping grant tables | expand

Commit Message

Andrew Cooper April 5, 2019, 4:33 p.m. UTC
OSSTests upgrade to Jessie has identified that with a sufficiently large grant
table, a watchdog timeout can occur.

http://logs.test-lab.xenproject.org/osstest/logs/134399/test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow/serial-chardonnay0.log

Reported-by: Ian Jackson <Ian.Jackson@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Julien Grall <julien.grall@arm.com>
CC: Ian Jackson <Ian.Jackson@citrix.com>
---
 xen/common/grant_table.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Julien Grall April 5, 2019, 4:35 p.m. UTC | #1
Hi Andrew,

On 05/04/2019 17:33, Andrew Cooper wrote:
> OSSTests upgrade to Jessie has identified that with a sufficiently large grant
> table, a watchdog timeout can occur.
> 
> http://logs.test-lab.xenproject.org/osstest/logs/134399/test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow/serial-chardonnay0.log

OSSTest log usually disappear after a few days. Who it be possible to copy the 
relevant part of the log instead?

Cheers,

> 
> Reported-by: Ian Jackson <Ian.Jackson@citrix.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Stefano Stabellini <sstabellini@kernel.org>
> CC: Julien Grall <julien.grall@arm.com>
> CC: Ian Jackson <Ian.Jackson@citrix.com>
> ---
>   xen/common/grant_table.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
> index 80728ea..344b3ee 100644
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -3956,6 +3956,9 @@ static void gnttab_usage_print(struct domain *rd)
>           uint16_t status;
>           uint64_t frame;
>   
> +        if ( !(ref & 31) )
> +            process_pending_softirqs();
> +
>           act = active_entry_acquire(gt, ref);
>           if ( !act->pin )
>           {
>
Andrew Cooper April 5, 2019, 4:38 p.m. UTC | #2
On 05/04/2019 17:35, Julien Grall wrote:
> Hi Andrew,
>
> On 05/04/2019 17:33, Andrew Cooper wrote:
>> OSSTests upgrade to Jessie has identified that with a sufficiently
>> large grant
>> table, a watchdog timeout can occur.
>>
>> http://logs.test-lab.xenproject.org/osstest/logs/134399/test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow/serial-chardonnay0.log
>>
>
> OSSTest log usually disappear after a few days. Who it be possible to
> copy the relevant part of the log instead?

Can do.  Something like:

(XEN) gnttab_usage_print_all [ key 'g' pressed
Apr  4 20:51:42.779992 (XEN)       -------- active --------       -------- shared --------
Apr  4 20:51:42.780081 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
Apr  4 20:51:42.791855 (XEN) grant-table for remote d0 (v1)
Apr  4 20:51:42.791915 (XEN)   1 frames (64 max), 11 maptrack frames (1024 max)
Apr  4 20:51:42.803911 (XEN) no active grant table entries
Apr  4 20:51:42.803975 (XEN)       -------- active --------       -------- shared --------
Apr  4 20:51:42.804034 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
Apr  4 20:51:42.815923 (XEN) grant-table for remote d1 (v1)
Apr  4 20:51:42.816040 (XEN)   7 frames (32 max), 0 maptrack frames (1024 max)
Apr  4 20:51:42.827834 (XEN) [0x000]      0 0x27e306 0x00000002          0 0x0fefff 0x19
Apr  4 20:51:42.827877 (XEN) [0x009]      0 0x22d9a1 0x00000001          0 0x02f3a1 0x19
Apr  4 20:51:42.827905 (XEN) [0x00a]      0 0x22d99f 0x00000001          0 0x02f39f 0x19
<snip masses of output>
Apr  4 20:51:49.814701 (XEN) [0x8af]      0 0x232f4f 0x00000001          0 0x029d4f 0x19
Apr  4 20:51:49.826827 (XEN) [0x8b0]      0 0x232f4e 0x00000001          0 0x029(XEN) Watchdog timer detects that CPU0 is stuck!
Apr  4 20:51:49.826903 (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----


~Andrew
<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">On 05/04/2019 17:35, Julien Grall
      wrote:<br>
    </div>
    <blockquote type="cite"
      cite="mid:09c1fe49-049c-d893-cd6a-31331f8cdc9c@arm.com">Hi Andrew,
      <br>
      <br>
      On 05/04/2019 17:33, Andrew Cooper wrote:
      <br>
      <blockquote type="cite">OSSTests upgrade to Jessie has identified
        that with a sufficiently large grant
        <br>
        table, a watchdog timeout can occur.
        <br>
        <br>
<a class="moz-txt-link-freetext" href="http://logs.test-lab.xenproject.org/osstest/logs/134399/test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow/serial-chardonnay0.log">http://logs.test-lab.xenproject.org/osstest/logs/134399/test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow/serial-chardonnay0.log</a>
        <br>
      </blockquote>
      <br>
      OSSTest log usually disappear after a few days. Who it be possible
      to copy the relevant part of the log instead?<br>
    </blockquote>
    <br>
    Can do.  Something like:<br>
    <br>
    <pre>(XEN) gnttab_usage_print_all [ key 'g' pressed
Apr  4 20:51:42.779992 (XEN)       -------- active --------       -------- shared --------
Apr  4 20:51:42.780081 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
Apr  4 20:51:42.791855 (XEN) grant-table for remote d0 (v1)
Apr  4 20:51:42.791915 (XEN)   1 frames (64 max), 11 maptrack frames (1024 max)
Apr  4 20:51:42.803911 (XEN) no active grant table entries
Apr  4 20:51:42.803975 (XEN)       -------- active --------       -------- shared --------
Apr  4 20:51:42.804034 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
Apr  4 20:51:42.815923 (XEN) grant-table for remote d1 (v1)
Apr  4 20:51:42.816040 (XEN)   7 frames (32 max), 0 maptrack frames (1024 max)
Apr  4 20:51:42.827834 (XEN) [0x000]      0 0x27e306 0x00000002          0 0x0fefff 0x19
Apr  4 20:51:42.827877 (XEN) [0x009]      0 0x22d9a1 0x00000001          0 0x02f3a1 0x19
Apr  4 20:51:42.827905 (XEN) [0x00a]      0 0x22d99f 0x00000001          0 0x02f39f 0x19
&lt;snip masses of output&gt;
Apr  4 20:51:49.814701 (XEN) [0x8af]      0 0x232f4f 0x00000001          0 0x029d4f 0x19
Apr  4 20:51:49.826827 (XEN) [0x8b0]      0 0x232f4e 0x00000001          0 0x029(XEN) Watchdog timer detects that CPU0 is stuck!
Apr  4 20:51:49.826903 (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----</pre>
    <br>
    ~Andrew<br>
  </body>
</html>
Jan Beulich April 8, 2019, 9:34 a.m. UTC | #3
>>> On 05.04.19 at 18:33, <andrew.cooper3@citrix.com> wrote:
> OSSTests upgrade to Jessie has identified that with a sufficiently large grant
> table, a watchdog timeout can occur.

How's this dependent on the precise distro version? I.e. is there
something that makes the logging slower now?

> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -3956,6 +3956,9 @@ static void gnttab_usage_print(struct domain *rd)
>          uint16_t status;
>          uint64_t frame;
>  
> +        if ( !(ref & 31) )
> +            process_pending_softirqs();

I think this is both risky and overly eager: Risky because of happening
with a lock held, and overly eager because no output may have been
done at all between any two such calls. That notwithstanding
Acked-by: Jan Beulich <jbeulich@suse.com>
on the basis that it is an improvement, and the risk is - afaict - a latent
one only.

Jan
Wei Liu April 8, 2019, 9:41 a.m. UTC | #4
On Fri, Apr 05, 2019 at 05:33:10PM +0100, Andrew Cooper wrote:
> OSSTests upgrade to Jessie has identified that with a sufficiently large grant

I think you meant "upgrade to Stretch" here.

Wei.
Julien Grall April 8, 2019, 5:21 p.m. UTC | #5
Hi,

On 4/5/19 5:38 PM, Andrew Cooper wrote:
> On 05/04/2019 17:35, Julien Grall wrote:
>> On 05/04/2019 17:33, Andrew Cooper wrote:
>> OSSTest log usually disappear after a few days. Who it be possible to 
>> copy the relevant part of the log instead?
> 
> Can do.  Something like: > (XEN) gnttab_usage_print_all [ key 'g' pressed
> Apr  4 20:51:42.779992 (XEN)       -------- active --------       -------- shared --------
> Apr  4 20:51:42.780081 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> Apr  4 20:51:42.791855 (XEN) grant-table for remote d0 (v1)
> Apr  4 20:51:42.791915 (XEN)   1 frames (64 max), 11 maptrack frames (1024 max)
> Apr  4 20:51:42.803911 (XEN) no active grant table entries
> Apr  4 20:51:42.803975 (XEN)       -------- active --------       -------- shared --------
> Apr  4 20:51:42.804034 (XEN) [ref] localdom mfn      pin          localdom gmfn     flags
> Apr  4 20:51:42.815923 (XEN) grant-table for remote d1 (v1)
> Apr  4 20:51:42.816040 (XEN)   7 frames (32 max), 0 maptrack frames (1024 max)
> Apr  4 20:51:42.827834 (XEN) [0x000]      0 0x27e306 0x00000002          0 0x0fefff 0x19
> Apr  4 20:51:42.827877 (XEN) [0x009]      0 0x22d9a1 0x00000001          0 0x02f3a1 0x19
> Apr  4 20:51:42.827905 (XEN) [0x00a]      0 0x22d99f 0x00000001          0 0x02f39f 0x19
> <snip masses of output>
> Apr  4 20:51:49.814701 (XEN) [0x8af]      0 0x232f4f 0x00000001          0 0x029d4f 0x19
> Apr  4 20:51:49.826827 (XEN) [0x8b0]      0 0x232f4e 0x00000001          0 0x029(XEN) Watchdog timer detects that CPU0 is stuck!
> Apr  4 20:51:49.826903 (XEN) ----[ Xen-4.13-unstable  x86_64  debug=y   Not tainted ]----


This would work for me.

Cheers,
diff mbox series

Patch

diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 80728ea..344b3ee 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -3956,6 +3956,9 @@  static void gnttab_usage_print(struct domain *rd)
         uint16_t status;
         uint64_t frame;
 
+        if ( !(ref & 31) )
+            process_pending_softirqs();
+
         act = active_entry_acquire(gt, ref);
         if ( !act->pin )
         {