[10/16] SUPPORT.md: Add Debugging, analysis, crash post-portem
diff mbox

Message ID 20171113154126.13038-10-george.dunlap@citrix.com
State New, archived
Headers show

Commit Message

George Dunlap Nov. 13, 2017, 3:41 p.m. UTC
Signed-off-by: George Dunlap <george.dunlap@citrix.com>
---
CC: Ian Jackson <ian.jackson@citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Jan Beulich <jbeulich@suse.com>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Konrad Wilk <konrad.wilk@oracle.com>
CC: Tim Deegan <tim@xen.org>
---
 SUPPORT.md | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

Comments

Jan Beulich Nov. 21, 2017, 8:48 a.m. UTC | #1
>>> On 13.11.17 at 16:41, <george.dunlap@citrix.com> wrote:
> --- a/SUPPORT.md
> +++ b/SUPPORT.md
> @@ -152,6 +152,35 @@ Output of information in machine-parseable JSON format
>  
>      Status: Supported, Security support external
>  
> +## Debugging, analysis, and crash post-mortem
> +
> +### gdbsx
> +
> +    Status, x86: Supported
> +
> +Debugger to debug ELF guests
> +
> +### Soft-reset for PV guests
> +
> +    Status: Supported
> +    
> +Soft-reset allows a new kernel to start 'from scratch' with a fresh VM state, 
> +but with all the memory from the previous state of the VM intact.
> +This is primarily designed to allow "crash kernels", 
> +which can do core dumps of memory to help with debugging in the event of a crash.
> +
> +### xentrace
> +
> +    Status, x86: Supported
> +
> +Tool to capture Xen trace buffer data
> +
> +### gcov
> +
> +    Status: Supported, Not security supported

I agree with excluding security support here, but why wouldn't the
same be the case for gdbsx and xentrace?

Jan
George Dunlap Nov. 21, 2017, 6:19 p.m. UTC | #2
On 11/21/2017 08:48 AM, Jan Beulich wrote:
>>>> On 13.11.17 at 16:41, <george.dunlap@citrix.com> wrote:
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -152,6 +152,35 @@ Output of information in machine-parseable JSON format
>>  
>>      Status: Supported, Security support external
>>  
>> +## Debugging, analysis, and crash post-mortem
>> +
>> +### gdbsx
>> +
>> +    Status, x86: Supported
>> +
>> +Debugger to debug ELF guests
>> +
>> +### Soft-reset for PV guests
>> +
>> +    Status: Supported
>> +    
>> +Soft-reset allows a new kernel to start 'from scratch' with a fresh VM state, 
>> +but with all the memory from the previous state of the VM intact.
>> +This is primarily designed to allow "crash kernels", 
>> +which can do core dumps of memory to help with debugging in the event of a crash.
>> +
>> +### xentrace
>> +
>> +    Status, x86: Supported
>> +
>> +Tool to capture Xen trace buffer data
>> +
>> +### gcov
>> +
>> +    Status: Supported, Not security supported
> 
> I agree with excluding security support here, but why wouldn't the
> same be the case for gdbsx and xentrace?

From my initial post:

---

gdbsx security support: Someone may want to debug an untrusted guest,
so I think we should say 'yes' here.

xentrace: Users may want to trace guests in production environments,
so I think we should say 'yes'.

gcov: No good reason to run a gcov hypervisor in a production
environment.  May be ways for a rogue guest to DoS.

---

xentrace I would argue for security support; I've asked customers to
send me xentrace data as part of analysis before.  I also know enough
about it that I'm reasonably confident the risk of an attack vector is
pretty low.

I don't have a strong opinion on gdbsx; I'd call it 'supported', but if
you think we need to exclude it from security support I'm happy with
that as well.

 -George
Ian Jackson Nov. 21, 2017, 7:05 p.m. UTC | #3
George Dunlap writes ("Re: [PATCH 10/16] SUPPORT.md: Add Debugging, analysis, crash post-portem"):
> gdbsx security support: Someone may want to debug an untrusted guest,
> so I think we should say 'yes' here.

I think running gdb on an potentially hostile program is foolish.

> I don't have a strong opinion on gdbsx; I'd call it 'supported', but if
> you think we need to exclude it from security support I'm happy with
> that as well.

gdbsx itself is probably simple enough to be fine but I would rather
not call it security supported because that might encourage people to
use it with gdb.

If someone wants to use gdbsx with something that's not gdb then they
might want to ask us to revisit that.

Ian.
Andrew Cooper Nov. 21, 2017, 7:21 p.m. UTC | #4
On 21/11/17 19:05, Ian Jackson wrote:
> George Dunlap writes ("Re: [PATCH 10/16] SUPPORT.md: Add Debugging, analysis, crash post-portem"):
>> gdbsx security support: Someone may want to debug an untrusted guest,
>> so I think we should say 'yes' here.
> I think running gdb on an potentially hostile program is foolish.
>
>> I don't have a strong opinion on gdbsx; I'd call it 'supported', but if
>> you think we need to exclude it from security support I'm happy with
>> that as well.
> gdbsx itself is probably simple enough to be fine but I would rather
> not call it security supported because that might encourage people to
> use it with gdb.
>
> If someone wants to use gdbsx with something that's not gdb then they
> might want to ask us to revisit that.

If gdbsx chooses (or gets tricked into using) DOMID_XEN, then it gets
arbitrary read/write access over hypervisor virtual address space, due
to the behaviour of the hypercalls it uses.

As a tool, it mostly functions (there are some rather sharp corners
which I've not gotten time to fix so far), but it is definitely not
something I would trust in a hostile environment.

~Andrew
George Dunlap Nov. 22, 2017, 10:51 a.m. UTC | #5
On 11/21/2017 07:21 PM, Andrew Cooper wrote:
> On 21/11/17 19:05, Ian Jackson wrote:
>> George Dunlap writes ("Re: [PATCH 10/16] SUPPORT.md: Add Debugging, analysis, crash post-portem"):
>>> gdbsx security support: Someone may want to debug an untrusted guest,
>>> so I think we should say 'yes' here.
>> I think running gdb on an potentially hostile program is foolish.
>>
>>> I don't have a strong opinion on gdbsx; I'd call it 'supported', but if
>>> you think we need to exclude it from security support I'm happy with
>>> that as well.
>> gdbsx itself is probably simple enough to be fine but I would rather
>> not call it security supported because that might encourage people to
>> use it with gdb.
>>
>> If someone wants to use gdbsx with something that's not gdb then they
>> might want to ask us to revisit that.
> 
> If gdbsx chooses (or gets tricked into using) DOMID_XEN, then it gets
> arbitrary read/write access over hypervisor virtual address space, due
> to the behaviour of the hypercalls it uses.
> 
> As a tool, it mostly functions (there are some rather sharp corners
> which I've not gotten time to fix so far), but it is definitely not
> something I would trust in a hostile environment.

Right -- "not security supported" it is. :-)

 -George
Jan Beulich Nov. 22, 2017, 11:15 a.m. UTC | #6
>>> On 21.11.17 at 19:19, <george.dunlap@citrix.com> wrote:
> xentrace I would argue for security support; I've asked customers to
> send me xentrace data as part of analysis before.  I also know enough
> about it that I'm reasonably confident the risk of an attack vector is
> pretty low.

Knowing pretty little about xentrace I will trust you here. What I
was afraid of is that generally anything adding overhead can have
unintended side effects, the more with the - aiui - huge amounts of
data this may produce.

> I don't have a strong opinion on gdbsx; I'd call it 'supported', but if
> you think we need to exclude it from security support I'm happy with
> that as well.

Looks like on another sub-thread it was meanwhile already agreed
to mark it not security supported.

Jan
George Dunlap Nov. 22, 2017, 5:06 p.m. UTC | #7
On 11/22/2017 11:15 AM, Jan Beulich wrote:
>>>> On 21.11.17 at 19:19, <george.dunlap@citrix.com> wrote:
>> xentrace I would argue for security support; I've asked customers to
>> send me xentrace data as part of analysis before.  I also know enough
>> about it that I'm reasonably confident the risk of an attack vector is
>> pretty low.
> 
> Knowing pretty little about xentrace I will trust you here. What I
> was afraid of is that generally anything adding overhead can have
> unintended side effects, the more with the - aiui - huge amounts of
> data this may produce.

The data is fundamentally limited by the size of the in-hypervisor
buffers.  Once those are full, the trace overhead shouldn't be
significantly different than having tracing disabled.  And regardless of
how big they are, the total amount of trace data will be limited by the
throughput of the dom0-based xentrace process writing to disk.  If the
throughput of that process is (say) 50MB/s, then the "steady state" of
trace creation will be the same (one way or another).  Or, at very most,
at the rate a single processor can copy data out of the in-hypervisor
buffers.

Back when I was using xentrace heavily, I regularly hit this limit, and
never had any stability issues.

I suppose with faster disks (SSDs?  SAN on a 40GiB NIC?) this limit will
be higher, but I still have trouble thinking that it would be
significantly more dangerous than, say, any other kind of domain 0 logging.

I mean, there may be something I'm missing; but I've just spent 10
minutes or so trying to brainstorm ways that an attacker could cause
problems on the system, and other than "fill the buffers with junk so
that the admin can't find what she's looking for".  Any other flaws
should be no more likely than from any other feature we expose to guests.

 -George

Patch
diff mbox

diff --git a/SUPPORT.md b/SUPPORT.md
index 8235336c41..bd83c81557 100644
--- a/SUPPORT.md
+++ b/SUPPORT.md
@@ -152,6 +152,35 @@  Output of information in machine-parseable JSON format
 
     Status: Supported, Security support external
 
+## Debugging, analysis, and crash post-mortem
+
+### gdbsx
+
+    Status, x86: Supported
+
+Debugger to debug ELF guests
+
+### Soft-reset for PV guests
+
+    Status: Supported
+    
+Soft-reset allows a new kernel to start 'from scratch' with a fresh VM state, 
+but with all the memory from the previous state of the VM intact.
+This is primarily designed to allow "crash kernels", 
+which can do core dumps of memory to help with debugging in the event of a crash.
+
+### xentrace
+
+    Status, x86: Supported
+
+Tool to capture Xen trace buffer data
+
+### gcov
+
+    Status: Supported, Not security supported
+
+Export hypervisor coverage data suitable for analysis by gcov or lcov.
+
 ## Memory Management
 
 ### Memory Ballooning