diff mbox series

9p: prevent read overrun in protocol dump tracepoint

Message ID 20231202030410.61047-1-inwardvessel@gmail.com (mailing list archive)
State Superseded
Headers show
Series 9p: prevent read overrun in protocol dump tracepoint | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

JP Kobryn Dec. 2, 2023, 3:04 a.m. UTC
An out of bounds read can occur within the tracepoint 9p_protocol_dump().
In the fast assign, there is a memcpy that uses a constant size of 32
(macro definition as P9_PROTO_DUMP_SZ). When the copy is invoked, the
source buffer is not guaranteed match this size. It was found that in some
cases the source buffer size is less than 32, resulting in a read that
overruns.

The size of the source buffer seems to be known at the time of the
tracepoint being invoked. The allocations happen within p9_fcall_init(),
where the capacity field is set to the allocated size of the payload
buffer. This patch tries to fix the overrun by using the minimum of that
field (size of source buffer) and the size of destination buffer when
performing the copy.

A repro can be performed by different ways. The simplest might just be
mounting a shared filesystem (between host and guest vm) using the plan
9 protocol while the tracepoint is enabled.

mount -t 9p -o trans=virtio <mount_tag> <mount_path>

The bpftrace program below can be used to show the out of bounds read.
Note that a recent version of bpftrace is needed for the raw tracepoint
support. The script was tested using v0.19.0.

/* from include/net/9p/9p.h */
struct p9_fcall {
    u32 size;
    u8 id;
    u16 tag;
    size_t offset;
    size_t capacity;
    struct kmem_cache *cache;
    u8 *sdata;
    bool zc;
};

tracepoint:9p:9p_protocol_dump
{
    /* out of bounds read can happen when this tracepoint is enabled */
}

rawtracepoint:9p_protocol_dump
{
    $pdu = (struct p9_fcall *)arg1;
    $dump_sz = (uint64)32;

    if ($dump_sz > $pdu->capacity) {
        printf("reading %zu bytes from src buffer of %zu bytes\n",
            $dump_sz, $pdu->capacity);
    }
}

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
---
 include/trace/events/9p.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Dominique Martinet Dec. 2, 2023, 4:35 a.m. UTC | #1
JP Kobryn wrote on Fri, Dec 01, 2023 at 07:04:10PM -0800:
> An out of bounds read can occur within the tracepoint 9p_protocol_dump().
> In the fast assign, there is a memcpy that uses a constant size of 32
> (macro definition as P9_PROTO_DUMP_SZ). When the copy is invoked, the
> source buffer is not guaranteed match this size. It was found that in some
> cases the source buffer size is less than 32, resulting in a read that
> overruns.
> 
> The size of the source buffer seems to be known at the time of the
> tracepoint being invoked. The allocations happen within p9_fcall_init(),
> where the capacity field is set to the allocated size of the payload
> buffer. This patch tries to fix the overrun by using the minimum of that
> field (size of source buffer) and the size of destination buffer when
> performing the copy.

Good catch; this is a regression due to a semi-recent optimization in
commit 60ece0833b6c ("net/9p: allocate appropriate reduced message
buffers")
For some reason I thought we rounded up small messages alloc to 4k but
I've just confirmed we don't, so these overruns are quite frequent.
I'll add the fixes tag and cc to stable if there's no other comment.

Thanks!

> 
> A repro can be performed by different ways. The simplest might just be
> mounting a shared filesystem (between host and guest vm) using the plan
> 9 protocol while the tracepoint is enabled.
> 
> mount -t 9p -o trans=virtio <mount_tag> <mount_path>
> 
> The bpftrace program below can be used to show the out of bounds read.
> Note that a recent version of bpftrace is needed for the raw tracepoint
> support. The script was tested using v0.19.0.
> 
> /* from include/net/9p/9p.h */
> struct p9_fcall {
>     u32 size;
>     u8 id;
>     u16 tag;
>     size_t offset;
>     size_t capacity;
>     struct kmem_cache *cache;
>     u8 *sdata;
>     bool zc;
> };
> 
> tracepoint:9p:9p_protocol_dump
> {
>     /* out of bounds read can happen when this tracepoint is enabled */
> }
> 
> rawtracepoint:9p_protocol_dump
> {
>     $pdu = (struct p9_fcall *)arg1;
>     $dump_sz = (uint64)32;
> 
>     if ($dump_sz > $pdu->capacity) {
>         printf("reading %zu bytes from src buffer of %zu bytes\n",
>             $dump_sz, $pdu->capacity);
>     }
> }
> 
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> ---
>  include/trace/events/9p.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
> index 4dfa6d7f83ba..8690a7086252 100644
> --- a/include/trace/events/9p.h
> +++ b/include/trace/events/9p.h
> @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
>  		    __entry->clnt   =  clnt;
>  		    __entry->type   =  pdu->id;
>  		    __entry->tag    =  pdu->tag;
> -		    memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> +		    memcpy(__entry->line, pdu->sdata,
> +				min(pdu->capacity, P9_PROTO_DUMP_SZ));
>  		    ),
>  	    TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
>  		      (unsigned long)__entry->clnt, show_9p_op(__entry->type),
Dominique Martinet Dec. 2, 2023, 7:19 a.m. UTC | #2
asmadeus@codewreck.org wrote on Sat, Dec 02, 2023 at 01:35:18PM +0900:
> > diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
> > index 4dfa6d7f83ba..8690a7086252 100644
> > --- a/include/trace/events/9p.h
> > +++ b/include/trace/events/9p.h
> > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
> >  		    __entry->clnt   =  clnt;
> >  		    __entry->type   =  pdu->id;
> >  		    __entry->tag    =  pdu->tag;
> > -		    memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> > +		    memcpy(__entry->line, pdu->sdata,
> > +				min(pdu->capacity, P9_PROTO_DUMP_SZ));

Building with W=1 yields a warning:
./include/linux/minmax.h:21:35: warning: comparison of distinct pointer types lacks a cast
...
./include/trace/events/9p.h:189:33: note: in expansion of macro ‘min’
  189 |                                 min(pdu->capacity, P9_PROTO_DUMP_SZ));

I've updated the patch to:
+				min_t(size_t, pdu->capacity, P9_PROTO_DUMP_SZ));

and pushed to my -next branch:
https://github.com/martinetd/linux/commits/9p-next
Christian Schoenebeck Dec. 2, 2023, 1:05 p.m. UTC | #3
On Saturday, December 2, 2023 5:35:18 AM CET asmadeus@codewreck.org wrote:
> JP Kobryn wrote on Fri, Dec 01, 2023 at 07:04:10PM -0800:
> > An out of bounds read can occur within the tracepoint 9p_protocol_dump().
> > In the fast assign, there is a memcpy that uses a constant size of 32
> > (macro definition as P9_PROTO_DUMP_SZ). When the copy is invoked, the
> > source buffer is not guaranteed match this size. It was found that in some
> > cases the source buffer size is less than 32, resulting in a read that
> > overruns.
> > 
> > The size of the source buffer seems to be known at the time of the
> > tracepoint being invoked. The allocations happen within p9_fcall_init(),
> > where the capacity field is set to the allocated size of the payload
> > buffer. This patch tries to fix the overrun by using the minimum of that
> > field (size of source buffer) and the size of destination buffer when
> > performing the copy.
> 
> Good catch; this is a regression due to a semi-recent optimization in
> commit 60ece0833b6c ("net/9p: allocate appropriate reduced message
> buffers")

Indeed, didn't have this one on screen! Thanks!

> For some reason I thought we rounded up small messages alloc to 4k but
> I've just confirmed we don't, so these overruns are quite frequent.
> I'll add the fixes tag and cc to stable if there's no other comment.

Yeah, in p9_msg_buf_size() [net/9p/protocol.c] the smallest allocation size
for message types known to be small (at compile-time) is hard coded to 4k.

However for all variable-size message types the size is calculated at runtime
exactly as needed for that particular message being sent. So these 9p message
types can trigger this case (<32). They are currently never rounded up.

[...]
> > diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
> > index 4dfa6d7f83ba..8690a7086252 100644
> > --- a/include/trace/events/9p.h
> > +++ b/include/trace/events/9p.h
> > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
> >  		    __entry->clnt   =  clnt;
> >  		    __entry->type   =  pdu->id;
> >  		    __entry->tag    =  pdu->tag;
> > -		    memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> > +		    memcpy(__entry->line, pdu->sdata,
> > +				min(pdu->capacity, P9_PROTO_DUMP_SZ));
> >  		    ),
> >  	    TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> >  		      (unsigned long)__entry->clnt, show_9p_op(__entry->type),

AFAICS __entry is a local variable on stack, and array __entry->line not
intialized with zeros, i.e. the dump would contain trash at the end. Maybe
prepending memset() before memcpy()?

/Christian
Steven Rostedt Dec. 3, 2023, 1:14 a.m. UTC | #4
On Sat, 02 Dec 2023 14:05:24 +0100
Christian Schoenebeck <linux_oss@crudebyte.com> wrote:

> > > --- a/include/trace/events/9p.h
> > > +++ b/include/trace/events/9p.h
> > > @@ -185,7 +185,8 @@ TRACE_EVENT(9p_protocol_dump,
> > >  		    __entry->clnt   =  clnt;
> > >  		    __entry->type   =  pdu->id;
> > >  		    __entry->tag    =  pdu->tag;
> > > -		    memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
> > > +		    memcpy(__entry->line, pdu->sdata,
> > > +				min(pdu->capacity, P9_PROTO_DUMP_SZ));
> > >  		    ),
> > >  	    TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> > >  		      (unsigned long)__entry->clnt, show_9p_op(__entry->type),  
> 
> AFAICS __entry is a local variable on stack, and array __entry->line not
> intialized with zeros, i.e. the dump would contain trash at the end. Maybe
> prepending memset() before memcpy()?

__entry is a macro that points into the ring buffer that gets allocated
before this is called. TRACE_EVENT() has a __dynamic_array() field that
can handle variable length arrays. What you can do is turn this into
something like:

TRACE_EVENT(9p_protocol_dump,
            TP_PROTO(struct p9_client *clnt, struct p9_fcall *pdu),

            TP_ARGS(clnt, pdu),

            TP_STRUCT__entry(
                    __field(    void *,         clnt                            )
                    __field(    __u8,           type                            )
                    __field(    __u16,          tag                             )
                    __dynamic_array(unsigned char,  line, min(pdu->capacity, P9_PROTO_DUMP_SZ) )
                    ),

            TP_fast_assign(
                    __entry->clnt   =  clnt;
                    __entry->type   =  pdu->id;
                    __entry->tag    =  pdu->tag;
                    memcpy(__get_dynamic_array(line), pdu->sdata,
			   min(pdu->capacity, P9_PROTO_DUMP_SZ));
                    ),
            TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
                      (unsigned long)__entry->clnt, show_9p_op(__entry->type),
                      __entry->tag, 0, __get_dynamic_array(line), 16,
		      __get_dynamic_array(line) + 16)
 );

-- Steve
Dominique Martinet Dec. 3, 2023, 1:33 a.m. UTC | #5
Steven Rostedt wrote on Sat, Dec 02, 2023 at 08:14:09PM -0500:
> > AFAICS __entry is a local variable on stack, and array __entry->line not
> > intialized with zeros, i.e. the dump would contain trash at the end. Maybe
> > prepending memset() before memcpy()?

Well spotted!
Now I'm thinking about it we weren't initializing the source buffer
either back when we had (>32) msize allocations, so these already had
been printing garbage, but might as well get this sorted out while we're
here.

> __entry is a macro that points into the ring buffer that gets allocated
> before this is called. TRACE_EVENT() has a __dynamic_array() field that
> can handle variable length arrays. What you can do is turn this into
> something like:
> 
> TRACE_EVENT(9p_protocol_dump,
>             TP_PROTO(struct p9_client *clnt, struct p9_fcall *pdu),
> 
>             TP_ARGS(clnt, pdu),
> 
>             TP_STRUCT__entry(
>                     __field(    void *,         clnt                            )
>                     __field(    __u8,           type                            )
>                     __field(    __u16,          tag                             )
>                     __dynamic_array(unsigned char,  line, min(pdu->capacity, P9_PROTO_DUMP_SZ) )
>                     ),
> 
>             TP_fast_assign(
>                     __entry->clnt   =  clnt;
>                     __entry->type   =  pdu->id;
>                     __entry->tag    =  pdu->tag;
>                     memcpy(__get_dynamic_array(line), pdu->sdata,
> 			   min(pdu->capacity, P9_PROTO_DUMP_SZ));
>                     ),
>             TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
>                       (unsigned long)__entry->clnt, show_9p_op(__entry->type),
>                       __entry->tag, 0, __get_dynamic_array(line), 16,
> 		      __get_dynamic_array(line) + 16)

This was just printing garbage in the previous version but %16ph with a
dynamic alloc would be out of range (even the start of the next buffer,
_get_dynamic_array(line) + 16, can be out of range)

Also, for custom tracepoints e.g. bpftrace the program needs to know how
many bytes can be read safely even if it's just for dumping -- unless
dynamic_array is a "fat pointer" that conveys its own size?
(Sorry didn't take the time to check)

So I see two ways forward:
 - We can give up on the 16 bytes split here, add the size in one of the
fields, and print with %*ph using that size.
 - Or just give up and zero the tail; I'm surprised there's no "memcpy
up to x bytes and zero up to y bytes if required" helper but Christian's
suggestion of always doing memset first is probably not that bad
performance-wise if someone's dumping these out already.

I don't have a hard preference here, what do you think?
Steven Rostedt Dec. 3, 2023, 4:15 a.m. UTC | #6
On Sun, 3 Dec 2023 10:33:32 +0900
Dominique Martinet <asmadeus@codewreck.org> wrote:


> >             TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
> >                       (unsigned long)__entry->clnt, show_9p_op(__entry->type),
> >                       __entry->tag, 0, __get_dynamic_array(line), 16,
> > 		      __get_dynamic_array(line) + 16)  
> 
> This was just printing garbage in the previous version but %16ph with a
> dynamic alloc would be out of range (even the start of the next buffer,
> _get_dynamic_array(line) + 16, can be out of range)
> 
> Also, for custom tracepoints e.g. bpftrace the program needs to know how
> many bytes can be read safely even if it's just for dumping -- unless
> dynamic_array is a "fat pointer" that conveys its own size?
> (Sorry didn't take the time to check)

Yes, there's also a __get_dynamic_array_len(line) that will return the
allocated length of the line. Is that what you need?

-- Steve
Dominique Martinet Dec. 3, 2023, 5:32 a.m. UTC | #7
Steven Rostedt wrote on Sat, Dec 02, 2023 at 11:15:24PM -0500:
> > Also, for custom tracepoints e.g. bpftrace the program needs to know how
> > many bytes can be read safely even if it's just for dumping -- unless
> > dynamic_array is a "fat pointer" that conveys its own size?
> > (Sorry didn't take the time to check)
> 
> Yes, there's also a __get_dynamic_array_len(line) that will return the
> allocated length of the line. Is that what you need?

Yes, thanks! So the lower two bytes of the field are its position in
the entry and the higher two bytes its size; ok.
It doesn't look like bpftrace has any helper for it but that can
probably be sorted out if someone wants to dump data there.


Let's update the event to use a dynamic array and have printk fomrat to
use %*ph with that length.

JP Kobryn, does that sound good to you? I'm not sure what you were
trying to do in the first place.
Do you want to send a v2 or shall I?
JP Kobryn Dec. 4, 2023, 4:20 p.m. UTC | #8
On Sun, Dec 03, 2023 at 02:32:15PM +0900, Dominique Martinet wrote:
> Steven Rostedt wrote on Sat, Dec 02, 2023 at 11:15:24PM -0500:
> > > Also, for custom tracepoints e.g. bpftrace the program needs to know how
> > > many bytes can be read safely even if it's just for dumping -- unless
> > > dynamic_array is a "fat pointer" that conveys its own size?
> > > (Sorry didn't take the time to check)
> > 
> > Yes, there's also a __get_dynamic_array_len(line) that will return the
> > allocated length of the line. Is that what you need?
> 
> Yes, thanks! So the lower two bytes of the field are its position in
> the entry and the higher two bytes its size; ok.
> It doesn't look like bpftrace has any helper for it but that can
> probably be sorted out if someone wants to dump data there.
> 
> 
> Let's update the event to use a dynamic array and have printk fomrat to
> use %*ph with that length.
> 
> JP Kobryn, does that sound good to you? I'm not sure what you were
> trying to do in the first place.
> Do you want to send a v2 or shall I?

Sounds good. I'll send out a v2. Thanks Steve for recommending the
dynamic array macros.

JP
> 
> -- 
> Dominique Martinet | Asmadeus
diff mbox series

Patch

diff --git a/include/trace/events/9p.h b/include/trace/events/9p.h
index 4dfa6d7f83ba..8690a7086252 100644
--- a/include/trace/events/9p.h
+++ b/include/trace/events/9p.h
@@ -185,7 +185,8 @@  TRACE_EVENT(9p_protocol_dump,
 		    __entry->clnt   =  clnt;
 		    __entry->type   =  pdu->id;
 		    __entry->tag    =  pdu->tag;
-		    memcpy(__entry->line, pdu->sdata, P9_PROTO_DUMP_SZ);
+		    memcpy(__entry->line, pdu->sdata,
+				min(pdu->capacity, P9_PROTO_DUMP_SZ));
 		    ),
 	    TP_printk("clnt %lu %s(tag = %d)\n%.3x: %16ph\n%.3x: %16ph\n",
 		      (unsigned long)__entry->clnt, show_9p_op(__entry->type),