Message ID | 20200513160038.2482415-15-hch@lst.de (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [01/18] maccess: unexport probe_kernel_write and probe_user_write | expand |
On Wed, May 13, 2020 at 9:01 AM Christoph Hellwig <hch@lst.de> wrote: > > + arch_kernel_read(dst, src, type, err_label); \ I'm wondering if (a) we shouldn't expose this as an interface in general (b) it wouldn't be named differently.. The reason for (a) is that several users of the "copy_from_kernel_nofault()" interfaces just seem to want a single access from kernel mode. The reason for (b) is that if we do expose this as a normal interface, it shouldn't be called "arch_kernel_read", and it should have the same semantics as "get_user_unsafe()". IOW, maybe we should simply do exactly that: have a "get_kernel_nofault()" thing that looks exactly like unsafe_get_user(). On x86, it would basically be identical to unsafe_get_user(). And on architectures that only have the copy function, you'd just have a fallback something like this: #define get_kernel_nofault(dst, src, err_label) do { \ typeof (*src) __gkn_result; \ if (probe_kernel_read(&__gkn_result, src) < 0) \ goto err_label; \ (dst) = __gkn_result; \ } while (0) and now the people who want to read a single kernel word can just do get_kernel_nofault(n, untrusted_pointer, error); and they're done. And some day - when we get reliably "asm goto" wiith outputs - that "get_kernel_fault()" will literally be a single instruction asm with the proper exception handler marker, the way "put_user_unsafe()" already works (and the way "put_kernel_nofault()" would already work if it does the above). Linus
On Wed, May 13, 2020 at 12:36:18PM -0700, Linus Torvalds wrote: > > + arch_kernel_read(dst, src, type, err_label); \ > > I'm wondering if > > (a) we shouldn't expose this as an interface in general We do export something like it, currently it is called probe_kernel_address, and the last patch renames it to get_kernel_nofault. However it is implemented as a wrapper around probe_kernel_address / copy_from_kernel_nofault and thus not quite as efficient and without the magic goto semantics. > (b) it wouldn't be named differently.. It probably should with all the renaming..
On Wed, May 13, 2020 at 12:40 PM Christoph Hellwig <hch@lst.de> wrote: > > We do export something like it, currently it is called > probe_kernel_address, and the last patch renames it to > get_kernel_nofault. However it is implemented as a wrapper > around probe_kernel_address / copy_from_kernel_nofault and thus > not quite as efficient and without the magic goto semantics. Looking at the current users of "probe_kernel_read()", it looks like it's almost mostly things that just want a single byte or word. It's not 100% that: we definitely do several things that want the "copy" semantics vs the "get" semantics: on the x86 side we have CALL_INSN_SIZE and MAX_INSN_SIZE, and the ldttss_desc. But the bulk of them do seem to be a single value. I don't know if performance really matters here, but to me the whole "most users seem to want to read a single value" is what makes me think that maybe that should be the primary model, rather than have the copy model be the primary one and then we implement the single value case (badly) with a copy. It probably doesn't matter that much. I certainly wouldn't hold this series up over it - it can be a future thing. Linus
On Wed, May 13, 2020 at 12:48:54PM -0700, Linus Torvalds wrote: > Looking at the current users of "probe_kernel_read()", it looks like > it's almost mostly things that just want a single byte or word. > > It's not 100% that: we definitely do several things that want the > "copy" semantics vs the "get" semantics: on the x86 side we have > CALL_INSN_SIZE and MAX_INSN_SIZE, and the ldttss_desc. > > But the bulk of them do seem to be a single value. > > I don't know if performance really matters here, but to me the whole > "most users seem to want to read a single value" is what makes me > think that maybe that should be the primary model, rather than have > the copy model be the primary one and then we implement the single > value case (badly) with a copy. > > It probably doesn't matter that much. I certainly wouldn't hold this > series up over it - it can be a future thing. I can make the get_kernel_nofault implementation suck a little less :) Note that the arch helper (we could call it unsafe_get_kernel_nofault) we still need to have a pagefault_disable / pagefault_enable pair around the calls. So maybe keep the get_kernel_nofault interface as-is (without the goto label), and prepare the arch helpers for being used similar to unsafe_get_user once all architectures are converted. And I can throw in a few patches to convert callers from the copy semantics to the get semantics.
Hi Christoph, On Wed, 13 May 2020 18:00:34 +0200 Christoph Hellwig <hch@lst.de> wrote: > Provide alternative versions of probe_kernel_read, probe_kernel_write > and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead > use arch hooks that are modelled after unsafe_{get,put}_user to access > kernel memory in an exception safe way. This patch seems to introduce new implementation of probe_kernel_read/write() and strncpy_from_kernel_unsafe(), but also drops copy_from/to_kernel_nofault() and strncpy_from_kernel_nofault() if HAVE_ARCH_PROBE_KERNEL is defined. In the result, this cause a link error with BPF and kprobe events. BTW, what is the difference of *_unsafe() and *_nofault()? (maybe we make those to *_nofault() finally?) Thank you, > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 76 insertions(+) > > diff --git a/mm/maccess.c b/mm/maccess.c > index 9773e2253b495..e9efe2f98e34a 100644 > --- a/mm/maccess.c > +++ b/mm/maccess.c > @@ -12,6 +12,81 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src, > return true; > } > > +#ifdef HAVE_ARCH_PROBE_KERNEL > + > +#define probe_kernel_read_loop(dst, src, len, type, err_label) \ > + while (len >= sizeof(type)) { \ > + arch_kernel_read(dst, src, type, err_label); \ > + dst += sizeof(type); \ > + src += sizeof(type); \ > + len -= sizeof(type); \ > + } > + > +long probe_kernel_read(void *dst, const void *src, size_t size) > +{ > + if (!probe_kernel_read_allowed(dst, src, size)) > + return -EFAULT; > + > + pagefault_disable(); > + probe_kernel_read_loop(dst, src, size, u64, Efault); > + probe_kernel_read_loop(dst, src, size, u32, Efault); > + probe_kernel_read_loop(dst, src, size, u16, Efault); > + probe_kernel_read_loop(dst, src, size, u8, Efault); > + pagefault_enable(); > + return 0; > +Efault: > + pagefault_enable(); > + return -EFAULT; > +} > +EXPORT_SYMBOL_GPL(probe_kernel_read); > + > +#define probe_kernel_write_loop(dst, src, len, type, err_label) \ > + while (len >= sizeof(type)) { \ > + arch_kernel_write(dst, src, type, err_label); \ > + dst += sizeof(type); \ > + src += sizeof(type); \ > + len -= sizeof(type); \ > + } > + > +long probe_kernel_write(void *dst, const void *src, size_t size) > +{ > + pagefault_disable(); > + probe_kernel_write_loop(dst, src, size, u64, Efault); > + probe_kernel_write_loop(dst, src, size, u32, Efault); > + probe_kernel_write_loop(dst, src, size, u16, Efault); > + probe_kernel_write_loop(dst, src, size, u8, Efault); > + pagefault_enable(); > + return 0; > +Efault: > + pagefault_enable(); > + return -EFAULT; > +} > + > +long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count) > +{ > + const void *src = unsafe_addr; > + > + if (unlikely(count <= 0)) > + return 0; > + if (!probe_kernel_read_allowed(dst, unsafe_addr, count)) > + return -EFAULT; > + > + pagefault_disable(); > + do { > + arch_kernel_read(dst, src, u8, Efault); > + dst++; > + src++; > + } while (dst[-1] && src - unsafe_addr < count); > + pagefault_enable(); > + > + dst[-1] = '\0'; > + return src - unsafe_addr; > +Efault: > + pagefault_enable(); > + dst[-1] = '\0'; > + return -EFAULT; > +} > +#else /* HAVE_ARCH_PROBE_KERNEL */ > /** > * probe_kernel_read(): safely attempt to read from kernel-space > * @dst: pointer to the buffer that shall take the data > @@ -114,6 +189,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count) > > return ret ? -EFAULT : src - unsafe_addr; > } > +#endif /* HAVE_ARCH_PROBE_KERNEL */ > > /** > * probe_user_read(): safely attempt to read from a user-space location > -- > 2.26.2 >
On Sat, May 16, 2020 at 12:42:59PM +0900, Masami Hiramatsu wrote: > > Provide alternative versions of probe_kernel_read, probe_kernel_write > > and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead > > use arch hooks that are modelled after unsafe_{get,put}_user to access > > kernel memory in an exception safe way. > > This patch seems to introduce new implementation of probe_kernel_read/write() > and strncpy_from_kernel_unsafe(), but also drops copy_from/to_kernel_nofault() > and strncpy_from_kernel_nofault() if HAVE_ARCH_PROBE_KERNEL is defined. > In the result, this cause a link error with BPF and kprobe events. That was just a bug as I didn't commit the changes to switch everything to _nofault and remove _unsafe entirely, sorry.
diff --git a/mm/maccess.c b/mm/maccess.c index 9773e2253b495..e9efe2f98e34a 100644 --- a/mm/maccess.c +++ b/mm/maccess.c @@ -12,6 +12,81 @@ bool __weak probe_kernel_read_allowed(void *dst, const void *unsafe_src, return true; } +#ifdef HAVE_ARCH_PROBE_KERNEL + +#define probe_kernel_read_loop(dst, src, len, type, err_label) \ + while (len >= sizeof(type)) { \ + arch_kernel_read(dst, src, type, err_label); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } + +long probe_kernel_read(void *dst, const void *src, size_t size) +{ + if (!probe_kernel_read_allowed(dst, src, size)) + return -EFAULT; + + pagefault_disable(); + probe_kernel_read_loop(dst, src, size, u64, Efault); + probe_kernel_read_loop(dst, src, size, u32, Efault); + probe_kernel_read_loop(dst, src, size, u16, Efault); + probe_kernel_read_loop(dst, src, size, u8, Efault); + pagefault_enable(); + return 0; +Efault: + pagefault_enable(); + return -EFAULT; +} +EXPORT_SYMBOL_GPL(probe_kernel_read); + +#define probe_kernel_write_loop(dst, src, len, type, err_label) \ + while (len >= sizeof(type)) { \ + arch_kernel_write(dst, src, type, err_label); \ + dst += sizeof(type); \ + src += sizeof(type); \ + len -= sizeof(type); \ + } + +long probe_kernel_write(void *dst, const void *src, size_t size) +{ + pagefault_disable(); + probe_kernel_write_loop(dst, src, size, u64, Efault); + probe_kernel_write_loop(dst, src, size, u32, Efault); + probe_kernel_write_loop(dst, src, size, u16, Efault); + probe_kernel_write_loop(dst, src, size, u8, Efault); + pagefault_enable(); + return 0; +Efault: + pagefault_enable(); + return -EFAULT; +} + +long strncpy_from_kernel_unsafe(char *dst, const void *unsafe_addr, long count) +{ + const void *src = unsafe_addr; + + if (unlikely(count <= 0)) + return 0; + if (!probe_kernel_read_allowed(dst, unsafe_addr, count)) + return -EFAULT; + + pagefault_disable(); + do { + arch_kernel_read(dst, src, u8, Efault); + dst++; + src++; + } while (dst[-1] && src - unsafe_addr < count); + pagefault_enable(); + + dst[-1] = '\0'; + return src - unsafe_addr; +Efault: + pagefault_enable(); + dst[-1] = '\0'; + return -EFAULT; +} +#else /* HAVE_ARCH_PROBE_KERNEL */ /** * probe_kernel_read(): safely attempt to read from kernel-space * @dst: pointer to the buffer that shall take the data @@ -114,6 +189,7 @@ long strncpy_from_kernel_nofault(char *dst, const void *unsafe_addr, long count) return ret ? -EFAULT : src - unsafe_addr; } +#endif /* HAVE_ARCH_PROBE_KERNEL */ /** * probe_user_read(): safely attempt to read from a user-space location
Provide alternative versions of probe_kernel_read, probe_kernel_write and strncpy_from_kernel_unsafe that don't need set_fs magic, but instead use arch hooks that are modelled after unsafe_{get,put}_user to access kernel memory in an exception safe way. Signed-off-by: Christoph Hellwig <hch@lst.de> --- mm/maccess.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+)