[git,pull] device mapper changes for 6.1

Message ID	Y07SYs98z5VNxdZq@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-block-owner@kernel.org> Date: Tue, 18 Oct 2022 12:20:50 -0400 From: Mike Snitzer <snitzer@kernel.org> To: Linus Torvalds <torvalds@linux-foundation.org> Cc: dm-devel@redhat.com, linux-block@vger.kernel.org, Alasdair G Kergon <agk@redhat.com>, Genjian Zhang <zhanggenjian@kylinos.cn>, Jiangshan Yi <yijiangshan@kylinos.cn>, Jilin Yuan <yuanjilin@cdjrlc.com>, Mikulas Patocka <mpatocka@redhat.com>, Milan Broz <gmazyland@gmail.com>, Nathan Huckleberry <nhuck@google.com>, Nikos Tsironis <ntsironis@arrikto.com>, Shaomin Deng <dengshaomin@cdjrlc.com> Subject: [git pull] device mapper changes for 6.1 Message-ID: <Y07SYs98z5VNxdZq@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Precedence: bulk
Series	[git,pull] device mapper changes for 6.1 \| expand [git,pull] device mapper changes for 6.1

Mike Snitzer Oct. 18, 2022, 4:20 p.m. UTC

Hi Linus,

I missed sending the DM changes during the 6.1 merge window. Slipped my
mind largely due to there not being anything super urgent or more
elaborate this cycle.

But there is one additional stable@ fix from Mikulas that I'll send
separately since it requires the recently introduced test_bit_acquire().

The following changes since commit 1c23f9e627a7b412978b4e852793c5e3c3efc555:

  Linux 6.0-rc2 (2022-08-21 17:32:54 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git tags/for-6.1/dm-changes

for you to fetch changes up to a871fb26aba8911ea313dc7bd28f3e788a80fdb4:

  dm clone: Fix typo in block_device format specifier (2022-10-04 19:00:22 -0400)

Please pull, thanks.
Mike

----------------------------------------------------------------
- DM core replace DMWARN with DMERR or DMCRIT for fatal errors.

- Enhance DM ioctl interface to allow returning an error string to
  userspace. Depends on exporting is_vmalloc_or_module_addr() to allow
  DM core to conditionally free memory allocated with kasprintf().

- Enable WQ_HIGHPRI on DM verity target's verify_wq.

- Add documentation for DM verity's try_verify_in_tasklet option.

- Various typo and redundant word fixes in code and/or comments.

----------------------------------------------------------------
Genjian Zhang (1):
      dm: remove unnecessary assignment statement in alloc_dev()

Jiangshan Yi (1):
      dm raid: fix typo in analyse_superblocks code comment

Jilin Yuan (1):
      dm raid: delete the redundant word 'that' in comment

Mikulas Patocka (4):
      dm: change from DMWARN to DMERR or DMCRIT for fatal errors
      dm ioctl: add an option to return an error string to userspace
      mm: export is_vmalloc_or_module_addr
      dm: support allocating error strings to enhance errors returned to userspace

Milan Broz (1):
      dm verity: Add documentation for try_verify_in_tasklet option

Nathan Huckleberry (1):
      dm verity: enable WQ_HIGHPRI on verify_wq

Nikos Tsironis (1):
      dm clone: Fix typo in block_device format specifier

Shaomin Deng (1):
      dm cache: delete the redundant word 'each' in comment

 Documentation/admin-guide/device-mapper/verity.rst |   4 +
 drivers/md/dm-cache-policy.h                       |   2 +-
 drivers/md/dm-clone-target.c                       |   2 +-
 drivers/md/dm-ioctl.c                              | 125 +++++++++++-------
 drivers/md/dm-raid.c                               |   4 +-
 drivers/md/dm-rq.c                                 |   4 +-
 drivers/md/dm-stats.c                              |   2 +-
 drivers/md/dm-table.c                              | 139 +++++++++++----------
 drivers/md/dm-verity-target.c                      |  18 +--
 drivers/md/dm.c                                    |   9 +-
 include/linux/device-mapper.h                      |  18 ++-
 include/uapi/linux/dm-ioctl.h                      |  14 ++-
 mm/vmalloc.c                                       |   1 +
 13 files changed, 205 insertions(+), 137 deletions(-)

Christoph Hellwig Oct. 18, 2022, 6:17 p.m. UTC | #1

On Tue, Oct 18, 2022 at 12:20:50PM -0400, Mike Snitzer wrote:
> 
> - Enhance DM ioctl interface to allow returning an error string to
>   userspace. Depends on exporting is_vmalloc_or_module_addr() to allow
>   DM core to conditionally free memory allocated with kasprintf().

That really does not sound like a good idea at all.  And it does not
seem to have any MM or core maintainer signoffs.

Mike Snitzer Oct. 18, 2022, 6:48 p.m. UTC | #2

On Tue, Oct 18 2022 at  2:17P -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Tue, Oct 18, 2022 at 12:20:50PM -0400, Mike Snitzer wrote:
> > 
> > - Enhance DM ioctl interface to allow returning an error string to
> >   userspace. Depends on exporting is_vmalloc_or_module_addr() to allow
> >   DM core to conditionally free memory allocated with kasprintf().
> 
> That really does not sound like a good idea at all.  And it does not
> seem to have any MM or core maintainer signoffs.

Sorry, I should've solicited proper review more loudly.

But if you have a specific concern with how DM is looking to use
is_vmalloc_or_module_addr() please say what that is.

Mike

Linus Torvalds Oct. 18, 2022, 6:54 p.m. UTC | #3

On Tue, Oct 18, 2022 at 11:17 AM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Oct 18, 2022 at 12:20:50PM -0400, Mike Snitzer wrote:
> >
> > - Enhance DM ioctl interface to allow returning an error string to
> >   userspace. Depends on exporting is_vmalloc_or_module_addr() to allow
> >   DM core to conditionally free memory allocated with kasprintf().
>
> That really does not sound like a good idea at all.  And it does not
> seem to have any MM or core maintainer signoffs.

I wouldn't worry about maintainer sign-offs just for exporting a
helper function, but I agree with the whole concept being a complete
disaster and not a good idea at all.

Use errno.

It really is that simple. Strings have been discussed before, and they
are simply not a good idea. If your interface is so complicated that
you think errors need some textual explanation, your interface is
probably garbage.

Strings also have allocation issues (as you found out), and have
serious localization issues.

Yes, we do a lot of strings in the kernel in the form of dmesg, and we
have the rule that we simply don't localize. But that's dmesg. It's
for special stuff, not some interface.

And equally importantly, some really small detail in the kernel really
has *NO* business making up new error models of its own. You may think
that the DM ioctl's are a big and important deal, but realistically,
it's just an odd corner of the world that very very few people care
about, and they can use the same error numbers that EVERYBODY ELSE HAS
BEEN USING FOR SIX DECADES!

Don't reinvent something that works - badly.

I think we have one major interface that is string-based (apart from
the obvious pathname ones and the strings passed to 'execve()').

It's 'mount()' (and now fsconfig() etc), and it's string-based mainly
because it has that nasty "arbitrary things that different filesystem
may need for configuration"). And it has some nasty logging model
associated with it too for output.

But no, we absolutely do *not* want to emulate that particular horror
anywhere else.

If you think some errors are really important and hard to understand,
maybe you can just log them with a ratelimited pr_info() or something.

           Linus

Linus Torvalds Oct. 18, 2022, 7:19 p.m. UTC | #4

On Tue, Oct 18, 2022 at 11:54 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> But no, we absolutely do *not* want to emulate that particular horror
> anywhere else.

Side note: if DM people go "Hmm, a lot of our management really does
have the exact same issues as 'mount()' and friends, and doing the
same things as mount does with the whole string interface and logging
sounds like a good idea", I want to nip that in the bud.

It's most definitely *not* a good idea. The mount thing is nasty, it's
just that we've always had the odd string interface, and it's just
grown from "const void *data" to be a whole complicated set of context
handling.

So don't even think about duplicating that thing.

Now, if some "inspired" person then thinks that instead of duplicating
it, you really would want to do device mapper *as* a filesystem and
actually using the fsconfig() model directly and natively, that is at
least conceptually not necessarily wrong. At what point does a
"translate disk blocks and munge contents" turn from "map devices into
other devices" to a "map devices into a filesystem"? We've had loop
devices forever, and they already show how filesystems and block
devices can be a mixed concept.

And no, I'm not seriously suggesting that as a "do this instead".

I'm just saying that from an interface standpoint, we _do_ have an
interface that is kind of doing this, and that is already an area
where a lot of people think there is a lot of commonalities (ie a
number of filesystems end up doing their own device mapping
internally, and people used to say "layering violation - please use dm
for that" before they got used/resigned to it because the filesystem
people really wanted to control the mapping).

In the absence of that kind of unification, just use 'errno'.

               Linus

Mikulas Patocka Oct. 18, 2022, 8:28 p.m. UTC | #5

On Tue, 18 Oct 2022, Linus Torvalds wrote:

> On Tue, Oct 18, 2022 at 11:17 AM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Oct 18, 2022 at 12:20:50PM -0400, Mike Snitzer wrote:
> > >
> > > - Enhance DM ioctl interface to allow returning an error string to
> > >   userspace. Depends on exporting is_vmalloc_or_module_addr() to allow
> > >   DM core to conditionally free memory allocated with kasprintf().
> >
> > That really does not sound like a good idea at all.  And it does not
> > seem to have any MM or core maintainer signoffs.
> 
> I wouldn't worry about maintainer sign-offs just for exporting a
> helper function, but I agree with the whole concept being a complete
> disaster and not a good idea at all.
> 
> Use errno.
> 
> It really is that simple. Strings have been discussed before, and they
> are simply not a good idea. If your interface is so complicated that
> you think errors need some textual explanation, your interface is
> probably garbage.
> 
> Strings also have allocation issues (as you found out), and have
> serious localization issues.
> 
> Yes, we do a lot of strings in the kernel in the form of dmesg, and we
> have the rule that we simply don't localize. But that's dmesg. It's
> for special stuff, not some interface.
> 
> And equally importantly, some really small detail in the kernel really
> has *NO* business making up new error models of its own. You may think
> that the DM ioctl's are a big and important deal, but realistically,
> it's just an odd corner of the world that very very few people care
> about, and they can use the same error numbers that EVERYBODY ELSE HAS
> BEEN USING FOR SIX DECADES!
> 
> Don't reinvent something that works - badly.
> 
> I think we have one major interface that is string-based (apart from
> the obvious pathname ones and the strings passed to 'execve()').
> 
> It's 'mount()' (and now fsconfig() etc), and it's string-based mainly
> because it has that nasty "arbitrary things that different filesystem
> may need for configuration"). And it has some nasty logging model
> associated with it too for output.
> 
> But no, we absolutely do *not* want to emulate that particular horror
> anywhere else.
> 
> If you think some errors are really important and hard to understand,
> maybe you can just log them with a ratelimited pr_info() or something.

This is what we currently do.

>            Linus

The error string is not intended to be parsed by userspace (I agree that 
parsing the error string is a horrible idea, but this is not going to 
happen). It is intended to be displayed to the user by tools such as 
cryptsetup or integritysetup. The tool can't read the log, extract 
messages from it and display them.

With "just use errno", the user sees messages like "device-mapper: reload 
ioctl on test (254:0) failed: No such file or directory" and it's not much 
useful because it doesn't tell what went wrong.

Try to type "grep -r 'ti->error = ' drivers/md/|wc -l". There are 480 
distinct error messages generated by device mapper. You can't map each of 
them to a unique errno number.

BTW. we were talking about replacing device mapper version numbers with 
feature bitmaps and people preferred textual lists of features instead of 
bitmaps (because the bitmap will overflow when you have more than 64 
features). Do you oppose to this too? Do you prefer a 64-bit feature 
bitmap or a string with comma-separated list of features?

Mikulas

Mike Snitzer Oct. 18, 2022, 9:13 p.m. UTC | #6

On Tue, Oct 18 2022 at  3:19P -0400,
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Tue, Oct 18, 2022 at 11:54 AM Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > But no, we absolutely do *not* want to emulate that particular horror
> > anywhere else.
> 
> Side note: if DM people go "Hmm, a lot of our management really does
> have the exact same issues as 'mount()' and friends, and doing the
> same things as mount does with the whole string interface and logging
> sounds like a good idea", I want to nip that in the bud.
> 
> It's most definitely *not* a good idea. The mount thing is nasty, it's
> just that we've always had the odd string interface, and it's just
> grown from "const void *data" to be a whole complicated set of context
> handling.
> 
> So don't even think about duplicating that thing.
> 
> Now, if some "inspired" person then thinks that instead of duplicating
> it, you really would want to do device mapper *as* a filesystem and
> actually using the fsconfig() model directly and natively, that is at
> least conceptually not necessarily wrong. At what point does a
> "translate disk blocks and munge contents" turn from "map devices into
> other devices" to a "map devices into a filesystem"? We've had loop
> devices forever, and they already show how filesystems and block
> devices can be a mixed concept.
> 
> And no, I'm not seriously suggesting that as a "do this instead".
> 
> I'm just saying that from an interface standpoint, we _do_ have an
> interface that is kind of doing this, and that is already an area
> where a lot of people think there is a lot of commonalities (ie a
> number of filesystems end up doing their own device mapping
> internally, and people used to say "layering violation - please use dm
> for that" before they got used/resigned to it because the filesystem
> people really wanted to control the mapping).
> 
> In the absence of that kind of unification, just use 'errno'.

Mikulas touches on why why using errno isn't useful for DM. And for
DM's device stacking it is hard to know which error spewed to dmesg
(via DMERR, DMCRIT, DMINFO, etc) is relevant to a particular admin
interface issuing the DM ioctl.

So the idea to send the (hopefully useful) error string back up to the
relevant userspace consumer was one task that seemed needed (based on
Milan Broz's various complaints against DM.. which sprang from your
regular remainder that DM's version numbers are broken and need to be
removed/replaced).

Making DM errors more useful to the endpoints causing them was dealt
with head-on with a couple small changes in this pull request. I
didn't think sending useful error strings to userspace was such a
contested design point.

All said, we'll have a look at dealing with your suggested fsconfig
unification (but it seems really awkward on the surface, maybe we can
at least distill out some subset that is common).

Mike

Milan Broz Oct. 18, 2022, 10:11 p.m. UTC | #7

On 10/18/22 23:13, Mike Snitzer wrote:
...
>> In the absence of that kind of unification, just use 'errno'.
> 
> Mikulas touches on why why using errno isn't useful for DM. And for
> DM's device stacking it is hard to know which error spewed to dmesg
> (via DMERR, DMCRIT, DMINFO, etc) is relevant to a particular admin
> interface issuing the DM ioctl.
> 
> So the idea to send the (hopefully useful) error string back up to the
> relevant userspace consumer was one task that seemed needed (based on
> Milan Broz's various complaints against DM.. which sprang from your
> regular remainder that DM's version numbers are broken and need to be
> removed/replaced).

Well, when you mention my complaints, I think we are moving from one extreme
to another.

For the error reporting - we use errno values in libcryptsetup everywhere,
so if DM ioctls (through libdevmapper we use) returns proper errno, this is
the minimal solution that helps here.
The problem is that ioctl() are often just translated to -EINVAL.
(Or lost in libdevmapper compatibility layers.)

 From the dm-crypt/verity/integrity perspective, ENOMEM, ENODEV (bad block device),
ENOTSUP/ENOENT (for crypto algs not available), EIO for IO error,
EILSEQ for data integrity failure is the basic what we need.
(I perhaps forgot some, I can go through the code if you need it.)

As a bonus, if DM ioctl() returns fixed string that describes user-friendly error
(like: "keysize not compatible" or so) that's all we need
(ti->error strings are already in DM targets).

I have never asked for dynamically allocated error strings in kernel
and I do not know Mikulas' motivation to implement it.

Milan

Linus Torvalds Oct. 18, 2022, 10:17 p.m. UTC | #8

On Tue, Oct 18, 2022 at 1:29 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> The error string is not intended to be parsed by userspace (I agree that
> parsing the error string is a horrible idea, but this is not going to
> happen).

I am happy we agree on that fundamental issue.

But it's also why error strings are a HORRIBLE HORRIBLE idea.

They are literally worse than just plain 'errno', exactly because user
space MUST NOT parse them.

They are worse because:

 - user space will parse them anyway, for localization reasons, so the
whole design is garbage

 - user space that correctly doesn't parse them cannot use them for
any helpful thing apart from just displaying them, which makes them
actively worse than having a number that you *can* make actual
decision on.

In other words, user space either can violate the fundamental rule of
"don't parse it", or they are actively weaker than then errno numbers
we already have.

Either way, they are worse. See?

>  It is intended to be displayed to the user by tools such as
> cryptsetup or integritysetup. The tool can't read the log, extract
> messages from it and display them.

Bullshit.

The tools could just use the error number and together with the
operation that failed, make a very good assumption on what went wrong.

And even when that assumption isn't some 100% "this is the reason",
the tool can easily print out helpful hints, like "This is often
because of Xyz".

And guess what? With an errno, the tool can do this MUCH BETTER.

It can localize the error messages.

It can do different things for different error messages.

And it will work with old kernels too.

> With "just use errno", the user sees messages like "device-mapper: reload
> ioctl on test (254:0) failed: No such file or directory" and it's not much
> useful because it doesn't tell what went wrong.

Again, I call bullshit.

You are saying "the tools isn't helpful, so let's change the kernel interface".

And that's just plain stupid.

if the tool isn't helpful, then FIX THE TOOL.

It's that simple.

The fact is, dm isn't special. We use 'errno' absolutely everywhere
else. What makes dm so special that the dm tools can't deal well with
them?

Look at the profile tools (just to give an example of a tool that is
in the kernel tree, so i can grep for it). Sometimes it does just

                if (aio_errno != EINTR)
                        pr_err("failed to write perf data, error: %m\n");

and prints that error string. But sometimes it does things like

                if (errno == EPERM) {
                        pr_err("Permission error mapping pages.\n"
                               "Consider increasing "
                               "/proc/sys/kernel/perf_event_mlock_kb,\n"
                               "or try again with a smaller value of
-m/--mmap_pages.\n"
                               "(current value: %u,%u)\n",
                               opts->mmap_pages, opts->auxtrace_mmap_pages);

because the tool isn't garbage.

You are basically saying that the kernel should generate those strings.

And I'm saying you are completely wrong, and that no, I will not pull
this kind of silly interface, because it's an actively horribly broken
garbage interface.

Furthermore, I'm telling you that you need to really *understand* that
no, device-mapper isn't some super-special thing that magically should
do string errors.

This is not something worth discussing. This is something where you
need to just realize that you are wrong.

End of story.

                  Linus

Linus Torvalds Oct. 18, 2022, 10:18 p.m. UTC | #9

On Tue, Oct 18, 2022 at 3:11 PM Milan Broz <gmazyland@gmail.com> wrote:
>
> The problem is that ioctl() are often just translated to -EINVAL.

Oh, that's absolutely a real problem.

Don't use one single error number.

           Linus

Jason A. Donenfeld Oct. 19, 2022, 6:18 a.m. UTC | #10

On Tue, Oct 18, 2022 at 11:54:49AM -0700, Linus Torvalds wrote:
> I think we have one major interface that is string-based (apart from
> the obvious pathname ones and the strings passed to 'execve()').
> 
> It's 'mount()' (and now fsconfig() etc), and it's string-based mainly
> because it has that nasty "arbitrary things that different filesystem
> may need for configuration"). And it has some nasty logging model
> associated with it too for output.
> 
> But no, we absolutely do *not* want to emulate that particular horror
> anywhere else.

This might ruin your day, but FYI, Netlink

   [...did you already run off screaming at the mention of Netlink?...]

Netlink has the whole "extack" extended acknowledgement mechanism, in
which netlink replies can and do contain error strings. Grep the kernel
for NL_SET_ERR_MSG and you'll see a bunch of these. (And as you
suggested, I wouldn't be surprised if some bad userspaces parse them.)

Jason

Christoph Hellwig Oct. 20, 2022, 8:16 a.m. UTC | #11

On Tue, Oct 18, 2022 at 02:48:01PM -0400, Mike Snitzer wrote:
> > That really does not sound like a good idea at all.  And it does not
> > seem to have any MM or core maintainer signoffs.
> 
> Sorry, I should've solicited proper review more loudly.
> 
> But if you have a specific concern with how DM is looking to use
> is_vmalloc_or_module_addr() please say what that is.

If I understand the patch correct it tries to use it to detect if
a string is a string global.  Besides being nasty API abuse I can't
see how this would even work if DM is built in.

Christoph Hellwig Oct. 20, 2022, 8:17 a.m. UTC | #12

On Tue, Oct 18, 2022 at 11:54:49AM -0700, Linus Torvalds wrote:
> I wouldn't worry about maintainer sign-offs just for exporting a
> helper function, but I agree with the whole concept being a complete
> disaster and not a good idea at all.

It's not just a a "helper", it is internal magic for KASAN and low-level
code patching.  No driver has any business checking if something is a
module text/data address, and I'm fairly sure how they are using it is
actually wrong if DM is built in.

FYI, I also agree that the concept is a disaster as well.

Mikulas Patocka Oct. 20, 2022, 11:44 a.m. UTC | #13

On Thu, 20 Oct 2022, Christoph Hellwig wrote:

> On Tue, Oct 18, 2022 at 02:48:01PM -0400, Mike Snitzer wrote:
> > > That really does not sound like a good idea at all.  And it does not
> > > seem to have any MM or core maintainer signoffs.
> > 
> > Sorry, I should've solicited proper review more loudly.
> > 
> > But if you have a specific concern with how DM is looking to use
> > is_vmalloc_or_module_addr() please say what that is.
> 
> If I understand the patch correct it tries to use it to detect if
> a string is a string global.  Besides being nasty API abuse I can't
> see how this would even work if DM is built in.

It works for built-in DM.

You have "kfree_const" that detects if the string points to to .rodata 
with "is_kernel_rodata". Unfortunatelly, is_kernel_rodata doesn't detect 
if the string points to some modules's rodata, so we need an extra check 
for that.

So, the logic is:
if (!is_vmalloc_or_module_addr(ptr) && !is_kernel_rodata(ptr)) kfree(ptr);

I thought about modifying is_kernel_rodata to detect module's rodata as 
well, but it wouldn't work because kstrdup_const uses it and there would 
be crash possibility:
ptr = kstrdup_const(modules_rodata); unload_module(); use "ptr";

Mikulas

[git,pull] device mapper changes for 6.1

Pull-request

Message

Comments