dm-verity: restart or panic on an I/O error

Message ID	cad3f86f-09bf-bc5f-c086-fc623ad223f4@redhat.com (mailing list archive)
State	Accepted, archived
Delegated to:	Mikulas Patocka
Headers	show Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A6B1224D7 for <dm-devel@lists.linux.dev>; Tue, 24 Sep 2024 13:18:41 +0000 (UTC) Date: Tue, 24 Sep 2024 15:18:29 +0200 (CEST) From: Mikulas Patocka <mpatocka@redhat.com> To: Maxim Suhanov <dfirblog@gmail.com> cc: Alasdair G Kergon <agk@redhat.com>, Greg KH <gregkh@linuxfoundation.org>, Mike Snitzer <snitzer@kernel.org>, security@kernel.org, dm-devel@lists.linux.dev Subject: [PATCH] dm-verity: restart or panic on an I/O error Message-ID: <cad3f86f-09bf-bc5f-c086-fc623ad223f4@redhat.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII
Series	dm-verity: restart or panic on an I/O error \| expand dm-verity: restart or panic on an I/O error

Mikulas Patocka Sept. 24, 2024, 1:18 p.m. UTC

Maxim Suhanov reported that dm-verity doesn't crash if an I/O error 
happens. In theory, this could be used to subvert security, because an 
attacker can create sectors that return error with the Write Uncorrectable 
command. Some programs may misbehave if they have to deal with EIO.

This commit fixes dm-verity, so that if "panic_on_corruption" or
"restart_on_corruption" was specified and an I/O error happens, the
machine will panic or restart.

This commit also changes kernel_restart to emergency_restart -
kernel_restart calls reboot notifiers and these reboot notifiers may wait
for the bio that failed. emergency_restart doesn't call the notifiers.

Reported-by: Maxim Suhanov <dfirblog@gmail.com>
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org

---
 drivers/md/dm-verity-target.c |   15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Sami Tolvanen Sept. 24, 2024, 5:43 p.m. UTC | #1

Hi Mikulas,

On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> happens. In theory, this could be used to subvert security, because an
> attacker can create sectors that return error with the Write Uncorrectable
> command. Some programs may misbehave if they have to deal with EIO.

I seem to recall that this was intentional. We used to restart/panic
on I/O errors with FEC enabled, but the behavior was changed in commit
2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
errors"). Akilesh, do you remember what exactly was the issue here?

> This commit fixes dm-verity, so that if "panic_on_corruption" or
> "restart_on_corruption" was specified and an I/O error happens, the
> machine will panic or restart.
>
> This commit also changes kernel_restart to emergency_restart -
> kernel_restart calls reboot notifiers and these reboot notifiers may wait
> for the bio that failed. emergency_restart doesn't call the notifiers.
>
> Reported-by: Maxim Suhanov <dfirblog@gmail.com>
> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> Cc: stable@vger.kernel.org
>
> ---
>  drivers/md/dm-verity-target.c |   15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
>
> Index: linux-2.6/drivers/md/dm-verity-target.c
> ===================================================================
> --- linux-2.6.orig/drivers/md/dm-verity-target.c        2024-09-23 17:48:08.000000000 +0200
> +++ linux-2.6/drivers/md/dm-verity-target.c     2024-09-24 11:34:08.000000000 +0200
> @@ -273,7 +273,7 @@ out:
>                 return 0;
>
>         if (v->mode == DM_VERITY_MODE_RESTART)
> -               kernel_restart("dm-verity device corrupted");
> +               emergency_restart();

Can we still log the reason for the restart? I remember some folks
used to rely on the "dm-verity device corrupted" message in the kernel
log.

Sami

Mikulas Patocka Sept. 24, 2024, 6:10 p.m. UTC | #2

On Tue, 24 Sep 2024, Sami Tolvanen wrote:

> Hi Mikulas,
> 
> On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> > happens. In theory, this could be used to subvert security, because an
> > attacker can create sectors that return error with the Write Uncorrectable
> > command. Some programs may misbehave if they have to deal with EIO.
> 
> I seem to recall that this was intentional. We used to restart/panic
> on I/O errors with FEC enabled, but the behavior was changed in commit
> 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
> errors"). Akilesh, do you remember what exactly was the issue here?

> > Index: linux-2.6/drivers/md/dm-verity-target.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-verity-target.c        2024-09-23 17:48:08.000000000 +0200
> > +++ linux-2.6/drivers/md/dm-verity-target.c     2024-09-24 11:34:08.000000000 +0200
> > @@ -273,7 +273,7 @@ out:
> >                 return 0;
> >
> >         if (v->mode == DM_VERITY_MODE_RESTART)
> > -               kernel_restart("dm-verity device corrupted");
> > +               emergency_restart();
> 
> Can we still log the reason for the restart? I remember some folks
> used to rely on the "dm-verity device corrupted" message in the kernel
> log.
> 
> Sami

OK. I've applied this on the top of my patch.

Mikulas



diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index 88012845bced..c9451df72c5a 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -272,8 +272,10 @@ static int verity_handle_err(struct dm_verity *v, enum verity_block_type type,
 	if (v->mode == DM_VERITY_MODE_LOGGING)
 		return 0;
 
-	if (v->mode == DM_VERITY_MODE_RESTART)
+	if (v->mode == DM_VERITY_MODE_RESTART) {
+		pr_emerg("dm-verity device corrupted\n");
 		emergency_restart();
+	}
 
 	if (v->mode == DM_VERITY_MODE_PANIC)
 		panic("dm-verity device corrupted");
@@ -602,11 +604,13 @@ static void verity_finish_io(struct dm_verity_io *io, blk_status_t status)
 			DMERR_LIMIT("%s has error: %s", v->data_dev->name,
 					blk_status_to_str(status));
 
-		if (v->mode == DM_VERITY_MODE_RESTART)
+		if (v->mode == DM_VERITY_MODE_RESTART) {
+			pr_emerg("dm-verity device corrupted\n");
 			emergency_restart();
+		}
 
 		if (v->mode == DM_VERITY_MODE_PANIC)
-			panic("dm-verity device has I/O error");
+			panic("dm-verity device corrupted");
 	}
 
 	bio_endio(bio);

Akilesh Kailash Sept. 24, 2024, 6:23 p.m. UTC | #3

On Tue, Sep 24, 2024 at 10:44 AM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> Hi Mikulas,
>
> On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> > happens. In theory, this could be used to subvert security, because an
> > attacker can create sectors that return error with the Write Uncorrectable
> > command. Some programs may misbehave if they have to deal with EIO.
>
> I seem to recall that this was intentional. We used to restart/panic
> on I/O errors with FEC enabled, but the behavior was changed in commit
> 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
> errors"). Akilesh, do you remember what exactly was the issue here?
>

I looked at the history and the original intent and the idea stems
from this patch:
https://lore.kernel.org/dm-devel/b004e7c7-f795-77ed-19b9-983785780e92@gmail.com/T/#mec4df1ba3f3cb63846875fb2bfc1f8b3100f31f1

I think the EIO should be propagated back up the stack; I think this
will trigger panic
even for genuine I/O errors.

> > This commit fixes dm-verity, so that if "panic_on_corruption" or
> > "restart_on_corruption" was specified and an I/O error happens, the
> > machine will panic or restart.
> >
> > This commit also changes kernel_restart to emergency_restart -
> > kernel_restart calls reboot notifiers and these reboot notifiers may wait
> > for the bio that failed. emergency_restart doesn't call the notifiers.
> >
> > Reported-by: Maxim Suhanov <dfirblog@gmail.com>
> > Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
> > Cc: stable@vger.kernel.org
> >
> > ---
> >  drivers/md/dm-verity-target.c |   15 ++++++++++++++-
> >  1 file changed, 14 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/drivers/md/dm-verity-target.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-verity-target.c        2024-09-23 17:48:08.000000000 +0200
> > +++ linux-2.6/drivers/md/dm-verity-target.c     2024-09-24 11:34:08.000000000 +0200
> > @@ -273,7 +273,7 @@ out:
> >                 return 0;
> >
> >         if (v->mode == DM_VERITY_MODE_RESTART)
> > -               kernel_restart("dm-verity device corrupted");
> > +               emergency_restart();
>
> Can we still log the reason for the restart? I remember some folks
> used to rely on the "dm-verity device corrupted" message in the kernel
> log.
>
> Sami

Mikulas Patocka Sept. 24, 2024, 6:36 p.m. UTC | #4

On Tue, 24 Sep 2024, Akilesh Kailash wrote:

> On Tue, Sep 24, 2024 at 10:44 AM Sami Tolvanen <samitolvanen@google.com> wrote:
> >
> > Hi Mikulas,
> >
> > On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> > > happens. In theory, this could be used to subvert security, because an
> > > attacker can create sectors that return error with the Write Uncorrectable
> > > command. Some programs may misbehave if they have to deal with EIO.
> >
> > I seem to recall that this was intentional. We used to restart/panic
> > on I/O errors with FEC enabled, but the behavior was changed in commit
> > 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
> > errors"). Akilesh, do you remember what exactly was the issue here?
> >
> 
> I looked at the history and the original intent and the idea stems
> from this patch:
> https://lore.kernel.org/dm-devel/b004e7c7-f795-77ed-19b9-983785780e92@gmail.com/T/#mec4df1ba3f3cb63846875fb2bfc1f8b3100f31f1
> 
> I think the EIO should be propagated back up the stack; I think this
> will trigger panic
> even for genuine I/O errors.

Maxim Suhanov's point is that you should trigger panic for genuine I/O 
errors. Otherwise the attacker could corrupt some sectors and let some 
program receive EIO and the EIO could confuse the program to do something 
insecure (for example, to not load a configuration file with security 
policy).

Mikulas

Akilesh Kailash Sept. 24, 2024, 8:54 p.m. UTC | #5

On Tue, Sep 24, 2024 at 11:45 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Tue, 24 Sep 2024, Akilesh Kailash wrote:
>
> > On Tue, Sep 24, 2024 at 10:44 AM Sami Tolvanen <samitolvanen@google.com> wrote:
> > >
> > > Hi Mikulas,
> > >
> > > On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >
> > > > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> > > > happens. In theory, this could be used to subvert security, because an
> > > > attacker can create sectors that return error with the Write Uncorrectable
> > > > command. Some programs may misbehave if they have to deal with EIO.
> > >
> > > I seem to recall that this was intentional. We used to restart/panic
> > > on I/O errors with FEC enabled, but the behavior was changed in commit
> > > 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
> > > errors"). Akilesh, do you remember what exactly was the issue here?
> > >
> >
> > I looked at the history and the original intent and the idea stems
> > from this patch:
> > https://lore.kernel.org/dm-devel/b004e7c7-f795-77ed-19b9-983785780e92@gmail.com/T/#mec4df1ba3f3cb63846875fb2bfc1f8b3100f31f1
> >
> > I think the EIO should be propagated back up the stack; I think this
> > will trigger panic
> > even for genuine I/O errors.
>
> Maxim Suhanov's point is that you should trigger panic for genuine I/O
> errors. Otherwise the attacker could corrupt some sectors and let some
> program receive EIO and the EIO could confuse the program to do something
> insecure (for example, to not load a configuration file with security
> policy).
>

Would it be ok to trigger the panic only when the system isn't shutting down?
During shutdown, we may end up with EIO requests from the underlying
driver and this may
inadvertently trigger the panic.

verity_is_system_shutting_down() should already have that logic to
skip during the shutdown path.


> Mikulas

Eric Biggers Sept. 24, 2024, 10:49 p.m. UTC | #6

On Tue, Sep 24, 2024 at 03:18:29PM +0200, Mikulas Patocka wrote:
> Maxim Suhanov reported that dm-verity doesn't crash if an I/O error 
> happens. In theory, this could be used to subvert security, because an 
> attacker can create sectors that return error with the Write Uncorrectable 
> command. Some programs may misbehave if they have to deal with EIO.
> 
> This commit fixes dm-verity, so that if "panic_on_corruption" or
> "restart_on_corruption" was specified and an I/O error happens, the
> machine will panic or restart.

How does this interact with FEC, which can correct for I/O errors on the
underlying media by reconstructing the data?

> @@ -596,6 +596,19 @@ static void verity_finish_io(struct dm_v
>  	if (!static_branch_unlikely(&use_bh_wq_enabled) || !io->in_bh)
>  		verity_fec_finish_io(io);
>  
> +	if (unlikely(status != BLK_STS_OK) && unlikely(!(bio->bi_opf & REQ_RAHEAD))) {
> +		if (v->mode == DM_VERITY_MODE_RESTART ||
> +		    v->mode == DM_VERITY_MODE_PANIC)
> +			DMERR_LIMIT("%s has error: %s", v->data_dev->name,
> +					blk_status_to_str(status));
> +
> +		if (v->mode == DM_VERITY_MODE_RESTART)
> +			emergency_restart();
> +
> +		if (v->mode == DM_VERITY_MODE_PANIC)
> +			panic("dm-verity device has I/O error");

The blk_status_t code that is checked above is usually the one that
verity_verify_io() produced, not the original one.  This could be an error from
a cause other than an I/O error on the underlying media, like data corruption
detected by dm-verity, or an error from the crypto API.

Maybe what you are trying to do is really "panic if an I/O error would be
returned to the layer above", not "panic if an I/O error occurred on the
underlying media"?  If so, this needs to be clearly explained.

- Eric

Milan Broz Sept. 25, 2024, 5:48 a.m. UTC | #7

On 9/24/24 8:36 PM, Mikulas Patocka wrote:
> On Tue, 24 Sep 2024, Akilesh Kailash wrote:
> 
>> On Tue, Sep 24, 2024 at 10:44 AM Sami Tolvanen <samitolvanen@google.com> wrote:
>>>
>>> Hi Mikulas,
>>>
>>> On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>>>>
>>>> Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
>>>> happens. In theory, this could be used to subvert security, because an
>>>> attacker can create sectors that return error with the Write Uncorrectable
>>>> command. Some programs may misbehave if they have to deal with EIO.
>>>
>>> I seem to recall that this was intentional. We used to restart/panic
>>> on I/O errors with FEC enabled, but the behavior was changed in commit
>>> 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
>>> errors"). Akilesh, do you remember what exactly was the issue here?
>>>
>>
>> I looked at the history and the original intent and the idea stems
>> from this patch:
>> https://lore.kernel.org/dm-devel/b004e7c7-f795-77ed-19b9-983785780e92@gmail.com/T/#mec4df1ba3f3cb63846875fb2bfc1f8b3100f31f1
>>
>> I think the EIO should be propagated back up the stack; I think this
>> will trigger panic
>> even for genuine I/O errors.
> 
> Maxim Suhanov's point is that you should trigger panic for genuine I/O
> errors. Otherwise the attacker could corrupt some sectors and let some
> program receive EIO and the EIO could confuse the program to do something
> insecure (for example, to not load a configuration file with security
> policy).

This is a very strange reasoning. I can say that restarting on an IO error
(that can happen in normal situations) could cause another security issue,
such as DoS. EIO is not a data integrity error; it can happen even higher
in the storage stack... and the application should handle it anyway.

Mikulas, could you please describe some scenarios where it causes
security issues? Using bad policy from a file on IO error looks like
a bug in userspace code, not a kernel security issue.

Milan

Maxim Suhanov Sept. 25, 2024, 6:09 a.m. UTC | #8

Hello.

> This is a very strange reasoning. I can say that restarting on an IO error
> (that can happen in normal situations) could cause another security issue,
> such as DoS. EIO is not a data integrity error; it can happen even higher
> in the storage stack... and the application should handle it anyway.

In the default mode of operation, there should be no panic/reboot on
an I/O error.

The issue and the proposed patch affect the non-default modes:
panic_on_corruption and restart_on_corruption.
Users relying on one of those two modes expect the system to crash if
something goes wrong [1]. So, DoS is expected here. Returning I/O
errors to the reader isn't something that is expected here (see [1]).

The issue allows the attacker to "downgrade" the "panic_on_corruption"
and "restart_on_corruption" modes to the default one by creating
unreadable sectors on the underlying media (e.g., through the Write
Uncorrectable command).

References:
1. https://github.com/flatcar/Flatcar/issues/422

Milan Broz Sept. 25, 2024, 6:35 a.m. UTC | #9

On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> Hello.
> 
>> This is a very strange reasoning. I can say that restarting on an IO error
>> (that can happen in normal situations) could cause another security issue,
>> such as DoS. EIO is not a data integrity error; it can happen even higher
>> in the storage stack... and the application should handle it anyway.
> 
> In the default mode of operation, there should be no panic/reboot on
> an I/O error.
> 
> The issue and the proposed patch affect the non-default modes:
> panic_on_corruption and restart_on_corruption.
> Users relying on one of those two modes expect the system to crash if
> something goes wrong [1]. So, DoS is expected here. Returning I/O
> errors to the reader isn't something that is expected here (see [1]).
> 
> The issue allows the attacker to "downgrade" the "panic_on_corruption"
> and "restart_on_corruption" modes to the default one by creating
> unreadable sectors on the underlying media (e.g., through the Write
> Uncorrectable command).

Sorry, but I do agree with this.

Panic/Restart should happen on data corruption (or failed Reed-Solomon
FEC data recovery fail). IO error (and other errors that basically translates
to this) can mean something else and userspace application must process
it in error path (think about network failure).

This is how is the dm-verity flag is documented:

restart_on_corruption
     Restart the system when a corrupted block is discovered. This option is
     not compatible with ignore_corruption and requires user space support to
     avoid restart loops.

It says nothing about restart on other error, that can come from crypto, network
or whatever. You are redefining the policy here.

Milan


> 
> References:
> 1. https://github.com/flatcar/Flatcar/issues/422

Mikulas Patocka Sept. 25, 2024, 9:15 a.m. UTC | #10

On Tue, 24 Sep 2024, Eric Biggers wrote:

> On Tue, Sep 24, 2024 at 03:18:29PM +0200, Mikulas Patocka wrote:
> > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error 
> > happens. In theory, this could be used to subvert security, because an 
> > attacker can create sectors that return error with the Write Uncorrectable 
> > command. Some programs may misbehave if they have to deal with EIO.
> > 
> > This commit fixes dm-verity, so that if "panic_on_corruption" or
> > "restart_on_corruption" was specified and an I/O error happens, the
> > machine will panic or restart.
> 
> How does this interact with FEC, which can correct for I/O errors on the
> underlying media by reconstructing the data?

It tries to do FEC, if it succeeds, it continues, if it doesn't succeed, 
it panics or restarts.

> > @@ -596,6 +596,19 @@ static void verity_finish_io(struct dm_v
> >  	if (!static_branch_unlikely(&use_bh_wq_enabled) || !io->in_bh)
> >  		verity_fec_finish_io(io);
> >  
> > +	if (unlikely(status != BLK_STS_OK) && unlikely(!(bio->bi_opf & REQ_RAHEAD))) {
> > +		if (v->mode == DM_VERITY_MODE_RESTART ||
> > +		    v->mode == DM_VERITY_MODE_PANIC)
> > +			DMERR_LIMIT("%s has error: %s", v->data_dev->name,
> > +					blk_status_to_str(status));
> > +
> > +		if (v->mode == DM_VERITY_MODE_RESTART)
> > +			emergency_restart();
> > +
> > +		if (v->mode == DM_VERITY_MODE_PANIC)
> > +			panic("dm-verity device has I/O error");
> 
> The blk_status_t code that is checked above is usually the one that
> verity_verify_io() produced, not the original one.  This could be an error from
> a cause other than an I/O error on the underlying media, like data corruption
> detected by dm-verity, or an error from the crypto API.
> 
> Maybe what you are trying to do is really "panic if an I/O error would be
> returned to the layer above", not "panic if an I/O error occurred on the
> underlying media"?  If so, this needs to be clearly explained.
> 
> - Eric

The function verity_finish_io is called when we are going to return the 
bio to the upper layer. So, the semantics is "panic if an I/O error would 
be returned to the layer above".

Mikulas

Mikulas Patocka Sept. 26, 2024, 2:55 p.m. UTC | #11

On Tue, 24 Sep 2024, Akilesh Kailash wrote:

> On Tue, Sep 24, 2024 at 11:45 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> >
> > On Tue, 24 Sep 2024, Akilesh Kailash wrote:
> >
> > > On Tue, Sep 24, 2024 at 10:44 AM Sami Tolvanen <samitolvanen@google.com> wrote:
> > > >
> > > > Hi Mikulas,
> > > >
> > > > On Tue, Sep 24, 2024 at 6:18 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > > >
> > > > > Maxim Suhanov reported that dm-verity doesn't crash if an I/O error
> > > > > happens. In theory, this could be used to subvert security, because an
> > > > > attacker can create sectors that return error with the Write Uncorrectable
> > > > > command. Some programs may misbehave if they have to deal with EIO.
> > > >
> > > > I seem to recall that this was intentional. We used to restart/panic
> > > > on I/O errors with FEC enabled, but the behavior was changed in commit
> > > > 2c0468e054c0 ("dm verity: skip redundant verity_handle_err() on I/O
> > > > errors"). Akilesh, do you remember what exactly was the issue here?
> > > >
> > >
> > > I looked at the history and the original intent and the idea stems
> > > from this patch:
> > > https://lore.kernel.org/dm-devel/b004e7c7-f795-77ed-19b9-983785780e92@gmail.com/T/#mec4df1ba3f3cb63846875fb2bfc1f8b3100f31f1
> > >
> > > I think the EIO should be propagated back up the stack; I think this
> > > will trigger panic
> > > even for genuine I/O errors.
> >
> > Maxim Suhanov's point is that you should trigger panic for genuine I/O
> > errors. Otherwise the attacker could corrupt some sectors and let some
> > program receive EIO and the EIO could confuse the program to do something
> > insecure (for example, to not load a configuration file with security
> > policy).
> >
> 
> Would it be ok to trigger the panic only when the system isn't shutting down?
> During shutdown, we may end up with EIO requests from the underlying
> driver and this may
> inadvertently trigger the panic.
> 
> verity_is_system_shutting_down() should already have that logic to
> skip during the shutdown path.

OK.

I've added this logic.

Mikulas

Sami Tolvanen Sept. 26, 2024, 5:32 p.m. UTC | #12

On Tue, Sep 24, 2024 at 11:35 PM Milan Broz <gmazyland@gmail.com> wrote:
>
> On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> > Hello.
> >
> >> This is a very strange reasoning. I can say that restarting on an IO error
> >> (that can happen in normal situations) could cause another security issue,
> >> such as DoS. EIO is not a data integrity error; it can happen even higher
> >> in the storage stack... and the application should handle it anyway.
> >
> > In the default mode of operation, there should be no panic/reboot on
> > an I/O error.
> >
> > The issue and the proposed patch affect the non-default modes:
> > panic_on_corruption and restart_on_corruption.
> > Users relying on one of those two modes expect the system to crash if
> > something goes wrong [1]. So, DoS is expected here. Returning I/O
> > errors to the reader isn't something that is expected here (see [1]).
> >
> > The issue allows the attacker to "downgrade" the "panic_on_corruption"
> > and "restart_on_corruption" modes to the default one by creating
> > unreadable sectors on the underlying media (e.g., through the Write
> > Uncorrectable command).
>
> Sorry, but I do agree with this.
>
> Panic/Restart should happen on data corruption (or failed Reed-Solomon
> FEC data recovery fail). IO error (and other errors that basically translates
> to this) can mean something else and userspace application must process
> it in error path (think about network failure).
>
> This is how is the dm-verity flag is documented:
>
> restart_on_corruption
>      Restart the system when a corrupted block is discovered. This option is
>      not compatible with ignore_corruption and requires user space support to
>      avoid restart loops.
>
> It says nothing about restart on other error, that can come from crypto, network
> or whatever. You are redefining the policy here.

I thought about this a bit more, and I agree with Milan. I/O errors
can be temporary and applications should be expected to handle them.
IMO we should trigger a restart/panic only if the underlying device
has corrupted data. If there's a use case for restarting on I/O errors
too, there should be a separate flag for that.

Android devices, which I assume are the largest users of this
functionality, are expected to switch dm-verity partitions back to
normal mode after the first restart is triggered, and userspace is
therefore expected to handle I/O errors gracefully anyway.

https://source.android.com/docs/security/features/verifiedboot/boot-flow

Sami

Maxim Suhanov Sept. 26, 2024, 6:41 p.m. UTC | #13

> I thought about this a bit more, and I agree with Milan. I/O errors
> can be temporary and applications should be expected to handle them.

Are we sure that I/O errors always reach a usermode reader? E.g., in
the file system metadata corruption case (like EIO for a file system
driver becoming ENOENT for an application).

> Android devices, which I assume are the largest users of this
> functionality, are expected to switch dm-verity partitions back to
> normal mode after the first restart is triggered, and userspace is
> therefore expected to handle I/O errors gracefully anyway.

And there is a warning displayed:
"When booting in eio mode, the device shows an error screen informing
the user that [...] the device might not function correctly."

https://source.android.com/docs/security/features/verifiedboot/verified-boot#handling-verification-errors

Mikulas Patocka Sept. 26, 2024, 8:44 p.m. UTC | #14

On Thu, 26 Sep 2024, Sami Tolvanen wrote:

> On Tue, Sep 24, 2024 at 11:35 PM Milan Broz <gmazyland@gmail.com> wrote:
> >
> > On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> > > Hello.
> > >
> > >> This is a very strange reasoning. I can say that restarting on an IO error
> > >> (that can happen in normal situations) could cause another security issue,
> > >> such as DoS. EIO is not a data integrity error; it can happen even higher
> > >> in the storage stack... and the application should handle it anyway.
> > >
> > > In the default mode of operation, there should be no panic/reboot on
> > > an I/O error.
> > >
> > > The issue and the proposed patch affect the non-default modes:
> > > panic_on_corruption and restart_on_corruption.
> > > Users relying on one of those two modes expect the system to crash if
> > > something goes wrong [1]. So, DoS is expected here. Returning I/O
> > > errors to the reader isn't something that is expected here (see [1]).
> > >
> > > The issue allows the attacker to "downgrade" the "panic_on_corruption"
> > > and "restart_on_corruption" modes to the default one by creating
> > > unreadable sectors on the underlying media (e.g., through the Write
> > > Uncorrectable command).
> >
> > Sorry, but I do agree with this.
> >
> > Panic/Restart should happen on data corruption (or failed Reed-Solomon
> > FEC data recovery fail). IO error (and other errors that basically translates
> > to this) can mean something else and userspace application must process
> > it in error path (think about network failure).
> >
> > This is how is the dm-verity flag is documented:
> >
> > restart_on_corruption
> >      Restart the system when a corrupted block is discovered. This option is
> >      not compatible with ignore_corruption and requires user space support to
> >      avoid restart loops.
> >
> > It says nothing about restart on other error, that can come from crypto, network
> > or whatever. You are redefining the policy here.
> 
> I thought about this a bit more, and I agree with Milan. I/O errors
> can be temporary and applications should be expected to handle them.
> IMO we should trigger a restart/panic only if the underlying device
> has corrupted data. If there's a use case for restarting on I/O errors
> too, there should be a separate flag for that.

See for example openssh, the function read_config_file_depth. There is:

while (getline(&line, &linesize, f) != -1) {
	... process_config_line_depth
}
free(line);
fclose(f)
if (bad_options > 0)
	fatal("%s: terminating, %d bad configuration options",
		filename, bad_options);A
return 1;

So, the function doesn't distinguish between error and eof. If reading the 
config file returns -EIO, the function exits with 1 as if the file was 
empty.

If an attacker can trigger -EIO on some file in /etc/ssh/sshd_config.d/, 
openssh would behave as if the file didn't exist. If the file contains 
some security-related settings, they are lost.

Another example - if the iptables binary can't be executed due to -EIO, 
would the system erroneously boot with active networking without firewall? 
Do the boot scripts really propagate the iptables error so that no network 
devices are activated if iptables failed? I don't know how systemd handles 
this, but the old sysvinit scripts didn't seem to propagate the error.

I understand that some users want to prevent applications from seeing 
-EIO. Testing the whole system for -EIO is hard.

Mikulas

Sami Tolvanen Sept. 27, 2024, 2:54 p.m. UTC | #15

On Thu, Sep 26, 2024 at 1:44 PM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Thu, 26 Sep 2024, Sami Tolvanen wrote:
>
> > On Tue, Sep 24, 2024 at 11:35 PM Milan Broz <gmazyland@gmail.com> wrote:
> > >
> > > On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> > > > Hello.
> > > >
> > > >> This is a very strange reasoning. I can say that restarting on an IO error
> > > >> (that can happen in normal situations) could cause another security issue,
> > > >> such as DoS. EIO is not a data integrity error; it can happen even higher
> > > >> in the storage stack... and the application should handle it anyway.
> > > >
> > > > In the default mode of operation, there should be no panic/reboot on
> > > > an I/O error.
> > > >
> > > > The issue and the proposed patch affect the non-default modes:
> > > > panic_on_corruption and restart_on_corruption.
> > > > Users relying on one of those two modes expect the system to crash if
> > > > something goes wrong [1]. So, DoS is expected here. Returning I/O
> > > > errors to the reader isn't something that is expected here (see [1]).
> > > >
> > > > The issue allows the attacker to "downgrade" the "panic_on_corruption"
> > > > and "restart_on_corruption" modes to the default one by creating
> > > > unreadable sectors on the underlying media (e.g., through the Write
> > > > Uncorrectable command).
> > >
> > > Sorry, but I do agree with this.
> > >
> > > Panic/Restart should happen on data corruption (or failed Reed-Solomon
> > > FEC data recovery fail). IO error (and other errors that basically translates
> > > to this) can mean something else and userspace application must process
> > > it in error path (think about network failure).
> > >
> > > This is how is the dm-verity flag is documented:
> > >
> > > restart_on_corruption
> > >      Restart the system when a corrupted block is discovered. This option is
> > >      not compatible with ignore_corruption and requires user space support to
> > >      avoid restart loops.
> > >
> > > It says nothing about restart on other error, that can come from crypto, network
> > > or whatever. You are redefining the policy here.
> >
> > I thought about this a bit more, and I agree with Milan. I/O errors
> > can be temporary and applications should be expected to handle them.
> > IMO we should trigger a restart/panic only if the underlying device
> > has corrupted data. If there's a use case for restarting on I/O errors
> > too, there should be a separate flag for that.
>
> See for example openssh, the function read_config_file_depth. There is:
>
> while (getline(&line, &linesize, f) != -1) {
>         ... process_config_line_depth
> }
> free(line);
> fclose(f)
> if (bad_options > 0)
>         fatal("%s: terminating, %d bad configuration options",
>                 filename, bad_options);A
> return 1;
>
> So, the function doesn't distinguish between error and eof. If reading the
> config file returns -EIO, the function exits with 1 as if the file was
> empty.

Sounds like OpenSSH's threat model doesn't include an attacker who can
trigger arbitrary I/O errors. If you want dm-verity to protect against
this, why not add a new restart_on_errors flag instead of changing the
semantics of the restart_on_corruption flag and risk breaking existing
users?

Sami

Mikulas Patocka Sept. 30, 2024, 10:59 a.m. UTC | #16

On Fri, 27 Sep 2024, Sami Tolvanen wrote:

> > See for example openssh, the function read_config_file_depth. There is:
> >
> > while (getline(&line, &linesize, f) != -1) {
> >         ... process_config_line_depth
> > }
> > free(line);
> > fclose(f)
> > if (bad_options > 0)
> >         fatal("%s: terminating, %d bad configuration options",
> >                 filename, bad_options);A
> > return 1;
> >
> > So, the function doesn't distinguish between error and eof. If reading the
> > config file returns -EIO, the function exits with 1 as if the file was
> > empty.
> 
> Sounds like OpenSSH's threat model doesn't include an attacker who can
> trigger arbitrary I/O errors. If you want dm-verity to protect against
> this, why not add a new restart_on_errors flag instead of changing the
> semantics of the restart_on_corruption flag and risk breaking existing
> users?
> 
> Sami

The dm-verity behavior was reported as a security bug, so by default, it 
should behave in the secure way - i.e. restart or panic on I/O error.

Do you intend to use dm-verity in Android and ChromeOS in the less-secure 
way where it returns -EIO? Have you audited the Android and ChromeOS 
codebase so that -EIO can't cause security breach? If yes, I can make a 
configuration switch for you that will enable the old behavior.

Mikulas

Will Drewry Sept. 30, 2024, 12:58 p.m. UTC | #17

On Mon, Sep 30, 2024 at 6:00 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Fri, 27 Sep 2024, Sami Tolvanen wrote:
>
> > > See for example openssh, the function read_config_file_depth. There is:
> > >
> > > while (getline(&line, &linesize, f) != -1) {
> > >         ... process_config_line_depth
> > > }
> > > free(line);
> > > fclose(f)
> > > if (bad_options > 0)
> > >         fatal("%s: terminating, %d bad configuration options",
> > >                 filename, bad_options);A
> > > return 1;
> > >
> > > So, the function doesn't distinguish between error and eof. If reading the
> > > config file returns -EIO, the function exits with 1 as if the file was
> > > empty.
> >
> > Sounds like OpenSSH's threat model doesn't include an attacker who can
> > trigger arbitrary I/O errors. If you want dm-verity to protect against
> > this, why not add a new restart_on_errors flag instead of changing the
> > semantics of the restart_on_corruption flag and risk breaking existing
> > users?
> >
> > Sami
>
> The dm-verity behavior was reported as a security bug, so by default, it
> should behave in the secure way - i.e. restart or panic on I/O error.
>
> Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> way where it returns -EIO? Have you audited the Android and ChromeOS
> codebase so that -EIO can't cause security breach? If yes, I can make a
> configuration switch for you that will enable the old behavior.

tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.

I'll dig through the archives, but I think this was discussed back when we
first worked on upstreaming (and at linux plumbers).

IO errors are _outside_ dm-verity's threat model. dm-verity verifies that a
block that is read in matches (or can be rebuilt via FEC to) the integrity hash
that it can safely authenticate back to its root hash. Historically, systems
running on raw nand often had to deal with -EIOs in the field that were
transient or could be worked around in other layers of the system.

However, even if a block is read and cannot be authenticated, dm-verity is
not violating any security invariants by not rebooting -- it just must not pass
along invalid data as valid data.  Rebooting on invalid data and rebooting on
EIO are system-wide policies which may or may not relate to the security
stance of the system.

wrt this thread, a flag which enables reboot-on-eio seems like a nice feature,
but it shouldn't be conflated with a global security stance or improvement.
Just because someone reports something as a security bug doesn't make it
a security bug -- it just means the behavior didn't match their expectations
and may have a security impact on _their_ deployment.

Both CrOS and Android deploy dm-verity with awarness that the underlying
block device stack is attack surface regardless of whether the device
mapper layer reboots on EIO or not.  Unless there is a specific security
problem in the EIO handling code paths or subsequent dm/dm-verity
behavior, then at best it's a late symptom of a problem.

If this behavior change does happen, please make sure dm-verity is still
useful without forced reboots on EIO.

thanks and hth!
will

Mikulas Patocka Sept. 30, 2024, 4:27 p.m. UTC | #18

On Mon, 30 Sep 2024, Will Drewry wrote:

> > The dm-verity behavior was reported as a security bug, so by default, it
> > should behave in the secure way - i.e. restart or panic on I/O error.
> >
> > Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> > way where it returns -EIO? Have you audited the Android and ChromeOS
> > codebase so that -EIO can't cause security breach? If yes, I can make a
> > configuration switch for you that will enable the old behavior.
> 
> tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.

OK, so I can revert it if you want it.

I'd like to ask - there is another change in that patch - I changed
	kernel_restart("dm-verity device corrupted");
to
	pr_emerg("dm-verity device corrupted\n");
	emergency_restart();

Because kernel_restart calls reboot notifiers and they may in theory wait 
for the bio that caused the restart, resulting in deadlock.

Do you want to have this part of the patch reverted too?

Mikulas

Will Drewry Sept. 30, 2024, 5:10 p.m. UTC | #19

On Mon, Sep 30, 2024 at 11:27 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
>
>
> On Mon, 30 Sep 2024, Will Drewry wrote:
>
> > > The dm-verity behavior was reported as a security bug, so by default, it
> > > should behave in the secure way - i.e. restart or panic on I/O error.
> > >
> > > Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> > > way where it returns -EIO? Have you audited the Android and ChromeOS
> > > codebase so that -EIO can't cause security breach? If yes, I can make a
> > > configuration switch for you that will enable the old behavior.
> >
> > tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.
>
> OK, so I can revert it if you want it.

That or an argument so the old behavior can remain for those who are using it
(I can send a patch for that if it's easier too).

> I'd like to ask - there is another change in that patch - I changed
>         kernel_restart("dm-verity device corrupted");
> to
>         pr_emerg("dm-verity device corrupted\n");
>         emergency_restart();
>
> Because kernel_restart calls reboot notifiers and they may in theory wait
> for the bio that caused the restart, resulting in deadlock.
>
> Do you want to have this part of the patch reverted too?

IMHO, that's a good change!  If the policy is to restart on corruption, then
it makes sense to avoid the reboot notifiers.

Thanks!
will

Sami Tolvanen Sept. 30, 2024, 6:07 p.m. UTC | #20

On Mon, Sep 30, 2024 at 10:10 AM Will Drewry <wad@chromium.org> wrote:
>
> On Mon, Sep 30, 2024 at 11:27 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> >
> >
> >
> > On Mon, 30 Sep 2024, Will Drewry wrote:
> >
> > > > The dm-verity behavior was reported as a security bug, so by default, it
> > > > should behave in the secure way - i.e. restart or panic on I/O error.
> > > >
> > > > Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> > > > way where it returns -EIO? Have you audited the Android and ChromeOS
> > > > codebase so that -EIO can't cause security breach? If yes, I can make a
> > > > configuration switch for you that will enable the old behavior.
> > >
> > > tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.
> >
> > OK, so I can revert it if you want it.
>
> That or an argument so the old behavior can remain for those who are using it
> (I can send a patch for that if it's easier too).

Thanks for chiming in on this, Will! I still think that adding a
separate restart_on_eio (or similar) flag would be the best solution.

> > I'd like to ask - there is another change in that patch - I changed
> >         kernel_restart("dm-verity device corrupted");
> > to
> >         pr_emerg("dm-verity device corrupted\n");
> >         emergency_restart();
> >
> > Because kernel_restart calls reboot notifiers and they may in theory wait
> > for the bio that caused the restart, resulting in deadlock.
> >
> > Do you want to have this part of the patch reverted too?
>
> IMHO, that's a good change!  If the policy is to restart on corruption, then
> it makes sense to avoid the reboot notifiers.

While I agree that this sounds good in principle, devices that use the
restart feature typically need to pass the reboot reason to a PMU, for
example, and it looks like Android devices depend on reboot notifiers
for this. Here's a recent Pixel implementation that checks for the
"dm-verity device corrupted" command as an example:

https://android.googlesource.com/kernel/google-modules/power/reset/+/refs/heads/android-gs-comet-6.1-android15/pixel-zuma-reboot.c#81

Sami

Will Drewry Sept. 30, 2024, 6:55 p.m. UTC | #21

On Mon, Sep 30, 2024 at 1:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
>
> On Mon, Sep 30, 2024 at 10:10 AM Will Drewry <wad@chromium.org> wrote:
> >
> > On Mon, Sep 30, 2024 at 11:27 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > >
> > >
> > >
> > > On Mon, 30 Sep 2024, Will Drewry wrote:
> > >
> > > > > The dm-verity behavior was reported as a security bug, so by default, it
> > > > > should behave in the secure way - i.e. restart or panic on I/O error.
> > > > >
> > > > > Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> > > > > way where it returns -EIO? Have you audited the Android and ChromeOS
> > > > > codebase so that -EIO can't cause security breach? If yes, I can make a
> > > > > configuration switch for you that will enable the old behavior.
> > > >
> > > > tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.
> > >
> > > OK, so I can revert it if you want it.
> >
> > That or an argument so the old behavior can remain for those who are using it
> > (I can send a patch for that if it's easier too).
>
> Thanks for chiming in on this, Will! I still think that adding a
> separate restart_on_eio (or similar) flag would be the best solution.
>
> > > I'd like to ask - there is another change in that patch - I changed
> > >         kernel_restart("dm-verity device corrupted");
> > > to
> > >         pr_emerg("dm-verity device corrupted\n");
> > >         emergency_restart();
> > >
> > > Because kernel_restart calls reboot notifiers and they may in theory wait
> > > for the bio that caused the restart, resulting in deadlock.
> > >
> > > Do you want to have this part of the patch reverted too?
> >
> > IMHO, that's a good change!  If the policy is to restart on corruption, then
> > it makes sense to avoid the reboot notifiers.
>
> While I agree that this sounds good in principle, devices that use the
> restart feature typically need to pass the reboot reason to a PMU, for
> example, and it looks like Android devices depend on reboot notifiers
> for this. Here's a recent Pixel implementation that checks for the
> "dm-verity device corrupted" command as an example:
>
> https://android.googlesource.com/kernel/google-modules/power/reset/+/refs/heads/android-gs-comet-6.1-android15/pixel-zuma-reboot.c#81

Good catch! (It's a shame it's using text as the API!) I totally
spaced on that, and depending on where the corruption is, you mind end
up in a boot loop.

Mikulas Patocka Oct. 1, 2024, 9:11 a.m. UTC | #22

On Mon, 30 Sep 2024, Will Drewry wrote:

> On Mon, Sep 30, 2024 at 1:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
> >
> > On Mon, Sep 30, 2024 at 10:10 AM Will Drewry <wad@chromium.org> wrote:
> > >
> > > On Mon, Sep 30, 2024 at 11:27 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
> > > >
> > > >
> > > >
> > > > On Mon, 30 Sep 2024, Will Drewry wrote:
> > > >
> > > > > > The dm-verity behavior was reported as a security bug, so by default, it
> > > > > > should behave in the secure way - i.e. restart or panic on I/O error.
> > > > > >
> > > > > > Do you intend to use dm-verity in Android and ChromeOS in the less-secure
> > > > > > way where it returns -EIO? Have you audited the Android and ChromeOS
> > > > > > codebase so that -EIO can't cause security breach? If yes, I can make a
> > > > > > configuration switch for you that will enable the old behavior.
> > > > >
> > > > > tl;dr don't change the default behavior, but adding a reboot-on-eio is nice.
> > > >
> > > > OK, so I can revert it if you want it.
> > >
> > > That or an argument so the old behavior can remain for those who are using it
> > > (I can send a patch for that if it's easier too).
> >
> > Thanks for chiming in on this, Will! I still think that adding a
> > separate restart_on_eio (or similar) flag would be the best solution.
> >
> > > > I'd like to ask - there is another change in that patch - I changed
> > > >         kernel_restart("dm-verity device corrupted");
> > > > to
> > > >         pr_emerg("dm-verity device corrupted\n");
> > > >         emergency_restart();
> > > >
> > > > Because kernel_restart calls reboot notifiers and they may in theory wait
> > > > for the bio that caused the restart, resulting in deadlock.
> > > >
> > > > Do you want to have this part of the patch reverted too?
> > >
> > > IMHO, that's a good change!  If the policy is to restart on corruption, then
> > > it makes sense to avoid the reboot notifiers.
> >
> > While I agree that this sounds good in principle, devices that use the
> > restart feature typically need to pass the reboot reason to a PMU, for
> > example, and it looks like Android devices depend on reboot notifiers
> > for this. Here's a recent Pixel implementation that checks for the
> > "dm-verity device corrupted" command as an example:
> >
> > https://android.googlesource.com/kernel/google-modules/power/reset/+/refs/heads/android-gs-comet-6.1-android15/pixel-zuma-reboot.c#81
> 
> Good catch! (It's a shame it's using text as the API!) I totally
> spaced on that, and depending on where the corruption is, you mind end
> up in a boot loop.

If I add that 'reboot-on-eio' flag, should it also restart the kernel with 
kernel_restart("dm-verity device corrupted")? Or, should it use a 
different string?

Mikulas

Milan Broz Oct. 1, 2024, 10:22 a.m. UTC | #23

On 10/1/24 11:11 AM, Mikulas Patocka wrote:
...
> 
> If I add that 'reboot-on-eio' flag, should it also restart the kernel with
> kernel_restart("dm-verity device corrupted")? Or, should it use a
> different string?

If we are already here revisiting it, maybe think if there are more errors that
should be catch (anything from crypto API?).
The name then should be different (no hardcoded "eio").

Also, isn't better to use just one flag and create something like
   restart_on_corruption_and_error
   panic_on_corruption_and_error
(that includes on_corruption AND additional error).

This will save us some flag combination checking in userspace :-)
(As I think restart on error without restart on corruption makes no sense.)

Milan

Sami Tolvanen Oct. 1, 2024, 6:32 p.m. UTC | #24

On Tue, Oct 1, 2024 at 2:12 AM Mikulas Patocka <mpatocka@redhat.com> wrote:
>
> On Mon, 30 Sep 2024, Will Drewry wrote:
>
> > On Mon, Sep 30, 2024 at 1:07 PM Sami Tolvanen <samitolvanen@google.com> wrote:
> > >
> > > While I agree that this sounds good in principle, devices that use the
> > > restart feature typically need to pass the reboot reason to a PMU, for
> > > example, and it looks like Android devices depend on reboot notifiers
> > > for this. Here's a recent Pixel implementation that checks for the
> > > "dm-verity device corrupted" command as an example:
> > >
> > > https://android.googlesource.com/kernel/google-modules/power/reset/+/refs/heads/android-gs-comet-6.1-android15/pixel-zuma-reboot.c#81
> >
> > Good catch! (It's a shame it's using text as the API!) I totally
> > spaced on that, and depending on where the corruption is, you mind end
> > up in a boot loop.
>
> If I add that 'reboot-on-eio' flag, should it also restart the kernel with
> kernel_restart("dm-verity device corrupted")? Or, should it use a
> different string?

I think it could be a different string since there are no existing
users of this flag and the restart isn't actually caused by
corruption, but I don't have strong feelings about this.

Sami

dm-verity: restart or panic on an I/O error

Commit Message

Comments

Patch