diff mbox

block: partitions: efi: Always check for alternative GPT at end of drive

Message ID 1461632806-5946-1-git-send-email-jwerner@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Julius Werner April 26, 2016, 1:06 a.m. UTC
The GUID Partiton Table layout maintains two synonymous partition tables
on a block device, one starting in sector 1 and one in the very last
sectors of the block device. This is useful if one of the tables gets
accidentally corrupted (e.g. through a partial write because of an
unexpected power loss).

Linux normally only boots if the primary GPT is valid. It will not even
try to find the alternative GPT to an invalid primary one unless the
"gpt" command line option forces more aggressive detection. This doesn't
really make any sense... if the "gpt" option is not set, the code
validates the protective or hybrid MBR in sector 0 anyway before it even
starts looking for the actual GPTs. If we get to the point where a valid
proctective or hybrid MBR was found but the primary GPT was not found
(valid), checking the alternative GPT is our best bet: we know that this
block device is meant to use GPT (because any other partitioning system
would've presumably overwritten sector 0), and we know that if the
alternative GPT is valid it should contain more accurate information
than parsing the protective/hybrid MBR with msdos_partition() would
yield (which would otherwise be what happens next).

Signed-off-by: Julius Werner <jwerner@chromium.org>
---
 block/partitions/efi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Karel Zak April 26, 2016, 10:20 a.m. UTC | #1
On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
> The GUID Partiton Table layout maintains two synonymous partition tables
> on a block device, one starting in sector 1 and one in the very last
> sectors of the block device. This is useful if one of the tables gets
> accidentally corrupted (e.g. through a partial write because of an
> unexpected power loss).
> 
> Linux normally only boots if the primary GPT is valid. It will not even
> try to find the alternative GPT to an invalid primary one unless the
> "gpt" command line option forces more aggressive detection. This doesn't
> really make any sense... if the "gpt" option is not set, the code
> validates the protective or hybrid MBR in sector 0 anyway before it even
> starts looking for the actual GPTs. If we get to the point where a valid
> proctective or hybrid MBR was found but the primary GPT was not found
> (valid), checking the alternative GPT is our best bet: we know that this
> block device is meant to use GPT (because any other partitioning system
> would've presumably overwritten sector 0), and we know that if the
> alternative GPT is valid it should contain more accurate information
> than parsing the protective/hybrid MBR with msdos_partition() would
> yield (which would otherwise be what happens next).

I guess "force_gpt" (and "gpt" on kernel command line) exists to force
users to think and care about a reason why the device has unreadable
(broken) primary GPT header.

It seems like bad (and dangerous) idea to silently ignore corrupted 
primary GTP header and boot from such device.

And note that alternative GPT header and the end of the device is a
just guess. The proper location of the alternative header is specified
with-in primary header (pgpt->alternate_lba). The header at the end of
the device (as used for "force_gpt") is a fallback solution only.

    Karel
Austin S. Hemmelgarn April 26, 2016, 2:38 p.m. UTC | #2
On 2016-04-25 21:06, Julius Werner wrote:
> The GUID Partiton Table layout maintains two synonymous partition tables
> on a block device, one starting in sector 1 and one in the very last
> sectors of the block device. This is useful if one of the tables gets
> accidentally corrupted (e.g. through a partial write because of an
> unexpected power loss).
>
> Linux normally only boots if the primary GPT is valid. It will not even
> try to find the alternative GPT to an invalid primary one unless the
> "gpt" command line option forces more aggressive detection. This doesn't
> really make any sense... if the "gpt" option is not set, the code
> validates the protective or hybrid MBR in sector 0 anyway before it even
> starts looking for the actual GPTs. If we get to the point where a valid
> proctective or hybrid MBR was found but the primary GPT was not found
> (valid), checking the alternative GPT is our best bet: we know that this
> block device is meant to use GPT (because any other partitioning system
> would've presumably overwritten sector 0), and we know that if the
> alternative GPT is valid it should contain more accurate information
> than parsing the protective/hybrid MBR with msdos_partition() would
> yield (which would otherwise be what happens next).
At the absolute minimum, we should be logging (at least at a warning 
level) that we had to fall back the the backup GPT.  If somebody is 
dealing with a disk that had a torn write to the primary GPT, that's one 
thing, but this could also be caused by any number of other problems 
(hardware issues, malicious intent, etc), and we need to log that we 
detected corrupted data.
>
> Signed-off-by: Julius Werner <jwerner@chromium.org>
> ---
>   block/partitions/efi.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/partitions/efi.c b/block/partitions/efi.c
> index 26cb624..0d4ca8e 100644
> --- a/block/partitions/efi.c
> +++ b/block/partitions/efi.c
> @@ -625,7 +625,7 @@ static int find_valid_gpt(struct parsed_partitions *state, gpt_header **gpt,
>   		good_agpt = is_gpt_valid(state,
>   					 le64_to_cpu(pgpt->alternate_lba),
>   					 &agpt, &aptes);
> -        if (!good_agpt && force_gpt)
> +        if (!good_agpt)
>                   good_agpt = is_gpt_valid(state, lastlba, &agpt, &aptes);
>
>           /* The obviously unsuccessful case */
>

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davidlohr Bueso April 26, 2016, 6:10 p.m. UTC | #3
On Tue, 26 Apr 2016, Austin S. Hemmelgarn wrote:

>At the absolute minimum, we should be logging (at least at a warning 
>level) that we had to fall back the the backup GPT.  If somebody is 
>dealing with a disk that had a torn write to the primary GPT, that's 
>one thing, but this could also be caused by any number of other 
>problems (hardware issues, malicious intent, etc), and we need to log 
>that we detected corrupted data.

We already complain about corrupted primary gpt (at a warn level), and
there's also plenty of verbosity about differences between primary and
backup (see compare_gpts()), or are you referring to something else?

Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davidlohr Bueso April 26, 2016, 6:33 p.m. UTC | #4
On Tue, 26 Apr 2016, Karel Zak wrote:

>On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
>> The GUID Partiton Table layout maintains two synonymous partition tables
>> on a block device, one starting in sector 1 and one in the very last
>> sectors of the block device. This is useful if one of the tables gets
>> accidentally corrupted (e.g. through a partial write because of an
>> unexpected power loss).
>>
>> Linux normally only boots if the primary GPT is valid. It will not even
>> try to find the alternative GPT to an invalid primary one unless the
>> "gpt" command line option forces more aggressive detection. This doesn't
>> really make any sense... if the "gpt" option is not set, the code
>> validates the protective or hybrid MBR in sector 0 anyway before it even
>> starts looking for the actual GPTs. If we get to the point where a valid
>> proctective or hybrid MBR was found but the primary GPT was not found
>> (valid), checking the alternative GPT is our best bet: we know that this

'best bet' in a kernel is not enough :) Which is why userland tools can fix
and/or do any sort of crazy stuff with the backup and recover the primary etc etc.

>> block device is meant to use GPT (because any other partitioning system
>> would've presumably overwritten sector 0), and we know that if the
>> alternative GPT is valid it should contain more accurate information
>> than parsing the protective/hybrid MBR with msdos_partition() would
>> yield (which would otherwise be what happens next).

>I guess "force_gpt" (and "gpt" on kernel command line) exists to force
>users to think and care about a reason why the device has unreadable
>(broken) primary GPT header.

Yes, from find_valid_gpt():

 * If the Primary GPT header is not valid, the Alternate GPT header
 * is not checked unless the 'gpt' kernel command line option is passed.
 * This protects against devices which misreport their size, and forces
 * the user to decide to use the Alternate GPT.

... so users are at least forced in some way to think about this.

>It seems like bad (and dangerous) idea to silently ignore corrupted
>primary GTP header and boot from such device.

Yeah, there's no way in hell I trust a backup gpt in kernel space.
We simply have no way of distinguishing between good and bad devices.

>And note that alternative GPT header and the end of the device is a
>just guess. The proper location of the alternative header is specified
>with-in primary header (pgpt->alternate_lba). The header at the end of
>the device (as used for "force_gpt") is a fallback solution only.

And this only illustrates the ambiguity of the backup.

Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Austin S. Hemmelgarn April 26, 2016, 7:27 p.m. UTC | #5
On 2016-04-26 14:10, Davidlohr Bueso wrote:
> On Tue, 26 Apr 2016, Austin S. Hemmelgarn wrote:
>
>> At the absolute minimum, we should be logging (at least at a warning
>> level) that we had to fall back the the backup GPT.  If somebody is
>> dealing with a disk that had a torn write to the primary GPT, that's
>> one thing, but this could also be caused by any number of other
>> problems (hardware issues, malicious intent, etc), and we need to log
>> that we detected corrupted data.
>
> We already complain about corrupted primary gpt (at a warn level), and
> there's also plenty of verbosity about differences between primary and
> backup (see compare_gpts()), or are you referring to something else?
Ah, you're right, somehow I had missed this.

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julius Werner April 26, 2016, 8:13 p.m. UTC | #6
>> I guess "force_gpt" (and "gpt" on kernel command line) exists to force
>> users to think and care about a reason why the device has unreadable
>> (broken) primary GPT header.
>
>
> Yes, from find_valid_gpt():
>
> * If the Primary GPT header is not valid, the Alternate GPT header
> * is not checked unless the 'gpt' kernel command line option is passed.
> * This protects against devices which misreport their size, and forces
> * the user to decide to use the Alternate GPT.
>
> ... so users are at least forced in some way to think about this.

Are you sure that not recognizing the device at all (rather than
spitting a warning and using the still valid information we have, as
it would do with my patch) is better for the user? This does not just
apply to the root device, after all... GPTs on any (possibly external)
medium will not be recognized if the primary table is broken. If a
novice user just wants to read a slightly corrupted USB stick on his
Ubuntu desktop, he has to dig through dmesg to find the reason it
doesn't auto-mount, then figure out somehow that he has to add "gpt"
to his command line (which the error messages don't allude to in any
way), and reboot the whole system to make it apply.

I agree that broken GPTs should be fixed, of course. But unless you
want the kernel to fix them itself, the only option is to spew a
warning and let some userspace tool look for that. With this patch,
any distro would be free to add a simple 'if dmesg | grep "Primary GPT
is invalid"; then parted --whatever-you-need-to-pass-to-fix-it; fi'
init script. But without it, you're going to throw the user to a grub
prompt or worse (e.g. what if it's a phone or something?), possibly
cutting off his only source of internet access and expecting him to
figure out the magic option he needs to make it work again on his own.

Another argument is that GPTs mostly break on update... disks
(usually) don't just corrupt themselves randomly. If your primary GPT
is broken but the secondary is valid, you're probably fine until you
try to repartition the device again, at which point any good
partitioning tool will detect and fix the problem before it let's you
make any more changes anyway.

>> It seems like bad (and dangerous) idea to silently ignore corrupted
>> primary GTP header and boot from such device.
>
>
> Yeah, there's no way in hell I trust a backup gpt in kernel space.
> We simply have no way of distinguishing between good and bad devices.

I'm not sure what you're getting at here. The alternative GPT is just
as trustworthy and as protected by checksums as the primary. If it's
there, and it's valid, and you had a valid PMBR, I don't see how
there's any realistic chance that this is not a GPT device.

>> And note that alternative GPT header and the end of the device is a
>> just guess. The proper location of the alternative header is specified
>> with-in primary header (pgpt->alternate_lba). The header at the end of
>> the device (as used for "force_gpt") is a fallback solution only.
>
> And this only illustrates the ambiguity of the backup.

The GPT specification mandates the alternative GPT header to be in the
last sector on disk. That is the only proper and valid location...
pgpt->alternative_lba is just a redundant aid to double-check it. If
it's not there (maybe because you dd'ed a disk onto a
differently-sized disk), that is just plain incorrect. Sure, it
happens, but that's fine... in that case this code just won't find a
valid GPT at the end and fail. But if it does find a GPT, and it is
valid, I really don't see why we shouldn't just use it.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Elliott, Robert (Servers) April 26, 2016, 8:34 p.m. UTC | #7
> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Davidlohr Bueso
> Sent: Tuesday, April 26, 2016 1:34 PM
> To: Karel Zak <kzak@redhat.com>
> Cc: Julius Werner <jwerner@chromium.org>; linux-efi@vger.kernel.org;
> linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Gwendal
> Grignou <gwendal@chromium.org>; Doug Anderson <dianders@chromium.org>
> Subject: Re: [PATCH] block: partitions: efi: Always check for
> alternative GPT at end of drive
> 
> On Tue, 26 Apr 2016, Karel Zak wrote:
> 
> >On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
> >> The GUID Partiton Table layout maintains two synonymous partition
> >> tables on a block device, one starting in sector 1 and one in the
> >> very last sectors of the block device. This is useful if one of
> >> the tables gets
> >> accidentally corrupted (e.g. through a partial write because of an
> >> unexpected power loss).
> >>
> >> Linux normally only boots if the primary GPT is valid. It will not
> >> even try to find the alternative GPT to an invalid primary one
> >> unless the "gpt" command line option forces more aggressive
> >> detection. This doesn't
> >> really make any sense... if the "gpt" option is not set, the code
> >> validates the protective or hybrid MBR in sector 0 anyway before
> >> it even starts looking for the actual GPTs. If we get to the point
> >> where a valid proctective or hybrid MBR was found but the primary
> >> GPT was not found (valid), checking the alternative GPT is our 
> >> best bet: we know that this
> 
> 'best bet' in a kernel is not enough :) Which is why userland tools
> can fix and/or do any sort of crazy stuff with the backup and recover
> the primary etc etc.

Drive blocks go bad; the redundant GPTs are there to let the
system keep booting and running if that happens.

Rewriting the bad GPTs is what should require user intervention.

> 
> >> block device is meant to use GPT (because any other partitioning
> system
> >> would've presumably overwritten sector 0), and we know that if the
> >> alternative GPT is valid it should contain more accurate
> information
> >> than parsing the protective/hybrid MBR with msdos_partition()
> would
> >> yield (which would otherwise be what happens next).
> 
> >I guess "force_gpt" (and "gpt" on kernel command line) exists to
> >force users to think and care about a reason why the device has
> >unreadable (broken) primary GPT header.
> 
> Yes, from find_valid_gpt():
> 
>  * If the Primary GPT header is not valid, the Alternate GPT header
>  * is not checked unless the 'gpt' kernel command line option is
> passed.
>  * This protects against devices which misreport their size, and
> forces
>  * the user to decide to use the Alternate GPT.
> 
> ... so users are at least forced in some way to think about this.
> 
> >It seems like bad (and dangerous) idea to silently ignore corrupted
> >primary GTP header and boot from such device.
> 
> Yeah, there's no way in hell I trust a backup gpt in kernel space.
> We simply have no way of distinguishing between good and bad devices.
>
> >And note that alternative GPT header and the end of the device is a
> >just guess. The proper location of the alternative header is
> >specified with-in primary header (pgpt->alternate_lba). The header
> >at the end of
> >the device (as used for "force_gpt") is a fallback solution only.
> 
> And this only illustrates the ambiguity of the backup.

The UEFI specification is not ambiguous - you should always look
for the backup GPT Header at the last LBA:

"Two GPT Header structures are stored on the device: the primary
and the backup. The primary GPT Header must be located in LBA 1
(i.e., the second logical block), and the backup GPT Header must
be located in the last LBA of the device."

If the primary GPT Header is corrupted (e.g., CRC is bad), you
cannot trust any fields in it, including the Alternate LBA field.
The Alternate LBA field is there to help you tolerate failures 
while growing or shrinking the block device size (not important
for individual physical drives, but an issue for logical drives
presented by RAID controllers).

--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Davidlohr Bueso April 26, 2016, 9:15 p.m. UTC | #8
On Tue, 26 Apr 2016, Elliott, Robert (Persistent Memory) wrote:

>The UEFI specification is not ambiguous - you should always look
>for the backup GPT Header at the last LBA:
>
>"Two GPT Header structures are stored on the device: the primary
>and the backup. The primary GPT Header must be located in LBA 1
>(i.e., the second logical block), and the backup GPT Header must
>be located in the last LBA of the device."
>
>If the primary GPT Header is corrupted (e.g., CRC is bad), you
>cannot trust any fields in it, including the Alternate LBA field.

I'm well aware of this, and is really what I meant with 'ambiguous'
(which was ambiguous in itself). All I'm arguing is that I don't see
a solid reason to change the default, when we have (and have had for
a long time) the gpt param alternative.

>The Alternate LBA field is there to help you tolerate failures
>while growing or shrinking the block device size (not important
>for individual physical drives, but an issue for logical drives
>presented by RAID controllers).

I have nothing against the agpt, just pass a boot param and voila,
you can use it. This is not some sort of recent regression we are
talking about here. How is this such a burden all of a sudden?

Thanks,
Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gwendal Grignou April 26, 2016, 9:51 p.m. UTC | #9
On Tue, Apr 26, 2016 at 2:15 PM, Davidlohr Bueso <dave@stgolabs.net> wrote:
> On Tue, 26 Apr 2016, Elliott, Robert (Persistent Memory) wrote:
>

>
> I have nothing against the agpt, just pass a boot param and voila,
> you can use it. This is not some sort of recent regression we are
> talking about here. How is this such a burden all of a sudden?
Julius and I were looking at the code when we spotted the issue.

As Julius said, "just pass a boot param", is not easy on certain
machines, like phone. It is not user friendly either.
The system won't boot at all, a typical user will have to do a full
reinstall to fix the error.

Running with forced_gpt always on is not a solution either, because SD
card formatted with a plain MBR over an older image with a GPT won't
be mounted properly: that newer MBR will be ignored.

Gwendal.
>
> Thanks,
> Davidlohr
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ard Biesheuvel April 27, 2016, 6 a.m. UTC | #10
On 26 April 2016 at 22:34, Elliott, Robert (Persistent Memory)
<elliott@hpe.com> wrote:
>
>
>> -----Original Message-----
>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>> owner@vger.kernel.org] On Behalf Of Davidlohr Bueso
>> Sent: Tuesday, April 26, 2016 1:34 PM
>> To: Karel Zak <kzak@redhat.com>
>> Cc: Julius Werner <jwerner@chromium.org>; linux-efi@vger.kernel.org;
>> linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Gwendal
>> Grignou <gwendal@chromium.org>; Doug Anderson <dianders@chromium.org>
>> Subject: Re: [PATCH] block: partitions: efi: Always check for
>> alternative GPT at end of drive
>>
>> On Tue, 26 Apr 2016, Karel Zak wrote:
>>
>> >On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
>> >> The GUID Partiton Table layout maintains two synonymous partition
>> >> tables on a block device, one starting in sector 1 and one in the
>> >> very last sectors of the block device. This is useful if one of
>> >> the tables gets
>> >> accidentally corrupted (e.g. through a partial write because of an
>> >> unexpected power loss).
>> >>
>> >> Linux normally only boots if the primary GPT is valid. It will not
>> >> even try to find the alternative GPT to an invalid primary one
>> >> unless the "gpt" command line option forces more aggressive
>> >> detection. This doesn't
>> >> really make any sense... if the "gpt" option is not set, the code
>> >> validates the protective or hybrid MBR in sector 0 anyway before
>> >> it even starts looking for the actual GPTs. If we get to the point
>> >> where a valid proctective or hybrid MBR was found but the primary
>> >> GPT was not found (valid), checking the alternative GPT is our
>> >> best bet: we know that this
>>
>> 'best bet' in a kernel is not enough :) Which is why userland tools
>> can fix and/or do any sort of crazy stuff with the backup and recover
>> the primary etc etc.
>
> Drive blocks go bad; the redundant GPTs are there to let the
> system keep booting and running if that happens.
>
> Rewriting the bad GPTs is what should require user intervention.
>
>>
>> >> block device is meant to use GPT (because any other partitioning
>> system
>> >> would've presumably overwritten sector 0), and we know that if the
>> >> alternative GPT is valid it should contain more accurate
>> information
>> >> than parsing the protective/hybrid MBR with msdos_partition()
>> would
>> >> yield (which would otherwise be what happens next).
>>
>> >I guess "force_gpt" (and "gpt" on kernel command line) exists to
>> >force users to think and care about a reason why the device has
>> >unreadable (broken) primary GPT header.
>>
>> Yes, from find_valid_gpt():
>>
>>  * If the Primary GPT header is not valid, the Alternate GPT header
>>  * is not checked unless the 'gpt' kernel command line option is
>> passed.
>>  * This protects against devices which misreport their size, and
>> forces
>>  * the user to decide to use the Alternate GPT.
>>
>> ... so users are at least forced in some way to think about this.
>>
>> >It seems like bad (and dangerous) idea to silently ignore corrupted
>> >primary GTP header and boot from such device.
>>
>> Yeah, there's no way in hell I trust a backup gpt in kernel space.
>> We simply have no way of distinguishing between good and bad devices.
>>
>> >And note that alternative GPT header and the end of the device is a
>> >just guess. The proper location of the alternative header is
>> >specified with-in primary header (pgpt->alternate_lba). The header
>> >at the end of
>> >the device (as used for "force_gpt") is a fallback solution only.
>>
>> And this only illustrates the ambiguity of the backup.
>
> The UEFI specification is not ambiguous - you should always look
> for the backup GPT Header at the last LBA:
>
> "Two GPT Header structures are stored on the device: the primary
> and the backup. The primary GPT Header must be located in LBA 1
> (i.e., the second logical block), and the backup GPT Header must
> be located in the last LBA of the device."
>
> If the primary GPT Header is corrupted (e.g., CRC is bad), you
> cannot trust any fields in it, including the Alternate LBA field.
> The Alternate LBA field is there to help you tolerate failures
> while growing or shrinking the block device size (not important
> for individual physical drives, but an issue for logical drives
> presented by RAID controllers).
>

What the UEFI spec stipulates is not really relevant for the kernel.
So the firmware must use the backup GPT if the CRC of the primary one
indicates that it is corrupted, fine. Once we are in the kernel, the
policy is currently different, which makes sense since we are not only
mounting the boot device, but other block devices as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Austin S. Hemmelgarn April 27, 2016, 12:59 p.m. UTC | #11
On 2016-04-27 02:00, Ard Biesheuvel wrote:
> On 26 April 2016 at 22:34, Elliott, Robert (Persistent Memory)
> <elliott@hpe.com> wrote:
>>
>>
>>> -----Original Message-----
>>> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
>>> owner@vger.kernel.org] On Behalf Of Davidlohr Bueso
>>> Sent: Tuesday, April 26, 2016 1:34 PM
>>> To: Karel Zak <kzak@redhat.com>
>>> Cc: Julius Werner <jwerner@chromium.org>; linux-efi@vger.kernel.org;
>>> linux-kernel@vger.kernel.org; linux-block@vger.kernel.org; Gwendal
>>> Grignou <gwendal@chromium.org>; Doug Anderson <dianders@chromium.org>
>>> Subject: Re: [PATCH] block: partitions: efi: Always check for
>>> alternative GPT at end of drive
>>>
>>> On Tue, 26 Apr 2016, Karel Zak wrote:
>>>
>>>> On Mon, Apr 25, 2016 at 06:06:46PM -0700, Julius Werner wrote:
>>>>> The GUID Partiton Table layout maintains two synonymous partition
>>>>> tables on a block device, one starting in sector 1 and one in the
>>>>> very last sectors of the block device. This is useful if one of
>>>>> the tables gets
>>>>> accidentally corrupted (e.g. through a partial write because of an
>>>>> unexpected power loss).
>>>>>
>>>>> Linux normally only boots if the primary GPT is valid. It will not
>>>>> even try to find the alternative GPT to an invalid primary one
>>>>> unless the "gpt" command line option forces more aggressive
>>>>> detection. This doesn't
>>>>> really make any sense... if the "gpt" option is not set, the code
>>>>> validates the protective or hybrid MBR in sector 0 anyway before
>>>>> it even starts looking for the actual GPTs. If we get to the point
>>>>> where a valid proctective or hybrid MBR was found but the primary
>>>>> GPT was not found (valid), checking the alternative GPT is our
>>>>> best bet: we know that this
>>>
>>> 'best bet' in a kernel is not enough :) Which is why userland tools
>>> can fix and/or do any sort of crazy stuff with the backup and recover
>>> the primary etc etc.
>>
>> Drive blocks go bad; the redundant GPTs are there to let the
>> system keep booting and running if that happens.
>>
>> Rewriting the bad GPTs is what should require user intervention.
>>
>>>
>>>>> block device is meant to use GPT (because any other partitioning
>>> system
>>>>> would've presumably overwritten sector 0), and we know that if the
>>>>> alternative GPT is valid it should contain more accurate
>>> information
>>>>> than parsing the protective/hybrid MBR with msdos_partition()
>>> would
>>>>> yield (which would otherwise be what happens next).
>>>
>>>> I guess "force_gpt" (and "gpt" on kernel command line) exists to
>>>> force users to think and care about a reason why the device has
>>>> unreadable (broken) primary GPT header.
>>>
>>> Yes, from find_valid_gpt():
>>>
>>>   * If the Primary GPT header is not valid, the Alternate GPT header
>>>   * is not checked unless the 'gpt' kernel command line option is
>>> passed.
>>>   * This protects against devices which misreport their size, and
>>> forces
>>>   * the user to decide to use the Alternate GPT.
>>>
>>> ... so users are at least forced in some way to think about this.
>>>
>>>> It seems like bad (and dangerous) idea to silently ignore corrupted
>>>> primary GTP header and boot from such device.
>>>
>>> Yeah, there's no way in hell I trust a backup gpt in kernel space.
>>> We simply have no way of distinguishing between good and bad devices.
>>>
>>>> And note that alternative GPT header and the end of the device is a
>>>> just guess. The proper location of the alternative header is
>>>> specified with-in primary header (pgpt->alternate_lba). The header
>>>> at the end of
>>>> the device (as used for "force_gpt") is a fallback solution only.
>>>
>>> And this only illustrates the ambiguity of the backup.
>>
>> The UEFI specification is not ambiguous - you should always look
>> for the backup GPT Header at the last LBA:
>>
>> "Two GPT Header structures are stored on the device: the primary
>> and the backup. The primary GPT Header must be located in LBA 1
>> (i.e., the second logical block), and the backup GPT Header must
>> be located in the last LBA of the device."
>>
>> If the primary GPT Header is corrupted (e.g., CRC is bad), you
>> cannot trust any fields in it, including the Alternate LBA field.
>> The Alternate LBA field is there to help you tolerate failures
>> while growing or shrinking the block device size (not important
>> for individual physical drives, but an issue for logical drives
>> presented by RAID controllers).
>>
>
> What the UEFI spec stipulates is not really relevant for the kernel.
> So the firmware must use the backup GPT if the CRC of the primary one
> indicates that it is corrupted, fine. Once we are in the kernel, the
> policy is currently different, which makes sense since we are not only
> mounting the boot device, but other block devices as well.
>
No, it is relevant considering that it's the authoritative standard for 
the GPT format.  Sure, we have to deal with other block devices.  The 
fact is though, we currently refuse to do anything with a disk that has 
a corrupted primary GPT, but a valid secondary.  I agree that the user 
needs to be notified somehow that something is wrong, but refusing to 
work is not a user friendly behavior, and doesn't really give much 
specific information about what's wrong (keep in mind, most typical 
desktop users won't look at kernel logs, and a lot of people using 
embedded devices can't).

For what it's worth, Windows 7 and newer will properly read partitions 
on a disk with a corrupt primary GPT and a valid secondary, and I'd be 
willing to bet that OS X does so as well.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Karel Zak April 27, 2016, 3:09 p.m. UTC | #12
On Tue, Apr 26, 2016 at 02:51:01PM -0700, Gwendal Grignou wrote:
> Julius and I were looking at the code when we spotted the issue.
> 
> As Julius said, "just pass a boot param", is not easy on certain
> machines, like phone. It is not user friendly either.
> The system won't boot at all, a typical user will have to do a full
> reinstall to fix the error.

And how typical user will fix the problem with corrupted primary
header after successful boot? I mean, use alternative header (without
force_gpt) is a good idea if we know that user will not ignore the
problem. The current solution is to force user to do anything.

It would be nice to have support for this issue in userspace 
and switch for example to single user mode, or so...

I'm also have doubts that printk() is the best way how to report 
this issue to userspace if we want to trigger any action :-)

    Karel
Doug Anderson April 27, 2016, 3:45 p.m. UTC | #13
Hi,

On Wed, Apr 27, 2016 at 8:09 AM, Karel Zak <kzak@redhat.com> wrote:
> On Tue, Apr 26, 2016 at 02:51:01PM -0700, Gwendal Grignou wrote:
>> Julius and I were looking at the code when we spotted the issue.
>>
>> As Julius said, "just pass a boot param", is not easy on certain
>> machines, like phone. It is not user friendly either.
>> The system won't boot at all, a typical user will have to do a full
>> reinstall to fix the error.
>
> And how typical user will fix the problem with corrupted primary
> header after successful boot? I mean, use alternative header (without
> force_gpt) is a good idea if we know that user will not ignore the
> problem. The current solution is to force user to do anything.
>
> It would be nice to have support for this issue in userspace
> and switch for example to single user mode, or so...
>
> I'm also have doubts that printk() is the best way how to report
> this issue to userspace if we want to trigger any action :-)

Presumably:

* We ended up in this situation because something (auto update,
partition resizer, etc) was making changes to the GPT and got
interrupted (power cycle, crash, etc).

* The something that was making changes will run again after the
system boots up.

* The something that was making changes will presumably be smart
enough to figure out how to fix things up.

* If we can't even boot up, that something has no chance...


...or, if we ended up in this situation because a cosmic ray hit our
storage device and corrupted the primary GPT then I'd say that we
should keep using the alternate and not care that userspace won't do
anything to fix it.  Hopefully cosmic rays are super rare and the
changes of a second one hitting and killing the secondary GPT are slim
to none.  If you're using this disk in outer space and cosmic rays are
common, presumably you've got some massive ECC going on and maybe you
even have userspace that expects periodic sector failures and knows
how to handle it.


-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Julius Werner April 27, 2016, 9:44 p.m. UTC | #14
On Wed, Apr 27, 2016 at 8:09 AM, Karel Zak <kzak@redhat.com> wrote:
> On Tue, Apr 26, 2016 at 02:51:01PM -0700, Gwendal Grignou wrote:
>> Julius and I were looking at the code when we spotted the issue.
>>
>> As Julius said, "just pass a boot param", is not easy on certain
>> machines, like phone. It is not user friendly either.
>> The system won't boot at all, a typical user will have to do a full
>> reinstall to fix the error.
>
> And how typical user will fix the problem with corrupted primary
> header after successful boot? I mean, use alternative header (without
> force_gpt) is a good idea if we know that user will not ignore the
> problem. The current solution is to force user to do anything.
>
> It would be nice to have support for this issue in userspace
> and switch for example to single user mode, or so...
>
> I'm also have doubts that printk() is the best way how to report
> this issue to userspace if we want to trigger any action :-)

Holding the whole system hostage and forcing manual action is *not*
user-friendly behavior. Linux is no longer just something for hobbyist
hackers to install on their converted Windows machines at home... it
is a mature, modern operating system kernel used on a wide range of
devices (server farms, phones, embedded systems, etc.) and it should
behave like one. Not all of these platforms necessarily make it easy
for the user to drop into grub and add some command line parameters,
and it's the kernel's job to provide a suitable environment for all of
them so that policy decisions can be left to userspace.

So yes, userspace should resolve this problem, but in order to do that
you need to allow userspace to boot first! dmesg is one suitable way
to communicate the problem, and there are others which I wouldn't be
opposed to either, but no matter which channel we choose the kernel
still has to continue booting to allow the rest of the OS to deal with
it. Whether to ignore, silently repair or fail to boot from a
corrupted primary GPT is a policy decision and it should be made in
user space... if you need to retain the current behavior, it's easy to
add an init script that greps for GPT warnings and hangs to your
distro.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/block/partitions/efi.c b/block/partitions/efi.c
index 26cb624..0d4ca8e 100644
--- a/block/partitions/efi.c
+++ b/block/partitions/efi.c
@@ -625,7 +625,7 @@  static int find_valid_gpt(struct parsed_partitions *state, gpt_header **gpt,
 		good_agpt = is_gpt_valid(state,
 					 le64_to_cpu(pgpt->alternate_lba),
 					 &agpt, &aptes);
-        if (!good_agpt && force_gpt)
+        if (!good_agpt)
                 good_agpt = is_gpt_valid(state, lastlba, &agpt, &aptes);
 
         /* The obviously unsuccessful case */