diff mbox series

efi: tpm: Avoid READ_ONCE() for accessing the event log

Message ID 20230109095948.2471205-1-ardb@kernel.org (mailing list archive)
State New, archived
Headers show
Series efi: tpm: Avoid READ_ONCE() for accessing the event log | expand

Commit Message

Ard Biesheuvel Jan. 9, 2023, 9:59 a.m. UTC
Nathan reports that recent kernels built with LTO will crash when doing
EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
misaligned load from the TPM event log, which is annotated with
READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
which does not tolerate misaligned accesses.

Interestingly, this does not happen when booting the same kernel
straight from the UEFI shell, and so the fact that the event log may
appear misaligned in memory may be caused by a bug in GRUB or SHIM.

However, using READ_ONCE() to access firmware tables is slightly unusual
in any case, and here, we only need to ensure that 'event' is not
dereferenced again after it gets unmapped, so a compiler barrier should
be sufficient, and works around the reported issue.

Cc: <stable@vger.kernel.org>
Cc: Peter Jones <pjones@redhat.com>
Cc: Jarkko Sakkinen <jarkko@kernel.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Reported-by: Nathan Chancellor <nathan@kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1782
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 include/linux/tpm_eventlog.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Will Deacon Jan. 9, 2023, 3:10 p.m. UTC | #1
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> Nathan reports that recent kernels built with LTO will crash when doing
> EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> misaligned load from the TPM event log, which is annotated with
> READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> which does not tolerate misaligned accesses.

Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned
address is pretty sketchy, but if this ends up tripping lots of folks up
then I suppose we could use a plain load and a DMB LD as an alternative.
It's likely to be more expensive in the LDAPR case, though.

> Interestingly, this does not happen when booting the same kernel
> straight from the UEFI shell, and so the fact that the event log may
> appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> 
> However, using READ_ONCE() to access firmware tables is slightly unusual
> in any case, and here, we only need to ensure that 'event' is not
> dereferenced again after it gets unmapped, so a compiler barrier should
> be sufficient, and works around the reported issue.
> 
> Cc: <stable@vger.kernel.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  include/linux/tpm_eventlog.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> index 20c0ff54b7a0d313..0abcc85904cba874 100644
> --- a/include/linux/tpm_eventlog.h
> +++ b/include/linux/tpm_eventlog.h
> @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
>  	 * The loop below will unmap these fields if the log is larger than
>  	 * one page, so save them here for reference:
>  	 */
> -	count = READ_ONCE(event->count);
> -	event_type = READ_ONCE(event->event_type);
> +	count = event->count;
> +	event_type = event->event_type;
> +
> +	barrier();

It would be handy to have a comment here, but when I started thinking about
what that would say, it occurred to me that the unmap operation should
already have a barrier inside it due to the TLB invalidation, so I'm not
sure why this is needed at all.

Will
Ard Biesheuvel Jan. 9, 2023, 3:20 p.m. UTC | #2
On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> > Nathan reports that recent kernels built with LTO will crash when doing
> > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> > misaligned load from the TPM event log, which is annotated with
> > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> > which does not tolerate misaligned accesses.
>
> Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned
> address is pretty sketchy, but if this ends up tripping lots of folks up
> then I suppose we could use a plain load and a DMB LD as an alternative.
> It's likely to be more expensive in the LDAPR case, though.
>

Yeah, I am not suggesting that we change READ_ONCE(), but this case
was definitely not taken into account at the time.

> > Interestingly, this does not happen when booting the same kernel
> > straight from the UEFI shell, and so the fact that the event log may
> > appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> >
> > However, using READ_ONCE() to access firmware tables is slightly unusual
> > in any case, and here, we only need to ensure that 'event' is not
> > dereferenced again after it gets unmapped, so a compiler barrier should
> > be sufficient, and works around the reported issue.
> >
> > Cc: <stable@vger.kernel.org>
> > Cc: Peter Jones <pjones@redhat.com>
> > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> > Reported-by: Nathan Chancellor <nathan@kernel.org>
> > Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  include/linux/tpm_eventlog.h | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> > index 20c0ff54b7a0d313..0abcc85904cba874 100644
> > --- a/include/linux/tpm_eventlog.h
> > +++ b/include/linux/tpm_eventlog.h
> > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
> >        * The loop below will unmap these fields if the log is larger than
> >        * one page, so save them here for reference:
> >        */
> > -     count = READ_ONCE(event->count);
> > -     event_type = READ_ONCE(event->event_type);
> > +     count = event->count;
> > +     event_type = event->event_type;
> > +
> > +     barrier();
>
> It would be handy to have a comment here, but when I started thinking about
> what that would say, it occurred to me that the unmap operation should
> already have a barrier inside it due to the TLB invalidation, so I'm not
> sure why this is needed at all.
>

This is purely to prevent the compiler from accessing count or
event_type by reloading it from the event pointer, in case it runs out
of registers.

Perhaps this is unlikely to occur, given that the kernel uses
-fno-strict-aliasing, and so any store occurring after this
READ_ONCE() could potentially affect the result of accessing
event->count or event->event_type.
Will Deacon Jan. 9, 2023, 3:34 p.m. UTC | #3
On Mon, Jan 09, 2023 at 04:20:34PM +0100, Ard Biesheuvel wrote:
> On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> > > Nathan reports that recent kernels built with LTO will crash when doing
> > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> > > misaligned load from the TPM event log, which is annotated with
> > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> > > which does not tolerate misaligned accesses.
> >
> > Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned
> > address is pretty sketchy, but if this ends up tripping lots of folks up
> > then I suppose we could use a plain load and a DMB LD as an alternative.
> > It's likely to be more expensive in the LDAPR case, though.
> >
> 
> Yeah, I am not suggesting that we change READ_ONCE(), but this case
> was definitely not taken into account at the time.

Indeed, and it looks like the architecture added SCTLR_ELx.nAA to toggle
this behaviour, although it was only added in 8.4 with FEAT_LSE2.

> > > Interestingly, this does not happen when booting the same kernel
> > > straight from the UEFI shell, and so the fact that the event log may
> > > appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> > >
> > > However, using READ_ONCE() to access firmware tables is slightly unusual
> > > in any case, and here, we only need to ensure that 'event' is not
> > > dereferenced again after it gets unmapped, so a compiler barrier should
> > > be sufficient, and works around the reported issue.
> > >
> > > Cc: <stable@vger.kernel.org>
> > > Cc: Peter Jones <pjones@redhat.com>
> > > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > > Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> > > Reported-by: Nathan Chancellor <nathan@kernel.org>
> > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  include/linux/tpm_eventlog.h | 6 ++++--
> > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> > > index 20c0ff54b7a0d313..0abcc85904cba874 100644
> > > --- a/include/linux/tpm_eventlog.h
> > > +++ b/include/linux/tpm_eventlog.h
> > > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
> > >        * The loop below will unmap these fields if the log is larger than
> > >        * one page, so save them here for reference:
> > >        */
> > > -     count = READ_ONCE(event->count);
> > > -     event_type = READ_ONCE(event->event_type);
> > > +     count = event->count;
> > > +     event_type = event->event_type;
> > > +
> > > +     barrier();
> >
> > It would be handy to have a comment here, but when I started thinking about
> > what that would say, it occurred to me that the unmap operation should
> > already have a barrier inside it due to the TLB invalidation, so I'm not
> > sure why this is needed at all.
> >
> 
> This is purely to prevent the compiler from accessing count or
> event_type by reloading it from the event pointer, in case it runs out
> of registers.

But that reload would only be a problem if the event has been unmapped, no?
Given that the unmapping code has a barrier() and the unmapped page is not
explicitly referenced, then I don't see the issue.

Will
Ard Biesheuvel Jan. 9, 2023, 3:43 p.m. UTC | #4
On Mon, 9 Jan 2023 at 16:34, Will Deacon <will@kernel.org> wrote:
>
> On Mon, Jan 09, 2023 at 04:20:34PM +0100, Ard Biesheuvel wrote:
> > On Mon, 9 Jan 2023 at 16:11, Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> > > > Nathan reports that recent kernels built with LTO will crash when doing
> > > > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> > > > misaligned load from the TPM event log, which is annotated with
> > > > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> > > > which does not tolerate misaligned accesses.
> > >
> > > Interesting, that's a funny change in behaviour. READ_ONCE() of an unaligned
> > > address is pretty sketchy, but if this ends up tripping lots of folks up
> > > then I suppose we could use a plain load and a DMB LD as an alternative.
> > > It's likely to be more expensive in the LDAPR case, though.
> > >
> >
> > Yeah, I am not suggesting that we change READ_ONCE(), but this case
> > was definitely not taken into account at the time.
>
> Indeed, and it looks like the architecture added SCTLR_ELx.nAA to toggle
> this behaviour, although it was only added in 8.4 with FEAT_LSE2.
>
> > > > Interestingly, this does not happen when booting the same kernel
> > > > straight from the UEFI shell, and so the fact that the event log may
> > > > appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> > > >
> > > > However, using READ_ONCE() to access firmware tables is slightly unusual
> > > > in any case, and here, we only need to ensure that 'event' is not
> > > > dereferenced again after it gets unmapped, so a compiler barrier should
> > > > be sufficient, and works around the reported issue.
> > > >
> > > > Cc: <stable@vger.kernel.org>
> > > > Cc: Peter Jones <pjones@redhat.com>
> > > > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > > > Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> > > > Reported-by: Nathan Chancellor <nathan@kernel.org>
> > > > Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > > ---
> > > >  include/linux/tpm_eventlog.h | 6 ++++--
> > > >  1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> > > > index 20c0ff54b7a0d313..0abcc85904cba874 100644
> > > > --- a/include/linux/tpm_eventlog.h
> > > > +++ b/include/linux/tpm_eventlog.h
> > > > @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
> > > >        * The loop below will unmap these fields if the log is larger than
> > > >        * one page, so save them here for reference:
> > > >        */
> > > > -     count = READ_ONCE(event->count);
> > > > -     event_type = READ_ONCE(event->event_type);
> > > > +     count = event->count;
> > > > +     event_type = event->event_type;
> > > > +
> > > > +     barrier();
> > >
> > > It would be handy to have a comment here, but when I started thinking about
> > > what that would say, it occurred to me that the unmap operation should
> > > already have a barrier inside it due to the TLB invalidation, so I'm not
> > > sure why this is needed at all.
> > >
> >
> > This is purely to prevent the compiler from accessing count or
> > event_type by reloading it from the event pointer, in case it runs out
> > of registers.
>
> But that reload would only be a problem if the event has been unmapped, no?
> Given that the unmapping code has a barrier() and the unmapped page is not
> explicitly referenced, then I don't see the issue.
>

Fair point. Looking at the history, it was I who suggested the
READ_ONCE() here in addition to the changes to lift the explicit event
dereferences out of the loop, but it indeed seems unlikely that there
is any way the compiler could decide that it can dereference event
again to grab these quantities.

https://lore.kernel.org/all/20190826153028.32639-1-pjones@redhat.com/T/#u

So i'll drop the barrier() from this patch.

Thanks,
Ard.
Nathan Chancellor Jan. 9, 2023, 5:48 p.m. UTC | #5
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> Nathan reports that recent kernels built with LTO will crash when doing
> EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> misaligned load from the TPM event log, which is annotated with
> READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> which does not tolerate misaligned accesses.
> 
> Interestingly, this does not happen when booting the same kernel
> straight from the UEFI shell, and so the fact that the event log may
> appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> 
> However, using READ_ONCE() to access firmware tables is slightly unusual
> in any case, and here, we only need to ensure that 'event' is not
> dereferenced again after it gets unmapped, so a compiler barrier should
> be sufficient, and works around the reported issue.
> 
> Cc: <stable@vger.kernel.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>

Based on the thread, I tested this patch without barrier() and my
machine boots up just fine now with an LTO kernel. Thanks a lot for the
analysis and fix!

Tested-by: Nathan Chancellor <nathan@kernel.org>

> ---
>  include/linux/tpm_eventlog.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> index 20c0ff54b7a0d313..0abcc85904cba874 100644
> --- a/include/linux/tpm_eventlog.h
> +++ b/include/linux/tpm_eventlog.h
> @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
>  	 * The loop below will unmap these fields if the log is larger than
>  	 * one page, so save them here for reference:
>  	 */
> -	count = READ_ONCE(event->count);
> -	event_type = READ_ONCE(event->event_type);
> +	count = event->count;
> +	event_type = event->event_type;
> +
> +	barrier();
>  
>  	/* Verify that it's the log header */
>  	if (event_header->pcr_idx != 0 ||
> -- 
> 2.39.0
>
Ard Biesheuvel Jan. 9, 2023, 5:50 p.m. UTC | #6
On Mon, 9 Jan 2023 at 18:48, Nathan Chancellor <nathan@kernel.org> wrote:
>
> On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> > Nathan reports that recent kernels built with LTO will crash when doing
> > EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> > misaligned load from the TPM event log, which is annotated with
> > READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> > which does not tolerate misaligned accesses.
> >
> > Interestingly, this does not happen when booting the same kernel
> > straight from the UEFI shell, and so the fact that the event log may
> > appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> >
> > However, using READ_ONCE() to access firmware tables is slightly unusual
> > in any case, and here, we only need to ensure that 'event' is not
> > dereferenced again after it gets unmapped, so a compiler barrier should
> > be sufficient, and works around the reported issue.
> >
> > Cc: <stable@vger.kernel.org>
> > Cc: Peter Jones <pjones@redhat.com>
> > Cc: Jarkko Sakkinen <jarkko@kernel.org>
> > Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> > Reported-by: Nathan Chancellor <nathan@kernel.org>
> > Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
>
> Based on the thread, I tested this patch without barrier() and my
> machine boots up just fine now with an LTO kernel. Thanks a lot for the
> analysis and fix!
>
> Tested-by: Nathan Chancellor <nathan@kernel.org>
>

Thanks. I've queued this up as a EFI fix.
Jarkko Sakkinen Jan. 20, 2023, 11:22 p.m. UTC | #7
On Mon, Jan 09, 2023 at 10:59:48AM +0100, Ard Biesheuvel wrote:
> Nathan reports that recent kernels built with LTO will crash when doing
> EFI boot using Fedora's GRUB and SHIM. The culprit turns out to be a
> misaligned load from the TPM event log, which is annotated with
> READ_ONCE(), and under LTO, this gets translated into a LDAR instruction
> which does not tolerate misaligned accesses.
> 
> Interestingly, this does not happen when booting the same kernel
> straight from the UEFI shell, and so the fact that the event log may
> appear misaligned in memory may be caused by a bug in GRUB or SHIM.
> 
> However, using READ_ONCE() to access firmware tables is slightly unusual
> in any case, and here, we only need to ensure that 'event' is not
> dereferenced again after it gets unmapped, so a compiler barrier should
> be sufficient, and works around the reported issue.
> 
> Cc: <stable@vger.kernel.org>
> Cc: Peter Jones <pjones@redhat.com>
> Cc: Jarkko Sakkinen <jarkko@kernel.org>
> Cc: Matthew Garrett <mjg59@srcf.ucam.org>
> Reported-by: Nathan Chancellor <nathan@kernel.org>
> Link: https://github.com/ClangBuiltLinux/linux/issues/1782
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  include/linux/tpm_eventlog.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
> index 20c0ff54b7a0d313..0abcc85904cba874 100644
> --- a/include/linux/tpm_eventlog.h
> +++ b/include/linux/tpm_eventlog.h
> @@ -198,8 +198,10 @@ static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
>  	 * The loop below will unmap these fields if the log is larger than
>  	 * one page, so save them here for reference:
>  	 */
> -	count = READ_ONCE(event->count);
> -	event_type = READ_ONCE(event->event_type);
> +	count = event->count;
> +	event_type = event->event_type;
> +
> +	barrier();
>  
>  	/* Verify that it's the log header */
>  	if (event_header->pcr_idx != 0 ||
> -- 
> 2.39.0
> 

Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>

BR, Jarkko
diff mbox series

Patch

diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h
index 20c0ff54b7a0d313..0abcc85904cba874 100644
--- a/include/linux/tpm_eventlog.h
+++ b/include/linux/tpm_eventlog.h
@@ -198,8 +198,10 @@  static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *ev
 	 * The loop below will unmap these fields if the log is larger than
 	 * one page, so save them here for reference:
 	 */
-	count = READ_ONCE(event->count);
-	event_type = READ_ONCE(event->event_type);
+	count = event->count;
+	event_type = event->event_type;
+
+	barrier();
 
 	/* Verify that it's the log header */
 	if (event_header->pcr_idx != 0 ||