diff mbox

[RFC,2/4] cpu: advertise CPU features over udev in a generic way

Message ID 1383844657-17487-3-git-send-email-ard.biesheuvel@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Ard Biesheuvel Nov. 7, 2013, 5:17 p.m. UTC
This patch implements a generic modalias 'cpu:feature:...' which
enables CPU feature flag based module loading in a generic way.
All the arch needs to do is enable CONFIG_ARCH_HAS_CPU_AUTOPROBE
and export a u32 called 'cpu_features'. (What each bit actually
means is irrelevant on this level.)

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 drivers/base/cpu.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Dave Martin Nov. 7, 2013, 7:33 p.m. UTC | #1
On Thu, Nov 07, 2013 at 06:17:35PM +0100, Ard Biesheuvel wrote:
> This patch implements a generic modalias 'cpu:feature:...' which
> enables CPU feature flag based module loading in a generic way.
> All the arch needs to do is enable CONFIG_ARCH_HAS_CPU_AUTOPROBE
> and export a u32 called 'cpu_features'. (What each bit actually
> means is irrelevant on this level.)

There seems to be an assumption here that a module is either a pure CPU
accelerator, or it is completely independent of CPU features.

I'm not sure that this is true.  A CPU feature isn't a "device".
Rather, it's a property of code (which might be a driver for something
different -- maybe we have a hardware crypto accelerator where key
scheduling must be done in software.  It's still a driver for the
crypto engine, but we might have different implementations of the key
scheduling, based on the CPU features avaiable).

It's also not obvious why we should blindly load all CPU-feature-
dependent helper modules on bgoot, regardless of whether the module(s)
that use them are being loaded.

Maybe the amount of CPU feature dependent code is small enough that
we don't really care about such subtleties, though.


It's also not clear how different optimised modules for the same
thing would get prioritised.  Suppose there we have v5E and NEON
optimised AES helper modules?  Both those CPU features are avaiable,
but which module should we load?

If all candidate modules get loaded, which one actually gets used?
Does the load order matter?

> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  drivers/base/cpu.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
> index 49c6f4b..a661d31 100644
> --- a/drivers/base/cpu.c
> +++ b/drivers/base/cpu.c
> @@ -276,6 +276,30 @@ static void cpu_device_release(struct device *dev)
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_CPU_AUTOPROBE
> +ssize_t print_cpu_modalias(struct device *dev,
> +			   struct device_attribute *attr,
> +			   char *buf)
> +{
> +	extern u32 __weak cpu_features;

Why is this __weak?  Surely CONFIG_ARCH_HAS_CPU_AUTOPROBE=y makes no
sense if the arch code does not define either cpu_features or
arch_print_cpu_modalias()?  The build should be made to fail in that
case...

> +	ssize_t n;
> +	int i;
> +	u32 f;
> +
> +	/*
> +	 * With 32 features maximum (taking 3 bytes each to print), we don't
> +	 * need to worry about overrunning the PAGE_SIZE sized buffer.
> +	 */
> +	n = sprintf(buf, "cpu:feature:");
> +	for (f = cpu_features, i = 0; f; f >>= 1, i++)
> +		if (f & 1)
> +			n += sprintf(&buf[n], ",%02X", i);

Why can't this overflow buf?

modalias matching is pretty much based on string matching, so I wonder
whether we could use the human-readable feature strings instead.
Those are already a stable ABI.  Relying on numbers unnecessarily
encrypts the udev/modprobe config.

Otherwise, "%02X" seems to place an arbitrary limit of 256 features.
I'm not sure that padding these numbers to a particular width is
advantageous for the parser.

> +	buf[n++] = '\n';
> +	return n;
> +}
> +
> +ssize_t __attribute__((weak, alias("print_cpu_modalias")))
> +arch_print_cpu_modalias(struct device *, struct device_attribute *, char *);
> +

If an implementation of arch_print_cpu_modalias() is linked with this,
won't that result in the print_cpu_modalias() defined here just being
included as dead code?

i.e., we knowingly link into the kernel some code that the build-time
configuration tells us is dead.

Maybe I'm misunderstanding things here, but I think this weak-symbol
stuff is mainly useful when shipping a binary blob in which people can
override certain symbols at link time.

We build vmlinux in one go, so I'm not sure that's appropriate here (?)

>  static int cpu_uevent(struct device *dev, struct kobj_uevent_env *env)
>  {
>  	char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
> -- 
> 1.8.3.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Ard Biesheuvel Nov. 7, 2013, 8 p.m. UTC | #2
> On 7 nov. 2013, at 20:33, Dave Martin <Dave.Martin@arm.com> wrote:
> 
>> On Thu, Nov 07, 2013 at 06:17:35PM +0100, Ard Biesheuvel wrote:
>> This patch implements a generic modalias 'cpu:feature:...' which
>> enables CPU feature flag based module loading in a generic way.
>> All the arch needs to do is enable CONFIG_ARCH_HAS_CPU_AUTOPROBE
>> and export a u32 called 'cpu_features'. (What each bit actually
>> means is irrelevant on this level.)
> 
> There seems to be an assumption here that a module is either a pure CPU
> accelerator, or it is completely independent of CPU features.
> 
> I'm not sure that this is true.  A CPU feature isn't a "device".
> Rather, it's a property of code (which might be a driver for something
> different -- maybe we have a hardware crypto accelerator where key
> scheduling must be done in software.  It's still a driver for the
> crypto engine, but we might have different implementations of the key
> scheduling, based on the CPU features avaiable).
> 

The use case I am targeting is dedicated instructions for AES, CRC, SHA1, SHA2, etc, all of which -on arm64- can be independently enabled by the implementer.  A single distro image should run as closely to optimal as possible right out of the box, especially in these cases.

> It's also not obvious why we should blindly load all CPU-feature-
> dependent helper modules on bgoot, regardless of whether the module(s)
> that use them are being loaded.
> 
> Maybe the amount of CPU feature dependent code is small enough that
> we don't really care about such subtleties, though.
> 
> It's also not clear how different optimised modules for the same
> thing would get prioritised.  Suppose there we have v5E and NEON
> optimised AES helper modules?  Both those CPU features are avaiable,
> but which module should we load?
> 
> If all candidate modules get loaded, which one actually gets used?
> Does the load order matter?
> 

In the cryptoapi case, each algorithm also has a priority assigned to it, which should be sufficient to break these kinds of ties. Other code exists (xor, raid6) that does a quick boot time benchmark of all available options. In general, though, this is undefined, so I agree that associating random bits of NEON or FP code with the CPU feature bit may make little sense, especially on v8 (which has feature bits for mandatory extensions like FP and ASIMD)
Ard Biesheuvel Nov. 7, 2013, 8:55 p.m. UTC | #3
On 7 November 2013 20:33, Dave Martin <Dave.Martin@arm.com> wrote:
> On Thu, Nov 07, 2013 at 06:17:35PM +0100, Ard Biesheuvel wrote:
>> This patch implements a generic modalias 'cpu:feature:...' which
>> enables CPU feature flag based module loading in a generic way.
>> All the arch needs to do is enable CONFIG_ARCH_HAS_CPU_AUTOPROBE
>> and export a u32 called 'cpu_features'. (What each bit actually
>> means is irrelevant on this level.)

[ ... ]

> Why is this __weak?  Surely CONFIG_ARCH_HAS_CPU_AUTOPROBE=y makes no
> sense if the arch code does not define either cpu_features or
> arch_print_cpu_modalias()?  The build should be made to fail in that
> case...
>

If you define the latter but not the former, you will still get a link
error. But indeed, getting this stuff right is the main point of this
RFC series.
Just putting a '#define arch_print_cpu_modalias
arch_print_cpu_modalias' in the right place and testing for it here is
indeed a lot better.

>> +     ssize_t n;
>> +     int i;
>> +     u32 f;
>> +
>> +     /*
>> +      * With 32 features maximum (taking 3 bytes each to print), we don't
>> +      * need to worry about overrunning the PAGE_SIZE sized buffer.
>> +      */
>> +     n = sprintf(buf, "cpu:feature:");
>> +     for (f = cpu_features, i = 0; f; f >>= 1, i++)
>> +             if (f & 1)
>> +                     n += sprintf(&buf[n], ",%02X", i);
>
> Why can't this overflow buf?
>

Because there are only so many bits you can shift out of a u32 before
it becomes zero.

> modalias matching is pretty much based on string matching, so I wonder
> whether we could use the human-readable feature strings instead.
> Those are already a stable ABI.  Relying on numbers unnecessarily
> encrypts the udev/modprobe config.
>

Introducing symbolic names would add an arch specific dependency to
the host tool that is used to generate the .ko metadata from the
custom ELF sections. This being a port of a similar feature on x86,
the comments in scripts/mod/file2alias.c were helpful in figuring out
that that was deemed too complicated at the time. And we don't resolve
PCI or USB VIDs to vendor names either, as this is just module
metadata for machine matching.

> Otherwise, "%02X" seems to place an arbitrary limit of 256 features.
> I'm not sure that padding these numbers to a particular width is
> advantageous for the parser.
>

Using a u32 already brings it down to 32, but it is sufficient for now
and I think changing it later if necessary should be doable.

>> +     buf[n++] = '\n';
>> +     return n;
>> +}
>> +
>> +ssize_t __attribute__((weak, alias("print_cpu_modalias")))
>> +arch_print_cpu_modalias(struct device *, struct device_attribute *, char *);
>> +
>
> If an implementation of arch_print_cpu_modalias() is linked with this,
> won't that result in the print_cpu_modalias() defined here just being
> included as dead code?
>
> i.e., we knowingly link into the kernel some code that the build-time
> configuration tells us is dead.
>

Absolutely true. See the above.

> Maybe I'm misunderstanding things here, but I think this weak-symbol
> stuff is mainly useful when shipping a binary blob in which people can
> override certain symbols at link time.
>
> We build vmlinux in one go, so I'm not sure that's appropriate here (?)
>

I agree, and (as I pointed out in my cover letter) I was prepared for
people perhaps having problems with that.
But yes, it seems just #defining arch_print_cpu_modalias in the
appropriate place and #ifndef'ing on it seems much bettter.
diff mbox

Patch

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 49c6f4b..a661d31 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -276,6 +276,30 @@  static void cpu_device_release(struct device *dev)
 }
 
 #ifdef CONFIG_ARCH_HAS_CPU_AUTOPROBE
+ssize_t print_cpu_modalias(struct device *dev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	extern u32 __weak cpu_features;
+	ssize_t n;
+	int i;
+	u32 f;
+
+	/*
+	 * With 32 features maximum (taking 3 bytes each to print), we don't
+	 * need to worry about overrunning the PAGE_SIZE sized buffer.
+	 */
+	n = sprintf(buf, "cpu:feature:");
+	for (f = cpu_features, i = 0; f; f >>= 1, i++)
+		if (f & 1)
+			n += sprintf(&buf[n], ",%02X", i);
+	buf[n++] = '\n';
+	return n;
+}
+
+ssize_t __attribute__((weak, alias("print_cpu_modalias")))
+arch_print_cpu_modalias(struct device *, struct device_attribute *, char *);
+
 static int cpu_uevent(struct device *dev, struct kobj_uevent_env *env)
 {
 	char *buf = kzalloc(PAGE_SIZE, GFP_KERNEL);