Message ID | 1607686144-2604-1-git-send-email-TonyWWang-oc@zhaoxin.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Herbert Xu |
Headers | show |
Series | crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs | expand |
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, When > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > --- > arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c > index feccb52..6dafdae 100644 > --- a/arch/x86/crypto/crc32c-intel_glue.c > +++ b/arch/x86/crypto/crc32c-intel_glue.c > @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); > > static int __init crc32c_intel_mod_init(void) > { > + struct cpuinfo_x86 *c = &boot_cpu_data; > + > if (!x86_match_cpu(crc32c_cpu_id)) > return -ENODEV; > + > + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) { > + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) > + return -ENODEV; > + } Egads, why can't you use that x86_match_cpu() above, and also this really wants a comment on why you're excluding these chips. Also, since (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed? That is; write it like: m = x86_match_cpu(crc32_cpu_id); if (!m || !m->data) return -ENODEV; That way you can have positive and negative matches in the array (obviously the existing FEATURE test would need data=1 and be last).
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, When > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> Does this mean that the performance of the crc32c instruction on those CPUs is actually slower than a regular C implementation? That's very weird. - Eric
On 11/12/2020 21:00, Peter Zijlstra wrote: > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver for >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >> --- >> arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c >> index feccb52..6dafdae 100644 >> --- a/arch/x86/crypto/crc32c-intel_glue.c >> +++ b/arch/x86/crypto/crc32c-intel_glue.c >> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); >> >> static int __init crc32c_intel_mod_init(void) >> { >> + struct cpuinfo_x86 *c = &boot_cpu_data; >> + >> if (!x86_match_cpu(crc32c_cpu_id)) >> return -ENODEV; >> + >> + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) { >> + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) >> + return -ENODEV; >> + } > > Egads, why can't you use that x86_match_cpu() above, and also this > really wants a comment on why you're excluding these chips. When doing lmbench3 Create and Delete file test on partitions with ext4 enabling metadata checksum, found using crc32c-generic driver could get about 20% performance gain than using the driver crc32c-intel on these chips. Also, since > (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed? > > That is; write it like: > > m = x86_match_cpu(crc32_cpu_id); > if (!m || !m->data) > return -ENODEV; > > That way you can have positive and negative matches in the array > (obviously the existing FEATURE test would need data=1 and be last). > . > Lot thanks for you suggestion, will list these chips in crc32c_cpu_id and use x86_match_cpu: static const struct x86_cpu_id crc32c_cpu_id[] = { + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), {} }; @@ -228,8 +234,10 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { - if (!x86_match_cpu(crc32c_cpu_id)) + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); + if (!m || m->driver_data) return -ENODEV; sincerely TonyWWangoc
On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote: > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > > On platforms with Zhaoxin CPUs supporting this X86 feature, When > > crc32c-intel and crc32c-generic are both registered, system will > > use crc32c-intel because its .cra_priority is greater than > > crc32c-generic. This case expect to use crc32c-generic driver for > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > > CPUs support from crc32c-intel. > > > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > > Does this mean that the performance of the crc32c instruction on those CPUs is > actually slower than a regular C implementation? That's very weird. > This driver does not use CRC instructions, but carryless multiplication and aggregation. So I suppose the pclmulqdq instruction triggers some pathological performance limitation here. That means the crct10dif driver probably needs the same treatment.
On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <ardb@kernel.org> wrote: > > On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote: > > > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > > > On platforms with Zhaoxin CPUs supporting this X86 feature, When > > > crc32c-intel and crc32c-generic are both registered, system will > > > use crc32c-intel because its .cra_priority is greater than > > > crc32c-generic. This case expect to use crc32c-generic driver for > > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > > > CPUs support from crc32c-intel. > > > > > > Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > > > > Does this mean that the performance of the crc32c instruction on those CPUs is > > actually slower than a regular C implementation? That's very weird. > > > > This driver does not use CRC instructions, but carryless > multiplication and aggregation. So I suppose the pclmulqdq instruction > triggers some pathological performance limitation here. > Just noticed it uses both crc instructions and pclmulqdq instructions. Sorry for the noise. > That means the crct10dif driver probably needs the same treatment. Tony, can you confirm that the problem is in the CRC instructions and not in the PCLMULQDQ code path that supersedes it when available?
On 12/12/2020 01:43, Eric Biggers wrote: > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver for >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > > Does this mean that the performance of the crc32c instruction on those CPUs is > actually slower than a regular C implementation? That's very weird. > From the lmbench3 Create and Delete file test on those chips, I think yes. sincerely Tony
On 12/12/2020 18:54, Ard Biesheuvel wrote: > On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel <ardb@kernel.org> wrote: >> >> On Fri, 11 Dec 2020 at 20:07, Eric Biggers <ebiggers@kernel.org> wrote: >>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When >>>> crc32c-intel and crc32c-generic are both registered, system will >>>> use crc32c-intel because its .cra_priority is greater than >>>> crc32c-generic. This case expect to use crc32c-generic driver for >>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >>>> CPUs support from crc32c-intel. >>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>> >>> Does this mean that the performance of the crc32c instruction on those CPUs is >>> actually slower than a regular C implementation? That's very weird. >>> >> >> This driver does not use CRC instructions, but carryless >> multiplication and aggregation. So I suppose the pclmulqdq instruction >> triggers some pathological performance limitation here. >> > > Just noticed it uses both crc instructions and pclmulqdq instructions. > Sorry for the noise. > >> That means the crct10dif driver probably needs the same treatment. > > Tony, can you confirm that the problem is in the CRC instructions and > not in the PCLMULQDQ code path that supersedes it when available? CRC instructions. sincerely Tony
On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: > On 12/12/2020 01:43, Eric Biggers wrote: > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > >> On platforms with Zhaoxin CPUs supporting this X86 feature, When > >> crc32c-intel and crc32c-generic are both registered, system will > >> use crc32c-intel because its .cra_priority is greater than > >> crc32c-generic. This case expect to use crc32c-generic driver for > >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > >> CPUs support from crc32c-intel. > >> > >> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > > > > Does this mean that the performance of the crc32c instruction on those CPUs is > > actually slower than a regular C implementation? That's very weird. > > > > From the lmbench3 Create and Delete file test on those chips, I think yes. > Did you try measuring the performance of the hashing itself, and not some higher-level filesystem operations? - Eric
On 15/12/2020 04:41, Eric Biggers wrote: > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >> On 12/12/2020 01:43, Eric Biggers wrote: >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When >>>> crc32c-intel and crc32c-generic are both registered, system will >>>> use crc32c-intel because its .cra_priority is greater than >>>> crc32c-generic. This case expect to use crc32c-generic driver for >>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >>>> CPUs support from crc32c-intel. >>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>> >>> Does this mean that the performance of the crc32c instruction on those CPUs is >>> actually slower than a regular C implementation? That's very weird. >>> >> >> From the lmbench3 Create and Delete file test on those chips, I think yes. >> > > Did you try measuring the performance of the hashing itself, and not some > higher-level filesystem operations? > Yes. Was testing on these Zhaoxin CPUs, the result is that with the same input value the generic C implementation takes fewer time than the crc32c instruction implementation. Sincerely Tony
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: > > On 15/12/2020 04:41, Eric Biggers wrote: > > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: > >> On 12/12/2020 01:43, Eric Biggers wrote: > >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > >>>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When > >>>> crc32c-intel and crc32c-generic are both registered, system will > >>>> use crc32c-intel because its .cra_priority is greater than > >>>> crc32c-generic. This case expect to use crc32c-generic driver for > >>>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > >>>> CPUs support from crc32c-intel. > >>>> > >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> > >>> > >>> Does this mean that the performance of the crc32c instruction on those CPUs is > >>> actually slower than a regular C implementation? That's very weird. > >>> > >> > >> From the lmbench3 Create and Delete file test on those chips, I think yes. > >> > > > > Did you try measuring the performance of the hashing itself, and not some > > higher-level filesystem operations? > > > > Yes. Was testing on these Zhaoxin CPUs, the result is that with the same > input value the generic C implementation takes fewer time than the > crc32c instruction implementation. > And that is really "working as intended"? Why do these CPUs even declare that they support the crc32c instruction, when it is so slow? Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, etc.) that these CPUs similarly declare support for but they are uselessly slow? - Eric
On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers <ebiggers@kernel.org> wrote: >On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >> >> On 15/12/2020 04:41, Eric Biggers wrote: >> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >> >> On 12/12/2020 01:43, Eric Biggers wrote: >> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> >>>> The driver crc32c-intel match CPUs supporting >X86_FEATURE_XMM4_2. >> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> >>>> crc32c-intel and crc32c-generic are both registered, system will >> >>>> use crc32c-intel because its .cra_priority is greater than >> >>>> crc32c-generic. This case expect to use crc32c-generic driver >for >> >>>> some Zhaoxin CPUs to get performance gain, So remove these >Zhaoxin >> >>>> CPUs support from crc32c-intel. >> >>>> >> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >> >>> >> >>> Does this mean that the performance of the crc32c instruction on >those CPUs is >> >>> actually slower than a regular C implementation? That's very >weird. >> >>> >> >> >> >> From the lmbench3 Create and Delete file test on those chips, I >think yes. >> >> >> > >> > Did you try measuring the performance of the hashing itself, and >not some >> > higher-level filesystem operations? >> > >> >> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >same >> input value the generic C implementation takes fewer time than the >> crc32c instruction implementation. >> > >And that is really "working as intended"? These CPU's crc32c instruction is not working as intended. Why do these CPUs even >declare that >they support the crc32c instruction, when it is so slow? > The presence of crc32c and some other instructions supports are enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok except the crc32c instruction. >Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >etc.) that >these CPUs similarly declare support for but they are uselessly slow? No. Sincerely Tonyw > >- Eric
On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote: >On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers ><ebiggers@kernel.org> wrote: >>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>> >>> On 15/12/2020 04:41, Eric Biggers wrote: >>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>> >> On 12/12/2020 01:43, Eric Biggers wrote: >>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>> >>>> The driver crc32c-intel match CPUs supporting >>X86_FEATURE_XMM4_2. >>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, >When >>> >>>> crc32c-intel and crc32c-generic are both registered, system >will >>> >>>> use crc32c-intel because its .cra_priority is greater than >>> >>>> crc32c-generic. This case expect to use crc32c-generic driver >>for >>> >>>> some Zhaoxin CPUs to get performance gain, So remove these >>Zhaoxin >>> >>>> CPUs support from crc32c-intel. >>> >>>> >>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>> >>> >>> >>> Does this mean that the performance of the crc32c instruction on >>those CPUs is >>> >>> actually slower than a regular C implementation? That's very >>weird. >>> >>> >>> >> >>> >> From the lmbench3 Create and Delete file test on those chips, I >>think yes. >>> >> >>> > >>> > Did you try measuring the performance of the hashing itself, and >>not some >>> > higher-level filesystem operations? >>> > >>> >>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >>same >>> input value the generic C implementation takes fewer time than the >>> crc32c instruction implementation. >>> >> >>And that is really "working as intended"? > >These CPU's crc32c instruction is not working as intended. > > Why do these CPUs even >>declare that >>they support the crc32c instruction, when it is so slow? >> > >The presence of crc32c and some other instructions supports are >enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >except the crc32c instruction. > >>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >>etc.) that >>these CPUs similarly declare support for but they are uselessly slow? > >No. > >Sincerely >Tonyw > >> >>- Eric Then the right thing to do is to disable the CPUID bit in the vendor-specific startup code.
On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote: >On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote: >>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >><ebiggers@kernel.org> wrote: >>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>>> >>>> On 15/12/2020 04:41, Eric Biggers wrote: >>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>>> >> On 12/12/2020 01:43, Eric Biggers wrote: >>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>>> >>>> The driver crc32c-intel match CPUs supporting >>>X86_FEATURE_XMM4_2. >>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, >>When >>>> >>>> crc32c-intel and crc32c-generic are both registered, system >>will >>>> >>>> use crc32c-intel because its .cra_priority is greater than >>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver >>>for >>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these >>>Zhaoxin >>>> >>>> CPUs support from crc32c-intel. >>>> >>>> >>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>>> >>> >>>> >>> Does this mean that the performance of the crc32c instruction >on >>>those CPUs is >>>> >>> actually slower than a regular C implementation? That's very >>>weird. >>>> >>> >>>> >> >>>> >> From the lmbench3 Create and Delete file test on those chips, I >>>think yes. >>>> >> >>>> > >>>> > Did you try measuring the performance of the hashing itself, and >>>not some >>>> > higher-level filesystem operations? >>>> > >>>> >>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >>>same >>>> input value the generic C implementation takes fewer time than the >>>> crc32c instruction implementation. >>>> >>> >>>And that is really "working as intended"? >> >>These CPU's crc32c instruction is not working as intended. >> >> Why do these CPUs even >>>declare that >>>they support the crc32c instruction, when it is so slow? >>> >> >>The presence of crc32c and some other instructions supports are >>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>except the crc32c instruction. >> >>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >>>etc.) that >>>these CPUs similarly declare support for but they are uselessly slow? >> >>No. >> >>Sincerely >>Tonyw >> >>> >>>- Eric > >Then the right thing to do is to disable the CPUID bit in the >vendor-specific startup code. This way makes these CPUs do not support all instruction sets enumerated by CPUID.01:ECX[SSE4.2]. While only crc32c instruction is slow, just expect the crc32c-intel driver do not match these CPUs. Sincerely Tonyw
On December 21, 2020 7:01:39 PM PST, tonywwang-oc@zhaoxin.com wrote: >On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote: >>On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote: >>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >>><ebiggers@kernel.org> wrote: >>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>>>> >>>>> On 15/12/2020 04:41, Eric Biggers wrote: >>>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>>>> >> On 12/12/2020 01:43, Eric Biggers wrote: >>>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc >wrote: >>>>> >>>> The driver crc32c-intel match CPUs supporting >>>>X86_FEATURE_XMM4_2. >>>>> >>>> On platforms with Zhaoxin CPUs supporting this X86 feature, >>>When >>>>> >>>> crc32c-intel and crc32c-generic are both registered, system >>>will >>>>> >>>> use crc32c-intel because its .cra_priority is greater than >>>>> >>>> crc32c-generic. This case expect to use crc32c-generic driver >>>>for >>>>> >>>> some Zhaoxin CPUs to get performance gain, So remove these >>>>Zhaoxin >>>>> >>>> CPUs support from crc32c-intel. >>>>> >>>> >>>>> >>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>>>> >>> >>>>> >>> Does this mean that the performance of the crc32c instruction >>on >>>>those CPUs is >>>>> >>> actually slower than a regular C implementation? That's very >>>>weird. >>>>> >>> >>>>> >> >>>>> >> From the lmbench3 Create and Delete file test on those chips, I >>>>think yes. >>>>> >> >>>>> > >>>>> > Did you try measuring the performance of the hashing itself, and >>>>not some >>>>> > higher-level filesystem operations? >>>>> > >>>>> >>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with >the >>>>same >>>>> input value the generic C implementation takes fewer time than the >>>>> crc32c instruction implementation. >>>>> >>>> >>>>And that is really "working as intended"? >>> >>>These CPU's crc32c instruction is not working as intended. >>> >>> Why do these CPUs even >>>>declare that >>>>they support the crc32c instruction, when it is so slow? >>>> >>> >>>The presence of crc32c and some other instructions supports are >>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>>except the crc32c instruction. >>> >>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, >AVX, >>>>etc.) that >>>>these CPUs similarly declare support for but they are uselessly >slow? >>> >>>No. >>> >>>Sincerely >>>Tonyw >>> >>>> >>>>- Eric >> >>Then the right thing to do is to disable the CPUID bit in the >>vendor-specific startup code. > >This way makes these CPUs do not support all instruction sets >enumerated >by CPUID.01:ECX[SSE4.2]. >While only crc32c instruction is slow, just expect the crc32c-intel >driver do not >match these CPUs. > >Sincerely >Tonyw Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over.
On 22/12/2020 12:54, hpa@zytor.com wrote: > On December 21, 2020 7:01:39 PM PST, tonywwang-oc@zhaoxin.com wrote: >> On December 22, 2020 3:27:33 AM GMT+08:00, hpa@zytor.com wrote: >>> On December 20, 2020 6:46:25 PM PST, tonywwang-oc@zhaoxin.com wrote: >>>> On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >>>> <ebiggers@kernel.org> wrote: >>>>> On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>>>>> >>>>>> On 15/12/2020 04:41, Eric Biggers wrote: >>>>>>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>>>>>>> On 12/12/2020 01:43, Eric Biggers wrote: >>>>>>>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc >> wrote: >>>>>>>>>> The driver crc32c-intel match CPUs supporting >>>>> X86_FEATURE_XMM4_2. >>>>>>>>>> On platforms with Zhaoxin CPUs supporting this X86 feature, >>>> When >>>>>>>>>> crc32c-intel and crc32c-generic are both registered, system >>>> will >>>>>>>>>> use crc32c-intel because its .cra_priority is greater than >>>>>>>>>> crc32c-generic. This case expect to use crc32c-generic driver >>>>> for >>>>>>>>>> some Zhaoxin CPUs to get performance gain, So remove these >>>>> Zhaoxin >>>>>>>>>> CPUs support from crc32c-intel. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> >>>>>>>>> >>>>>>>>> Does this mean that the performance of the crc32c instruction >>> on >>>>> those CPUs is >>>>>>>>> actually slower than a regular C implementation? That's very >>>>> weird. >>>>>>>>> >>>>>>>> >>>>>>>> From the lmbench3 Create and Delete file test on those chips, I >>>>> think yes. >>>>>>>> >>>>>>> >>>>>>> Did you try measuring the performance of the hashing itself, and >>>>> not some >>>>>>> higher-level filesystem operations? >>>>>>> >>>>>> >>>>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with >> the >>>>> same >>>>>> input value the generic C implementation takes fewer time than the >>>>>> crc32c instruction implementation. >>>>>> >>>>> >>>>> And that is really "working as intended"? >>>> >>>> These CPU's crc32c instruction is not working as intended. >>>> >>>> Why do these CPUs even >>>>> declare that >>>>> they support the crc32c instruction, when it is so slow? >>>>> >>>> >>>> The presence of crc32c and some other instructions supports are >>>> enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>>> except the crc32c instruction. >>>> >>>>> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, >> AVX, >>>>> etc.) that >>>>> these CPUs similarly declare support for but they are uselessly >> slow? >>>> >>>> No. >>>> >>>> Sincerely >>>> Tonyw >>>> >>>>> >>>>> - Eric >>> >>> Then the right thing to do is to disable the CPUID bit in the >>> vendor-specific startup code. >> >> This way makes these CPUs do not support all instruction sets >> enumerated >> by CPUID.01:ECX[SSE4.2]. >> While only crc32c instruction is slow, just expect the crc32c-intel >> driver do not >> match these CPUs. >> >> Sincerely >> Tonyw > > Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over. > Thanks for your suggestion. Have send new patch set. Sincerely Tonyw
diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb52..6dafdae 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { + struct cpuinfo_x86 *c = &boot_cpu_data; + if (!x86_match_cpu(crc32c_cpu_id)) return -ENODEV; + + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) { + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) + return -ENODEV; + } + #ifdef CONFIG_X86_64 if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { alg.update = crc32c_pcl_intel_update;
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, When crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. This case expect to use crc32c-generic driver for some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> --- arch/x86/crypto/crc32c-intel_glue.c | 8 ++++++++ 1 file changed, 8 insertions(+)