mbox series

[0/3] crypto: x86/chacha20 - AVX-512VL block functions

Message ID 20181120163050.22251-1-martin@strongswan.org (mailing list archive)
Headers show
Series crypto: x86/chacha20 - AVX-512VL block functions | expand

Message

Martin Willi Nov. 20, 2018, 4:30 p.m. UTC
In the quest for pushing the limits of chacha20 encryption for both IPsec
and Wireguard, this small series adds AVX-512VL block functions. The VL
variant works on 256-bit ymm registers, but compared to AVX2 can benefit
from the new instructions.

Compared to the AVX2 version, these block functions bring an overall
speed improvement across encryption lengths of ~20%. Below the tcrypt
results for additional block sizes in kOps/s, for the current AVX2
code path, the new AVX-512VL code path and the comparison to Zinc in
AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).

These numbers result in a very nice chart, available at:
  https://download.strongswan.org/misc/chacha-avx-512vl.svg

                     zinc   zinc
 len   avx2  512vl   avx2  512vl
   8   5719   5672   5468   5612
  16   5675   5627   5355   5621
  24   5687   5601   5322   5633
  32   5667   5622   5244   5564
  40   5603   5582   5337   5578
  48   5638   5539   5400   5556
  56   5624   5566   5375   5482
  64   5590   5573   5352   5531
  72   4841   5467   3365   3457
  80   5316   5761   3310   3381
  88   4798   5470   3239   3343
  96   5324   5723   3197   3281
 104   4819   5460   3155   3232
 112   5266   5749   3020   3195
 120   4776   5391   2959   3145
 128   5291   5723   3398   3489
 136   4122   4837   3321   3423
 144   4507   5057   3247   3389
 152   4139   4815   3233   3329
 160   4482   5043   3159   3256
 168   4142   4766   3131   3224
 176   4506   5028   3073   3162
 184   4119   4772   3010   3109
 192   4499   5016   3402   3502
 200   4127   4766   3329   3448
 208   4452   5012   3276   3371
 216   4128   4744   3243   3334
 224   4484   5008   3203   3298
 232   4103   4772   3141   3237
 240   4458   4963   3115   3217
 248   4121   4751   3085   3177
 256   4461   4987   3364   4046
 264   3406   4282   3270   4006
 272   3408   4287   3207   3961
 280   3371   4271   3203   3825
 288   3625   4301   3129   3751
 296   3402   4283   3093   3688
 304   3401   4247   3062   3637
 312   3382   4282   2995   3614
 320   3611   4279   3305   4070
 328   3386   4260   3276   3968
 336   3369   4288   3171   3929
 344   3389   4289   3134   3847
 352   3609   4266   3127   3720
 360   3355   4252   3076   3692
 368   3387   4264   3048   3650
 376   3387   4238   2967   3553
 384   3568   4265   3277   4035
 392   3369   4262   3299   3973
 400   3362   4235   3239   3899
 408   3352   4269   3196   3843
 416   3585   4243   3127   3736
 424   3364   4216   3092   3672
 432   3341   4246   3067   3628
 440   3353   4235   3018   3593
 448   3538   4245   3327   4035
 456   3322   4244   3275   3900
 464   3340   4237   3212   3880
 472   3330   4242   3054   3802
 480   3530   4234   3078   3707
 488   3337   4228   3094   3664
 496   3330   4223   3015   3591
 504   3317   4214   3002   3517
 512   3531   4197   3339   4016
 520   2511   3101   2030   2682
 528   2627   3087   2027   2641
 536   2508   3102   2001   2601
 544   2638   3090   1964   2564
 552   2494   3077   1962   2516
 560   2625   3064   1941   2515
 568   2500   3086   1922   2493
 576   2611   3074   2050   2689
 584   2482   3062   2041   2680
 592   2595   3074   2026   2644
 600   2470   3060   1985   2595
 608   2581   3039   1961   2555
 616   2478   3062   1956   2521
 624   2587   3066   1930   2493
 632   2457   3053   1923   2486
 640   2581   3050   2059   2712
 648   2296   2839   2024   2655
 656   2389   2845   2019   2642
 664   2292   2842   2002   2610
 672   2404   2838   1959   2537
 680   2273   2827   1956   2527
 688   2389   2840   1938   2510
 696   2280   2837   1911   2463
 704   2370   2819   2055   2702
 712   2277   2834   2029   2663
 720   2369   2829   2020   2625
 728   2255   2820   2001   2600
 736   2373   2819   1958   2543
 744   2269   2827   1956   2524
 752   2364   2817   1937   2492
 760   2270   2805   1909   2483
 768   2378   2820   2050   2696
 776   2053   2700   2002   2643
 784   2066   2693   1922   2640
 792   2065   2703   1928   2602
 800   2138   2706   1962   2535
 808   2065   2679   1938   2528
 816   2063   2699   1929   2500
 824   2053   2676   1915   2468
 832   2149   2692   2036   2693
 840   2055   2689   2024   2659
 848   2049   2689   2006   2610
 856   2057   2702   1979   2585
 864   2144   2703   1960   2547
 872   2047   2685   1945   2501
 880   2055   2683   1902   2497
 888   2060   2689   1897   2478
 896   2139   2693   2023   2663
 904   2049   2686   1970   2644
 912   2055   2688   1925   2621
 920   2047   2685   1911   2572
 928   2114   2695   1907   2545
 936   2055   2681   1927   2492
 944   2055   2693   1930   2478
 952   2042   2688   1909   2471
 960   2136   2682   2014   2672
 968   2054   2687   1999   2626
 976   2040   2682   1982   2598
 984   2055   2687   1943   2569
 992   2138   2694   1884   2522
1000   2036   2681   1929   2506
1008   2052   2676   1926   2475
1016   2050   2686   1889   2430
1024   2125   2670   2039   2656
1032   1717   2175   1470   1995
1040   1768   2186   1456   1983
1048   1704   2185   1451   1950
1056   1770   2176   1410   1927
1064   1710   2178   1418   1918
1072   1753   2168   1394   1892
1080   1696   2170   1400   1892
1088   1761   2174   1472   2014
1096   1681   2158   1464   1968
1104   1746   2172   1457   1978
1112   1689   2167   1445   1955
1120   1738   2160   1431   1919
1128   1689   2155   1428   1915
1136   1747   2169   1415   1899
1144   1678   2161   1403   1881
1152   1749   2159   1474   2007
1160   1601   2050   1470   1991
1168   1648   2057   1461   1969
1176   1605   2043   1439   1948
1184   1654   2057   1428   1926
1192   1595   2051   1427   1899
1200   1647   2036   1419   1902
1208   1598   2048   1402   1888
1216   1643   2053   1471   1991
1224   1595   2043   1469   1987
1232   1649   2048   1456   1971
1240   1599   2040   1436   1939
1248   1644   2042   1433   1918
1256   1602   2045   1424   1900
1264   1648   2048   1413   1878
1272   1591   2034   1401   1878
1280   1649   2044   1475   2002
1288   1493   1984   1461   1972
1296   1484   1971   1438   1962
1304   1490   1985   1443   1947
1312   1535   1987   1425   1913
1320   1481   1965   1410   1901
1328   1493   1984   1407   1900
1336   1493   1979   1396   1882
1344   1526   1980   1465   1988
1352   1492   1970   1463   1983
1360   1487   1974   1452   1966
1368   1481   1977   1439   1937
1376   1535   1970   1428   1915
1384   1489   1973   1417   1905
1392   1483   1974   1415   1881
1400   1485   1963   1403   1882
1408   1523   1976   1466   1988
1416   1477   1969   1459   1964
1424   1487   1975   1455   1966
1432   1488   1972   1438   1941
1440   1518   1958   1432   1908
1448   1484   1972   1421   1905
1456   1485   1973   1398   1888
1464   1476   1962   1399   1870
1472   1530   1975   1471   1998
1480   1478   1967   1452   1979
1488   1478   1963   1453   1947
1496   1477   1963   1438   1930


Martin Willi (3):
  crypto: x86/chacha20 - Add a 8-block AVX-512VL variant
  crypto: x86/chacha20 - Add a 2-block AVX-512VL variant
  crypto: x86/chacha20 - Add a 4-block AVX-512VL variant

 arch/x86/crypto/Makefile                   |   5 +
 arch/x86/crypto/chacha20-avx512vl-x86_64.S | 839 +++++++++++++++++++++
 arch/x86/crypto/chacha20_glue.c            |  40 +
 3 files changed, 884 insertions(+)
 create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S

Comments

Herbert Xu Nov. 29, 2018, 10:13 a.m. UTC | #1
On Tue, Nov 20, 2018 at 05:30:47PM +0100, Martin Willi wrote:
> In the quest for pushing the limits of chacha20 encryption for both IPsec
> and Wireguard, this small series adds AVX-512VL block functions. The VL
> variant works on 256-bit ymm registers, but compared to AVX2 can benefit
> from the new instructions.
> 
> Compared to the AVX2 version, these block functions bring an overall
> speed improvement across encryption lengths of ~20%. Below the tcrypt
> results for additional block sizes in kOps/s, for the current AVX2
> code path, the new AVX-512VL code path and the comparison to Zinc in
> AVX2 and AVX-512VL. All numbers from a Xeon Platinum 8168 (2.7GHz).
> 
> These numbers result in a very nice chart, available at:
>   https://download.strongswan.org/misc/chacha-avx-512vl.svg
> 
>                      zinc   zinc
>  len   avx2  512vl   avx2  512vl
>    8   5719   5672   5468   5612
>   16   5675   5627   5355   5621
>   24   5687   5601   5322   5633
>   32   5667   5622   5244   5564
>   40   5603   5582   5337   5578
>   48   5638   5539   5400   5556
>   56   5624   5566   5375   5482
>   64   5590   5573   5352   5531
>   72   4841   5467   3365   3457
>   80   5316   5761   3310   3381
>   88   4798   5470   3239   3343
>   96   5324   5723   3197   3281
>  104   4819   5460   3155   3232
>  112   5266   5749   3020   3195
>  120   4776   5391   2959   3145
>  128   5291   5723   3398   3489
>  136   4122   4837   3321   3423
>  144   4507   5057   3247   3389
>  152   4139   4815   3233   3329
>  160   4482   5043   3159   3256
>  168   4142   4766   3131   3224
>  176   4506   5028   3073   3162
>  184   4119   4772   3010   3109
>  192   4499   5016   3402   3502
>  200   4127   4766   3329   3448
>  208   4452   5012   3276   3371
>  216   4128   4744   3243   3334
>  224   4484   5008   3203   3298
>  232   4103   4772   3141   3237
>  240   4458   4963   3115   3217
>  248   4121   4751   3085   3177
>  256   4461   4987   3364   4046
>  264   3406   4282   3270   4006
>  272   3408   4287   3207   3961
>  280   3371   4271   3203   3825
>  288   3625   4301   3129   3751
>  296   3402   4283   3093   3688
>  304   3401   4247   3062   3637
>  312   3382   4282   2995   3614
>  320   3611   4279   3305   4070
>  328   3386   4260   3276   3968
>  336   3369   4288   3171   3929
>  344   3389   4289   3134   3847
>  352   3609   4266   3127   3720
>  360   3355   4252   3076   3692
>  368   3387   4264   3048   3650
>  376   3387   4238   2967   3553
>  384   3568   4265   3277   4035
>  392   3369   4262   3299   3973
>  400   3362   4235   3239   3899
>  408   3352   4269   3196   3843
>  416   3585   4243   3127   3736
>  424   3364   4216   3092   3672
>  432   3341   4246   3067   3628
>  440   3353   4235   3018   3593
>  448   3538   4245   3327   4035
>  456   3322   4244   3275   3900
>  464   3340   4237   3212   3880
>  472   3330   4242   3054   3802
>  480   3530   4234   3078   3707
>  488   3337   4228   3094   3664
>  496   3330   4223   3015   3591
>  504   3317   4214   3002   3517
>  512   3531   4197   3339   4016
>  520   2511   3101   2030   2682
>  528   2627   3087   2027   2641
>  536   2508   3102   2001   2601
>  544   2638   3090   1964   2564
>  552   2494   3077   1962   2516
>  560   2625   3064   1941   2515
>  568   2500   3086   1922   2493
>  576   2611   3074   2050   2689
>  584   2482   3062   2041   2680
>  592   2595   3074   2026   2644
>  600   2470   3060   1985   2595
>  608   2581   3039   1961   2555
>  616   2478   3062   1956   2521
>  624   2587   3066   1930   2493
>  632   2457   3053   1923   2486
>  640   2581   3050   2059   2712
>  648   2296   2839   2024   2655
>  656   2389   2845   2019   2642
>  664   2292   2842   2002   2610
>  672   2404   2838   1959   2537
>  680   2273   2827   1956   2527
>  688   2389   2840   1938   2510
>  696   2280   2837   1911   2463
>  704   2370   2819   2055   2702
>  712   2277   2834   2029   2663
>  720   2369   2829   2020   2625
>  728   2255   2820   2001   2600
>  736   2373   2819   1958   2543
>  744   2269   2827   1956   2524
>  752   2364   2817   1937   2492
>  760   2270   2805   1909   2483
>  768   2378   2820   2050   2696
>  776   2053   2700   2002   2643
>  784   2066   2693   1922   2640
>  792   2065   2703   1928   2602
>  800   2138   2706   1962   2535
>  808   2065   2679   1938   2528
>  816   2063   2699   1929   2500
>  824   2053   2676   1915   2468
>  832   2149   2692   2036   2693
>  840   2055   2689   2024   2659
>  848   2049   2689   2006   2610
>  856   2057   2702   1979   2585
>  864   2144   2703   1960   2547
>  872   2047   2685   1945   2501
>  880   2055   2683   1902   2497
>  888   2060   2689   1897   2478
>  896   2139   2693   2023   2663
>  904   2049   2686   1970   2644
>  912   2055   2688   1925   2621
>  920   2047   2685   1911   2572
>  928   2114   2695   1907   2545
>  936   2055   2681   1927   2492
>  944   2055   2693   1930   2478
>  952   2042   2688   1909   2471
>  960   2136   2682   2014   2672
>  968   2054   2687   1999   2626
>  976   2040   2682   1982   2598
>  984   2055   2687   1943   2569
>  992   2138   2694   1884   2522
> 1000   2036   2681   1929   2506
> 1008   2052   2676   1926   2475
> 1016   2050   2686   1889   2430
> 1024   2125   2670   2039   2656
> 1032   1717   2175   1470   1995
> 1040   1768   2186   1456   1983
> 1048   1704   2185   1451   1950
> 1056   1770   2176   1410   1927
> 1064   1710   2178   1418   1918
> 1072   1753   2168   1394   1892
> 1080   1696   2170   1400   1892
> 1088   1761   2174   1472   2014
> 1096   1681   2158   1464   1968
> 1104   1746   2172   1457   1978
> 1112   1689   2167   1445   1955
> 1120   1738   2160   1431   1919
> 1128   1689   2155   1428   1915
> 1136   1747   2169   1415   1899
> 1144   1678   2161   1403   1881
> 1152   1749   2159   1474   2007
> 1160   1601   2050   1470   1991
> 1168   1648   2057   1461   1969
> 1176   1605   2043   1439   1948
> 1184   1654   2057   1428   1926
> 1192   1595   2051   1427   1899
> 1200   1647   2036   1419   1902
> 1208   1598   2048   1402   1888
> 1216   1643   2053   1471   1991
> 1224   1595   2043   1469   1987
> 1232   1649   2048   1456   1971
> 1240   1599   2040   1436   1939
> 1248   1644   2042   1433   1918
> 1256   1602   2045   1424   1900
> 1264   1648   2048   1413   1878
> 1272   1591   2034   1401   1878
> 1280   1649   2044   1475   2002
> 1288   1493   1984   1461   1972
> 1296   1484   1971   1438   1962
> 1304   1490   1985   1443   1947
> 1312   1535   1987   1425   1913
> 1320   1481   1965   1410   1901
> 1328   1493   1984   1407   1900
> 1336   1493   1979   1396   1882
> 1344   1526   1980   1465   1988
> 1352   1492   1970   1463   1983
> 1360   1487   1974   1452   1966
> 1368   1481   1977   1439   1937
> 1376   1535   1970   1428   1915
> 1384   1489   1973   1417   1905
> 1392   1483   1974   1415   1881
> 1400   1485   1963   1403   1882
> 1408   1523   1976   1466   1988
> 1416   1477   1969   1459   1964
> 1424   1487   1975   1455   1966
> 1432   1488   1972   1438   1941
> 1440   1518   1958   1432   1908
> 1448   1484   1972   1421   1905
> 1456   1485   1973   1398   1888
> 1464   1476   1962   1399   1870
> 1472   1530   1975   1471   1998
> 1480   1478   1967   1452   1979
> 1488   1478   1963   1453   1947
> 1496   1477   1963   1438   1930
> 
> 
> Martin Willi (3):
>   crypto: x86/chacha20 - Add a 8-block AVX-512VL variant
>   crypto: x86/chacha20 - Add a 2-block AVX-512VL variant
>   crypto: x86/chacha20 - Add a 4-block AVX-512VL variant
> 
>  arch/x86/crypto/Makefile                   |   5 +
>  arch/x86/crypto/chacha20-avx512vl-x86_64.S | 839 +++++++++++++++++++++
>  arch/x86/crypto/chacha20_glue.c            |  40 +
>  3 files changed, 884 insertions(+)
>  create mode 100644 arch/x86/crypto/chacha20-avx512vl-x86_64.S

All applied.  Thanks.