Message ID | 20240131115823.541317-1-mschmidt@redhat.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] ice: fix unaligned access in ice_create_lag_recipe | expand |
Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >new_rcp->recipe_bitmap was written to as if it were an aligned bitmap. >It is an 8-byte array, but aligned only to 4. >Use put_unaligned to set its value. > >Additionally, values in ice commands are typically in little-endian. >I assume the recipe bitmap should be too, so use the *_le64 conversion. >I don't have a big-endian system with ice to test this. > >I tested that the driver does not crash when probing on aarch64 anymore, >which is good enough for me. I don't know if the LAG feature actually >works. > >This is what the crash looked like without the fix: >[ 17.599009] Unable to handle kernel paging request at virtual address ffff07ff9c6dc004 >[ 17.599011] Mem abort info: >[ 17.599011] ESR = 0x0000000096000021 >[ 17.599012] EC = 0x25: DABT (current EL), IL = 32 bits >[ 17.599013] SET = 0, FnV = 0 >[ 17.599014] EA = 0, S1PTW = 0 >[ 17.599014] FSC = 0x21: alignment fault >[ 17.599015] Data abort info: >[ 17.599016] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 >[ 17.599016] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 >[ 17.599017] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >[ 17.599019] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080dd6bd0000 >[ 17.599020] [ffff07ff9c6dc004] pgd=1800083fffacf003, p4d=1800083fffacf003, pud=1800083ffface003, pmd=1800083fff9ea003, pte=006808001c6dcf07 >[ 17.599025] Internal error: Oops: 0000000096000021 [#1] SMP >[ 17.599027] Modules linked in: crct10dif_ce ghash_ce sha2_ce sha256_arm64 mlx5_core sha1_ce sbsa_gwdt ice(+) nvme nvme_core mlxfw igb tls nvme_common psample i2c_algo_bit gnss pci_hyperv_intf i2c_designware_platform i2c_designware_core xgene_hwmon dm_mirror dm_region_hash dm_log dm_mod >[ 17.599043] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.14.0-407.el9.aarch64 #1 >[ 17.599044] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F31L (SCP: 2.10.20220531) 09/29/2022 >[ 17.599046] Workqueue: events work_for_cpu_fn >[ 17.599051] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >[ 17.599053] pc : ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] >[ 17.599091] lr : ice_create_lag_recipe.constprop.0+0x54/0x11c [ice] >[ 17.599121] sp : ffff8000084a3c50 >[ 17.599122] x29: ffff8000084a3c50 x28: ffffabc4a6790f00 x27: ffffabc4a6200fa0 >[ 17.599124] x26: ffff07ff809e5c34 x25: ffff083e5f41980d x24: ffff07ff8610a0c0 >[ 17.599126] x23: 0000000000000000 x22: ffff07ff9fe894c0 x21: ffff07ffb771a460 >[ 17.599128] x20: ffff07ff9c6dc000 x19: 0000000000000000 x18: 0000000000000014 >[ 17.599130] x17: 00000000c3142fa2 x16: 000000007e77e163 x15: 0000000018c66856 >[ 17.599132] x14: 00000000b8afd426 x13: 000000007e8b3b19 x12: 000000004a34fdf7 >[ 17.599134] x11: 00000000a7cb2fcc x10: 00000000ffffff8a x9 : 0000000000000000 >[ 17.599136] x8 : 0000002000000005 x7 : 0000000000000001 x6 : ffffabc487a054d8 >[ 17.599138] x5 : ffff07ff9c6dc004 x4 : 000000000000000a x3 : 0000000000000000 >[ 17.599140] x2 : 0000000000000000 x1 : 0000000000000400 x0 : ffff07ff9c6dc004 >[ 17.599142] Call trace: >[ 17.599143] ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] >[ 17.599172] ice_init_lag+0xcc/0x22c [ice] >[ 17.599201] ice_init_features+0x160/0x2b4 [ice] >[ 17.599230] ice_probe+0x2d0/0x30c [ice] >[ 17.599258] local_pci_probe+0x58/0xb0 >[ 17.599262] work_for_cpu_fn+0x20/0x30 >[ 17.599264] process_one_work+0x1e4/0x4c0 >[ 17.599266] worker_thread+0x220/0x450 >[ 17.599268] kthread+0xe8/0xf4 >[ 17.599270] ret_from_fork+0x10/0x20 >[ 17.599273] Code: 380044a4 f800429f 8b000ca0 d503201f (f821301f) >[ 17.599274] ---[ end trace 168d79e2ecf9f7e3 ]--- >[ 17.599275] Kernel panic - not syncing: Oops: Fatal exception >[ 17.893321] SMP: stopping secondary CPUs >[ 17.897374] Kernel Offset: 0x2bc49c400000 from 0xffff800008000000 >[ 17.903453] PHYS_OFFSET: 0x80000000 >[ 17.906928] CPU features: 0x0,00000001,70028143,1041720b >[ 17.912226] Memory Limit: none >[ 17.915268] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- > >Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") >Signed-off-by: Michal Schmidt <mschmidt@redhat.com> >--- > drivers/net/ethernet/intel/ice/ice_lag.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > >diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c >index 2a25323105e5..d4848f6fe919 100644 >--- a/drivers/net/ethernet/intel/ice/ice_lag.c >+++ b/drivers/net/ethernet/intel/ice/ice_lag.c >@@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw *hw, u16 *rid, > new_rcp->content.act_ctrl_fwd_priority = prio; > new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; > new_rcp->recipe_indx = *rid; >- bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >- ICE_MAX_NUM_RECIPES); >- set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >+ put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); Looks like there might be another incorrect bitmap usage for this in ice_add_sw_recipe(). Care to fix it there as well? Otherwise, the patch looks fine. > > err = ice_aq_add_recipe(hw, new_rcp, 1, NULL); > if (err) >-- >2.43.0 > >
From: Jiri Pirko <jiri@resnulli.us> Date: Wed, 31 Jan 2024 13:17:44 +0100 > Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >> new_rcp->recipe_bitmap was written to as if it were an aligned bitmap. >> It is an 8-byte array, but aligned only to 4. >> Use put_unaligned to set its value. >> >> Additionally, values in ice commands are typically in little-endian. >> I assume the recipe bitmap should be too, so use the *_le64 conversion. >> I don't have a big-endian system with ice to test this. >> >> I tested that the driver does not crash when probing on aarch64 anymore, >> which is good enough for me. I don't know if the LAG feature actually >> works. >> >> This is what the crash looked like without the fix: >> [ 17.599009] Unable to handle kernel paging request at virtual address ffff07ff9c6dc004 >> [ 17.599011] Mem abort info: >> [ 17.599011] ESR = 0x0000000096000021 >> [ 17.599012] EC = 0x25: DABT (current EL), IL = 32 bits >> [ 17.599013] SET = 0, FnV = 0 >> [ 17.599014] EA = 0, S1PTW = 0 >> [ 17.599014] FSC = 0x21: alignment fault >> [ 17.599015] Data abort info: >> [ 17.599016] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 >> [ 17.599016] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 >> [ 17.599017] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 >> [ 17.599019] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080dd6bd0000 >> [ 17.599020] [ffff07ff9c6dc004] pgd=1800083fffacf003, p4d=1800083fffacf003, pud=1800083ffface003, pmd=1800083fff9ea003, pte=006808001c6dcf07 >> [ 17.599025] Internal error: Oops: 0000000096000021 [#1] SMP >> [ 17.599027] Modules linked in: crct10dif_ce ghash_ce sha2_ce sha256_arm64 mlx5_core sha1_ce sbsa_gwdt ice(+) nvme nvme_core mlxfw igb tls nvme_common psample i2c_algo_bit gnss pci_hyperv_intf i2c_designware_platform i2c_designware_core xgene_hwmon dm_mirror dm_region_hash dm_log dm_mod >> [ 17.599043] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.14.0-407.el9.aarch64 #1 >> [ 17.599044] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F31L (SCP: 2.10.20220531) 09/29/2022 >> [ 17.599046] Workqueue: events work_for_cpu_fn >> [ 17.599051] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) >> [ 17.599053] pc : ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] >> [ 17.599091] lr : ice_create_lag_recipe.constprop.0+0x54/0x11c [ice] >> [ 17.599121] sp : ffff8000084a3c50 >> [ 17.599122] x29: ffff8000084a3c50 x28: ffffabc4a6790f00 x27: ffffabc4a6200fa0 >> [ 17.599124] x26: ffff07ff809e5c34 x25: ffff083e5f41980d x24: ffff07ff8610a0c0 >> [ 17.599126] x23: 0000000000000000 x22: ffff07ff9fe894c0 x21: ffff07ffb771a460 >> [ 17.599128] x20: ffff07ff9c6dc000 x19: 0000000000000000 x18: 0000000000000014 >> [ 17.599130] x17: 00000000c3142fa2 x16: 000000007e77e163 x15: 0000000018c66856 >> [ 17.599132] x14: 00000000b8afd426 x13: 000000007e8b3b19 x12: 000000004a34fdf7 >> [ 17.599134] x11: 00000000a7cb2fcc x10: 00000000ffffff8a x9 : 0000000000000000 >> [ 17.599136] x8 : 0000002000000005 x7 : 0000000000000001 x6 : ffffabc487a054d8 >> [ 17.599138] x5 : ffff07ff9c6dc004 x4 : 000000000000000a x3 : 0000000000000000 >> [ 17.599140] x2 : 0000000000000000 x1 : 0000000000000400 x0 : ffff07ff9c6dc004 >> [ 17.599142] Call trace: >> [ 17.599143] ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] >> [ 17.599172] ice_init_lag+0xcc/0x22c [ice] >> [ 17.599201] ice_init_features+0x160/0x2b4 [ice] >> [ 17.599230] ice_probe+0x2d0/0x30c [ice] >> [ 17.599258] local_pci_probe+0x58/0xb0 >> [ 17.599262] work_for_cpu_fn+0x20/0x30 >> [ 17.599264] process_one_work+0x1e4/0x4c0 >> [ 17.599266] worker_thread+0x220/0x450 >> [ 17.599268] kthread+0xe8/0xf4 >> [ 17.599270] ret_from_fork+0x10/0x20 >> [ 17.599273] Code: 380044a4 f800429f 8b000ca0 d503201f (f821301f) >> [ 17.599274] ---[ end trace 168d79e2ecf9f7e3 ]--- >> [ 17.599275] Kernel panic - not syncing: Oops: Fatal exception >> [ 17.893321] SMP: stopping secondary CPUs >> [ 17.897374] Kernel Offset: 0x2bc49c400000 from 0xffff800008000000 >> [ 17.903453] PHYS_OFFSET: 0x80000000 >> [ 17.906928] CPU features: 0x0,00000001,70028143,1041720b >> [ 17.912226] Memory Limit: none >> [ 17.915268] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- >> >> Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") >> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> >> --- >> drivers/net/ethernet/intel/ice/ice_lag.c | 4 +--- >> 1 file changed, 1 insertion(+), 3 deletions(-) >> >> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c >> index 2a25323105e5..d4848f6fe919 100644 >> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw *hw, u16 *rid, >> new_rcp->content.act_ctrl_fwd_priority = prio; >> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >> new_rcp->recipe_indx = *rid; >> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >> - ICE_MAX_NUM_RECIPES); >> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); > > Looks like there might be another incorrect bitmap usage for this in > ice_add_sw_recipe(). Care to fix it there as well? Those are already fixed in one switchdev series and will be sent to IWL soon. I believe this patch would also make no sense after it's sent. > > Otherwise, the patch looks fine. > > >> >> err = ice_aq_add_recipe(hw, new_rcp, 1, NULL); >> if (err) >> -- >> 2.43.0 Thanks, Olek
On 1/31/24 17:59, Alexander Lobakin wrote: > From: Jiri Pirko <jiri@resnulli.us> > Date: Wed, 31 Jan 2024 13:17:44 +0100 > >> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c >>> index 2a25323105e5..d4848f6fe919 100644 >>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw *hw, u16 *rid, >>> new_rcp->content.act_ctrl_fwd_priority = prio; >>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>> new_rcp->recipe_indx = *rid; >>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>> - ICE_MAX_NUM_RECIPES); >>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >> >> Looks like there might be another incorrect bitmap usage for this in >> ice_add_sw_recipe(). Care to fix it there as well? > > Those are already fixed in one switchdev series and will be sent to IWL > soon. > I believe this patch would also make no sense after it's sent. Hi Alexander, When will the series be sent? The bug causes a kernel panic. Will the series target net.git? Michal
From: Michal Schmidt <mschmidt@redhat.com> Date: Thu, 1 Feb 2024 19:40:17 +0100 > On 1/31/24 17:59, Alexander Lobakin wrote: >> From: Jiri Pirko <jiri@resnulli.us> >> Date: Wed, 31 Jan 2024 13:17:44 +0100 >> >>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>> index 2a25323105e5..d4848f6fe919 100644 >>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw >>>> *hw, u16 *rid, >>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>> new_rcp->recipe_indx = *rid; >>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>> - ICE_MAX_NUM_RECIPES); >>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>> >>> Looks like there might be another incorrect bitmap usage for this in >>> ice_add_sw_recipe(). Care to fix it there as well? >> >> Those are already fixed in one switchdev series and will be sent to IWL >> soon. >> I believe this patch would also make no sense after it's sent. > > Hi Alexander, > When will the series be sent? > The bug causes a kernel panic. Will the series target net.git? The global fix is here: [0] It's targeting net-next. I don't know what the best way here would be. Target net instead or pick your fix to net and then fix it properly in net-next? > Michal Thanks, Olek
From: Alexander Lobakin <aleksander.lobakin@intel.com> Date: Fri, 2 Feb 2024 13:39:28 +0100 > From: Michal Schmidt <mschmidt@redhat.com> > Date: Thu, 1 Feb 2024 19:40:17 +0100 > >> On 1/31/24 17:59, Alexander Lobakin wrote: >>> From: Jiri Pirko <jiri@resnulli.us> >>> Date: Wed, 31 Jan 2024 13:17:44 +0100 >>> >>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>> index 2a25323105e5..d4848f6fe919 100644 >>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw >>>>> *hw, u16 *rid, >>>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>>> new_rcp->recipe_indx = *rid; >>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>>> - ICE_MAX_NUM_RECIPES); >>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>>> >>>> Looks like there might be another incorrect bitmap usage for this in >>>> ice_add_sw_recipe(). Care to fix it there as well? >>> >>> Those are already fixed in one switchdev series and will be sent to IWL >>> soon. >>> I believe this patch would also make no sense after it's sent. >> >> Hi Alexander, >> When will the series be sent? >> The bug causes a kernel panic. Will the series target net.git? > > The global fix is here: [0] > It's targeting net-next. > > I don't know what the best way here would be. Target net instead or pick > your fix to net and then fix it properly in net-next? Sorry, forgot to paste the link :clownface: [0] https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com Thanks, Olek
Fri, Feb 02, 2024 at 01:40:18PM CET, aleksander.lobakin@intel.com wrote: >From: Alexander Lobakin <aleksander.lobakin@intel.com> >Date: Fri, 2 Feb 2024 13:39:28 +0100 > >> From: Michal Schmidt <mschmidt@redhat.com> >> Date: Thu, 1 Feb 2024 19:40:17 +0100 >> >>> On 1/31/24 17:59, Alexander Lobakin wrote: >>>> From: Jiri Pirko <jiri@resnulli.us> >>>> Date: Wed, 31 Jan 2024 13:17:44 +0100 >>>> >>>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>> index 2a25323105e5..d4848f6fe919 100644 >>>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw >>>>>> *hw, u16 *rid, >>>>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>>>> new_rcp->recipe_indx = *rid; >>>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>>>> - ICE_MAX_NUM_RECIPES); >>>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>>>> >>>>> Looks like there might be another incorrect bitmap usage for this in >>>>> ice_add_sw_recipe(). Care to fix it there as well? >>>> >>>> Those are already fixed in one switchdev series and will be sent to IWL >>>> soon. >>>> I believe this patch would also make no sense after it's sent. >>> >>> Hi Alexander, >>> When will the series be sent? >>> The bug causes a kernel panic. Will the series target net.git? >> >> The global fix is here: [0] >> It's targeting net-next. >> >> I don't know what the best way here would be. Target net instead or pick >> your fix to net and then fix it properly in net-next? > >Sorry, forgot to paste the link :clownface: > >[0] >https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com Just put this into -net, no? I don't see why not. > >Thanks, >Olek
On Fri, Feb 02, 2024 at 01:40:18PM +0100, Alexander Lobakin wrote: > From: Alexander Lobakin <aleksander.lobakin@intel.com> > Date: Fri, 2 Feb 2024 13:39:28 +0100 > > > From: Michal Schmidt <mschmidt@redhat.com> > > Date: Thu, 1 Feb 2024 19:40:17 +0100 > > > >> On 1/31/24 17:59, Alexander Lobakin wrote: > >>> From: Jiri Pirko <jiri@resnulli.us> > >>> Date: Wed, 31 Jan 2024 13:17:44 +0100 > >>> > >>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: > >>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c > >>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c > >>>>> index 2a25323105e5..d4848f6fe919 100644 > >>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c > >>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c > >>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw > >>>>> *hw, u16 *rid, > >>>>> new_rcp->content.act_ctrl_fwd_priority = prio; > >>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; > >>>>> new_rcp->recipe_indx = *rid; > >>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, > >>>>> - ICE_MAX_NUM_RECIPES); > >>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); > >>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); > >>>> > >>>> Looks like there might be another incorrect bitmap usage for this in > >>>> ice_add_sw_recipe(). Care to fix it there as well? > >>> > >>> Those are already fixed in one switchdev series and will be sent to IWL > >>> soon. > >>> I believe this patch would also make no sense after it's sent. > >> > >> Hi Alexander, > >> When will the series be sent? > >> The bug causes a kernel panic. Will the series target net.git? > > > > The global fix is here: [0] > > It's targeting net-next. > > > > I don't know what the best way here would be. Target net instead or pick > > your fix to net and then fix it properly in net-next? > > Sorry, forgot to paste the link :clownface: IMHO 1/2 should go to net. Then you would have to wait for it to got accepted and get merged to -next and then you come back with 2/2. You know the deal. > > [0] > https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com > > Thanks, > Olek >
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Date: Fri, 2 Feb 2024 14:00:03 +0100 > On Fri, Feb 02, 2024 at 01:40:18PM +0100, Alexander Lobakin wrote: >> From: Alexander Lobakin <aleksander.lobakin@intel.com> >> Date: Fri, 2 Feb 2024 13:39:28 +0100 >> >>> From: Michal Schmidt <mschmidt@redhat.com> >>> Date: Thu, 1 Feb 2024 19:40:17 +0100 >>> >>>> On 1/31/24 17:59, Alexander Lobakin wrote: >>>>> From: Jiri Pirko <jiri@resnulli.us> >>>>> Date: Wed, 31 Jan 2024 13:17:44 +0100 >>>>> >>>>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>> index 2a25323105e5..d4848f6fe919 100644 >>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw >>>>>>> *hw, u16 *rid, >>>>>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>>>>> new_rcp->recipe_indx = *rid; >>>>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>>>>> - ICE_MAX_NUM_RECIPES); >>>>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>>>>> >>>>>> Looks like there might be another incorrect bitmap usage for this in >>>>>> ice_add_sw_recipe(). Care to fix it there as well? >>>>> >>>>> Those are already fixed in one switchdev series and will be sent to IWL >>>>> soon. >>>>> I believe this patch would also make no sense after it's sent. >>>> >>>> Hi Alexander, >>>> When will the series be sent? >>>> The bug causes a kernel panic. Will the series target net.git? >>> >>> The global fix is here: [0] >>> It's targeting net-next. >>> >>> I don't know what the best way here would be. Target net instead or pick >>> your fix to net and then fix it properly in net-next? >> >> Sorry, forgot to paste the link :clownface: > > IMHO 1/2 should go to net. Then you would have to wait for it to got > accepted and get merged to -next and then you come back with 2/2. You know > the deal. Agree! Hi Steve, Could you please send the first patch from your series to net instead of net-next? (and add "Fixes:" tag with the blamed commit) > >> >> [0] >> https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com >> >> Thanks, >> Olek Thanks, Olek
On 2/2/2024 9:01 PM, Alexander Lobakin wrote: > From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> > Date: Fri, 2 Feb 2024 14:00:03 +0100 > >> On Fri, Feb 02, 2024 at 01:40:18PM +0100, Alexander Lobakin wrote: >>> From: Alexander Lobakin <aleksander.lobakin@intel.com> >>> Date: Fri, 2 Feb 2024 13:39:28 +0100 >>> >>>> From: Michal Schmidt <mschmidt@redhat.com> >>>> Date: Thu, 1 Feb 2024 19:40:17 +0100 >>>> >>>>> On 1/31/24 17:59, Alexander Lobakin wrote: >>>>>> From: Jiri Pirko <jiri@resnulli.us> >>>>>> Date: Wed, 31 Jan 2024 13:17:44 +0100 >>>>>> >>>>>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>> index 2a25323105e5..d4848f6fe919 100644 >>>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw >>>>>>>> *hw, u16 *rid, >>>>>>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>>>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>>>>>> new_rcp->recipe_indx = *rid; >>>>>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>>>>>> - ICE_MAX_NUM_RECIPES); >>>>>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>>>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>>>>>> Looks like there might be another incorrect bitmap usage for this in >>>>>>> ice_add_sw_recipe(). Care to fix it there as well? >>>>>> Those are already fixed in one switchdev series and will be sent to IWL >>>>>> soon. >>>>>> I believe this patch would also make no sense after it's sent. >>>>> Hi Alexander, >>>>> When will the series be sent? >>>>> The bug causes a kernel panic. Will the series target net.git? >>>> The global fix is here: [0] >>>> It's targeting net-next. >>>> >>>> I don't know what the best way here would be. Target net instead or pick >>>> your fix to net and then fix it properly in net-next? >>> Sorry, forgot to paste the link :clownface: >> IMHO 1/2 should go to net. Then you would have to wait for it to got >> accepted and get merged to -next and then you come back with 2/2. You know >> the deal. > Agree! > > Hi Steve, > > Could you please send the first patch from your series to net instead of > net-next? > > (and add "Fixes:" tag with the blamed commit) Hi Olek, Sure, I will do it soon. > >>> [0] >>> https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com >>> >>> Thanks, >>> Olek > Thanks, > Olek
On 2/7/2024 8:44 AM, Zou, Steven wrote: > > On 2/2/2024 9:01 PM, Alexander Lobakin wrote: >> From: Maciej Fijalkowski <maciej.fijalkowski@intel.com> >> Date: Fri, 2 Feb 2024 14:00:03 +0100 >> >>> On Fri, Feb 02, 2024 at 01:40:18PM +0100, Alexander Lobakin wrote: >>>> From: Alexander Lobakin <aleksander.lobakin@intel.com> >>>> Date: Fri, 2 Feb 2024 13:39:28 +0100 >>>> >>>>> From: Michal Schmidt <mschmidt@redhat.com> >>>>> Date: Thu, 1 Feb 2024 19:40:17 +0100 >>>>> >>>>>> On 1/31/24 17:59, Alexander Lobakin wrote: >>>>>>> From: Jiri Pirko <jiri@resnulli.us> >>>>>>> Date: Wed, 31 Jan 2024 13:17:44 +0100 >>>>>>> >>>>>>>> Wed, Jan 31, 2024 at 12:58:23PM CET, mschmidt@redhat.com wrote: >>>>>>>>> diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>>> b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>>> index 2a25323105e5..d4848f6fe919 100644 >>>>>>>>> --- a/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>>> +++ b/drivers/net/ethernet/intel/ice/ice_lag.c >>>>>>>>> @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct >>>>>>>>> ice_hw >>>>>>>>> *hw, u16 *rid, >>>>>>>>> new_rcp->content.act_ctrl_fwd_priority = prio; >>>>>>>>> new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; >>>>>>>>> new_rcp->recipe_indx = *rid; >>>>>>>>> - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, >>>>>>>>> - ICE_MAX_NUM_RECIPES); >>>>>>>>> - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); >>>>>>>>> + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); >>>>>>>> Looks like there might be another incorrect bitmap usage for >>>>>>>> this in >>>>>>>> ice_add_sw_recipe(). Care to fix it there as well? >>>>>>> Those are already fixed in one switchdev series and will be sent >>>>>>> to IWL >>>>>>> soon. >>>>>>> I believe this patch would also make no sense after it's sent. >>>>>> Hi Alexander, >>>>>> When will the series be sent? >>>>>> The bug causes a kernel panic. Will the series target net.git? >>>>> The global fix is here: [0] >>>>> It's targeting net-next. >>>>> >>>>> I don't know what the best way here would be. Target net instead >>>>> or pick >>>>> your fix to net and then fix it properly in net-next? >>>> Sorry, forgot to paste the link :clownface: >>> IMHO 1/2 should go to net. Then you would have to wait for it to got >>> accepted and get merged to -next and then you come back with 2/2. >>> You know >>> the deal. >> Agree! >> >> Hi Steve, >> >> Could you please send the first patch from your series to net instead of >> net-next? >> >> (and add "Fixes:" tag with the blamed commit) > > Hi Olek, > Sure, I will do it soon. Hi team, The patch has been rebased and submitted on: https://lore.kernel.org/intel-wired-lan/20240207014959.24012-1-steven.zou@intel.com/ https://patchwork.ozlabs.org/project/intel-wired-lan/patch/20240207014959.24012-1-steven.zou@intel.com/ Thanks, Steven > >> >>>> [0] >>>> https://lore.kernel.org/intel-wired-lan/20240130025146.30265-2-steven.zou@intel.com >>>> >>>> >>>> Thanks, >>>> Olek >> Thanks, >> Olek
diff --git a/drivers/net/ethernet/intel/ice/ice_lag.c b/drivers/net/ethernet/intel/ice/ice_lag.c index 2a25323105e5..d4848f6fe919 100644 --- a/drivers/net/ethernet/intel/ice/ice_lag.c +++ b/drivers/net/ethernet/intel/ice/ice_lag.c @@ -1829,9 +1829,7 @@ static int ice_create_lag_recipe(struct ice_hw *hw, u16 *rid, new_rcp->content.act_ctrl_fwd_priority = prio; new_rcp->content.rid = *rid | ICE_AQ_RECIPE_ID_IS_ROOT; new_rcp->recipe_indx = *rid; - bitmap_zero((unsigned long *)new_rcp->recipe_bitmap, - ICE_MAX_NUM_RECIPES); - set_bit(*rid, (unsigned long *)new_rcp->recipe_bitmap); + put_unaligned_le64(BIT_ULL(*rid), new_rcp->recipe_bitmap); err = ice_aq_add_recipe(hw, new_rcp, 1, NULL); if (err)
new_rcp->recipe_bitmap was written to as if it were an aligned bitmap. It is an 8-byte array, but aligned only to 4. Use put_unaligned to set its value. Additionally, values in ice commands are typically in little-endian. I assume the recipe bitmap should be too, so use the *_le64 conversion. I don't have a big-endian system with ice to test this. I tested that the driver does not crash when probing on aarch64 anymore, which is good enough for me. I don't know if the LAG feature actually works. This is what the crash looked like without the fix: [ 17.599009] Unable to handle kernel paging request at virtual address ffff07ff9c6dc004 [ 17.599011] Mem abort info: [ 17.599011] ESR = 0x0000000096000021 [ 17.599012] EC = 0x25: DABT (current EL), IL = 32 bits [ 17.599013] SET = 0, FnV = 0 [ 17.599014] EA = 0, S1PTW = 0 [ 17.599014] FSC = 0x21: alignment fault [ 17.599015] Data abort info: [ 17.599016] ISV = 0, ISS = 0x00000021, ISS2 = 0x00000000 [ 17.599016] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 17.599017] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 17.599019] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000080dd6bd0000 [ 17.599020] [ffff07ff9c6dc004] pgd=1800083fffacf003, p4d=1800083fffacf003, pud=1800083ffface003, pmd=1800083fff9ea003, pte=006808001c6dcf07 [ 17.599025] Internal error: Oops: 0000000096000021 [#1] SMP [ 17.599027] Modules linked in: crct10dif_ce ghash_ce sha2_ce sha256_arm64 mlx5_core sha1_ce sbsa_gwdt ice(+) nvme nvme_core mlxfw igb tls nvme_common psample i2c_algo_bit gnss pci_hyperv_intf i2c_designware_platform i2c_designware_core xgene_hwmon dm_mirror dm_region_hash dm_log dm_mod [ 17.599043] CPU: 0 PID: 18 Comm: kworker/0:1 Not tainted 5.14.0-407.el9.aarch64 #1 [ 17.599044] Hardware name: GIGABYTE R272-P31-00/MP32-AR1-00, BIOS F31L (SCP: 2.10.20220531) 09/29/2022 [ 17.599046] Workqueue: events work_for_cpu_fn [ 17.599051] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 17.599053] pc : ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] [ 17.599091] lr : ice_create_lag_recipe.constprop.0+0x54/0x11c [ice] [ 17.599121] sp : ffff8000084a3c50 [ 17.599122] x29: ffff8000084a3c50 x28: ffffabc4a6790f00 x27: ffffabc4a6200fa0 [ 17.599124] x26: ffff07ff809e5c34 x25: ffff083e5f41980d x24: ffff07ff8610a0c0 [ 17.599126] x23: 0000000000000000 x22: ffff07ff9fe894c0 x21: ffff07ffb771a460 [ 17.599128] x20: ffff07ff9c6dc000 x19: 0000000000000000 x18: 0000000000000014 [ 17.599130] x17: 00000000c3142fa2 x16: 000000007e77e163 x15: 0000000018c66856 [ 17.599132] x14: 00000000b8afd426 x13: 000000007e8b3b19 x12: 000000004a34fdf7 [ 17.599134] x11: 00000000a7cb2fcc x10: 00000000ffffff8a x9 : 0000000000000000 [ 17.599136] x8 : 0000002000000005 x7 : 0000000000000001 x6 : ffffabc487a054d8 [ 17.599138] x5 : ffff07ff9c6dc004 x4 : 000000000000000a x3 : 0000000000000000 [ 17.599140] x2 : 0000000000000000 x1 : 0000000000000400 x0 : ffff07ff9c6dc004 [ 17.599142] Call trace: [ 17.599143] ice_create_lag_recipe.constprop.0+0xbc/0x11c [ice] [ 17.599172] ice_init_lag+0xcc/0x22c [ice] [ 17.599201] ice_init_features+0x160/0x2b4 [ice] [ 17.599230] ice_probe+0x2d0/0x30c [ice] [ 17.599258] local_pci_probe+0x58/0xb0 [ 17.599262] work_for_cpu_fn+0x20/0x30 [ 17.599264] process_one_work+0x1e4/0x4c0 [ 17.599266] worker_thread+0x220/0x450 [ 17.599268] kthread+0xe8/0xf4 [ 17.599270] ret_from_fork+0x10/0x20 [ 17.599273] Code: 380044a4 f800429f 8b000ca0 d503201f (f821301f) [ 17.599274] ---[ end trace 168d79e2ecf9f7e3 ]--- [ 17.599275] Kernel panic - not syncing: Oops: Fatal exception [ 17.893321] SMP: stopping secondary CPUs [ 17.897374] Kernel Offset: 0x2bc49c400000 from 0xffff800008000000 [ 17.903453] PHYS_OFFSET: 0x80000000 [ 17.906928] CPU features: 0x0,00000001,70028143,1041720b [ 17.912226] Memory Limit: none [ 17.915268] ---[ end Kernel panic - not syncing: Oops: Fatal exception ]--- Fixes: 1e0f9881ef79 ("ice: Flesh out implementation of support for SRIOV on bonded interface") Signed-off-by: Michal Schmidt <mschmidt@redhat.com> --- drivers/net/ethernet/intel/ice/ice_lag.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)