Message ID | 20230803202010.37149-1-nnac123@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,1/5] ibmvnic: Enforce stronger sanity checks on login response | expand |
On Thu, Aug 03, 2023 at 03:20:06PM -0500, Nick Child wrote: > Ensure that all offsets in a login response buffer are within the size > of the allocated response buffer. Any offsets or lengths that surpass > the allocation are likely the result of an incomplete response buffer. > In these cases, a full reset is necessary. > > When attempting to login, the ibmvnic device will allocate a response > buffer and pass a reference to the VIOS. The VIOS will then send the > ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled > with data. If the ibmvnic device does not get a response in 20 seconds, > the old buffer is freed and a new login request is sent. With 2 > outstanding requests, any LOGIN_RSP CRQ's could be for the older > login request. If this is the case then the login response buffer (which > is for the newer login request) could be incomplete and contain invalid > data. Therefore, we must enforce strict sanity checks on the response > buffer values. > > Testing has shown that the `off_rxadd_buff_size` value is filled in last > by the VIOS and will be the smoking gun for these circumstances. > > Until VIOS can implement a mechanism for tracking outstanding response > buffers and a method for mapping a LOGIN_RSP CRQ to a particular login > response buffer, the best ibmvnic can do in this situation is perform a > full reset. > > Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") > Signed-off-by: Nick Child <nnac123@linux.ibm.com> Reviewed-by: Simon Horman <horms@kernel.org> > --- > > Hello! > This patchset is all relevant to recent bugs which came up regarding > the ibmvnic login process. Specifically, when this process times out. > > ibmvnic devices are virtual devices which need to "login" to a physical > NIC at the end of its initialization process. This invloves sending a > command to the VIOS (virtual input output server, essentially the server > that this client is logging into) requesting it to fill out a DMA mapped > repsonse buffer. Once done, the VIOS sends a response informing the > client that the buffer has been filled with data. > > If the VIOS does not send a response in 20 seconds then the client tries > again. If this happens then several bugs can occur. This is usually due > to the fact that there are more than one outstanding requests and no > mechanism for mapping a response CRQ to a given response buffer. Until > that mechanism is created, this patchset aims to harden this timeout > recovery process so that the device does not get stuck in an inopperable > state. This sort of information really belongs in a cover letter for the patchset. And in any case, it's nice to have a cover letter if there is more than one patch in the series. > drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c > index 763d613adbcc..996f8037c266 100644 > --- a/drivers/net/ethernet/ibm/ibmvnic.c > +++ b/drivers/net/ethernet/ibm/ibmvnic.c > @@ -5397,6 +5397,7 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq, > int num_rx_pools; > u64 *size_array; > int i; > + u32 rsp_len; nit: It's preferred in Networking code to arrange local variables in reverse xmas tree order - longest line to shortest. I know this file doesn't follow that very closely. But still, it would be slightly nicer, at least in this case. ...
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 763d613adbcc..996f8037c266 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -5397,6 +5397,7 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq, int num_rx_pools; u64 *size_array; int i; + u32 rsp_len; /* CHECK: Test/set of login_pending does not need to be atomic * because only ibmvnic_tasklet tests/clears this. @@ -5447,6 +5448,23 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq, ibmvnic_reset(adapter, VNIC_RESET_FATAL); return -EIO; } + + rsp_len = be32_to_cpu(login_rsp->len); + if (be32_to_cpu(login->login_rsp_len) < rsp_len || + rsp_len <= be32_to_cpu(login_rsp->off_txsubm_subcrqs) || + rsp_len <= be32_to_cpu(login_rsp->off_rxadd_subcrqs) || + rsp_len <= be32_to_cpu(login_rsp->off_rxadd_buff_size) || + rsp_len <= be32_to_cpu(login_rsp->off_supp_tx_desc)) { + /* This can happen if a login request times out and there are + * 2 outstanding login requests sent, the LOGIN_RSP crq + * could have been for the older login request. So we are + * parsing the newer response buffer which may be incomplete + */ + dev_err(dev, "FATAL: Login rsp offsets/lengths invalid\n"); + ibmvnic_reset(adapter, VNIC_RESET_FATAL); + return -EIO; + } + size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) + be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size)); /* variable buffer sizes are not supported, so just read the
Ensure that all offsets in a login response buffer are within the size of the allocated response buffer. Any offsets or lengths that surpass the allocation are likely the result of an incomplete response buffer. In these cases, a full reset is necessary. When attempting to login, the ibmvnic device will allocate a response buffer and pass a reference to the VIOS. The VIOS will then send the ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled with data. If the ibmvnic device does not get a response in 20 seconds, the old buffer is freed and a new login request is sent. With 2 outstanding requests, any LOGIN_RSP CRQ's could be for the older login request. If this is the case then the login response buffer (which is for the newer login request) could be incomplete and contain invalid data. Therefore, we must enforce strict sanity checks on the response buffer values. Testing has shown that the `off_rxadd_buff_size` value is filled in last by the VIOS and will be the smoking gun for these circumstances. Until VIOS can implement a mechanism for tracking outstanding response buffers and a method for mapping a LOGIN_RSP CRQ to a particular login response buffer, the best ibmvnic can do in this situation is perform a full reset. Fixes: dff515a3e71d ("ibmvnic: Harden device login requests") Signed-off-by: Nick Child <nnac123@linux.ibm.com> --- Hello! This patchset is all relevant to recent bugs which came up regarding the ibmvnic login process. Specifically, when this process times out. ibmvnic devices are virtual devices which need to "login" to a physical NIC at the end of its initialization process. This invloves sending a command to the VIOS (virtual input output server, essentially the server that this client is logging into) requesting it to fill out a DMA mapped repsonse buffer. Once done, the VIOS sends a response informing the client that the buffer has been filled with data. If the VIOS does not send a response in 20 seconds then the client tries again. If this happens then several bugs can occur. This is usually due to the fact that there are more than one outstanding requests and no mechanism for mapping a response CRQ to a given response buffer. Until that mechanism is created, this patchset aims to harden this timeout recovery process so that the device does not get stuck in an inopperable state. drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)