diff mbox series

[net,1/5] ibmvnic: Enforce stronger sanity checks on login response

Message ID 20230803202010.37149-1-nnac123@linux.ibm.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net,1/5] ibmvnic: Enforce stronger sanity checks on login response | expand

Checks

Context Check Description
netdev/series_format warning Series does not have a cover letter
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 9 maintainers not CCed: mpe@ellerman.id.au kuba@kernel.org npiggin@gmail.com christophe.leroy@csgroup.eu davem@davemloft.net ricklind@linux.ibm.com pabeni@redhat.com edumazet@google.com linuxppc-dev@lists.ozlabs.org
netdev/build_clang success Errors and warnings before: 9 this patch: 9
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 30 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Nick Child Aug. 3, 2023, 8:20 p.m. UTC
Ensure that all offsets in a login response buffer are within the size
of the allocated response buffer. Any offsets or lengths that surpass
the allocation are likely the result of an incomplete response buffer.
In these cases, a full reset is necessary.

When attempting to login, the ibmvnic device will allocate a response
buffer and pass a reference to the VIOS. The VIOS will then send the
ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
with data. If the ibmvnic device does not get a response in 20 seconds,
the old buffer is freed and a new login request is sent. With 2
outstanding requests, any LOGIN_RSP CRQ's could be for the older
login request. If this is the case then the login response buffer (which
is for the newer login request) could be incomplete and contain invalid
data. Therefore, we must enforce strict sanity checks on the response
buffer values.

Testing has shown that the `off_rxadd_buff_size` value is filled in last
by the VIOS and will be the smoking gun for these circumstances.

Until VIOS can implement a mechanism for tracking outstanding response
buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
response buffer, the best ibmvnic can do in this situation is perform a
full reset.

Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
---

Hello!
This patchset is all relevant to recent bugs which came up regarding
the ibmvnic login process. Specifically, when this process times out.

ibmvnic devices are virtual devices which need to "login" to a physical
NIC at the end of its initialization process. This invloves sending a
command to the VIOS (virtual input output server, essentially the server
that this client is logging into) requesting it to fill out a DMA mapped
repsonse buffer. Once done, the VIOS sends a response informing the
client that the buffer has been filled with data.

If the VIOS does not send a response in 20 seconds then the client tries
again. If this happens then several bugs can occur. This is usually due
to the fact that there are more than one outstanding requests and no
mechanism for mapping a response CRQ to a given response buffer. Until
that mechanism is created, this patchset aims to harden this timeout
recovery process so that the device does not get stuck in an inopperable
state.

 drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Simon Horman Aug. 5, 2023, 7:18 a.m. UTC | #1
On Thu, Aug 03, 2023 at 03:20:06PM -0500, Nick Child wrote:
> Ensure that all offsets in a login response buffer are within the size
> of the allocated response buffer. Any offsets or lengths that surpass
> the allocation are likely the result of an incomplete response buffer.
> In these cases, a full reset is necessary.
> 
> When attempting to login, the ibmvnic device will allocate a response
> buffer and pass a reference to the VIOS. The VIOS will then send the
> ibmvnic device a LOGIN_RSP CRQ to signal that the buffer has been filled
> with data. If the ibmvnic device does not get a response in 20 seconds,
> the old buffer is freed and a new login request is sent. With 2
> outstanding requests, any LOGIN_RSP CRQ's could be for the older
> login request. If this is the case then the login response buffer (which
> is for the newer login request) could be incomplete and contain invalid
> data. Therefore, we must enforce strict sanity checks on the response
> buffer values.
> 
> Testing has shown that the `off_rxadd_buff_size` value is filled in last
> by the VIOS and will be the smoking gun for these circumstances.
> 
> Until VIOS can implement a mechanism for tracking outstanding response
> buffers and a method for mapping a LOGIN_RSP CRQ to a particular login
> response buffer, the best ibmvnic can do in this situation is perform a
> full reset.
> 
> Fixes: dff515a3e71d ("ibmvnic: Harden device login requests")
> Signed-off-by: Nick Child <nnac123@linux.ibm.com>

Reviewed-by: Simon Horman <horms@kernel.org>

> ---
> 
> Hello!
> This patchset is all relevant to recent bugs which came up regarding
> the ibmvnic login process. Specifically, when this process times out.
> 
> ibmvnic devices are virtual devices which need to "login" to a physical
> NIC at the end of its initialization process. This invloves sending a
> command to the VIOS (virtual input output server, essentially the server
> that this client is logging into) requesting it to fill out a DMA mapped
> repsonse buffer. Once done, the VIOS sends a response informing the
> client that the buffer has been filled with data.
> 
> If the VIOS does not send a response in 20 seconds then the client tries
> again. If this happens then several bugs can occur. This is usually due
> to the fact that there are more than one outstanding requests and no
> mechanism for mapping a response CRQ to a given response buffer. Until
> that mechanism is created, this patchset aims to harden this timeout
> recovery process so that the device does not get stuck in an inopperable
> state.

This sort of information really belongs in a cover letter for the patchset.
And in any case, it's nice to have a cover letter if there is more
than one patch in the series.

>  drivers/net/ethernet/ibm/ibmvnic.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
> index 763d613adbcc..996f8037c266 100644
> --- a/drivers/net/ethernet/ibm/ibmvnic.c
> +++ b/drivers/net/ethernet/ibm/ibmvnic.c
> @@ -5397,6 +5397,7 @@ static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq,
>  	int num_rx_pools;
>  	u64 *size_array;
>  	int i;
> +	u32 rsp_len;

nit: It's preferred in Networking code to arrange local variables in
     reverse xmas tree order - longest line to shortest.

     I know this file doesn't follow that very closely.
     But still, it would be slightly nicer, at least in this case.

...
diff mbox series

Patch

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 763d613adbcc..996f8037c266 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -5397,6 +5397,7 @@  static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq,
 	int num_rx_pools;
 	u64 *size_array;
 	int i;
+	u32 rsp_len;
 
 	/* CHECK: Test/set of login_pending does not need to be atomic
 	 * because only ibmvnic_tasklet tests/clears this.
@@ -5447,6 +5448,23 @@  static int handle_login_rsp(union ibmvnic_crq *login_rsp_crq,
 		ibmvnic_reset(adapter, VNIC_RESET_FATAL);
 		return -EIO;
 	}
+
+	rsp_len = be32_to_cpu(login_rsp->len);
+	if (be32_to_cpu(login->login_rsp_len) < rsp_len ||
+	    rsp_len <= be32_to_cpu(login_rsp->off_txsubm_subcrqs) ||
+	    rsp_len <= be32_to_cpu(login_rsp->off_rxadd_subcrqs) ||
+	    rsp_len <= be32_to_cpu(login_rsp->off_rxadd_buff_size) ||
+	    rsp_len <= be32_to_cpu(login_rsp->off_supp_tx_desc)) {
+		/* This can happen if a login request times out and there are
+		 * 2 outstanding login requests sent, the LOGIN_RSP crq
+		 * could have been for the older login request. So we are
+		 * parsing the newer response buffer which may be incomplete
+		 */
+		dev_err(dev, "FATAL: Login rsp offsets/lengths invalid\n");
+		ibmvnic_reset(adapter, VNIC_RESET_FATAL);
+		return -EIO;
+	}
+
 	size_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
 		be32_to_cpu(adapter->login_rsp_buf->off_rxadd_buff_size));
 	/* variable buffer sizes are not supported, so just read the