From patchwork Tue Jan 30 19:38:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dai Ngo X-Patchwork-Id: 13537937 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2A80D7B3EC for ; Tue, 30 Jan 2024 19:38:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.177.32 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706643534; cv=none; b=pGn5JEc/umsL47FRwAaTMevk3T9kiRx7cUTU+S7B1kcuXtVYq4XmhbRPk7JXR/hMMf8ub2EZ8+Br2IzNavXKdaPVOQFgrcIRAo2im9oLSod+HLcMeEhfozpZ4RbrO0bq7J6JU9BgRxPWv5WxMejxk/sp76zuwaEIppTwVew6vvE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706643534; c=relaxed/simple; bh=9MYA6V2gU7AHNQgLkhr65hPnycylgt6TBZ4qL58HX/c=; h=From:To:Subject:Date:Message-Id; b=MzF/w8mxP5orgsXP6qtloY/fUKrc7mDri+UZJ894eQ8ilQglddVcZmhzKqGk37d62mJeMrH/eWSDvOsoiRDxtCWWWW6sk66D7vOI6KEQtk0AOxfDMg1zL8qRteGF7nhjsdFyvnAIfmFdZNkKj2j2q0MGEvWt1NZpHZ0qbsSaOz4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com; spf=pass smtp.mailfrom=oracle.com; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b=Ms3nGWke; arc=none smtp.client-ip=205.220.177.32 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=oracle.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=oracle.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="Ms3nGWke" Received: from pps.filterd (m0246630.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 40UHkFtO030403 for ; Tue, 30 Jan 2024 19:38:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : subject : date : message-id; s=corp-2023-11-20; bh=jFyIgWO3dOimRNFuWPzZEd5wN0UaVsyKBO9L81hG5QE=; b=Ms3nGWke6AJ3txkIJ4QbE1LdqXjIn6KOIrlnamzdXuP3ngpY9yXOyqdEDv+JVgJorI33 4zLYgzqhnqGiTTgH2c5B+yThvK2ZTmqL+w8xu2Sw3lPggcBmjnOuJc14zgyBLLQ6dKru pklKD9B8ETurAGYGHsGOuwimzPrup1mwZacFFXxBdvEQF1aDIs9TY6we6M+Zwe+ElxYE freyrnxa+2SsHo1gLfCKKznOFZ8W53i7DnE5DUxI1tacykkB0GNP4pUYRf/vFSKgaI57 jxNef2oJnWKdk2QwAoGXIC5L6XVQUKgjuhqsTLPAERDW27+RUEFhYK7nErzq/Tp+ECE4 TA== Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.appoci.oracle.com [147.154.114.232]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3vvr8efr6q-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 30 Jan 2024 19:38:50 +0000 Received: from pps.filterd (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 40UIXFOH031449 for ; Tue, 30 Jan 2024 19:38:38 GMT Received: from pps.reinject (localhost [127.0.0.1]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTPS id 3vvr97xqgv-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Tue, 30 Jan 2024 19:38:38 +0000 Received: from phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 40UJcbCA028891 for ; Tue, 30 Jan 2024 19:38:37 GMT Received: from ca-common-hq.us.oracle.com (ca-common-hq.us.oracle.com [10.211.9.209]) by phxpaimrmta02.imrmtpd1.prodappphxaev1.oraclevcn.com (PPS) with ESMTP id 3vvr97xqgk-1; Tue, 30 Jan 2024 19:38:37 +0000 From: Dai Ngo To: linux-nfs@vger.kernel.org Subject: [PATCH 1/1] SUNRPC: increase size of rpc_wait_queue.qlen from unsigned short to unsigned int Date: Tue, 30 Jan 2024 11:38:25 -0800 Message-Id: <1706643505-31122-1-git-send-email-dai.ngo@oracle.com> X-Mailer: git-send-email 1.8.3.1 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-01-30_09,2024-01-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 bulkscore=0 malwarescore=0 mlxscore=0 mlxlogscore=999 spamscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311290000 definitions=main-2401300146 X-Proofpoint-ORIG-GUID: 6RfuCsy0Baf6f1Z4RHlTgkMIDsxKMK0D X-Proofpoint-GUID: 6RfuCsy0Baf6f1Z4RHlTgkMIDsxKMK0D Precedence: bulk X-Mailing-List: linux-nfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: When the NFS client is under extreme load the rpc_wait_queue.qlen counter can be overflowed. Here is an instant of the backlog queue overflow in a real world environment shown by drgn helper: rpc_task_stats(rpc_clnt): ------------------------- rpc_clnt: 0xffff92b65d2bae00 rpc_xprt: 0xffff9275db64f000 Queue: sending[64887] pending[524] backlog[30441] binding[0] XMIT task: 0xffff925c6b1d8e98 WRITE: 750654 __dta_call_status_580: 65463 __dta_call_transmit_status_579: 1 call_reserveresult: 685189 nfs_client_init_is_complete: 1 COMMIT: 584 call_reserveresult: 573 __dta_call_status_580: 11 ACCESS: 1 __dta_call_status_580: 1 GETATTR: 10 __dta_call_status_580: 4 call_reserveresult: 6 751249 tasks for server 111.222.333.444 Total tasks: 751249 count_rpc_wait_queues(xprt): ---------------------------- **** rpc_xprt: 0xffff9275db64f000 num_reqs: 65511 wait_queue: xprt_binding[0] cnt: 0 wait_queue: xprt_binding[1] cnt: 0 wait_queue: xprt_binding[2] cnt: 0 wait_queue: xprt_binding[3] cnt: 0 rpc_wait_queue[xprt_binding].qlen: 0 maxpriority: 0 wait_queue: xprt_sending[0] cnt: 0 wait_queue: xprt_sending[1] cnt: 64887 wait_queue: xprt_sending[2] cnt: 0 wait_queue: xprt_sending[3] cnt: 0 rpc_wait_queue[xprt_sending].qlen: 64887 maxpriority: 3 wait_queue: xprt_pending[0] cnt: 524 wait_queue: xprt_pending[1] cnt: 0 wait_queue: xprt_pending[2] cnt: 0 wait_queue: xprt_pending[3] cnt: 0 rpc_wait_queue[xprt_pending].qlen: 524 maxpriority: 0 wait_queue: xprt_backlog[0] cnt: 0 wait_queue: xprt_backlog[1] cnt: 685801 wait_queue: xprt_backlog[2] cnt: 0 wait_queue: xprt_backlog[3] cnt: 0 rpc_wait_queue[xprt_backlog].qlen: 30441 maxpriority: 3 [task cnt mismatch] There is no effect on operations when this overflow occurs. However it causes confusion when trying to diagnose the performance problem. Signed-off-by: Dai Ngo Reviewed-by: Jeff Layton --- include/linux/sunrpc/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 8ada7dc802d3..8f9bee0e21c3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -186,7 +186,7 @@ struct rpc_wait_queue { unsigned char maxpriority; /* maximum priority (0 if queue is not a priority queue) */ unsigned char priority; /* current priority */ unsigned char nr; /* # tasks remaining for cookie */ - unsigned short qlen; /* total # tasks waiting in queue */ + unsigned int qlen; /* total # tasks waiting in queue */ struct rpc_timer timer_list; #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) || IS_ENABLED(CONFIG_TRACEPOINTS) const char * name;