From patchwork Mon Feb 15 12:29:48 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Wenzel, Marco" X-Patchwork-Id: 12088107 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A509C433DB for ; Mon, 15 Feb 2021 12:31:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 69F98600CF for ; Mon, 15 Feb 2021 12:31:22 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230375AbhBOMbT (ORCPT ); Mon, 15 Feb 2021 07:31:19 -0500 Received: from mail.a-eberle.de ([213.95.140.213]:60056 "EHLO mail.a-eberle.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230318AbhBOMbC (ORCPT ); Mon, 15 Feb 2021 07:31:02 -0500 Received: from localhost (localhost [127.0.0.1]) by mail.a-eberle.de (Postfix) with ESMTP id ADDF9380558 for ; Mon, 15 Feb 2021 13:30:00 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aeberle-mx.softwerk.noris.de Received: from mail.a-eberle.de ([127.0.0.1]) by localhost (ebl-mx-02.a-eberle.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BiIwAyg_q4BQ for ; Mon, 15 Feb 2021 13:29:59 +0100 (CET) Received: from gateway.a-eberle.de (unknown [178.15.155.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "sg310.eberle.local", Issuer "A. Eberle GmbH & Co. KG WebAdmin CA" (not verified)) (Authenticated sender: postmaster@a-eberle.de) by mail.a-eberle.de (Postfix) with ESMTPSA for ; Mon, 15 Feb 2021 13:29:59 +0100 (CET) Received: from exch-svr2013.eberle.local ([192.168.1.9]:29336 helo=webmail.a-eberle.de) by gateway.a-eberle.de with esmtps (TLSv1.2:AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1lBd0e-0006j2-2T; Mon, 15 Feb 2021 13:29:48 +0100 Received: from EXCH-SVR2013.eberle.local (192.168.1.9) by EXCH-SVR2013.eberle.local (192.168.1.9) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Mon, 15 Feb 2021 13:29:49 +0100 Received: from EXCH-SVR2013.eberle.local ([::1]) by EXCH-SVR2013.eberle.local ([::1]) with mapi id 15.00.1497.006; Mon, 15 Feb 2021 13:29:48 +0100 From: "Wenzel, Marco" To: George McCollister CC: "netdev@vger.kernel.org" Subject: AW: HSR/PRP sequence counter issue with Cisco Redbox Thread-Topic: HSR/PRP sequence counter issue with Cisco Redbox Thread-Index: Adb0oMB5N/nh+lgRSvKGIOcbXjKnGABIcvMAA3RX9HA= Date: Mon, 15 Feb 2021 12:29:48 +0000 Message-ID: <11291f9b05764307b660049e2290dd10@EXCH-SVR2013.eberle.local> References: <69ec2fd1a9a048e8b3305a4bc36aad01@EXCH-SVR2013.eberle.local> In-Reply-To: Accept-Language: de-DE, en-US Content-Language: de-DE X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.242.2.55] x-kse-serverinfo: EXCH-SVR2013.eberle.local, 9 x-kse-antivirus-interceptor-info: scan successful x-kse-antivirus-info: Clean, bases: 15.02.2021 09:01:00 x-kse-attachment-filter-triggered-rules: Clean x-kse-attachment-filter-triggered-filters: Clean x-kse-bulkmessagesfiltering-scan-result: protection disabled MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org > On Wed, Jan 27, 2021 at 6:32 AM Wenzel, Marco eberle.de> wrote: > > > > Hi, > > > > we have figured out an issue with the current PRP driver when trying to > communicate with Cisco IE 2000 industrial Ethernet switches in Redbox > mode. The Cisco always resets the HSR/PRP sequence counter to "1" at low > traffic (<= 1 frame in 400 ms). It can be reproduced by a simple ICMP echo > request with 1 s interval between a Linux box running with PRP and a VDAN > behind the Cisco Redbox. The Linux box then always receives frames with > sequence counter "1" and drops them. The behavior is not configurable at > the Cisco Redbox. > > > > I fixed it by ignoring sequence counters with value "1" at the sequence > counter check in hsr_register_frame_out (): > > > > diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c index > > 5c97de459905..630c238e81f0 100644 > > --- a/net/hsr/hsr_framereg.c > > +++ b/net/hsr/hsr_framereg.c > > @@ -411,7 +411,7 @@ void hsr_register_frame_in(struct hsr_node *node, > > struct hsr_port *port, int hsr_register_frame_out(struct hsr_port *port, > struct hsr_node *node, > > u16 sequence_nr) { > > - if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type])) > > + if (seq_nr_before_or_eq(sequence_nr, > > + node->seq_out[port->type]) && (sequence_nr != 1)) > > return 1; > > > > node->seq_out[port->type] = sequence_nr; > > > > > > Do you think this could be a solution? Should this patch be officially applied > in order to avoid other users running into these communication issues? > > This isn't the correct way to solve the problem. IEC 62439-3 defines > EntryForgetTime as "Time after which an entry is removed from the duplicate > table" with a value of 400ms and states devices should usually be configured > to keep entries in the table for a much shorter time. hsr_framereg.c needs to > be reworked to handle this according to the specification. Sorry for the delay but I did not have the time to take a closer look at the problem until now. My suggestion for the EntryForgetTime feature would be the following: A time_out element will be added to the hsr_node structure, which always stores the current time when entering hsr_register_frame_out(). If the last stored time is older than EntryForgetTime (400 ms) the sequence number check will be ignored. This approach works fine with the Cisco IE 2000 and I think it implements the correct way to handle sequence numbers as defined in IEC 62439-3. Regards, Marco Wenzel > > > > Thanks > > Marco Wenzel > > Regards, > George McCollister diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c index 5c97de459905..a97bffbd2581 100644 --- a/net/hsr/hsr_framereg.c +++ b/net/hsr/hsr_framereg.c @@ -164,8 +164,10 @@ static struct hsr_node *hsr_add_node(struct hsr_priv *hsr, * as initialization. (0 could trigger an spurious ring error warning). */ now = jiffies; - for (i = 0; i < HSR_PT_PORTS; i++) + for (i = 0; i < HSR_PT_PORTS; i++) { new_node->time_in[i] = now; + new_node->time_out[i] = now; + } for (i = 0; i < HSR_PT_PORTS; i++) new_node->seq_out[i] = seq_out; @@ -411,9 +413,12 @@ void hsr_register_frame_in(struct hsr_node *node, struct hsr_port *port, int hsr_register_frame_out(struct hsr_port *port, struct hsr_node *node, u16 sequence_nr) { - if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type])) + if (seq_nr_before_or_eq(sequence_nr, node->seq_out[port->type]) && + time_is_after_jiffies(node->time_out[port->type] + msecs_to_jiffies(HSR_ENTRY_FORGET_TIME))) { return 1; + } + node->time_out[port->type] = jiffies; node->seq_out[port->type] = sequence_nr; return 0; } diff --git a/net/hsr/hsr_framereg.h b/net/hsr/hsr_framereg.h index 86b43f539f2c..d9628e7a5f05 100644 --- a/net/hsr/hsr_framereg.h +++ b/net/hsr/hsr_framereg.h @@ -75,6 +75,7 @@ struct hsr_node { enum hsr_port_type addr_B_port; unsigned long time_in[HSR_PT_PORTS]; bool time_in_stale[HSR_PT_PORTS]; + unsigned long time_out[HSR_PT_PORTS]; /* if the node is a SAN */ bool san_a; bool san_b; diff --git a/net/hsr/hsr_main.h b/net/hsr/hsr_main.h index 7dc92ce5a134..f79ca55d6986 100644 --- a/net/hsr/hsr_main.h +++ b/net/hsr/hsr_main.h @@ -21,6 +21,7 @@ #define HSR_LIFE_CHECK_INTERVAL 2000 /* ms */ #define HSR_NODE_FORGET_TIME 60000 /* ms */ #define HSR_ANNOUNCE_INTERVAL 100 /* ms */ +#define HSR_ENTRY_FORGET_TIME 400 /* ms */ /* By how much may slave1 and slave2 timestamps of latest received frame from * each node differ before we notify of communication problem?