diff mbox series

[v3,net-next,11/14] af_unix: Assign a unique index to SCC.

Message ID 20240223214003.17369-12-kuniyu@amazon.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series af_unix: Rework GC. | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 982 this patch: 982
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 958 this patch: 958
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1003 this patch: 1003
netdev/checkpatch warning WARNING: line length of 91 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-02-27--03-00 (tests: 1456)

Commit Message

Kuniyuki Iwashima Feb. 23, 2024, 9:40 p.m. UTC
The definition of the lowlink in Tarjan's algorithm is the
smallest index of a vertex that is reachable with at most one
back-edge in SCC.  This is not useful for a cross-edge.

If we start traversing from A in the following graph, the final
lowlink of D is 3.  The cross-edge here is one between D and C.

  A -> B -> D   D = (4, 3)  (index, lowlink)
  ^    |    |   C = (3, 1)
  |    V    |   B = (2, 1)
  `--- C <--'   A = (1, 1)

This is because the lowlink of D is updated with the index of C.

In the following patch, we detect a dead SCC by checking two
conditions for each vertex.

  1) vertex has no edge directed to another SCC (no bridge)
  2) vertex's out_degree is the same as the refcount of its file

If 1) is false, there is a receiver of all fds of the SCC and
its ancestor SCC.

To evaluate 1), we need to assign a unique index to each SCC and
assign it to all vertices in the SCC.

This patch changes the lowlink update logic for cross-edge so
that in the example above, the lowlink of D is updated with the
lowlink of C.

  A -> B -> D   D = (4, 1)  (index, lowlink)
  ^    |    |   C = (3, 1)
  |    V    |   B = (2, 1)
  `--- C <--'   A = (1, 1)

Then, all vertices in the same SCC have the same lowlink, and we
can quickly find the bridge connecting to different SCC if exists.

However, it is no longer called lowlink, so we rename it to
scc_index.  (It's sometimes called lowpoint.)

Also, we add a global variable to hold the last index used in DFS
so that we do not reset the initial index in each DFS.

This patch can be squashed to the SCC detection patch but is
split deliberately for anyone wondering why lowlink is not used
as used in the original Tarjan's algorithm and many reference
implementations.

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
---
 include/net/af_unix.h |  2 +-
 net/unix/garbage.c    | 15 ++++++++-------
 2 files changed, 9 insertions(+), 8 deletions(-)

Comments

Paolo Abeni Feb. 27, 2024, 11:19 a.m. UTC | #1
On Fri, 2024-02-23 at 13:40 -0800, Kuniyuki Iwashima wrote:
> The definition of the lowlink in Tarjan's algorithm is the
> smallest index of a vertex that is reachable with at most one
> back-edge in SCC.  This is not useful for a cross-edge.
> 
> If we start traversing from A in the following graph, the final
> lowlink of D is 3.  The cross-edge here is one between D and C.
> 
>   A -> B -> D   D = (4, 3)  (index, lowlink)
>   ^    |    |   C = (3, 1)
>   |    V    |   B = (2, 1)
>   `--- C <--'   A = (1, 1)
> 
> This is because the lowlink of D is updated with the index of C.
> 
> In the following patch, we detect a dead SCC by checking two
> conditions for each vertex.
> 
>   1) vertex has no edge directed to another SCC (no bridge)
>   2) vertex's out_degree is the same as the refcount of its file
> 
> If 1) is false, there is a receiver of all fds of the SCC and
> its ancestor SCC.
> 
> To evaluate 1), we need to assign a unique index to each SCC and
> assign it to all vertices in the SCC.
> 
> This patch changes the lowlink update logic for cross-edge so
> that in the example above, the lowlink of D is updated with the
> lowlink of C.
> 
>   A -> B -> D   D = (4, 1)  (index, lowlink)
>   ^    |    |   C = (3, 1)
>   |    V    |   B = (2, 1)
>   `--- C <--'   A = (1, 1)
> 
> Then, all vertices in the same SCC have the same lowlink, and we
> can quickly find the bridge connecting to different SCC if exists.
> 
> However, it is no longer called lowlink, so we rename it to
> scc_index.  (It's sometimes called lowpoint.)
> 
> Also, we add a global variable to hold the last index used in DFS
> so that we do not reset the initial index in each DFS.
> 
> This patch can be squashed to the SCC detection patch but is
> split deliberately for anyone wondering why lowlink is not used
> as used in the original Tarjan's algorithm and many reference
> implementations.
> 
> Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> ---
>  include/net/af_unix.h |  2 +-
>  net/unix/garbage.c    | 15 ++++++++-------
>  2 files changed, 9 insertions(+), 8 deletions(-)
> 
> diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> index ec040caaa4b5..696d997a5ac9 100644
> --- a/include/net/af_unix.h
> +++ b/include/net/af_unix.h
> @@ -36,7 +36,7 @@ struct unix_vertex {
>  	struct list_head scc_entry;
>  	unsigned long out_degree;
>  	unsigned long index;
> -	unsigned long lowlink;
> +	unsigned long scc_index;
>  };
>  
>  struct unix_edge {
> diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> index 1d9a0498dec5..0eb1610c96d7 100644
> --- a/net/unix/garbage.c
> +++ b/net/unix/garbage.c
> @@ -308,18 +308,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
>  
>  static LIST_HEAD(unix_visited_vertices);
>  static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> +static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
>  
>  static void __unix_walk_scc(struct unix_vertex *vertex)
>  {
> -	unsigned long index = UNIX_VERTEX_INDEX_START;
>  	LIST_HEAD(vertex_stack);
>  	struct unix_edge *edge;
>  	LIST_HEAD(edge_stack);
>  
>  next_vertex:
> -	vertex->index = index;
> -	vertex->lowlink = index;
> -	index++;
> +	vertex->index = unix_vertex_last_index;
> +	vertex->scc_index = unix_vertex_last_index;
> +	unix_vertex_last_index++;
>  
>  	list_add(&vertex->scc_entry, &vertex_stack);
>  
> @@ -342,13 +342,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
>  
>  			vertex = edge->predecessor->vertex;
>  
> -			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
> +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
>  		} else if (next_vertex->index != unix_vertex_grouped_index) {
> -			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
> +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);

I guess the above will break when unix_vertex_last_index wraps around,
or am I low on coffee? (I guess there is not such a thing as enough
coffee to allow me reviewing this whole series at once ;)

Can we expect a wrap around in host with (surprisingly very) long
uptimes? 

Thanks,

Paolo
Kuniyuki Iwashima Feb. 28, 2024, 3:05 a.m. UTC | #2
From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 27 Feb 2024 12:19:40 +0100
> On Fri, 2024-02-23 at 13:40 -0800, Kuniyuki Iwashima wrote:
> > The definition of the lowlink in Tarjan's algorithm is the
> > smallest index of a vertex that is reachable with at most one
> > back-edge in SCC.  This is not useful for a cross-edge.
> > 
> > If we start traversing from A in the following graph, the final
> > lowlink of D is 3.  The cross-edge here is one between D and C.
> > 
> >   A -> B -> D   D = (4, 3)  (index, lowlink)
> >   ^    |    |   C = (3, 1)
> >   |    V    |   B = (2, 1)
> >   `--- C <--'   A = (1, 1)
> > 
> > This is because the lowlink of D is updated with the index of C.
> > 
> > In the following patch, we detect a dead SCC by checking two
> > conditions for each vertex.
> > 
> >   1) vertex has no edge directed to another SCC (no bridge)
> >   2) vertex's out_degree is the same as the refcount of its file
> > 
> > If 1) is false, there is a receiver of all fds of the SCC and
> > its ancestor SCC.
> > 
> > To evaluate 1), we need to assign a unique index to each SCC and
> > assign it to all vertices in the SCC.
> > 
> > This patch changes the lowlink update logic for cross-edge so
> > that in the example above, the lowlink of D is updated with the
> > lowlink of C.
> > 
> >   A -> B -> D   D = (4, 1)  (index, lowlink)
> >   ^    |    |   C = (3, 1)
> >   |    V    |   B = (2, 1)
> >   `--- C <--'   A = (1, 1)
> > 
> > Then, all vertices in the same SCC have the same lowlink, and we
> > can quickly find the bridge connecting to different SCC if exists.
> > 
> > However, it is no longer called lowlink, so we rename it to
> > scc_index.  (It's sometimes called lowpoint.)
> > 
> > Also, we add a global variable to hold the last index used in DFS
> > so that we do not reset the initial index in each DFS.
> > 
> > This patch can be squashed to the SCC detection patch but is
> > split deliberately for anyone wondering why lowlink is not used
> > as used in the original Tarjan's algorithm and many reference
> > implementations.
> > 
> > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > ---
> >  include/net/af_unix.h |  2 +-
> >  net/unix/garbage.c    | 15 ++++++++-------
> >  2 files changed, 9 insertions(+), 8 deletions(-)
> > 
> > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > index ec040caaa4b5..696d997a5ac9 100644
> > --- a/include/net/af_unix.h
> > +++ b/include/net/af_unix.h
> > @@ -36,7 +36,7 @@ struct unix_vertex {
> >  	struct list_head scc_entry;
> >  	unsigned long out_degree;
> >  	unsigned long index;
> > -	unsigned long lowlink;
> > +	unsigned long scc_index;
> >  };
> >  
> >  struct unix_edge {
> > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > index 1d9a0498dec5..0eb1610c96d7 100644
> > --- a/net/unix/garbage.c
> > +++ b/net/unix/garbage.c
> > @@ -308,18 +308,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
> >  
> >  static LIST_HEAD(unix_visited_vertices);
> >  static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> > +static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
> >  
> >  static void __unix_walk_scc(struct unix_vertex *vertex)
> >  {
> > -	unsigned long index = UNIX_VERTEX_INDEX_START;
> >  	LIST_HEAD(vertex_stack);
> >  	struct unix_edge *edge;
> >  	LIST_HEAD(edge_stack);
> >  
> >  next_vertex:
> > -	vertex->index = index;
> > -	vertex->lowlink = index;
> > -	index++;
> > +	vertex->index = unix_vertex_last_index;
> > +	vertex->scc_index = unix_vertex_last_index;
> > +	unix_vertex_last_index++;
> >  
> >  	list_add(&vertex->scc_entry, &vertex_stack);
> >  
> > @@ -342,13 +342,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
> >  
> >  			vertex = edge->predecessor->vertex;
> >  
> > -			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
> > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> >  		} else if (next_vertex->index != unix_vertex_grouped_index) {
> > -			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
> > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> 
> I guess the above will break when unix_vertex_last_index wraps around,
> or am I low on coffee? (I guess there is not such a thing as enough
> coffee to allow me reviewing this whole series at once ;)
> 
> Can we expect a wrap around in host with (surprisingly very) long
> uptimes? 

Then, the number of inflight AF_UNIX sockets is at least 2^64 - 1.
After this series, struct unix_sock is 1024 bytes, so... the host
would have roughly

  2^10 * 2^64 == 2^74 bytes == 2^34 TBi == 17179869184 TBi

memory!

So, we need not expect a wrap around :)
Paolo Abeni Feb. 28, 2024, 7:49 a.m. UTC | #3
On Tue, 2024-02-27 at 19:05 -0800, Kuniyuki Iwashima wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Tue, 27 Feb 2024 12:19:40 +0100
> > On Fri, 2024-02-23 at 13:40 -0800, Kuniyuki Iwashima wrote:
> > > The definition of the lowlink in Tarjan's algorithm is the
> > > smallest index of a vertex that is reachable with at most one
> > > back-edge in SCC.  This is not useful for a cross-edge.
> > > 
> > > If we start traversing from A in the following graph, the final
> > > lowlink of D is 3.  The cross-edge here is one between D and C.
> > > 
> > >   A -> B -> D   D = (4, 3)  (index, lowlink)
> > >   ^    |    |   C = (3, 1)
> > >   |    V    |   B = (2, 1)
> > >   `--- C <--'   A = (1, 1)
> > > 
> > > This is because the lowlink of D is updated with the index of C.
> > > 
> > > In the following patch, we detect a dead SCC by checking two
> > > conditions for each vertex.
> > > 
> > >   1) vertex has no edge directed to another SCC (no bridge)
> > >   2) vertex's out_degree is the same as the refcount of its file
> > > 
> > > If 1) is false, there is a receiver of all fds of the SCC and
> > > its ancestor SCC.
> > > 
> > > To evaluate 1), we need to assign a unique index to each SCC and
> > > assign it to all vertices in the SCC.
> > > 
> > > This patch changes the lowlink update logic for cross-edge so
> > > that in the example above, the lowlink of D is updated with the
> > > lowlink of C.
> > > 
> > >   A -> B -> D   D = (4, 1)  (index, lowlink)
> > >   ^    |    |   C = (3, 1)
> > >   |    V    |   B = (2, 1)
> > >   `--- C <--'   A = (1, 1)
> > > 
> > > Then, all vertices in the same SCC have the same lowlink, and we
> > > can quickly find the bridge connecting to different SCC if exists.
> > > 
> > > However, it is no longer called lowlink, so we rename it to
> > > scc_index.  (It's sometimes called lowpoint.)
> > > 
> > > Also, we add a global variable to hold the last index used in DFS
> > > so that we do not reset the initial index in each DFS.
> > > 
> > > This patch can be squashed to the SCC detection patch but is
> > > split deliberately for anyone wondering why lowlink is not used
> > > as used in the original Tarjan's algorithm and many reference
> > > implementations.
> > > 
> > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > ---
> > >  include/net/af_unix.h |  2 +-
> > >  net/unix/garbage.c    | 15 ++++++++-------
> > >  2 files changed, 9 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > > index ec040caaa4b5..696d997a5ac9 100644
> > > --- a/include/net/af_unix.h
> > > +++ b/include/net/af_unix.h
> > > @@ -36,7 +36,7 @@ struct unix_vertex {
> > >  	struct list_head scc_entry;
> > >  	unsigned long out_degree;
> > >  	unsigned long index;
> > > -	unsigned long lowlink;
> > > +	unsigned long scc_index;
> > >  };
> > >  
> > >  struct unix_edge {
> > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > > index 1d9a0498dec5..0eb1610c96d7 100644
> > > --- a/net/unix/garbage.c
> > > +++ b/net/unix/garbage.c
> > > @@ -308,18 +308,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
> > >  
> > >  static LIST_HEAD(unix_visited_vertices);
> > >  static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> > > +static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
> > >  
> > >  static void __unix_walk_scc(struct unix_vertex *vertex)
> > >  {
> > > -	unsigned long index = UNIX_VERTEX_INDEX_START;
> > >  	LIST_HEAD(vertex_stack);
> > >  	struct unix_edge *edge;
> > >  	LIST_HEAD(edge_stack);
> > >  
> > >  next_vertex:
> > > -	vertex->index = index;
> > > -	vertex->lowlink = index;
> > > -	index++;
> > > +	vertex->index = unix_vertex_last_index;
> > > +	vertex->scc_index = unix_vertex_last_index;
> > > +	unix_vertex_last_index++;
> > >  
> > >  	list_add(&vertex->scc_entry, &vertex_stack);
> > >  
> > > @@ -342,13 +342,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
> > >  
> > >  			vertex = edge->predecessor->vertex;
> > >  
> > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
> > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > >  		} else if (next_vertex->index != unix_vertex_grouped_index) {
> > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
> > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > 
> > I guess the above will break when unix_vertex_last_index wraps around,
> > or am I low on coffee? (I guess there is not such a thing as enough
> > coffee to allow me reviewing this whole series at once ;)
> > 
> > Can we expect a wrap around in host with (surprisingly very) long
> > uptimes? 
> 
> Then, the number of inflight AF_UNIX sockets is at least 2^64 - 1.

Isn't "unix_vertex_last_index" value preserved across consecutive cg
run? I though we could reach wrap around after a lot of gc runs...

Cheers, 

Paolo
Kuniyuki Iwashima Feb. 28, 2024, 4:25 p.m. UTC | #4
From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 28 Feb 2024 08:49:46 +0100
> On Tue, 2024-02-27 at 19:05 -0800, Kuniyuki Iwashima wrote:
> > From: Paolo Abeni <pabeni@redhat.com>
> > Date: Tue, 27 Feb 2024 12:19:40 +0100
> > > On Fri, 2024-02-23 at 13:40 -0800, Kuniyuki Iwashima wrote:
> > > > The definition of the lowlink in Tarjan's algorithm is the
> > > > smallest index of a vertex that is reachable with at most one
> > > > back-edge in SCC.  This is not useful for a cross-edge.
> > > > 
> > > > If we start traversing from A in the following graph, the final
> > > > lowlink of D is 3.  The cross-edge here is one between D and C.
> > > > 
> > > >   A -> B -> D   D = (4, 3)  (index, lowlink)
> > > >   ^    |    |   C = (3, 1)
> > > >   |    V    |   B = (2, 1)
> > > >   `--- C <--'   A = (1, 1)
> > > > 
> > > > This is because the lowlink of D is updated with the index of C.
> > > > 
> > > > In the following patch, we detect a dead SCC by checking two
> > > > conditions for each vertex.
> > > > 
> > > >   1) vertex has no edge directed to another SCC (no bridge)
> > > >   2) vertex's out_degree is the same as the refcount of its file
> > > > 
> > > > If 1) is false, there is a receiver of all fds of the SCC and
> > > > its ancestor SCC.
> > > > 
> > > > To evaluate 1), we need to assign a unique index to each SCC and
> > > > assign it to all vertices in the SCC.
> > > > 
> > > > This patch changes the lowlink update logic for cross-edge so
> > > > that in the example above, the lowlink of D is updated with the
> > > > lowlink of C.
> > > > 
> > > >   A -> B -> D   D = (4, 1)  (index, lowlink)
> > > >   ^    |    |   C = (3, 1)
> > > >   |    V    |   B = (2, 1)
> > > >   `--- C <--'   A = (1, 1)
> > > > 
> > > > Then, all vertices in the same SCC have the same lowlink, and we
> > > > can quickly find the bridge connecting to different SCC if exists.
> > > > 
> > > > However, it is no longer called lowlink, so we rename it to
> > > > scc_index.  (It's sometimes called lowpoint.)
> > > > 
> > > > Also, we add a global variable to hold the last index used in DFS
> > > > so that we do not reset the initial index in each DFS.
> > > > 
> > > > This patch can be squashed to the SCC detection patch but is
> > > > split deliberately for anyone wondering why lowlink is not used
> > > > as used in the original Tarjan's algorithm and many reference
> > > > implementations.
> > > > 
> > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > ---
> > > >  include/net/af_unix.h |  2 +-
> > > >  net/unix/garbage.c    | 15 ++++++++-------
> > > >  2 files changed, 9 insertions(+), 8 deletions(-)
> > > > 
> > > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > > > index ec040caaa4b5..696d997a5ac9 100644
> > > > --- a/include/net/af_unix.h
> > > > +++ b/include/net/af_unix.h
> > > > @@ -36,7 +36,7 @@ struct unix_vertex {
> > > >  	struct list_head scc_entry;
> > > >  	unsigned long out_degree;
> > > >  	unsigned long index;
> > > > -	unsigned long lowlink;
> > > > +	unsigned long scc_index;
> > > >  };
> > > >  
> > > >  struct unix_edge {
> > > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > > > index 1d9a0498dec5..0eb1610c96d7 100644
> > > > --- a/net/unix/garbage.c
> > > > +++ b/net/unix/garbage.c
> > > > @@ -308,18 +308,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
> > > >  
> > > >  static LIST_HEAD(unix_visited_vertices);
> > > >  static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> > > > +static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
> > > >  
> > > >  static void __unix_walk_scc(struct unix_vertex *vertex)
> > > >  {
> > > > -	unsigned long index = UNIX_VERTEX_INDEX_START;
> > > >  	LIST_HEAD(vertex_stack);
> > > >  	struct unix_edge *edge;
> > > >  	LIST_HEAD(edge_stack);
> > > >  
> > > >  next_vertex:
> > > > -	vertex->index = index;
> > > > -	vertex->lowlink = index;
> > > > -	index++;
> > > > +	vertex->index = unix_vertex_last_index;
> > > > +	vertex->scc_index = unix_vertex_last_index;
> > > > +	unix_vertex_last_index++;
> > > >  
> > > >  	list_add(&vertex->scc_entry, &vertex_stack);
> > > >  
> > > > @@ -342,13 +342,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
> > > >  
> > > >  			vertex = edge->predecessor->vertex;
> > > >  
> > > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
> > > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > > >  		} else if (next_vertex->index != unix_vertex_grouped_index) {
> > > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
> > > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > > 
> > > I guess the above will break when unix_vertex_last_index wraps around,
> > > or am I low on coffee? (I guess there is not such a thing as enough
> > > coffee to allow me reviewing this whole series at once ;)
> > > 
> > > Can we expect a wrap around in host with (surprisingly very) long
> > > uptimes? 
> > 
> > Then, the number of inflight AF_UNIX sockets is at least 2^64 - 1.
> 
> Isn't "unix_vertex_last_index" value preserved across consecutive cg
> run? I though we could reach wrap around after a lot of gc runs...

It's preserved across consecutive DFS in a single gc run, but
unix_walk_scc() always reset it.  So, if it's wrapped, there
would be too many sockets.

I used unix_vertex_last_index elsewhere in the initial draft,
but now local variable could be better here.
Paolo Abeni Feb. 28, 2024, 5:51 p.m. UTC | #5
On Wed, 2024-02-28 at 08:25 -0800, Kuniyuki Iwashima wrote:
> From: Paolo Abeni <pabeni@redhat.com>
> Date: Wed, 28 Feb 2024 08:49:46 +0100
> > On Tue, 2024-02-27 at 19:05 -0800, Kuniyuki Iwashima wrote:
> > > From: Paolo Abeni <pabeni@redhat.com>
> > > Date: Tue, 27 Feb 2024 12:19:40 +0100
> > > > On Fri, 2024-02-23 at 13:40 -0800, Kuniyuki Iwashima wrote:
> > > > > The definition of the lowlink in Tarjan's algorithm is the
> > > > > smallest index of a vertex that is reachable with at most one
> > > > > back-edge in SCC.  This is not useful for a cross-edge.
> > > > > 
> > > > > If we start traversing from A in the following graph, the final
> > > > > lowlink of D is 3.  The cross-edge here is one between D and C.
> > > > > 
> > > > >   A -> B -> D   D = (4, 3)  (index, lowlink)
> > > > >   ^    |    |   C = (3, 1)
> > > > >   |    V    |   B = (2, 1)
> > > > >   `--- C <--'   A = (1, 1)
> > > > > 
> > > > > This is because the lowlink of D is updated with the index of C.
> > > > > 
> > > > > In the following patch, we detect a dead SCC by checking two
> > > > > conditions for each vertex.
> > > > > 
> > > > >   1) vertex has no edge directed to another SCC (no bridge)
> > > > >   2) vertex's out_degree is the same as the refcount of its file
> > > > > 
> > > > > If 1) is false, there is a receiver of all fds of the SCC and
> > > > > its ancestor SCC.
> > > > > 
> > > > > To evaluate 1), we need to assign a unique index to each SCC and
> > > > > assign it to all vertices in the SCC.
> > > > > 
> > > > > This patch changes the lowlink update logic for cross-edge so
> > > > > that in the example above, the lowlink of D is updated with the
> > > > > lowlink of C.
> > > > > 
> > > > >   A -> B -> D   D = (4, 1)  (index, lowlink)
> > > > >   ^    |    |   C = (3, 1)
> > > > >   |    V    |   B = (2, 1)
> > > > >   `--- C <--'   A = (1, 1)
> > > > > 
> > > > > Then, all vertices in the same SCC have the same lowlink, and we
> > > > > can quickly find the bridge connecting to different SCC if exists.
> > > > > 
> > > > > However, it is no longer called lowlink, so we rename it to
> > > > > scc_index.  (It's sometimes called lowpoint.)
> > > > > 
> > > > > Also, we add a global variable to hold the last index used in DFS
> > > > > so that we do not reset the initial index in each DFS.
> > > > > 
> > > > > This patch can be squashed to the SCC detection patch but is
> > > > > split deliberately for anyone wondering why lowlink is not used
> > > > > as used in the original Tarjan's algorithm and many reference
> > > > > implementations.
> > > > > 
> > > > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
> > > > > ---
> > > > >  include/net/af_unix.h |  2 +-
> > > > >  net/unix/garbage.c    | 15 ++++++++-------
> > > > >  2 files changed, 9 insertions(+), 8 deletions(-)
> > > > > 
> > > > > diff --git a/include/net/af_unix.h b/include/net/af_unix.h
> > > > > index ec040caaa4b5..696d997a5ac9 100644
> > > > > --- a/include/net/af_unix.h
> > > > > +++ b/include/net/af_unix.h
> > > > > @@ -36,7 +36,7 @@ struct unix_vertex {
> > > > >  	struct list_head scc_entry;
> > > > >  	unsigned long out_degree;
> > > > >  	unsigned long index;
> > > > > -	unsigned long lowlink;
> > > > > +	unsigned long scc_index;
> > > > >  };
> > > > >  
> > > > >  struct unix_edge {
> > > > > diff --git a/net/unix/garbage.c b/net/unix/garbage.c
> > > > > index 1d9a0498dec5..0eb1610c96d7 100644
> > > > > --- a/net/unix/garbage.c
> > > > > +++ b/net/unix/garbage.c
> > > > > @@ -308,18 +308,18 @@ static bool unix_scc_cyclic(struct list_head *scc)
> > > > >  
> > > > >  static LIST_HEAD(unix_visited_vertices);
> > > > >  static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
> > > > > +static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
> > > > >  
> > > > >  static void __unix_walk_scc(struct unix_vertex *vertex)
> > > > >  {
> > > > > -	unsigned long index = UNIX_VERTEX_INDEX_START;
> > > > >  	LIST_HEAD(vertex_stack);
> > > > >  	struct unix_edge *edge;
> > > > >  	LIST_HEAD(edge_stack);
> > > > >  
> > > > >  next_vertex:
> > > > > -	vertex->index = index;
> > > > > -	vertex->lowlink = index;
> > > > > -	index++;
> > > > > +	vertex->index = unix_vertex_last_index;
> > > > > +	vertex->scc_index = unix_vertex_last_index;
> > > > > +	unix_vertex_last_index++;
> > > > >  
> > > > >  	list_add(&vertex->scc_entry, &vertex_stack);
> > > > >  
> > > > > @@ -342,13 +342,13 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
> > > > >  
> > > > >  			vertex = edge->predecessor->vertex;
> > > > >  
> > > > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
> > > > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > > > >  		} else if (next_vertex->index != unix_vertex_grouped_index) {
> > > > > -			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
> > > > > +			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
> > > > 
> > > > I guess the above will break when unix_vertex_last_index wraps around,
> > > > or am I low on coffee? (I guess there is not such a thing as enough
> > > > coffee to allow me reviewing this whole series at once ;)
> > > > 
> > > > Can we expect a wrap around in host with (surprisingly very) long
> > > > uptimes? 
> > > 
> > > Then, the number of inflight AF_UNIX sockets is at least 2^64 - 1.
> > 
> > Isn't "unix_vertex_last_index" value preserved across consecutive cg
> > run? I though we could reach wrap around after a lot of gc runs...
> 
> It's preserved across consecutive DFS in a single gc run, but
> unix_walk_scc() always reset it.  So, if it's wrapped, there
> would be too many sockets.

Ah, I missed that point. No wrap-around problem then!

> I used unix_vertex_last_index elsewhere in the initial draft,
> but now local variable could be better here.

You could bundle the index, hitlist, etc. in a single struct (gs_state
or whatever) and pass around a single argument, if that helps.

Cheers,

Paolo
diff mbox series

Patch

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index ec040caaa4b5..696d997a5ac9 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -36,7 +36,7 @@  struct unix_vertex {
 	struct list_head scc_entry;
 	unsigned long out_degree;
 	unsigned long index;
-	unsigned long lowlink;
+	unsigned long scc_index;
 };
 
 struct unix_edge {
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 1d9a0498dec5..0eb1610c96d7 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -308,18 +308,18 @@  static bool unix_scc_cyclic(struct list_head *scc)
 
 static LIST_HEAD(unix_visited_vertices);
 static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
+static unsigned long unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
 
 static void __unix_walk_scc(struct unix_vertex *vertex)
 {
-	unsigned long index = UNIX_VERTEX_INDEX_START;
 	LIST_HEAD(vertex_stack);
 	struct unix_edge *edge;
 	LIST_HEAD(edge_stack);
 
 next_vertex:
-	vertex->index = index;
-	vertex->lowlink = index;
-	index++;
+	vertex->index = unix_vertex_last_index;
+	vertex->scc_index = unix_vertex_last_index;
+	unix_vertex_last_index++;
 
 	list_add(&vertex->scc_entry, &vertex_stack);
 
@@ -342,13 +342,13 @@  static void __unix_walk_scc(struct unix_vertex *vertex)
 
 			vertex = edge->predecessor->vertex;
 
-			vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink);
+			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
 		} else if (next_vertex->index != unix_vertex_grouped_index) {
-			vertex->lowlink = min(vertex->lowlink, next_vertex->index);
+			vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index);
 		}
 	}
 
-	if (vertex->index == vertex->lowlink) {
+	if (vertex->index == vertex->scc_index) {
 		struct list_head scc;
 
 		__list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
@@ -371,6 +371,7 @@  static void __unix_walk_scc(struct unix_vertex *vertex)
 
 static void unix_walk_scc(void)
 {
+	unix_vertex_last_index = UNIX_VERTEX_INDEX_START;
 	unix_graph_maybe_cyclic = false;
 
 	while (!list_empty(&unix_unvisited_vertices)) {