mbox series

[RFC,bpf-next,v1,0/8] Pinning bpf objects outside bpffs

Message ID 20220106215059.2308931-1-haoluo@google.com (mailing list archive)
Headers show
Series Pinning bpf objects outside bpffs | expand

Message

Hao Luo Jan. 6, 2022, 9:50 p.m. UTC
Bpffs is a pseudo file system that persists bpf objects. Previously
bpf objects can only be pinned in bpffs, this patchset extends pinning
to allow bpf objects to be pinned (or exposed) to other file systems.

In particular, this patchset allows pinning bpf objects in kernfs. This
creates a new file entry in the kernfs file system and the created file
is able to reference the bpf object. By doing so, bpf can be used to
customize the file's operations, such as seq_show.

As a concrete usecase of this feature, this patchset introduces a
simple new program type called 'bpf_view', which can be used to format
a seq file by a kernel object's state. By pinning a bpf_view program
into a cgroup directory, userspace is able to read the cgroup's state
from file in a format defined by the bpf program.

Different from bpffs, kernfs doesn't have a callback when a kernfs node
is freed, which is problem if we allow the kernfs node to hold an extra
reference of the bpf object, because there is no chance to dec the
object's refcnt. Therefore the kernfs node created by pinning doesn't
hold reference of the bpf object. The lifetime of the kernfs node
depends on the lifetime of the bpf object. Rather than "pinning in
kernfs", it is "exposing to kernfs". We require the bpf object to be
pinned in bpffs first before it can be pinned in kernfs. When the
object is unpinned from bpffs, their kernfs nodes will be removed
automatically. This somehow treats a pinned bpf object as a persistent
"device".

We rely on fsnotify to monitor the inode events in bpffs. A new function
bpf_watch_inode() is introduced. It allows registering a callback
function at inode destruction. For the kernfs case, a callback that
removes kernfs node is registered at the destruction of bpffs inodes.
For other file systems such as sockfs, bpf_watch_inode() can monitor the
destruction of sockfs inodes and the created file entry can hold the bpf
object's reference. In this case, it is truly "pinning".

File operations other than seq_show can also be implemented using bpf.
For example, bpf may be of help for .poll and .mmap in kernfs.

Patch organization:
 - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
   2/8 records bpffs inode in bpf object.
 - patch 3/8 and 4/8 implement generic logic for creating bpf backed
   kernfs file.
 - patch 5/8 and 6/8 add a new program type for formatting output.
 - patch 7/8 implements cgroup seq_show operation using bpf.
 - patch 8/8 adds selftest.

Hao Luo (8):
  bpf: Support pinning in non-bpf file system.
  bpf: Record back pointer to the inode in bpffs
  bpf: Expose bpf object in kernfs
  bpf: Support removing kernfs entries
  bpf: Introduce a new program type bpf_view.
  libbpf: Support of bpf_view prog type.
  bpf: Add seq_show operation for bpf in cgroupfs
  selftests/bpf: Test exposing bpf objects in kernfs

 include/linux/bpf.h                           |   9 +-
 include/uapi/linux/bpf.h                      |   2 +
 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
 kernel/bpf/bpf_view.h                         |  25 ++
 kernel/bpf/inode.c                            | 219 ++++++++++++++--
 kernel/bpf/inode.h                            |  54 ++++
 kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
 kernel/bpf/syscall.c                          |   3 +
 kernel/bpf/verifier.c                         |   6 +
 kernel/trace/bpf_trace.c                      |  12 +-
 tools/include/uapi/linux/bpf.h                |   2 +
 tools/lib/bpf/libbpf.c                        |  21 ++
 .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
 .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
 15 files changed, 995 insertions(+), 32 deletions(-)
 create mode 100644 kernel/bpf/bpf_view.c
 create mode 100644 kernel/bpf/bpf_view.h
 create mode 100644 kernel/bpf/inode.h
 create mode 100644 kernel/bpf/kernfs_node.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
 create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c

Comments

Stanislav Fomichev Jan. 6, 2022, 11:02 p.m. UTC | #1
On 01/06, Hao Luo wrote:
> Bpffs is a pseudo file system that persists bpf objects. Previously
> bpf objects can only be pinned in bpffs, this patchset extends pinning
> to allow bpf objects to be pinned (or exposed) to other file systems.

> In particular, this patchset allows pinning bpf objects in kernfs. This
> creates a new file entry in the kernfs file system and the created file
> is able to reference the bpf object. By doing so, bpf can be used to
> customize the file's operations, such as seq_show.

> As a concrete usecase of this feature, this patchset introduces a
> simple new program type called 'bpf_view', which can be used to format
> a seq file by a kernel object's state. By pinning a bpf_view program
> into a cgroup directory, userspace is able to read the cgroup's state
> from file in a format defined by the bpf program.

> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> is freed, which is problem if we allow the kernfs node to hold an extra
> reference of the bpf object, because there is no chance to dec the
> object's refcnt. Therefore the kernfs node created by pinning doesn't
> hold reference of the bpf object. The lifetime of the kernfs node
> depends on the lifetime of the bpf object. Rather than "pinning in
> kernfs", it is "exposing to kernfs". We require the bpf object to be
> pinned in bpffs first before it can be pinned in kernfs. When the
> object is unpinned from bpffs, their kernfs nodes will be removed
> automatically. This somehow treats a pinned bpf object as a persistent
> "device".

> We rely on fsnotify to monitor the inode events in bpffs. A new function
> bpf_watch_inode() is introduced. It allows registering a callback
> function at inode destruction. For the kernfs case, a callback that
> removes kernfs node is registered at the destruction of bpffs inodes.
> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> destruction of sockfs inodes and the created file entry can hold the bpf
> object's reference. In this case, it is truly "pinning".

> File operations other than seq_show can also be implemented using bpf.
> For example, bpf may be of help for .poll and .mmap in kernfs.

This looks awesome!

One thing I don't understand is: why did go through the pinning
interface VS regular attach/detach? IOW, why not allow regular
sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
(which, in turn, creates the kernfs nodes). Seems like this way you can drop
the requirement on the object being pinned in the bpffs first?
Yonghong Song Jan. 7, 2022, 12:30 a.m. UTC | #2
On 1/6/22 1:50 PM, Hao Luo wrote:
> Bpffs is a pseudo file system that persists bpf objects. Previously
> bpf objects can only be pinned in bpffs, this patchset extends pinning
> to allow bpf objects to be pinned (or exposed) to other file systems.
> 
> In particular, this patchset allows pinning bpf objects in kernfs. This
> creates a new file entry in the kernfs file system and the created file
> is able to reference the bpf object. By doing so, bpf can be used to
> customize the file's operations, such as seq_show.
> 
> As a concrete usecase of this feature, this patchset introduces a
> simple new program type called 'bpf_view', which can be used to format
> a seq file by a kernel object's state. By pinning a bpf_view program
> into a cgroup directory, userspace is able to read the cgroup's state
> from file in a format defined by the bpf program.
> 
> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> is freed, which is problem if we allow the kernfs node to hold an extra
> reference of the bpf object, because there is no chance to dec the
> object's refcnt. Therefore the kernfs node created by pinning doesn't
> hold reference of the bpf object. The lifetime of the kernfs node
> depends on the lifetime of the bpf object. Rather than "pinning in
> kernfs", it is "exposing to kernfs". We require the bpf object to be
> pinned in bpffs first before it can be pinned in kernfs. When the
> object is unpinned from bpffs, their kernfs nodes will be removed
> automatically. This somehow treats a pinned bpf object as a persistent
> "device".

During our initial discussion for bpf_iter, we even proposed to
put cat-able files under /proc/ system. But there are some concerns
that /proc/ system holds stable APIs so people can rely on its format 
etc. Not sure kernfs here has such requirement or not.

I understand directly put files in respective target directories
(e.g., cgroup) helps. But you can also create directory hierarchy
in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
better organized.

Also regarding new program type bpf_view, I think we can reuse
bpf_iter infrastructure. E.g., we already can customize bpf_iter
for a particular map. We can certainly customize bpf_iter targeting
a particular cgroup.

> 
> We rely on fsnotify to monitor the inode events in bpffs. A new function
> bpf_watch_inode() is introduced. It allows registering a callback
> function at inode destruction. For the kernfs case, a callback that
> removes kernfs node is registered at the destruction of bpffs inodes.
> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> destruction of sockfs inodes and the created file entry can hold the bpf
> object's reference. In this case, it is truly "pinning".
> 
> File operations other than seq_show can also be implemented using bpf.
> For example, bpf may be of help for .poll and .mmap in kernfs.
> 
> Patch organization:
>   - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
>     2/8 records bpffs inode in bpf object.
>   - patch 3/8 and 4/8 implement generic logic for creating bpf backed
>     kernfs file.
>   - patch 5/8 and 6/8 add a new program type for formatting output.
>   - patch 7/8 implements cgroup seq_show operation using bpf.
>   - patch 8/8 adds selftest.
> 
> Hao Luo (8):
>    bpf: Support pinning in non-bpf file system.
>    bpf: Record back pointer to the inode in bpffs
>    bpf: Expose bpf object in kernfs
>    bpf: Support removing kernfs entries
>    bpf: Introduce a new program type bpf_view.
>    libbpf: Support of bpf_view prog type.
>    bpf: Add seq_show operation for bpf in cgroupfs
>    selftests/bpf: Test exposing bpf objects in kernfs
> 
>   include/linux/bpf.h                           |   9 +-
>   include/uapi/linux/bpf.h                      |   2 +
>   kernel/bpf/Makefile                           |   2 +-
>   kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
>   kernel/bpf/bpf_view.h                         |  25 ++
>   kernel/bpf/inode.c                            | 219 ++++++++++++++--
>   kernel/bpf/inode.h                            |  54 ++++
>   kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
>   kernel/bpf/syscall.c                          |   3 +
>   kernel/bpf/verifier.c                         |   6 +
>   kernel/trace/bpf_trace.c                      |  12 +-
>   tools/include/uapi/linux/bpf.h                |   2 +
>   tools/lib/bpf/libbpf.c                        |  21 ++
>   .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
>   .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
>   15 files changed, 995 insertions(+), 32 deletions(-)
>   create mode 100644 kernel/bpf/bpf_view.c
>   create mode 100644 kernel/bpf/bpf_view.h
>   create mode 100644 kernel/bpf/inode.h
>   create mode 100644 kernel/bpf/kernfs_node.c
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
>   create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c
>
Hao Luo Jan. 7, 2022, 6:59 p.m. UTC | #3
On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
>
> On 01/06, Hao Luo wrote:
> > Bpffs is a pseudo file system that persists bpf objects. Previously
> > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > to allow bpf objects to be pinned (or exposed) to other file systems.
>
> > In particular, this patchset allows pinning bpf objects in kernfs. This
> > creates a new file entry in the kernfs file system and the created file
> > is able to reference the bpf object. By doing so, bpf can be used to
> > customize the file's operations, such as seq_show.
>
> > As a concrete usecase of this feature, this patchset introduces a
> > simple new program type called 'bpf_view', which can be used to format
> > a seq file by a kernel object's state. By pinning a bpf_view program
> > into a cgroup directory, userspace is able to read the cgroup's state
> > from file in a format defined by the bpf program.
>
> > Different from bpffs, kernfs doesn't have a callback when a kernfs node
> > is freed, which is problem if we allow the kernfs node to hold an extra
> > reference of the bpf object, because there is no chance to dec the
> > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > hold reference of the bpf object. The lifetime of the kernfs node
> > depends on the lifetime of the bpf object. Rather than "pinning in
> > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > pinned in bpffs first before it can be pinned in kernfs. When the
> > object is unpinned from bpffs, their kernfs nodes will be removed
> > automatically. This somehow treats a pinned bpf object as a persistent
> > "device".
>
> > We rely on fsnotify to monitor the inode events in bpffs. A new function
> > bpf_watch_inode() is introduced. It allows registering a callback
> > function at inode destruction. For the kernfs case, a callback that
> > removes kernfs node is registered at the destruction of bpffs inodes.
> > For other file systems such as sockfs, bpf_watch_inode() can monitor the
> > destruction of sockfs inodes and the created file entry can hold the bpf
> > object's reference. In this case, it is truly "pinning".
>
> > File operations other than seq_show can also be implemented using bpf.
> > For example, bpf may be of help for .poll and .mmap in kernfs.
>
> This looks awesome!
>
> One thing I don't understand is: why did go through the pinning
> interface VS regular attach/detach? IOW, why not allow regular
> sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> (which, in turn, creates the kernfs nodes). Seems like this way you can drop
> the requirement on the object being pinned in the bpffs first?

Thanks Stan.

Yeah, the attach/detach approach is definitely another option. IIUC,
in comparison to pinning, does attach/detach only work for cgroups?
Pinning may be used on other file systems, sockfs, sysfs or resctrl.
But I don't know whether this generality is welcome and implementing
seq_show is the only concrete use case I can think of right now. If
people think the ability of creating files in other subsystems is not
good, I'd be happy to take a look at the attach/detach approach and
that may be the right way.
Stanislav Fomichev Jan. 7, 2022, 7:25 p.m. UTC | #4
On 01/07, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> >
> > On 01/06, Hao Luo wrote:
> > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > > In particular, this patchset allows pinning bpf objects in kernfs.  
> This
> > > creates a new file entry in the kernfs file system and the created  
> file
> > > is able to reference the bpf object. By doing so, bpf can be used to
> > > customize the file's operations, such as seq_show.
> >
> > > As a concrete usecase of this feature, this patchset introduces a
> > > simple new program type called 'bpf_view', which can be used to format
> > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > into a cgroup directory, userspace is able to read the cgroup's state
> > > from file in a format defined by the bpf program.
> >
> > > Different from bpffs, kernfs doesn't have a callback when a kernfs  
> node
> > > is freed, which is problem if we allow the kernfs node to hold an  
> extra
> > > reference of the bpf object, because there is no chance to dec the
> > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > hold reference of the bpf object. The lifetime of the kernfs node
> > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > automatically. This somehow treats a pinned bpf object as a persistent
> > > "device".
> >
> > > We rely on fsnotify to monitor the inode events in bpffs. A new  
> function
> > > bpf_watch_inode() is introduced. It allows registering a callback
> > > function at inode destruction. For the kernfs case, a callback that
> > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > For other file systems such as sockfs, bpf_watch_inode() can monitor  
> the
> > > destruction of sockfs inodes and the created file entry can hold the  
> bpf
> > > object's reference. In this case, it is truly "pinning".
> >
> > > File operations other than seq_show can also be implemented using bpf.
> > > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > This looks awesome!
> >
> > One thing I don't understand is: why did go through the pinning
> > interface VS regular attach/detach? IOW, why not allow regular
> > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > (which, in turn, creates the kernfs nodes). Seems like this way you can  
> drop
> > the requirement on the object being pinned in the bpffs first?

> Thanks Stan.

> Yeah, the attach/detach approach is definitely another option. IIUC,
> in comparison to pinning, does attach/detach only work for cgroups?

attach has target_fd argument that, in theory, can be whatever. We can
add support for different fd types.

> Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> But I don't know whether this generality is welcome and implementing
> seq_show is the only concrete use case I can think of right now. If
> people think the ability of creating files in other subsystems is not
> good, I'd be happy to take a look at the attach/detach approach and
> that may be the right way.

The reason I started thinking about attach/detach is because of clunky
unlink that you have to do (aka echo "rm" > file). IMO, having standard
attach/detach is a much more clear. But I might be missing some
complexity associated with non-cgroup filesystems.
Hao Luo Jan. 7, 2022, 8:43 p.m. UTC | #5
On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/6/22 1:50 PM, Hao Luo wrote:
> > Bpffs is a pseudo file system that persists bpf objects. Previously
> > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > to allow bpf objects to be pinned (or exposed) to other file systems.
> >
> > In particular, this patchset allows pinning bpf objects in kernfs. This
> > creates a new file entry in the kernfs file system and the created file
> > is able to reference the bpf object. By doing so, bpf can be used to
> > customize the file's operations, such as seq_show.
> >
> > As a concrete usecase of this feature, this patchset introduces a
> > simple new program type called 'bpf_view', which can be used to format
> > a seq file by a kernel object's state. By pinning a bpf_view program
> > into a cgroup directory, userspace is able to read the cgroup's state
> > from file in a format defined by the bpf program.
> >
> > Different from bpffs, kernfs doesn't have a callback when a kernfs node
> > is freed, which is problem if we allow the kernfs node to hold an extra
> > reference of the bpf object, because there is no chance to dec the
> > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > hold reference of the bpf object. The lifetime of the kernfs node
> > depends on the lifetime of the bpf object. Rather than "pinning in
> > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > pinned in bpffs first before it can be pinned in kernfs. When the
> > object is unpinned from bpffs, their kernfs nodes will be removed
> > automatically. This somehow treats a pinned bpf object as a persistent
> > "device".

Thanks Yonghong for the feedback.

>
> During our initial discussion for bpf_iter, we even proposed to
> put cat-able files under /proc/ system. But there are some concerns
> that /proc/ system holds stable APIs so people can rely on its format
> etc. Not sure kernfs here has such requirement or not.
>
> I understand directly put files in respective target directories
> (e.g., cgroup) helps. But you can also create directory hierarchy
> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
> better organized.
>

I thought about this. I think the problem is that you need to
simultaneously manage two hierarchies now. You may have
synchronization problems in bpffs to deal with. For example, what if
the target cgroup is being removed while there is an on-going 'cat' on
the bpf program. I haven't given much thought in this direction. This
patchset doesn't deal with this problem, because kernfs has already
handled these synchronizations.

> Also regarding new program type bpf_view, I think we can reuse
> bpf_iter infrastructure. E.g., we already can customize bpf_iter
> for a particular map. We can certainly customize bpf_iter targeting
> a particular cgroup.
>

Right, that's what I was thinking. During the bpf office hour when I
initially proposed the idea, Alexei suggested creating a new program
type instead of reusing bpf_iter. The reason I remember was that iter
is for iterating a set of objects. Even for dumping a particular map,
it is iterating the entries in a map. While what I wanted to achieve
here is printing for a particular cgroup, not iterating something.
Maybe, we should reuse the infrastructure of bpf_iter (attach, target
registration, etc) but having a different prog type? I do copy-pasted
many code from bpf_iter for bpf_view. I haven't put too much thought
there as I would like to get feedbacks on the idea in general in this
first version of RFC.

> >
> > We rely on fsnotify to monitor the inode events in bpffs. A new function
> > bpf_watch_inode() is introduced. It allows registering a callback
> > function at inode destruction. For the kernfs case, a callback that
> > removes kernfs node is registered at the destruction of bpffs inodes.
> > For other file systems such as sockfs, bpf_watch_inode() can monitor the
> > destruction of sockfs inodes and the created file entry can hold the bpf
> > object's reference. In this case, it is truly "pinning".
> >
> > File operations other than seq_show can also be implemented using bpf.
> > For example, bpf may be of help for .poll and .mmap in kernfs.
> >
> > Patch organization:
> >   - patch 1/8 and 2/8 are preparations. 1/8 implements bpf_watch_inode();
> >     2/8 records bpffs inode in bpf object.
> >   - patch 3/8 and 4/8 implement generic logic for creating bpf backed
> >     kernfs file.
> >   - patch 5/8 and 6/8 add a new program type for formatting output.
> >   - patch 7/8 implements cgroup seq_show operation using bpf.
> >   - patch 8/8 adds selftest.
> >
> > Hao Luo (8):
> >    bpf: Support pinning in non-bpf file system.
> >    bpf: Record back pointer to the inode in bpffs
> >    bpf: Expose bpf object in kernfs
> >    bpf: Support removing kernfs entries
> >    bpf: Introduce a new program type bpf_view.
> >    libbpf: Support of bpf_view prog type.
> >    bpf: Add seq_show operation for bpf in cgroupfs
> >    selftests/bpf: Test exposing bpf objects in kernfs
> >
> >   include/linux/bpf.h                           |   9 +-
> >   include/uapi/linux/bpf.h                      |   2 +
> >   kernel/bpf/Makefile                           |   2 +-
> >   kernel/bpf/bpf_view.c                         | 190 ++++++++++++++
> >   kernel/bpf/bpf_view.h                         |  25 ++
> >   kernel/bpf/inode.c                            | 219 ++++++++++++++--
> >   kernel/bpf/inode.h                            |  54 ++++
> >   kernel/bpf/kernfs_node.c                      | 165 ++++++++++++
> >   kernel/bpf/syscall.c                          |   3 +
> >   kernel/bpf/verifier.c                         |   6 +
> >   kernel/trace/bpf_trace.c                      |  12 +-
> >   tools/include/uapi/linux/bpf.h                |   2 +
> >   tools/lib/bpf/libbpf.c                        |  21 ++
> >   .../selftests/bpf/prog_tests/pinning_kernfs.c | 245 ++++++++++++++++++
> >   .../selftests/bpf/progs/pinning_kernfs.c      |  72 +++++
> >   15 files changed, 995 insertions(+), 32 deletions(-)
> >   create mode 100644 kernel/bpf/bpf_view.c
> >   create mode 100644 kernel/bpf/bpf_view.h
> >   create mode 100644 kernel/bpf/inode.h
> >   create mode 100644 kernel/bpf/kernfs_node.c
> >   create mode 100644 tools/testing/selftests/bpf/prog_tests/pinning_kernfs.c
> >   create mode 100644 tools/testing/selftests/bpf/progs/pinning_kernfs.c
> >
Yonghong Song Jan. 10, 2022, 5:30 p.m. UTC | #6
On 1/7/22 12:43 PM, Hao Luo wrote:
> On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
>>
>>
>>
>> On 1/6/22 1:50 PM, Hao Luo wrote:
>>> Bpffs is a pseudo file system that persists bpf objects. Previously
>>> bpf objects can only be pinned in bpffs, this patchset extends pinning
>>> to allow bpf objects to be pinned (or exposed) to other file systems.
>>>
>>> In particular, this patchset allows pinning bpf objects in kernfs. This
>>> creates a new file entry in the kernfs file system and the created file
>>> is able to reference the bpf object. By doing so, bpf can be used to
>>> customize the file's operations, such as seq_show.
>>>
>>> As a concrete usecase of this feature, this patchset introduces a
>>> simple new program type called 'bpf_view', which can be used to format
>>> a seq file by a kernel object's state. By pinning a bpf_view program
>>> into a cgroup directory, userspace is able to read the cgroup's state
>>> from file in a format defined by the bpf program.
>>>
>>> Different from bpffs, kernfs doesn't have a callback when a kernfs node
>>> is freed, which is problem if we allow the kernfs node to hold an extra
>>> reference of the bpf object, because there is no chance to dec the
>>> object's refcnt. Therefore the kernfs node created by pinning doesn't
>>> hold reference of the bpf object. The lifetime of the kernfs node
>>> depends on the lifetime of the bpf object. Rather than "pinning in
>>> kernfs", it is "exposing to kernfs". We require the bpf object to be
>>> pinned in bpffs first before it can be pinned in kernfs. When the
>>> object is unpinned from bpffs, their kernfs nodes will be removed
>>> automatically. This somehow treats a pinned bpf object as a persistent
>>> "device".
> 
> Thanks Yonghong for the feedback.
> 
>>
>> During our initial discussion for bpf_iter, we even proposed to
>> put cat-able files under /proc/ system. But there are some concerns
>> that /proc/ system holds stable APIs so people can rely on its format
>> etc. Not sure kernfs here has such requirement or not.
>>
>> I understand directly put files in respective target directories
>> (e.g., cgroup) helps. But you can also create directory hierarchy
>> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
>> better organized.
>>
> 
> I thought about this. I think the problem is that you need to
> simultaneously manage two hierarchies now. You may have
> synchronization problems in bpffs to deal with. For example, what if
> the target cgroup is being removed while there is an on-going 'cat' on
> the bpf program. I haven't given much thought in this direction. This
> patchset doesn't deal with this problem, because kernfs has already
> handled these synchronizations.

If the file is going to pinned inside /sys/fs/cgroup, which arguably is 
indeed a better place, maybe ask cgroup maintainer's opinion?

> 
>> Also regarding new program type bpf_view, I think we can reuse
>> bpf_iter infrastructure. E.g., we already can customize bpf_iter
>> for a particular map. We can certainly customize bpf_iter targeting
>> a particular cgroup.
>>
> 
> Right, that's what I was thinking. During the bpf office hour when I
> initially proposed the idea, Alexei suggested creating a new program
> type instead of reusing bpf_iter. The reason I remember was that iter
> is for iterating a set of objects. Even for dumping a particular map,
> it is iterating the entries in a map. While what I wanted to achieve
> here is printing for a particular cgroup, not iterating something.
> Maybe, we should reuse the infrastructure of bpf_iter (attach, target
> registration, etc) but having a different prog type? I do copy-pasted
> many code from bpf_iter for bpf_view. I haven't put too much thought
> there as I would like to get feedbacks on the idea in general in this
> first version of RFC.

Sorry I am not aware of this bpf_view discussion. It is okay for me.
But it would be great if we can avoid lots of code duplication.

> 
>>>
>>> We rely on fsnotify to monitor the inode events in bpffs. A new function
>>> bpf_watch_inode() is introduced. It allows registering a callback
>>> function at inode destruction. For the kernfs case, a callback that
>>> removes kernfs node is registered at the destruction of bpffs inodes.
>>> For other file systems such as sockfs, bpf_watch_inode() can monitor the
>>> destruction of sockfs inodes and the created file entry can hold the bpf
>>> object's reference. In this case, it is truly "pinning".
>>>
>>> File operations other than seq_show can also be implemented using bpf.
>>> For example, bpf may be of help for .poll and .mmap in kernfs.
>>>
[...]
Hao Luo Jan. 10, 2022, 6:55 p.m. UTC | #7
On Fri, Jan 7, 2022 at 11:25 AM <sdf@google.com> wrote:
>
> On 01/07, Hao Luo wrote:
> > On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> > >
> > > On 01/06, Hao Luo wrote:
> > > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > > to allow bpf objects to be pinned (or exposed) to other file systems.
> > >
> > > > In particular, this patchset allows pinning bpf objects in kernfs.
> > This
> > > > creates a new file entry in the kernfs file system and the created
> > file
> > > > is able to reference the bpf object. By doing so, bpf can be used to
> > > > customize the file's operations, such as seq_show.
> > >
> > > > As a concrete usecase of this feature, this patchset introduces a
> > > > simple new program type called 'bpf_view', which can be used to format
> > > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > > into a cgroup directory, userspace is able to read the cgroup's state
> > > > from file in a format defined by the bpf program.
> > >
> > > > Different from bpffs, kernfs doesn't have a callback when a kernfs
> > node
> > > > is freed, which is problem if we allow the kernfs node to hold an
> > extra
> > > > reference of the bpf object, because there is no chance to dec the
> > > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > > hold reference of the bpf object. The lifetime of the kernfs node
> > > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > > automatically. This somehow treats a pinned bpf object as a persistent
> > > > "device".
> > >
> > > > We rely on fsnotify to monitor the inode events in bpffs. A new
> > function
> > > > bpf_watch_inode() is introduced. It allows registering a callback
> > > > function at inode destruction. For the kernfs case, a callback that
> > > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > > For other file systems such as sockfs, bpf_watch_inode() can monitor
> > the
> > > > destruction of sockfs inodes and the created file entry can hold the
> > bpf
> > > > object's reference. In this case, it is truly "pinning".
> > >
> > > > File operations other than seq_show can also be implemented using bpf.
> > > > For example, bpf may be of help for .poll and .mmap in kernfs.
> > >
> > > This looks awesome!
> > >
> > > One thing I don't understand is: why did go through the pinning
> > > interface VS regular attach/detach? IOW, why not allow regular
> > > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > > (which, in turn, creates the kernfs nodes). Seems like this way you can
> > drop
> > > the requirement on the object being pinned in the bpffs first?
>
> > Thanks Stan.
>
> > Yeah, the attach/detach approach is definitely another option. IIUC,
> > in comparison to pinning, does attach/detach only work for cgroups?
>
> attach has target_fd argument that, in theory, can be whatever. We can
> add support for different fd types.
>

I see. With attach API, are we also able to specify some attributes
for the attachment? For example, a property that we may want is: let
descendent cgroups inherit their parent cgroup's programs.

> > Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> > But I don't know whether this generality is welcome and implementing
> > seq_show is the only concrete use case I can think of right now. If
> > people think the ability of creating files in other subsystems is not
> > good, I'd be happy to take a look at the attach/detach approach and
> > that may be the right way.
>
> The reason I started thinking about attach/detach is because of clunky
> unlink that you have to do (aka echo "rm" > file). IMO, having standard
> attach/detach is a much more clear. But I might be missing some
> complexity associated with non-cgroup filesystems.

Oh, I see. Looks good. Let me play with it before sending the next version.
Hao Luo Jan. 10, 2022, 6:56 p.m. UTC | #8
On Mon, Jan 10, 2022 at 9:30 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 1/7/22 12:43 PM, Hao Luo wrote:
> > On Thu, Jan 6, 2022 at 4:30 PM Yonghong Song <yhs@fb.com> wrote:
> >>
> >>
> >>
> >> On 1/6/22 1:50 PM, Hao Luo wrote:
> >>> Bpffs is a pseudo file system that persists bpf objects. Previously
> >>> bpf objects can only be pinned in bpffs, this patchset extends pinning
> >>> to allow bpf objects to be pinned (or exposed) to other file systems.
> >>>
> >>> In particular, this patchset allows pinning bpf objects in kernfs. This
> >>> creates a new file entry in the kernfs file system and the created file
> >>> is able to reference the bpf object. By doing so, bpf can be used to
> >>> customize the file's operations, such as seq_show.
> >>>
> >>> As a concrete usecase of this feature, this patchset introduces a
> >>> simple new program type called 'bpf_view', which can be used to format
> >>> a seq file by a kernel object's state. By pinning a bpf_view program
> >>> into a cgroup directory, userspace is able to read the cgroup's state
> >>> from file in a format defined by the bpf program.
> >>>
> >>> Different from bpffs, kernfs doesn't have a callback when a kernfs node
> >>> is freed, which is problem if we allow the kernfs node to hold an extra
> >>> reference of the bpf object, because there is no chance to dec the
> >>> object's refcnt. Therefore the kernfs node created by pinning doesn't
> >>> hold reference of the bpf object. The lifetime of the kernfs node
> >>> depends on the lifetime of the bpf object. Rather than "pinning in
> >>> kernfs", it is "exposing to kernfs". We require the bpf object to be
> >>> pinned in bpffs first before it can be pinned in kernfs. When the
> >>> object is unpinned from bpffs, their kernfs nodes will be removed
> >>> automatically. This somehow treats a pinned bpf object as a persistent
> >>> "device".
> >
> > Thanks Yonghong for the feedback.
> >
> >>
> >> During our initial discussion for bpf_iter, we even proposed to
> >> put cat-able files under /proc/ system. But there are some concerns
> >> that /proc/ system holds stable APIs so people can rely on its format
> >> etc. Not sure kernfs here has such requirement or not.
> >>
> >> I understand directly put files in respective target directories
> >> (e.g., cgroup) helps. But you can also create directory hierarchy
> >> in bpffs. This can make a bunch of cgroup-stat-dumping bpf programs
> >> better organized.
> >>
> >
> > I thought about this. I think the problem is that you need to
> > simultaneously manage two hierarchies now. You may have
> > synchronization problems in bpffs to deal with. For example, what if
> > the target cgroup is being removed while there is an on-going 'cat' on
> > the bpf program. I haven't given much thought in this direction. This
> > patchset doesn't deal with this problem, because kernfs has already
> > handled these synchronizations.
>
> If the file is going to pinned inside /sys/fs/cgroup, which arguably is
> indeed a better place, maybe ask cgroup maintainer's opinion?
>

Yes, will do.

I don't know if pinning in other file systems other than cgroup has
any use. If not, I can tailor this patchset for cgroup only. Would be
much simplified.

> >
> >> Also regarding new program type bpf_view, I think we can reuse
> >> bpf_iter infrastructure. E.g., we already can customize bpf_iter
> >> for a particular map. We can certainly customize bpf_iter targeting
> >> a particular cgroup.
> >>
> >
> > Right, that's what I was thinking. During the bpf office hour when I
> > initially proposed the idea, Alexei suggested creating a new program
> > type instead of reusing bpf_iter. The reason I remember was that iter
> > is for iterating a set of objects. Even for dumping a particular map,
> > it is iterating the entries in a map. While what I wanted to achieve
> > here is printing for a particular cgroup, not iterating something.
> > Maybe, we should reuse the infrastructure of bpf_iter (attach, target
> > registration, etc) but having a different prog type? I do copy-pasted
> > many code from bpf_iter for bpf_view. I haven't put too much thought
> > there as I would like to get feedbacks on the idea in general in this
> > first version of RFC.
>
> Sorry I am not aware of this bpf_view discussion. It is okay for me.
> But it would be great if we can avoid lots of code duplication.
>

No problem. I totally agree.


> >
> >>>
> >>> We rely on fsnotify to monitor the inode events in bpffs. A new function
> >>> bpf_watch_inode() is introduced. It allows registering a callback
> >>> function at inode destruction. For the kernfs case, a callback that
> >>> removes kernfs node is registered at the destruction of bpffs inodes.
> >>> For other file systems such as sockfs, bpf_watch_inode() can monitor the
> >>> destruction of sockfs inodes and the created file entry can hold the bpf
> >>> object's reference. In this case, it is truly "pinning".
> >>>
> >>> File operations other than seq_show can also be implemented using bpf.
> >>> For example, bpf may be of help for .poll and .mmap in kernfs.
> >>>
> [...]
Stanislav Fomichev Jan. 10, 2022, 7:22 p.m. UTC | #9
On Mon, Jan 10, 2022 at 10:56 AM Hao Luo <haoluo@google.com> wrote:
>
> On Fri, Jan 7, 2022 at 11:25 AM <sdf@google.com> wrote:
> >
> > On 01/07, Hao Luo wrote:
> > > On Thu, Jan 6, 2022 at 3:03 PM <sdf@google.com> wrote:
> > > >
> > > > On 01/06, Hao Luo wrote:
> > > > > Bpffs is a pseudo file system that persists bpf objects. Previously
> > > > > bpf objects can only be pinned in bpffs, this patchset extends pinning
> > > > > to allow bpf objects to be pinned (or exposed) to other file systems.
> > > >
> > > > > In particular, this patchset allows pinning bpf objects in kernfs.
> > > This
> > > > > creates a new file entry in the kernfs file system and the created
> > > file
> > > > > is able to reference the bpf object. By doing so, bpf can be used to
> > > > > customize the file's operations, such as seq_show.
> > > >
> > > > > As a concrete usecase of this feature, this patchset introduces a
> > > > > simple new program type called 'bpf_view', which can be used to format
> > > > > a seq file by a kernel object's state. By pinning a bpf_view program
> > > > > into a cgroup directory, userspace is able to read the cgroup's state
> > > > > from file in a format defined by the bpf program.
> > > >
> > > > > Different from bpffs, kernfs doesn't have a callback when a kernfs
> > > node
> > > > > is freed, which is problem if we allow the kernfs node to hold an
> > > extra
> > > > > reference of the bpf object, because there is no chance to dec the
> > > > > object's refcnt. Therefore the kernfs node created by pinning doesn't
> > > > > hold reference of the bpf object. The lifetime of the kernfs node
> > > > > depends on the lifetime of the bpf object. Rather than "pinning in
> > > > > kernfs", it is "exposing to kernfs". We require the bpf object to be
> > > > > pinned in bpffs first before it can be pinned in kernfs. When the
> > > > > object is unpinned from bpffs, their kernfs nodes will be removed
> > > > > automatically. This somehow treats a pinned bpf object as a persistent
> > > > > "device".
> > > >
> > > > > We rely on fsnotify to monitor the inode events in bpffs. A new
> > > function
> > > > > bpf_watch_inode() is introduced. It allows registering a callback
> > > > > function at inode destruction. For the kernfs case, a callback that
> > > > > removes kernfs node is registered at the destruction of bpffs inodes.
> > > > > For other file systems such as sockfs, bpf_watch_inode() can monitor
> > > the
> > > > > destruction of sockfs inodes and the created file entry can hold the
> > > bpf
> > > > > object's reference. In this case, it is truly "pinning".
> > > >
> > > > > File operations other than seq_show can also be implemented using bpf.
> > > > > For example, bpf may be of help for .poll and .mmap in kernfs.
> > > >
> > > > This looks awesome!
> > > >
> > > > One thing I don't understand is: why did go through the pinning
> > > > interface VS regular attach/detach? IOW, why not allow regular
> > > > sys_bpf(BPF_PROG_ATTACH, prog_id, cgroup_id) and attach to the cgroup
> > > > (which, in turn, creates the kernfs nodes). Seems like this way you can
> > > drop
> > > > the requirement on the object being pinned in the bpffs first?
> >
> > > Thanks Stan.
> >
> > > Yeah, the attach/detach approach is definitely another option. IIUC,
> > > in comparison to pinning, does attach/detach only work for cgroups?
> >
> > attach has target_fd argument that, in theory, can be whatever. We can
> > add support for different fd types.
> >
>
> I see. With attach API, are we also able to specify some attributes
> for the attachment? For example, a property that we may want is: let
> descendent cgroups inherit their parent cgroup's programs.

There are already flags like these: BPF_F_ALLOW_OVERRIDE, maybe you
can rely on them? But you can always add more bit flags if needed.

> > > Pinning may be used on other file systems, sockfs, sysfs or resctrl.
> > > But I don't know whether this generality is welcome and implementing
> > > seq_show is the only concrete use case I can think of right now. If
> > > people think the ability of creating files in other subsystems is not
> > > good, I'd be happy to take a look at the attach/detach approach and
> > > that may be the right way.
> >
> > The reason I started thinking about attach/detach is because of clunky
> > unlink that you have to do (aka echo "rm" > file). IMO, having standard
> > attach/detach is a much more clear. But I might be missing some
> > complexity associated with non-cgroup filesystems.
>
> Oh, I see. Looks good. Let me play with it before sending the next version.
Alexei Starovoitov Jan. 11, 2022, 3:33 a.m. UTC | #10
On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> 
> I see. With attach API, are we also able to specify some attributes
> for the attachment? For example, a property that we may want is: let
> descendent cgroups inherit their parent cgroup's programs.

Plenty of interesting ideas in this thread. Thanks for kicking it off.
Maybe we should move it to office hours?
The back and forth over email can take some time.
It sounds to me that "let descendents inherit" is a mandatory feature.
In that sense "allow attach in kernfs" is not a feature. It feels that
it's creating more problems for the design.
Creating a "catable" file inside cgroup directory that descedents inherit
with the same name is a cgroup specific feature.
Inherit or not can be a flag, but the inheritance needs to be designed
from the start.

echo "rm" is not pretty. 
fsnotify feels a bit hacky.
Maybe pinning in cgroupfs will avoid both issues?
We can have normal unlink implemented there.

The attach bpf_sys cmd as-is won't work. It needs a name at least.
That will make it look like obj_pin cmd. So probably better to make
obj_pin work when path is inside cgroupfs and use file_flags for
inherit or not.
The patch 8 gives a glimpse of how the bpf prog will look like.
Can you make it more realistic?
Do you need to walk cgroup children? Or all processes in a cgroup?
Will we need css_for_each_descendant() as a bpf helper?
Stanislav Fomichev Jan. 11, 2022, 5:06 p.m. UTC | #11
On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> >
> > I see. With attach API, are we also able to specify some attributes
> > for the attachment? For example, a property that we may want is: let
> > descendent cgroups inherit their parent cgroup's programs.
>
> Plenty of interesting ideas in this thread. Thanks for kicking it off.
> Maybe we should move it to office hours?
> The back and forth over email can take some time.
> It sounds to me that "let descendents inherit" is a mandatory feature.
> In that sense "allow attach in kernfs" is not a feature. It feels that
> it's creating more problems for the design.
> Creating a "catable" file inside cgroup directory that descedents inherit
> with the same name is a cgroup specific feature.
> Inherit or not can be a flag, but the inheritance needs to be designed
> from the start.
>
> echo "rm" is not pretty.
> fsnotify feels a bit hacky.
> Maybe pinning in cgroupfs will avoid both issues?
> We can have normal unlink implemented there.

With unlink we can do chown and let regular users call unlink.
So maybe it's actually more flexible vs syscall detach, although no
idea why the users would do that.

> The attach bpf_sys cmd as-is won't work. It needs a name at least.

Can we use prog_name in attach? Or, if it's limited by
BPF_OBJ_NAME_LEN, can we extract function name from btf? Or is it too
magic to derive the path from the program name?
obj_pin+unlink is a good alternative. One thing I'm not certain is:
what happens when I call unlink on some of the inherited nodes? (i.e.,
do I need to have a flag to say "unlink this one including
children/parent"? probably not and returning error is fine?)
Agree with your summary that the inheritance story needs more thought :-)
Hao Luo Jan. 11, 2022, 6:20 p.m. UTC | #12
On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> >
> > I see. With attach API, are we also able to specify some attributes
> > for the attachment? For example, a property that we may want is: let
> > descendent cgroups inherit their parent cgroup's programs.
>
> Plenty of interesting ideas in this thread. Thanks for kicking it off.
> Maybe we should move it to office hours?
> The back and forth over email can take some time.

No problem. Requested a time on Thursday (1/13/22).

> It sounds to me that "let descendents inherit" is a mandatory feature.
> In that sense "allow attach in kernfs" is not a feature. It feels that
> it's creating more problems for the design.
> Creating a "catable" file inside cgroup directory that descedents inherit
> with the same name is a cgroup specific feature.
> Inherit or not can be a flag, but the inheritance needs to be designed
> from the start.
>
> echo "rm" is not pretty.
> fsnotify feels a bit hacky.
> Maybe pinning in cgroupfs will avoid both issues?
> We can have normal unlink implemented there.
>
> The attach bpf_sys cmd as-is won't work. It needs a name at least.
> That will make it look like obj_pin cmd. So probably better to make
> obj_pin work when path is inside cgroupfs and use file_flags for
> inherit or not.
> The patch 8 gives a glimpse of how the bpf prog will look like.
> Can you make it more realistic?
> Do you need to walk cgroup children? Or all processes in a cgroup?
> Will we need css_for_each_descendant() as a bpf helper?
Song Liu Jan. 12, 2022, 6:55 p.m. UTC | #13
On Tue, Jan 11, 2022 at 10:20 AM Hao Luo <haoluo@google.com> wrote:
>
> On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> > >
> > > I see. With attach API, are we also able to specify some attributes
> > > for the attachment? For example, a property that we may want is: let
> > > descendent cgroups inherit their parent cgroup's programs.
> >
> > Plenty of interesting ideas in this thread. Thanks for kicking it off.
> > Maybe we should move it to office hours?
> > The back and forth over email can take some time.
>
> No problem. Requested a time on Thursday (1/13/22).

CC Tejun, who might be interested in this discussion.

@Tejun, FYI the office hour is tomorrow 9am PST, via zoom:

  https://fb.zoom.us/j/91157637485

Thanks,
Song
Hao Luo Jan. 12, 2022, 7:19 p.m. UTC | #14
On Wed, Jan 12, 2022 at 10:55 AM Song Liu <song@kernel.org> wrote:
>
> On Tue, Jan 11, 2022 at 10:20 AM Hao Luo <haoluo@google.com> wrote:
> >
> > On Mon, Jan 10, 2022 at 7:33 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Mon, Jan 10, 2022 at 10:55:54AM -0800, Hao Luo wrote:
> > > >
> > > > I see. With attach API, are we also able to specify some attributes
> > > > for the attachment? For example, a property that we may want is: let
> > > > descendent cgroups inherit their parent cgroup's programs.
> > >
> > > Plenty of interesting ideas in this thread. Thanks for kicking it off.
> > > Maybe we should move it to office hours?
> > > The back and forth over email can take some time.
> >
> > No problem. Requested a time on Thursday (1/13/22).
>
> CC Tejun, who might be interested in this discussion.
>
> @Tejun, FYI the office hour is tomorrow 9am PST, via zoom:
>
>   https://fb.zoom.us/j/91157637485
>

Thanks Song, let me resend the whole patchset to include Tejun and cc
linux-kernel.

> Thanks,
> Song