mbox series

[00/10,RFC] Simple IPC Mechanism

Message ID pull.766.git.1610465492.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Simple IPC Mechanism | expand

Message

Linus Arver via GitGitGadget Jan. 12, 2021, 3:31 p.m. UTC
This series introduces a multi-threaded IPC mechanism called "Simple IPC".
This is a library-layer feature to make it easy to create very long running
daemon/service applications and for unrelated Git commands to communicate
with them. Communication uses pkt-line messaging over a Windows named pipe
or Unix domain socket.

On the server side, Simple IPC implements a (platform-specific) connection
listener and worker thread-pool to accept and handle a series of client
connections. The server functionality is completely hidden behind the
ipc_server_run() and ipc_server_run_async() APIs. The daemon/service
application only needs to define an application-specific callback to handle
client requests.

Note that Simple IPC is completely unrelated to the long running process
feature (described in sub-process.h) where the lifetime of a "sub-process"
child is bound to that of the invoking parent process and communication
occurs over the child's stdin/stdout.

Simple IPC will serve as a basis for a future builtin FSMonitor daemon
feature.

Jeff Hostetler (7):
  pkt-line: use stack rather than static buffer in packet_write_gently()
  simple-ipc: design documentation for new IPC mechanism
  simple-ipc: add win32 implementation
  unix-socket: create gentle version of unix_stream_listen()
  unix-socket: add no-chdir option to unix_stream_listen_gently()
  simple-ipc: add t/helper/test-simple-ipc and t0052
  simple-ipc: add Unix domain socket implementation

Johannes Schindelin (3):
  pkt-line: (optionally) libify the packet readers
  pkt-line: optionally skip the flush packet in
    write_packetized_from_buf()
  pkt-line: accept additional options in read_packetized_to_strbuf()

 Documentation/technical/api-simple-ipc.txt |   31 +
 Makefile                                   |    8 +
 compat/simple-ipc/ipc-shared.c             |   28 +
 compat/simple-ipc/ipc-unix-socket.c        | 1093 ++++++++++++++++++++
 compat/simple-ipc/ipc-win32.c              |  723 +++++++++++++
 config.mak.uname                           |    2 +
 contrib/buildsystems/CMakeLists.txt        |    6 +
 convert.c                                  |    4 +-
 pkt-line.c                                 |   30 +-
 pkt-line.h                                 |   13 +-
 simple-ipc.h                               |  221 ++++
 t/helper/test-simple-ipc.c                 |  485 +++++++++
 t/helper/test-tool.c                       |    1 +
 t/helper/test-tool.h                       |    1 +
 t/t0052-simple-ipc.sh                      |  129 +++
 unix-socket.c                              |   58 +-
 unix-socket.h                              |    9 +
 17 files changed, 2828 insertions(+), 14 deletions(-)
 create mode 100644 Documentation/technical/api-simple-ipc.txt
 create mode 100644 compat/simple-ipc/ipc-shared.c
 create mode 100644 compat/simple-ipc/ipc-unix-socket.c
 create mode 100644 compat/simple-ipc/ipc-win32.c
 create mode 100644 simple-ipc.h
 create mode 100644 t/helper/test-simple-ipc.c
 create mode 100755 t/t0052-simple-ipc.sh


base-commit: 71ca53e8125e36efbda17293c50027d31681a41f
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-766%2Fjeffhostetler%2Fsimple-ipc-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-766/jeffhostetler/simple-ipc-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/766

Comments

Ævar Arnfjörð Bjarmason Jan. 12, 2021, 4:50 p.m. UTC | #1
On Tue, Jan 12 2021, Jeff Hostetler via GitGitGadget wrote:

> This series introduces a multi-threaded IPC mechanism called "Simple IPC".
> This is a library-layer feature to make it easy to create very long running
> daemon/service applications and for unrelated Git commands to communicate
> with them. Communication uses pkt-line messaging over a Windows named pipe
> or Unix domain socket.
>
> On the server side, Simple IPC implements a (platform-specific) connection
> listener and worker thread-pool to accept and handle a series of client
> connections. The server functionality is completely hidden behind the
> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service
> application only needs to define an application-specific callback to handle
> client requests.
>
> Note that Simple IPC is completely unrelated to the long running process
> feature (described in sub-process.h) where the lifetime of a "sub-process"
> child is bound to that of the invoking parent process and communication
> occurs over the child's stdin/stdout.
>
> Simple IPC will serve as a basis for a future builtin FSMonitor daemon
> feature.

I only skimmed this so far. In the past we had a git-read-cache--daemon
-> git-index-helper[1] -> watchman. The last iteration of that seems to
be the [3] re-roll from Ben Peart in 2017. I used/tested that for a
while and had some near-production use-cases of it.

How does this new series relate to that past work (if at all), and (not
having re-read the old threads) were there reasons those old patch
serieses weren't merged in that are addressed here, mitigated etc?

1. https://lore.kernel.org/git/1402406665-27988-1-git-send-email-pclouds@gmail.com/
2. https://lore.kernel.org/git/1457548582-28302-1-git-send-email-dturner@twopensource.com/
3. https://lore.kernel.org/git/20170518201333.13088-1-benpeart@microsoft.com/
4. https://lore.kernel.org/git/87bmhfwmqa.fsf@evledraar.gmail.com/
Jeff Hostetler Jan. 12, 2021, 6:25 p.m. UTC | #2
On 1/12/21 11:50 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Tue, Jan 12 2021, Jeff Hostetler via GitGitGadget wrote:
> 
>> This series introduces a multi-threaded IPC mechanism called "Simple IPC".
>> This is a library-layer feature to make it easy to create very long running
>> daemon/service applications and for unrelated Git commands to communicate
>> with them. Communication uses pkt-line messaging over a Windows named pipe
>> or Unix domain socket.
>>
>> On the server side, Simple IPC implements a (platform-specific) connection
>> listener and worker thread-pool to accept and handle a series of client
>> connections. The server functionality is completely hidden behind the
>> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service
>> application only needs to define an application-specific callback to handle
>> client requests.
>>
>> Note that Simple IPC is completely unrelated to the long running process
>> feature (described in sub-process.h) where the lifetime of a "sub-process"
>> child is bound to that of the invoking parent process and communication
>> occurs over the child's stdin/stdout.
>>
>> Simple IPC will serve as a basis for a future builtin FSMonitor daemon
>> feature.
> 
> I only skimmed this so far. In the past we had a git-read-cache--daemon
> -> git-index-helper[1] -> watchman. The last iteration of that seems to
> be the [3] re-roll from Ben Peart in 2017. I used/tested that for a
> while and had some near-production use-cases of it.
> 
> How does this new series relate to that past work (if at all), and (not
> having re-read the old threads) were there reasons those old patch
> serieses weren't merged in that are addressed here, mitigated etc?
> 
> 1. https://lore.kernel.org/git/1402406665-27988-1-git-send-email-pclouds@gmail.com/
> 2. https://lore.kernel.org/git/1457548582-28302-1-git-send-email-dturner@twopensource.com/
> 3. https://lore.kernel.org/git/20170518201333.13088-1-benpeart@microsoft.com/
> 4. https://lore.kernel.org/git/87bmhfwmqa.fsf@evledraar.gmail.com/
> 

I'm starting with the model used by the existing FSMonitor feature
that Ben Peart and Kevin Willford added to Git.

Item [3] looks to be an earlier draft of that effort.  The idea there
was to add the fsmonitor hook that could talk to a daemon like Watchman
and quickly update the in-memory cache-entry flags without the need to
lstat() and similarly update the untracked-cache.  An index extension
was added to remember the last fsmonitor response processed.

Currently in Git, we have a fsmonitor hook (usually a perl script) that
talks to Watchman and translates the Watchman response back into
something that the Git client can understand.  This comes back as a
list of files that have changed since some timestamp (or in V2, relative
to some daemon-specific token).

Items [1,2] are not related to that.  That was a different effort to
quickly fetch a read-only copy of an already-parsed index via shared
memory.  In the last version I saw, there were 2 daemons.  index-helper
kept a fresh view of the index in shared memory and could give it to
the Git client.  The client could just mmap the pre-parsed index and
avoid calling `read_index()`.  Index-helper would drive Watchman to
keep track of cache-entries as they changed and handle the lstat's.

I'm not familiar with [4] (and I only quickly scanned it).  There are
several ideas for finding slow spots while reading the index.  I don't
want to go into all of them, but several are obsolete now.  They didn't
contribute to the current effort.


The Simple IPC series (and a soon to be submitted fsmonitor--daemon
series) are intended to be a follow on to FSMonitor effort that is
currently in Git.

1. Build a git-native daemon to watch the file system and avoid needing
a third-party tool.  This doesn't preclude the use of Watchman, but
having a builtin tool might simplify engineering support costs when
deploying to a large team.

2. Use direct IPC between the Git command and the daemon to avoid the
expense of the Hook API (which is expensive on Windows).

3. Make the daemon Git-aware.  For example, it might want to pre-filter
ignored files.  (This might not be present in V1.  And we might extend
the daemon to do more of this as we improve performance.)

Jeff
Junio C Hamano Jan. 12, 2021, 8:01 p.m. UTC | #3
"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This series introduces a multi-threaded IPC mechanism called "Simple IPC".
> This is a library-layer feature to make it easy to create very long running
> daemon/service applications and for unrelated Git commands to communicate
> with them. Communication uses pkt-line messaging over a Windows named pipe
> or Unix domain socket.
>
> On the server side, Simple IPC implements a (platform-specific) connection
> listener and worker thread-pool to accept and handle a series of client
> connections. The server functionality is completely hidden behind the
> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service
> application only needs to define an application-specific callback to handle
> client requests.
>
> Note that Simple IPC is completely unrelated to the long running process
> feature (described in sub-process.h) where the lifetime of a "sub-process"
> child is bound to that of the invoking parent process and communication
> occurs over the child's stdin/stdout.
>
> Simple IPC will serve as a basis for a future builtin FSMonitor daemon
> feature.

What kind of security implications does this bring into the system?

Can a Simple IPC daemon be connected by any client?  How does the
daemon know that the other side is authorized to make requests?
When a git binary acting as client connect to whatever happens to be
listening to the well-known location, how does it know if the other
side is the daemon it wanted to talk to and not a malicious MITM or
other impersonator?

Or is this to be only used for "this is meant to be used to give
read-only access to data that is public anyway" kind of daemon,
e.g. "git://" transport that serves clones and fetches?

Or is this meant to be used on client machines where all the
processes are assumed to be working for the end user, so it is OK to
declare that anything will go (good old DOS mental model?)

I know at the Mechanism level we do not yet know how it will be
used, but we cannot retrofit sufficient security, so it would be
necessary to know answers to these questions.

Thanks.
Jeff Hostetler Jan. 12, 2021, 11:25 p.m. UTC | #4
On 1/12/21 3:01 PM, Junio C Hamano wrote:
> "Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> This series introduces a multi-threaded IPC mechanism called "Simple IPC".
>> This is a library-layer feature to make it easy to create very long running
>> daemon/service applications and for unrelated Git commands to communicate
>> with them. Communication uses pkt-line messaging over a Windows named pipe
>> or Unix domain socket.
>>
>> On the server side, Simple IPC implements a (platform-specific) connection
>> listener and worker thread-pool to accept and handle a series of client
>> connections. The server functionality is completely hidden behind the
>> ipc_server_run() and ipc_server_run_async() APIs. The daemon/service
>> application only needs to define an application-specific callback to handle
>> client requests.
>>
>> Note that Simple IPC is completely unrelated to the long running process
>> feature (described in sub-process.h) where the lifetime of a "sub-process"
>> child is bound to that of the invoking parent process and communication
>> occurs over the child's stdin/stdout.
>>
>> Simple IPC will serve as a basis for a future builtin FSMonitor daemon
>> feature.
> 
> What kind of security implications does this bring into the system?
> 
> Can a Simple IPC daemon be connected by any client?  How does the
> daemon know that the other side is authorized to make requests?
> When a git binary acting as client connect to whatever happens to be
> listening to the well-known location, how does it know if the other
> side is the daemon it wanted to talk to and not a malicious MITM or
> other impersonator?
> 
> Or is this to be only used for "this is meant to be used to give
> read-only access to data that is public anyway" kind of daemon,
> e.g. "git://" transport that serves clones and fetches?
> 
> Or is this meant to be used on client machines where all the
> processes are assumed to be working for the end user, so it is OK to
> declare that anything will go (good old DOS mental model?)
> 
> I know at the Mechanism level we do not yet know how it will be
> used, but we cannot retrofit sufficient security, so it would be
> necessary to know answers to these questions.
> 
> Thanks.
> 

Good questions.

Yes, this is a local-only mechanism.  A local-only named pipe on
Windows and a Unix domain socket on Unix.  In both cases the daemon
creates the pipe/socket as the foreground user (since the daemon
process will be implicitly started by the first Git command that
needs to talk to it).  Later client process try to open the pipe/socket
with RW access if they can.

On Windows a local named pipe is created by the server side.  It rejects
remote connections.  I did not put an ACL, so it should inherit the
system default which grants the user RW access (since the daemon is
implicitly started by the first foreground client command that needs
to talk to it.)  Other users in the user's group and the anonymous
user should have R but not W access to it, so they could not be able
to connect.  The name pipe is kept in the local Named Pipe File System
(NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the 
system, but I don't think it is a problem.

On the Unix side, the socket is created inside the .git directory
by the daemon.  Potential clients would have to have access to the
working directory and the .git directory to connect to the socket,
so in normal circumstances they would be able to read everything in
the WD anyway.  So again, I don't think it is a problem.


Alternatively, if a malicious server is started and holds the named
pipe or socket:  the client might be able to talk to it (assuming
that bad server grants access or impersonates the rightful user).
The client might not be able to tell they've been tricked, but at
that point the system is already compromised.

So unless I'm missing something, I think we're OK either way.

Jeff
Junio C Hamano Jan. 13, 2021, 12:13 a.m. UTC | #5
Jeff Hostetler <git@jeffhostetler.com> writes:

> On Windows a local named pipe is created by the server side.  It rejects
> remote connections.  I did not put an ACL, so it should inherit the
> system default which grants the user RW access (since the daemon is
> implicitly started by the first foreground client command that needs
> to talk to it.)  Other users in the user's group and the anonymous
> user should have R but not W access to it, so they could not be able
> to connect.  The name pipe is kept in the local Named Pipe File System
> (NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the
> system, but I don't think it is a problem.

It is not intuitively obvious why globalluy visible thing is OK to
me, but I'll take your word for it on stuff about Windows.

> On the Unix side, the socket is created inside the .git directory
> by the daemon.  Potential clients would have to have access to the
> working directory and the .git directory to connect to the socket,
> so in normal circumstances they would be able to read everything in
> the WD anyway.  So again, I don't think it is a problem.

OK, yes, writability to .git would automatically mean that
everything is a fair game to those who can talk to the daemon, so
there is no new issue here, as long as the first process that
creates the socket is careful not to loosen the permission.

Thanks.
Jeff Hostetler Jan. 13, 2021, 12:32 a.m. UTC | #6
On 1/12/21 7:13 PM, Junio C Hamano wrote:
> Jeff Hostetler <git@jeffhostetler.com> writes:
> 
>> On Windows a local named pipe is created by the server side.  It rejects
>> remote connections.  I did not put an ACL, so it should inherit the
>> system default which grants the user RW access (since the daemon is
>> implicitly started by the first foreground client command that needs
>> to talk to it.)  Other users in the user's group and the anonymous
>> user should have R but not W access to it, so they could not be able
>> to connect.  The name pipe is kept in the local Named Pipe File System
>> (NPFS) as `\\.\pipe\<unique-path>` so it is globally visible on the
>> system, but I don't think it is a problem.
> 
> It is not intuitively obvious why globalluy visible thing is OK to
> me, but I'll take your word for it on stuff about Windows.

Sorry, that's a quirk of Windows.  Windows has a funky virtual drive
where named pipes are stored -- kind of like a magic directory in /proc
on Linux.  All local named pipes have the "\\.\pipe\" path prefix.

So they are globally visible as a side-effect of that "namespace"
restriction.

> 
>> On the Unix side, the socket is created inside the .git directory
>> by the daemon.  Potential clients would have to have access to the
>> working directory and the .git directory to connect to the socket,
>> so in normal circumstances they would be able to read everything in
>> the WD anyway.  So again, I don't think it is a problem.
> 
> OK, yes, writability to .git would automatically mean that
> everything is a fair game to those who can talk to the daemon, so
> there is no new issue here, as long as the first process that
> creates the socket is careful not to loosen the permission.
> 
> Thanks.
>
Jeff King Jan. 13, 2021, 1:46 p.m. UTC | #7
On Tue, Jan 12, 2021 at 06:25:20PM -0500, Jeff Hostetler wrote:

> On the Unix side, the socket is created inside the .git directory
> by the daemon.  Potential clients would have to have access to the
> working directory and the .git directory to connect to the socket,
> so in normal circumstances they would be able to read everything in
> the WD anyway.  So again, I don't think it is a problem.

Just thinking out loud, here are two potential issues with putting it in
.git that we may have to deal with later:

  - fsmonitor is conceptually a read-only thing (i.e., it would speed up
    "git status", etc). And not knowing much about how it will work, I'd
    guess that is carried through (i.e., even though you may open the
    socket R/W so that you can write requests and read them back, there
    is no operation you can request that will overwrite data). But the
    running user may not have write access to .git.

    As long as we cleanly bail to the non-fsmonitor code paths, I don't
    think it's the end of the world. Those read-only users just won't
    get to use the speedup (and it may even be desirable). They may
    complain, but it is open source so the onus is on them to improve
    it. You will not have made anything worse. :)

  - repositories may be on network filesystems that do not support unix
    sockets.

So it would be nice if there was some way to specify an alternate path
to be used for the socket. Possibly one or both of:

  - a config option to give a root path for sockets, where Git would
    then canonicalize the $GIT_DIR name and use $root/$GIT_DIR for the
    socket. That solves the problem for a given user once for all repos.

  - a config option to say "use this path for the socket". This would be
    per-repo, but is more flexible and possibly less confusing.

One final note: on some systems[1] the permissions on the socket file
itself are ignored. The safe way to protect it is to make sure the
permissions on the surrounding directory are what you want. See
credential-cache's init_socket_directory() for an example.

-Peff

[1] Sorry, I don't remember which systems. This is one of those random
    bits of Unix lore I've carried around for 20 years, and it's
    entirely possible it is simply obsolete at this point.
Ævar Arnfjörð Bjarmason Jan. 13, 2021, 3:48 p.m. UTC | #8
On Wed, Jan 13 2021, Jeff King wrote:

> On Tue, Jan 12, 2021 at 06:25:20PM -0500, Jeff Hostetler wrote:
>
>> On the Unix side, the socket is created inside the .git directory
>> by the daemon.  Potential clients would have to have access to the
>> working directory and the .git directory to connect to the socket,
>> so in normal circumstances they would be able to read everything in
>> the WD anyway.  So again, I don't think it is a problem.
>
> Just thinking out loud, here are two potential issues with putting it in
> .git that we may have to deal with later:
>
>   - fsmonitor is conceptually a read-only thing (i.e., it would speed up
>     "git status", etc). And not knowing much about how it will work, I'd
>     guess that is carried through (i.e., even though you may open the
>     socket R/W so that you can write requests and read them back, there
>     is no operation you can request that will overwrite data). But the
>     running user may not have write access to .git.
>
>     As long as we cleanly bail to the non-fsmonitor code paths, I don't
>     think it's the end of the world. Those read-only users just won't
>     get to use the speedup (and it may even be desirable). They may
>     complain, but it is open source so the onus is on them to improve
>     it. You will not have made anything worse. :)
>
>   - repositories may be on network filesystems that do not support unix
>     sockets.
>
> So it would be nice if there was some way to specify an alternate path
> to be used for the socket. Possibly one or both of:
>
>   - a config option to give a root path for sockets, where Git would
>     then canonicalize the $GIT_DIR name and use $root/$GIT_DIR for the
>     socket. That solves the problem for a given user once for all repos.
>
>   - a config option to say "use this path for the socket". This would be
>     per-repo, but is more flexible and possibly less confusing.
>
> One final note: on some systems[1] the permissions on the socket file
> itself are ignored. The safe way to protect it is to make sure the
> permissions on the surrounding directory are what you want. See
> credential-cache's init_socket_directory() for an example.
>
> -Peff
>
> [1] Sorry, I don't remember which systems. This is one of those random
>     bits of Unix lore I've carried around for 20 years, and it's
>     entirely possible it is simply obsolete at this point.

According to StackExchange lore this seems to have been the case with
4.2 BSD & maybe something obscure like HP/UX:
https://unix.stackexchange.com/questions/83032/which-systems-do-not-honor-socket-read-write-permissions

I'd say it's probably safe to ignore this as a concern for new features
in git in general, and certainly for something like a thing intended for
a watchman-like program which users are likely to only run on modern
OS's.