Message ID | pull.923.v3.git.1625150864.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Builtin FSMonitor Feature | expand |
On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: > Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I > rebased this series onto v2.32.0. > > V3 addresses most of the previous review comments and things we've learned > from our experimental testing of V2. (A version of V2 was shipped as an > experimental feature in the v2.32.0-based releases of Git for Windows and > VFS for Git.) > > There are still a few items that I need to address, but that list is getting > very short. ... > fsmonitor-fs-listen-win32: stub in backend for Windows > fsmonitor-fs-listen-macos: stub in backend for MacOS I left some light comments on the repo-settings.c part of this to follow up from a previous round. Any other testing of it is stalled by there being no linux backend for it as part of this series. I see from spelunking repos that Johannes had a WIP compat/fsmonitor/linux.c which looks like it could/should mostly work, but the API names all changed since then, and after a short try I gave up on trying to rebase it. I'd really prefer for git not to have features that place free platforms at a disadvantage against proprietary platforms if it can be avoided, and in this case the lack of a Linux backend also means much less widespread testing of the feature among the development community / CI.
On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: > >> Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I >> rebased this series onto v2.32.0. >> >> V3 addresses most of the previous review comments and things we've learned >> from our experimental testing of V2. (A version of V2 was shipped as an >> experimental feature in the v2.32.0-based releases of Git for Windows and >> VFS for Git.) >> >> There are still a few items that I need to address, but that list is getting >> very short. > > ... >> fsmonitor-fs-listen-win32: stub in backend for Windows >> fsmonitor-fs-listen-macos: stub in backend for MacOS > > I left some light comments on the repo-settings.c part of this to follow > up from a previous round. Thanks. > > Any other testing of it is stalled by there being no linux backend for > it as part of this series. I see from spelunking repos that Johannes had > a WIP compat/fsmonitor/linux.c which looks like it could/should mostly > work, but the API names all changed since then, and after a short try I > gave up on trying to rebase it. The early Linux version was dropped because inotify does not give recursive coverage -- only the requested directory. Using inotify requires adding a watch to each subdirectory (recursively) in the worktree. There's a system limit on the maximum number of watched directories (defaults to 8K IIRC) and that limit is system-wide. Since the whole point was to support large very large repos, using inotify was a non-starter, so I removed the Linux version from our patch series. For example, the first repo I tried it on (outside of the test suite) had 25K subdirectories. I'm told there is a new fanotify API in recent Linux kernels that is a better fit for what we need, but we haven't had time to investigate it yet. > > I'd really prefer for git not to have features that place free platforms > at a disadvantage against proprietary platforms if it can be avoided, > and in this case the lack of a Linux backend also means much less > widespread testing of the feature among the development community / CI. > This feature is always going to have platform-specific components, so the lack of one platform or another should not stop us from discussing it for the platforms that can be supported. And given the size and complexity of the platform-specific code, we should not assume that "just test it on Linux" is sufficient. Yes, there are some common/shared routines/data structures in the daemon, but hard/tricky parts are in the platform layer. Jeff
On Thu, Jul 01 2021, Jeff Hostetler wrote: > On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >> >>> Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I >>> rebased this series onto v2.32.0. >>> >>> V3 addresses most of the previous review comments and things we've learned >>> from our experimental testing of V2. (A version of V2 was shipped as an >>> experimental feature in the v2.32.0-based releases of Git for Windows and >>> VFS for Git.) >>> >>> There are still a few items that I need to address, but that list is getting >>> very short. >> ... >>> fsmonitor-fs-listen-win32: stub in backend for Windows >>> fsmonitor-fs-listen-macos: stub in backend for MacOS >> I left some light comments on the repo-settings.c part of this to >> follow >> up from a previous round. > > Thanks. > >> Any other testing of it is stalled by there being no linux backend >> for >> it as part of this series. I see from spelunking repos that Johannes had >> a WIP compat/fsmonitor/linux.c which looks like it could/should mostly >> work, but the API names all changed since then, and after a short try I >> gave up on trying to rebase it. > > The early Linux version was dropped because inotify does not give > recursive coverage -- only the requested directory. Using inotify > requires adding a watch to each subdirectory (recursively) in the > worktree. There's a system limit on the maximum number of watched > directories (defaults to 8K IIRC) and that limit is system-wide. > > Since the whole point was to support large very large repos, using > inotify was a non-starter, so I removed the Linux version from our > patch series. For example, the first repo I tried it on (outside > of the test suite) had 25K subdirectories. > > I'm told there is a new fanotify API in recent Linux kernels that > is a better fit for what we need, but we haven't had time to investigate > it yet. That default limit is a bit annoying, but I don't see how it's a blocker in any way. You simply adjust the limit. E.g. I deployed and tested the hook version of inotify (using watchman) in a sizable development environment, and written my own code using the API. This was all before fanotify(7) existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it with really large Git repos, including with David Turner's 2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around a quarter-million). If you have a humongous repository and don't have root on your own machine you're out of luck. But I think most people who'd use this are either using their own laptop, or e.g. in a corporate setting where administrators would tweak the sysctl limits given the performance advantages (as I did). Once you adjust the limits Linux deals with large trees just fine, it just has overly conservative limits for most things in sysctl. The API is a bit annoying, your event loop needs to run around and add watches. AFAICT you need Linux 5.1 for fanotify(7) to be useful, e.g. Debian stable, RHEL etc. aren't using anything that new. So having an inotify backend as well as possibly a fanotify one would be very useful. And linux.git's docs suggest that the default limit was bumped from 8192 to 1M in v5.10, a really recent kernel, so if you've got that you've also got fanotify. In any case, even if Linux's inotify limit was something hardcoded and impossible to change you could still use such an API to test & debug the basics of this feature on that platform. >> I'd really prefer for git not to have features that place free >> platforms >> at a disadvantage against proprietary platforms if it can be avoided, >> and in this case the lack of a Linux backend also means much less >> widespread testing of the feature among the development community / CI. >> > > This feature is always going to have platform-specific components, > so the lack of one platform or another should not stop us from > discussing it for the platforms that can be supported. (I think per the above that's s/can be/are/) > And given the size and complexity of the platform-specific code, > we should not assume that "just test it on Linux" is sufficient. > Yes, there are some common/shared routines/data structures in the > daemon, but hard/tricky parts are in the platform layer. I think we're talking past each other a bit here. I'm not saying that you can get full or meaningful testing for it on Windows if you test it on Linux, or the other way around. Of course there's platform-specific stuff, although there's also a lot of non-platform-specific stuff, so even a very basic implementation of inotify would make reviwing this easier / give access to more reviewers. I'm saying that I prefer that Git as a free software project not be in the situation of saying the best use-case for a given size/shape of repo is to use Git in combination with proprietary operating systems over freely licensed ones. IOW what the FSF has a policy for GNU projects. Now, I think the FSF probably goes too far in that (famously removing rather obscure font rendering features from Emacs on OSX), but "manage lots of data" is a core feature of git.
On 7/1/21 5:26 PM, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Jul 01 2021, Jeff Hostetler wrote: > >> On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: >>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >>> >>>> Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I >>>> rebased this series onto v2.32.0. >>>> >>>> V3 addresses most of the previous review comments and things we've learned >>>> from our experimental testing of V2. (A version of V2 was shipped as an >>>> experimental feature in the v2.32.0-based releases of Git for Windows and >>>> VFS for Git.) >>>> >>>> There are still a few items that I need to address, but that list is getting >>>> very short. >>> ... >>>> fsmonitor-fs-listen-win32: stub in backend for Windows >>>> fsmonitor-fs-listen-macos: stub in backend for MacOS >>> I left some light comments on the repo-settings.c part of this to >>> follow >>> up from a previous round. >> >> Thanks. >> >>> Any other testing of it is stalled by there being no linux backend >>> for >>> it as part of this series. I see from spelunking repos that Johannes had >>> a WIP compat/fsmonitor/linux.c which looks like it could/should mostly >>> work, but the API names all changed since then, and after a short try I >>> gave up on trying to rebase it. >> >> The early Linux version was dropped because inotify does not give >> recursive coverage -- only the requested directory. Using inotify >> requires adding a watch to each subdirectory (recursively) in the >> worktree. There's a system limit on the maximum number of watched >> directories (defaults to 8K IIRC) and that limit is system-wide. >> >> Since the whole point was to support large very large repos, using >> inotify was a non-starter, so I removed the Linux version from our >> patch series. For example, the first repo I tried it on (outside >> of the test suite) had 25K subdirectories. >> >> I'm told there is a new fanotify API in recent Linux kernels that >> is a better fit for what we need, but we haven't had time to investigate >> it yet. > > That default limit is a bit annoying, but I don't see how it's a blocker > in any way. > > You simply adjust the limit. E.g. I deployed and tested the hook version > of inotify (using watchman) in a sizable development environment, and > written my own code using the API. This was all before fanotify(7) > existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it > with really large Git repos, including with David Turner's > 2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around > a quarter-million). > > If you have a humongous repository and don't have root on your own > machine you're out of luck. But I think most people who'd use this are > either using their own laptop, or e.g. in a corporate setting where > administrators would tweak the sysctl limits given the performance > advantages (as I did). > > Once you adjust the limits Linux deals with large trees just fine, it > just has overly conservative limits for most things in sysctl. The API > is a bit annoying, your event loop needs to run around and add watches. > > AFAICT you need Linux 5.1 for fanotify(7) to be useful, e.g. Debian > stable, RHEL etc. aren't using anything that new. So having an inotify > backend as well as possibly a fanotify one would be very useful. > > And linux.git's docs suggest that the default limit was bumped from 8192 > to 1M in v5.10, a really recent kernel, so if you've got that you've > also got fanotify. > > In any case, even if Linux's inotify limit was something hardcoded and > impossible to change you could still use such an API to test & debug the > basics of this feature on that platform. Good points. If the inotify limits can be increased into the millions then we can revisit supporting it. I do worry about possible race conditions as we have to scan the worktree and add/delete a watch for each directory, but we don't need to worry about that right now. I do want to have Linux support eventually, but I was saving it for a later effort (and/or looking for volunteers). My IPC-based builtin daemon complements the existing hook-based fsmonitor support that Ben Peart and Kevin Willford added a few years ago. That model (and PERL hook script and Watchman integration) work fine for Linux, so the advantages of a builtin monitor aren't as compelling. For example, on Linux, hook process creation is fast, PERL is fast, and it is easy to just apt-get a tool like Watchman. But on Windows, process creation is slow, NTFS is slow, PERL is available as part of the Git-for-Windows distribution, and installing third-party tools like Watchman onto a collection of enterprise users' machines is a chore, so it made sense of us to pick platforms where the existing hook model has issues and add other platforms later. Besides, this patch series is already at 34 commits. And some of them are quite large. Adding another platform would just make it even larger. Right now I'm more interested in the larger question of whether we WANT to have a builtin fsmonitor and do we like the overall design of what I have here? Picking up a new platform, whether it is Linux, or AIX, or Solaris, or Nonstop, or whatever, should nicely fit into one platform-specific file in compat/fsmonitor and not take that long. > >>> I'd really prefer for git not to have features that place free >>> platforms >>> at a disadvantage against proprietary platforms if it can be avoided, >>> and in this case the lack of a Linux backend also means much less >>> widespread testing of the feature among the development community / CI. >>> >> >> This feature is always going to have platform-specific components, >> so the lack of one platform or another should not stop us from >> discussing it for the platforms that can be supported. > > (I think per the above that's s/can be/are/) > >> And given the size and complexity of the platform-specific code, >> we should not assume that "just test it on Linux" is sufficient. >> Yes, there are some common/shared routines/data structures in the >> daemon, but hard/tricky parts are in the platform layer. > > I think we're talking past each other a bit here. I'm not saying that > you can get full or meaningful testing for it on Windows if you test it > on Linux, or the other way around. > > Of course there's platform-specific stuff, although there's also a lot > of non-platform-specific stuff, so even a very basic implementation of > inotify would make reviwing this easier / give access to more reviewers. > > I'm saying that I prefer that Git as a free software project not be in > the situation of saying the best use-case for a given size/shape of repo > is to use Git in combination with proprietary operating systems over > freely licensed ones. I wouldn't worry about that. Even without Watchman integration, Linux runs things so much faster than anything else it's not funny. If anything, we need things like fsmonitor on Windows to help keep up with Linux. > > IOW what the FSF has a policy for GNU projects. Now, I think the FSF > probably goes too far in that (famously removing rather obscure font > rendering features from Emacs on OSX), but "manage lots of data" is a > core feature of git. > Jeff
Hi Ævar, On Thu, 1 Jul 2021, Ævar Arnfjörð Bjarmason wrote: > On Thu, Jul 01 2021, Jeff Hostetler wrote: > > > On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: > > > >> Any other testing of it is stalled by there being no linux backend > >> for it as part of this series. I see from spelunking repos that > >> Johannes had a WIP compat/fsmonitor/linux.c which looks like it > >> could/should mostly work, but the API names all changed since then, > >> and after a short try I gave up on trying to rebase it. I am a bit surprised that you gave up so easily, it cannot be that hard if you use `git rebase -i` smartly. But I think there is an even bigger obstacle lurking than just the challenge of rebasing that experimental backend. > > The early Linux version was dropped because inotify does not give > > recursive coverage -- only the requested directory. Using inotify > > requires adding a watch to each subdirectory (recursively) in the > > worktree. There's a system limit on the maximum number of watched > > directories (defaults to 8K IIRC) and that limit is system-wide. > > > > Since the whole point was to support large very large repos, using > > inotify was a non-starter, so I removed the Linux version from our > > patch series. For example, the first repo I tried it on (outside of > > the test suite) had 25K subdirectories. > > > > I'm told there is a new fanotify API in recent Linux kernels that is a > > better fit for what we need, but we haven't had time to investigate it > > yet. > > That default limit is a bit annoying, but I don't see how it's a blocker > in any way. Let me help you to see it. So let's assume that you start FSMonitor-enabled Git, with the default values. What is going to happen if you have any decently-sized worktree? You run out of handles. What then? Throw your hands in the air? Stop working? Report incorrect results? Those are real design challenges, and together with the race problems Jeff mentioned, they pose a much bigger obstacle than the rebasing you mentioned above. > You simply adjust the limit. E.g. I deployed and tested the hook version > of inotify (using watchman) in a sizable development environment, and > written my own code using the API. This was all before fanotify(7) > existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it > with really large Git repos, including with David Turner's > 2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around > a quarter-million). > > If you have a humongous repository and don't have root on your own > machine you're out of luck. But I think most people who'd use this are > either using their own laptop, or e.g. in a corporate setting where > administrators would tweak the sysctl limits given the performance > advantages (as I did). This conjecture that most people who'd use this are using their own laptop or have a corporate setting where administrators would tweak the sysctl limits according to engineers' wishes strikes me as totally made up from thin air, nothing else. In other words, I find it an incredibly unconvincing argument. I prefer not to address the rest of your mail, as I found it not only a lengthy tangent (basically trying to talk Jeff into adding Linux support in what could have been a much shorter mail), but actively distracting from the already long patch series Jeff presented. It is so long, in fact, that we had to put in an exemption in GitGitGadget because it is already longer than a usually-unreasonable 30 patches. Also, at this point, insisting on Linux support (in so many words) is unhelpful. Let me summarize why I think this is unhelpful: In Git, it is our tradition to develop incrementally, for better or worse. Jeff's effort brought us to a point where we already have Windows and macOS support, i.e. support for the most prevalent development platforms (see e.g. https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems). We already established multiple obstacles for Linux support, therefore demanding Linux support to be included Right Now would increase the patch series even further, making it even less reviewable, being even less incremental, hold up the current known-to-work-well state, force Jeff to work on something he probably cannot work on right now, and therefore delaying the entire effort even further. Ciao, Johannes
On Mon, Jul 05 2021, Johannes Schindelin wrote: > On Thu, 1 Jul 2021, Ævar Arnfjörð Bjarmason wrote: > >> On Thu, Jul 01 2021, Jeff Hostetler wrote: >> >> > On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: > [...] >> > The early Linux version was dropped because inotify does not give >> > recursive coverage -- only the requested directory. Using inotify >> > requires adding a watch to each subdirectory (recursively) in the >> > worktree. There's a system limit on the maximum number of watched >> > directories (defaults to 8K IIRC) and that limit is system-wide. >> > >> > Since the whole point was to support large very large repos, using >> > inotify was a non-starter, so I removed the Linux version from our >> > patch series. For example, the first repo I tried it on (outside of >> > the test suite) had 25K subdirectories. >> > >> > I'm told there is a new fanotify API in recent Linux kernels that is a >> > better fit for what we need, but we haven't had time to investigate it >> > yet. >> >> That default limit is a bit annoying, but I don't see how it's a blocker >> in any way. > > Let me help you to see it. > > So let's assume that you start FSMonitor-enabled Git, with the default > values. What is going to happen if you have any decently-sized worktree? > You run out of handles. What then? Throw your hands in the air? Stop > working? Report incorrect results? > > Those are real design challenges, and together with the race problems Jeff > mentioned, they pose a much bigger obstacle than the rebasing you > mentioned above. You report an error and tell the user to raise the limit, and cover this in your install docs. It's what watchman does: https://github.com/facebook/watchman/blob/master/watchman/error_category.cpp#L28-L45 https://facebook.github.io/watchman/docs/install.html#system-specific-preparation >> You simply adjust the limit. E.g. I deployed and tested the hook version >> of inotify (using watchman) in a sizable development environment, and >> written my own code using the API. This was all before fanotify(7) >> existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it >> with really large Git repos, including with David Turner's >> 2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around >> a quarter-million). >> >> If you have a humongous repository and don't have root on your own >> machine you're out of luck. But I think most people who'd use this are >> either using their own laptop, or e.g. in a corporate setting where >> administrators would tweak the sysctl limits given the performance >> advantages (as I did). > > This conjecture that most people who'd use this are using their own laptop > or have a corporate setting where administrators would tweak the sysctl > limits according to engineers' wishes strikes me as totally made up from > thin air, nothing else. > > In other words, I find it an incredibly unconvincing argument. It's from a sample size of one experience of deploying this in a BigCorp setting. But sure, perhaps things are done differently where you work/have worked. My experience is that even if you're dealing with some BOFHs and e.g. are using shared racked development servers it's generally not an insurmountable task to get some useful software installed, or some system configuration tweaked. In this case we're talking about ~40MB of kernel memory for 1 million dirs IIRC, that coupled with the target audience that benefits most from this probably being deployments that are *painfully* aware of their "git status" slowness... > I prefer not to address the rest of your mail, as I found it not only a > lengthy tangent (basically trying to talk Jeff into adding Linux support > in what could have been a much shorter mail), but actively distracting > from the already long patch series Jeff presented. It is so long, in fact, > that we had to put in an exemption in GitGitGadget because it is already > longer than a usually-unreasonable 30 patches. Also, at this point, > insisting on Linux support (in so many words) is unhelpful. This part of the tread started because Jeff H. claimed upthread: [...]inotify was a non-starter, so I removed the Linux version from our patch series But after I noted that it works just fine, you just need to change some sysctl limits. It seems at this point we're debating whether some installations of Linux have BOFH-y enough administrators that they won't tweak sysctl limits for you. OK, but given that I've run this thing in a production setting it's clearly not a "non-starter". I think it could be useful for a lot of users. I'll reply with more (and hopefully helpful) specifics to Jeff's mail. > Let me summarize why I think this is unhelpful: In Git, it is our > tradition to develop incrementally, for better or worse. Jeff's effort > brought us to a point where we already have Windows and macOS support, > i.e. support for the most prevalent development platforms (see e.g. > https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems). > We already established multiple obstacles for Linux support, therefore > demanding Linux support to be included Right Now would increase the patch > series even further, making it even less reviewable, being even less > incremental, hold up the current known-to-work-well state, force Jeff to > work on something he probably cannot work on right now, and therefore > delaying the entire effort even further. I think we just disagree. I wouldn't call my opinion "unhelpful" any more than yours. I don't think Git's ever had anything like a major feature (built in, config settings, etc. etc.) contributed by a propriterary OS vendor that works on that vendor's OS, as well as another vendor's propriterary OS, but not comparable free systems Is that less incremental and perhaps less practical? Sure. It's not an entirely practical viewpoint. I work on free software partly for idealistic reasons. I'd prefer if the project I'm working on doesn't give users a carrot to pick proprietary systems over free ones. But ultimately it's not my call, but Junio's.
On Fri, Jul 02 2021, Jeff Hostetler wrote: > On 7/1/21 5:26 PM, Ævar Arnfjörð Bjarmason wrote: >> On Thu, Jul 01 2021, Jeff Hostetler wrote: >> >>> On 7/1/21 1:40 PM, Ævar Arnfjörð Bjarmason wrote: >>>> On Thu, Jul 01 2021, Jeff Hostetler via GitGitGadget wrote: >>>> >>>>> Here is V3 of my patch series to add a builtin FSMonitor daemon to Git. I >>>>> rebased this series onto v2.32.0. >>>>> >>>>> V3 addresses most of the previous review comments and things we've learned >>>>> from our experimental testing of V2. (A version of V2 was shipped as an >>>>> experimental feature in the v2.32.0-based releases of Git for Windows and >>>>> VFS for Git.) >>>>> >>>>> There are still a few items that I need to address, but that list is getting >>>>> very short. >>>> ... >>>>> fsmonitor-fs-listen-win32: stub in backend for Windows >>>>> fsmonitor-fs-listen-macos: stub in backend for MacOS >>>> I left some light comments on the repo-settings.c part of this to >>>> follow >>>> up from a previous round. >>> >>> Thanks. >>> >>>> Any other testing of it is stalled by there being no linux backend >>>> for >>>> it as part of this series. I see from spelunking repos that Johannes had >>>> a WIP compat/fsmonitor/linux.c which looks like it could/should mostly >>>> work, but the API names all changed since then, and after a short try I >>>> gave up on trying to rebase it. >>> >>> The early Linux version was dropped because inotify does not give >>> recursive coverage -- only the requested directory. Using inotify >>> requires adding a watch to each subdirectory (recursively) in the >>> worktree. There's a system limit on the maximum number of watched >>> directories (defaults to 8K IIRC) and that limit is system-wide. >>> >>> Since the whole point was to support large very large repos, using >>> inotify was a non-starter, so I removed the Linux version from our >>> patch series. For example, the first repo I tried it on (outside >>> of the test suite) had 25K subdirectories. >>> >>> I'm told there is a new fanotify API in recent Linux kernels that >>> is a better fit for what we need, but we haven't had time to investigate >>> it yet. >> That default limit is a bit annoying, but I don't see how it's a >> blocker >> in any way. >> You simply adjust the limit. E.g. I deployed and tested the hook >> version >> of inotify (using watchman) in a sizable development environment, and >> written my own code using the API. This was all before fanotify(7) >> existed. IIRC I set most of the limits to 2^24 or 2^20. I've used it >> with really large Git repos, including with David Turner's >> 2015-04-03-1M-git for testing (`git ls-files | wc -l` on that is around >> a quarter-million). >> If you have a humongous repository and don't have root on your own >> machine you're out of luck. But I think most people who'd use this are >> either using their own laptop, or e.g. in a corporate setting where >> administrators would tweak the sysctl limits given the performance >> advantages (as I did). >> Once you adjust the limits Linux deals with large trees just fine, >> it >> just has overly conservative limits for most things in sysctl. The API >> is a bit annoying, your event loop needs to run around and add watches. >> AFAICT you need Linux 5.1 for fanotify(7) to be useful, e.g. Debian >> stable, RHEL etc. aren't using anything that new. So having an inotify >> backend as well as possibly a fanotify one would be very useful. >> And linux.git's docs suggest that the default limit was bumped from >> 8192 >> to 1M in v5.10, a really recent kernel, so if you've got that you've >> also got fanotify. >> In any case, even if Linux's inotify limit was something hardcoded >> and >> impossible to change you could still use such an API to test & debug the >> basics of this feature on that platform. > > Good points. If the inotify limits can be increased into the millions > then we can revisit supporting it. I also replied in the side-thread on this topic: https://lore.kernel.org/git/874kd874qv.fsf@evledraar.gmail.com/ > [...] I do worry about possible race > conditions as we have to scan the worktree and add/delete a watch for > each directory, but we don't need to worry about that right now. In the watchman code there's a scheduleRecrawl() in such cases, but it's not just something that happens on Linux. On Windows it's if ReadDirectoryChangesW() returns ERROR_NOTIFY_ENUM_DIR. This builtin version doesn't check that at all. Here's someone who seems to work at MSFT talking about it: https://devblogs.microsoft.com/oldnewthing/20110812-00/?p=9913 So isn't this an issue on Windows as well? > I do want to have Linux support eventually, but I was saving it for > a later effort (and/or looking for volunteers). OK, upthread you said it was a "non-starter" because of the adjustable limits we've been discussing... > My IPC-based builtin daemon complements the existing hook-based > fsmonitor support that Ben Peart and Kevin Willford added a few years > ago. That model (and PERL hook script and Watchman integration) work > fine for Linux, so the advantages of a builtin monitor aren't as > compelling. > > For example, on Linux, hook process creation is fast, PERL is fast,[...] Did you or Ben ever try to reproduce what I noted at: https://lore.kernel.org/git/CACBZZX5e58bWuf3NdDYTxu2KyZj29hHONzN=rp-7vXd8nURyWQ@mail.gmail.com/ And more recently with reference to that at: https://lore.kernel.org/git/87h7lgfchm.fsf@evledraar.gmail.com/ I.e. when I tested it back then the issue wasn't Perl's performance, it was something between watchman and the user, happening in git. I.e. watchman would return in the single-digit milliseconds, git would take hundreds or thousands of ms. Maybe I missed some intervening analysis, but this didn't have to do with process overhead back then. Also, watchman has a JSON network interface. So for a "native" solution I also wonder if you or Ben tried a solution that involved just talking to that over the local network. The hook/perl script was just a trivial aid to make doing that easier. Wouldn't that be just as fast (or near enough) on Windows? Ben mentioned in the linked thread that there was some talks between Microsoft (or well, the "Git team") and the watchman people. So again, I missed some in-between discussions. I'm just wondering about the design choices... > and it is easy to just apt-get a tool like Watchman. But on > Windows, process creation is slow, NTFS is slow, PERL is available > as part of the Git-for-Windows distribution, and installing third-party > tools like Watchman onto a collection of enterprise users' machines is > a chore, so it made sense of us to pick platforms where the existing > hook model has issues and add other platforms later. If we're trying to do an end-run around package systems being a pain would including the relevant part of watchman in contrib + talking to it over its network interface be a replacement for some/most/all of this series? > Besides, this patch series is already at 34 commits. And some of > them are quite large. Adding another platform would just make it > even larger. > > Right now I'm more interested in the larger question of whether > we WANT to have a builtin fsmonitor and do we like the overall > design of what I have here? Picking up a new platform, whether > it is Linux, or AIX, or Solaris, or Nonstop, or whatever, should > nicely fit into one platform-specific file in compat/fsmonitor > and not take that long. Sure, FWIW I'm not opposed to this being a built-in per-se, or us re-implementing parts of watchman here, and not being able to test this series meaningfully a lot of these questions are probably easy to answer on a supported OS... >>>> I'd really prefer for git not to have features that place free >>>> platforms >>>> at a disadvantage against proprietary platforms if it can be avoided, >>>> and in this case the lack of a Linux backend also means much less >>>> widespread testing of the feature among the development community / CI. >>>> >>> >>> This feature is always going to have platform-specific components, >>> so the lack of one platform or another should not stop us from >>> discussing it for the platforms that can be supported. >> (I think per the above that's s/can be/are/) >> >>> And given the size and complexity of the platform-specific code, >>> we should not assume that "just test it on Linux" is sufficient. >>> Yes, there are some common/shared routines/data structures in the >>> daemon, but hard/tricky parts are in the platform layer. >> I think we're talking past each other a bit here. I'm not saying >> that >> you can get full or meaningful testing for it on Windows if you test it >> on Linux, or the other way around. >> Of course there's platform-specific stuff, although there's also a >> lot >> of non-platform-specific stuff, so even a very basic implementation of >> inotify would make reviwing this easier / give access to more reviewers. >> I'm saying that I prefer that Git as a free software project not be >> in >> the situation of saying the best use-case for a given size/shape of repo >> is to use Git in combination with proprietary operating systems over >> freely licensed ones. > > I wouldn't worry about that. Even without Watchman integration, > Linux runs things so much faster than anything else it's not funny. > > If anything, we need things like fsmonitor on Windows to help keep > up with Linux. Even on Linux if you do a "git status" on a humongous repository it can take 1-2 seconds (see old but still valid linked numbers above), whereas if you've been logging fs events and ask a daemon "what changed since time xyz" it can take <5ms. But that's just been on repositories I've tested. I'd assumed that this part of GVFS wasn't something that could have been fairly easily replaced by having the relevant developers run a Linux distro in vmware or whatever, adn that's been consistent with my own testing on still-big-but-smaller repos than that.
Hi Ævar, On Tue, 6 Jul 2021, Ævar Arnfjörð Bjarmason wrote: > I think we just disagree. I wouldn't call my opinion "unhelpful" any > more than yours. You still misunderstand. This is not about any "opinion" of yours, it is about your delay tactics to make it deliberately difficult to finish this patch series, by raising the bar beyond what is reasonable for a single patch series. And you keep doing it. I would appreciate if you just stopped with all those tangents and long and many replies that do not seem designed to help the patch series stabilize, but do the opposite. Ciao, Johannes
Johannes Schindelin wrote: > In Git, it is our tradition to develop incrementally, for better or > worse. We develop incrementally, but the first version of a big feature has to have as many eyes on it as possible in order to track potential issues in the future. It is very common to look back at an API and say "oh, if only wad thought of that back then". Therefore we should do our best to think about it now. We want to avoid future aw-shucks. > Jeff's effort > brought us to a point where we already have Windows and macOS support, > i.e. support for the most prevalent development platforms (see e.g. > https://insights.stackoverflow.com/survey/2020#technology-developers-primary-operating-systems). > We already established multiple obstacles for Linux support, therefore > demanding Linux support to be included Right Now would increase the patch > series even further, making it even less reviewable, being even less > incremental, hold up the current known-to-work-well state, force Jeff to > work on something he probably cannot work on right now, and therefore > delaying the entire effort even further. This is a red herring. You don't need to send the Linux support as part of the patch series, you can simply provide a branch for the people that want to give it a try. Even if it's not ready for wide use, even if it needs a specific version of Linux, even if it's a proof of concept, you can provide it. The real reason it's not provided is laziness (this is not an attack, laziness is good trait in a programmer, although not always). Cheers.
Johannes Schindelin wrote: > On Tue, 6 Jul 2021, Ævar Arnfjörð Bjarmason wrote: > > > I think we just disagree. I wouldn't call my opinion "unhelpful" any > > more than yours. > > You still misunderstand. This is not about any "opinion" of yours, it is > about your delay tactics to make it deliberately difficult to finish this > patch series, The whole job of a reviewer is to *delay* the inclusion of the patch series being reviewed until it has passed his/her personal standards. Ævar is just doing his job (metaphorically). > by raising the bar beyond what is reasonable for a single patch > series. It is not up to you to determine what is reasonable for a patch series, and given that you have a vested interest you are also not an objective observer. I have hundreds of patches pending review, and I would love for anyone to try find issues with them, even if I ultimately disagreed with their assessment. The Git project doesn't owe your patches any preferential treatment. The review process will take as long as the review process takes. You can't put deadlines on open source projects.