Justin Azoff realized git-annex should have an incremental fsck.

This requires storing the last fsck time of each object.

I would not be strongly opposed to sqlite, but I think there are other places the data could be stored. One possible place is the mode or mtime of the .git/annex/objects/xx/yy/$key directories (the parent directories of where the content is stored). Perhaps the sticky bit could be used to indicate the content has been fsked, and the mtime indicate the time of last fsck. Anything that dropped or put in content would need to clear the sticky bit. --Joey

Basic incremental fsck is done now.

Some enhancements would include:

  • --max-age=30d Once the incremental fsck completes and was started 30 days ago, start a new one.
  • --time-limit --size-limit --file-limit: Limit how long the fsck runs.

Calling this done. The --incremental-schedule option allows scheduling time between incremental fscks. --time-limit is done. I implemented --smallerthan independently. Not clear what --file-limit would be. --Joey

I have a proof of concept written in python.

You can run it and point it the root of an annex or to a subdirectory. In my brief testing it seems to work :-)

the goal would be to have options like

git annex fsck /data/annex --check-older-than 1w --check-for 2h --max-load-avg 0.5
Comments on this page are closed.