What steps will reproduce the problem?

echo "TEST CONTENT" > fileA
cp fileA fileB
git annex add file{A,B}
git annex drop fileA --force
cat fileB

What is the expected output? What do you see instead?

expected:

--> TEST CONTENT

observed:

--> cat: fileB: No such file or directory

What version of git-annex are you using? On what operating system?

git-annex version: 3.20121017

Please provide any additional information below.

I really like git annex's feature, to store the same content only once. But as this happens transparently (i.e. the user does not need to no, nor is he told, that contents are identical (which is very comfortable, of course)), the "git annex drop" function is broken. For it effectively deleting (seemingly) random files, WITHOUT notifying the user.

Possible solution?

One simple solution would be to use "git annex find" functionality to see who else uses the file and NOT deleting it.

But this still leaves a problem:

Consider the following variation of the above example and assume, that "drop" does not delete content that is still used (i.e. implementing the above solution).

echo "TEST CONTENT" > fileA
cp fileA fileB
git annex add file{A,B}
git rm fileB
git annex drop fileA --force
git checkout --force
cat fileB

--> cat: fileB: No such file or directory

Here again, the problem is, that the user would probably (correct me if I am wrong) expect that the fileB still exists, because removing a file and checking it out again is expected to not mess with the annex contents (?). He does not know, that the "annex frop fileA" actually drop fileB's contents, because there was no additional file linking to it. It effectively performed a "git annex dropunused".

We seem to have agreed this is reasonable behavior, and a doc change was done. Do feel free to suggest other doc changes.. done --Joey

You can avoid this by not using a deduplicating backend; for example you can use the WORM backend.

However, the foot shooting actually occurs due to using drop --force, which is explicitly asking git-annex to be unsafe. If you want to be safe, simply don't use --force. If you want to safely delete a file, simply git rm it, and then git annex unused will come along and find content that can safely be removed.

Comment by http://joeyh.name/ Sun Oct 28 17:13:32 2012

I onyl used "--force" for demonstration purposes. I could also set

annex.numcopies = 0

which removes the need "force". While this setting can be totally reasonable in certain circumstancing it seems very dangerous, that completely unrelated files might unwillingly be deleted.

I agree with you, that a possible solution could be to not use a deduplicating backend. But my point is, that this needs to be either changed or documented. Because even if the user can "fix" this by changing his behavior, he will probably only do so AFTER he lost something.

Instead of changing the program (to include a check), I would at least suggest an addition to "drop"'s documentation:

"drop": keep in mind, that on dedcupliocating backends, you might end up deleting more than one file. to be perfectly safe, use git-rm and git-annex dropunused.

numcopies=0 is inherently unsafe, and unreasonable if you value your data at all. I've added some warnings about it to the man page.
Comment by http://joeyh.name/ Sun Oct 28 23:29:46 2012

Thanks, that's cool. Admittedly, I cannot think of too many scenarios, where there are two identical files without the user's knowloedge. And an even smaller subset of scenarios, where one would want to issue a "drop" on (only) one of these due to storage shortages.

By the way, I LOVE git-annex.

PS: I just realized, that the same applies to the "move" command.

You're guaranteed to still have at least 1 copy of the file after move though, so you can get it back.
Comment by http://joeyh.name/ Mon Oct 29 00:03:40 2012
Comments on this page are closed.