Soak up Pipelines with sponge

thumbnail-for-Soak-up-Pipelines-with-sponge

While migrating my vipe script from perl to python, I had the chance to look into more of the programs in the moreutils project. While there were a few that seemed really useful, sponge is far and away the most practical utility I've seen. In this post lets explore the issues and applications of the sponge utility.

#Accidentally Truncating Files

sponge addresses a common use case when working with a shell pipeline: modifying files in place. For example:

sed --in-place 's/bar/baz/g' foo.txt

This opens the file foo.txt, replaces each occurrence of bar with baz on each line and then saves the changes back to the foo.txt. The default behaviour of sed is to output whatever modifications it does to the file back to stdout, however the --in-place flag makes it save those changes back to the opened files instead. Quite often whatever stream editor you're using (sed, awk, perl) has a flag like this to modify a file in place. However what about when this isn't an option? Most beginners generally try to do this:

cat foo.txt | sed 's/bar/baz/' > foo.txt
Error
But DON'T. This doesn't modify the file, foo.txt, in place; It truncates/erases it.

To explain, we're outputting the contents of the file foo.txt (with cat foo.txt), editing it with sed (sed 's/bar/baz/'), before finally redirecting the changes back to foo.txt (> foo.txt). The issue is that when you redirect to a file, your shell will open the file in write mode, erasing its previous contents. Then by the time cat foo.txt gets started you'll be reading from an empty file. In essence you tried to modify a file and you instead erased it.

I'm sad to admit that this issue has bitten me in the back at least twice (>д<). I always try to keep backups and nowadays I avoid redirecting with > because of how dangerous this is. If you have to use the previous approach you should redirect elsewhere and then save the changes back. It also helps to delete any temporary files you make when you're done.

cat foo.txt | sed 's/bar/baz/' > foo2.txt && mv -f foo2.txt foo.txt || rm foo2.txt

This works... but it's needlessly wordy and too cumbersome to have to type each time you have to do something as simple as edit a file in a pipeline. Luckily moreutils sponge gives you a much nicer approach for problems like this.

#Sponge

Like it's namesake, sponge is a command that soaks in its input before writing it back out. If one of the commands in the pipeline fails then a broken pipe error (errno 32) will be sent to sponge and it won't end up writing to anything. Otherwise it'll wait until it's input is finished and then write it out.

Our earlier example becomes:

cat foo.txt | sed 's/bar/baz/ | sponge foo.txt

One extra command in the pipeline and we can save ourselves the hassle of maintaining temporary state and error handling.

#When to Not Use Sponge

Frankly a more important discussion than when to use sponge, is when not to use it.

Generally you'll never want to use it when a builtin option will get you the same affect. sed has an option for editing files in place so you'll rarely ever want to use sponge with sed. Why do sed 's/bar/baz/' foo.txt | sponge foo.txt when sed -i 's/bar/baz/' foo.txt will suffice?

Of course there's some situations in which it makes sense. The most common would probably be when you'd like to build up a complex edit rather then incrementally apply commands. For example:

sed -i 's/bar/baz/' foo.txt
sed -i 's/foo/bag/' foo.txt
sed -i 's/bingle/bangle/' foo.txt
Note
In this case sed can accept multiple expression For demonstration purposes I've divided them into 3 seperate commands.

Any one of these commands could fail and you end up in the unenviable position of trying to backtrack through them to see which one failed and why? You then have to reset the file to a state where you can apply the command again and get your desired affect. In this case it's pretty straightforward because each is a simple edit and none of them interfere with each other, but for more complex edits this can become a serious headache.

cmd | sponge file, on the other hand, has the added benefit that if any cmd before sponge in the pipeline fails, nothing will happen to file.

sed 's/bar/baz/' foo.txt |
  sed 's/foo/bag/' |
  sed 's/bingle/bangle/' |
  sponge foo.txt

now foo.txt will stay unmodified if any command leading up to it fails, otherwise we'll get the changes we want.

Conclusion

This has been a brief but useful introduction to the sponge command, I recommend using this as an opportunity to explore some of the other cool commands in the moreutils project. There's honestly some great utilities (eg. ifne) which admittedly you'll rarely find a use for; but when you do you'll be grateful they exist.