GNU Bash & The UNIX Philosophy

A lot of developers I talk to don’t really know what bash is. If I call it “the command line”, then they get it. “Oh, yeah, the thing I’ve got to pop open to git push, or npm start or pip install. Yeah, I get it. Why, what about it?”

To be clear, I have no intention of shaming those folks. For most developers (especially modern ones), that’s all they need to know: it’s a command-line window that they use to do a few things with their software to get it to run. And that’s totally fine.

But man, was I enthralled by bash when I first started using it. Yes, it’s “the command line”. Yes, I need it to install libraries for my project. Yes, every now and then I’d Google how to do something, and find arcane shell voodoo to solve some problem I had. But I didn’t come to appreciate what bash really is, or what it can really do for me, until I started diving deeper into my tech toolbox.

bash is a shell; it’s a piece of software that wraps around your system’s kernel (like a… shell). The kernel is the piece of software that ultimately handles the translation of software instructions, to hardware functions. In this way, a shell is the closest human-readable interface to this low-level tooling. A shell can be the “glue language” of an operating system – it can be used to manage how your software tools interact with one another. It’s an incredibly powerful tool that is often overlooked unless you have an explicit use case for it.

bash, specifically, is a GNU-enhanced version of the original Unix shell program, sh. It’s very mature & stable (it was first released in 1989), but still under active development. As with most software, bash is the name of the software itself and of the programming language required to interface with it. The language is quite feature-complete: it has support for conditionals, iteration (loops), functions, variables, arithmetic operations, and other common features of most programming languages. Shell code gets a lot of flak in an era where languages like Python & Ruby exist (much more readable/expressive/safe), but when used correctly, it can be remarkably powerful and lightweight, given its age.

UNIX-alike shells (like bash) were originally designed to adhere to the “UNIX Philosophy”, which is a set of guiding principles for minimalistic, modular software development. While the UNIX Philosophy has drawn some specific criticism over its 40-plus-year existence, the core tenets of what it lays out still hold true today. There’s a lot to digest from writings on the Philosophy itself, but the two pieces that are the most recognizably relevant today are that developers should:

Write programs that do one thing, and do it well; and
Write programs that are designed to work together with other, as-yet-unknown programs.

The third most common tenet in the shell ecosystem is that you should “write programs to handle text streams, since text is a universal interface”. This is still true in some software domains, but not all. But it is very relevant in shell environments, as UNIX-alike shells treat everything as strings (in this regard, the shell has the most strongly-typed language in the world!).

Now, if you’ve ever been around any “big data” technologies, you’ve likely at least heard of Hadoop, MapReduce, etc. These are distributed processing toolsets that can allow for incredibly fast (or even just possible) data processing tasks. By distributing workflows across mulitiple machines, instead of equipping a single machine with many hundreds of gigabytes of RAM, you can get your results faster. But when the data you do have is small enough to fit on a single machine, using these tools can be overkill. Even for data files that are several gigabytes in size, the overhead associated with a distributed processing framework can be less efficient than an OS-native approach using local tools. That article I linked walks through the particulars of a case study that the author did to demonstrate this.

This is a really, really powerful idea. We’re often so inundated with whatever the latest, buzzwordy tools & frameworks are that we forget that the problems we’re trying to solve aren’t necessarily new; tools already exist to solve the most common, timeless problems. There shouldn’t be a need to pull in several megabytes (or more) of dependencies, or download new, decentralized software just to:

Hit a URL (or wget to be even more OS-native)
Run parallel tasks
Process tabular text data
Process / parse JSON
Work with RDBMS data
Interpolate dynamic input into another set of commands
And on and on and on.

(I’d also like to continue on this kind of “dependency hell” tangent, specifically regarding the Python 3 standard library and how it’s criminally ignored, but I’ll save that for another time).

Now, those example programs I linked above are not at all an exhaustive list of the toolbox available to you; I bring those up because they’re some of my favorites for those specific tasks. But the most wonderful thing about using these CLI tools in a UNIX-alike environment is that they’re all composable. I can curl some REST API to retrieve a JSON response, pipe it to jq to extract relevant fields / rebuild the structure, pipe that to parallel to push each bit of JSON to AWS S3, and then append each S3 status response to a log to inspect / process for retry later.

That all might look something like this:

curl 'https://some.data.api.org/api' \
    | jq -c '.[] | {.key: .some.nested.value}' \
    | parallel -j0 -I{} \
        aws s3 cp <(echo {}) s3://somebucket/somekey \
    >> s3.log

Don’t worry if some of that looks like the shell voodoo I mentioned earlier in the post (because it totally is). The point is, each line in that block is being processed by a separate, specific tool, and each of those tools has one clear, well-defined job that it does very well:

curl gets the data
jq changes the data
parallel runs the s3 cp command to upload the data, using as may processes as possible
>> redirects the terminal output to append to a log file.

And again, all of these tools are either already installed, or are just an install command away via your OS’ package manager (aws being the exception; it’s just supported by these tools). No fuddling with environments, far fewer version conflicts (these are mostly super-stable software), and an overall straightforward development experience.

Note that each of these steps is modular. Perhaps some rework would be in order if you were to swap one tool out for another, but that depends on your program logic of the data format, not the tooling itself. For example, no other change would be necessary to swap curl out for wget, or to drop the parallel line entirely (if you just want to run the S3 upload commands in serial). It might looks arcane at first, but once the syntax of each tool is understood, I find this to be remarkably beautiful & powerful.

In fact, I love the idea of this approach so much, I’ll be giving a talk on it at the Dev Up conference in St. Louis this October (shameless plug). If you find yourself in the area, come check out the conference!

There’s so much more I could say about bash, the GNU ecosystem of tools, and the UNIX Philosophy, but suffice it to say that if you’ve not spent a little time exploring each of those, I sincerely believe that you’re missing out. Who among us doesn’t like to geek out on vintage crap; you like record players, don’t you?

Footnotes

An additional benefit of learning tools like bash and other native friends, is that maybe you work in an environment where “better” tools aren’t permitted. I’ve been working on a client project recently that has me writing a lot of bash scripts, either for demoing functionality, configuration, integration testing the stack, or promoting code between environments.
For a light-hearted, but also quite topical interpretation of the UNIX Philosophy, have a read through the UNIX Koans, which depict the teachings of “Master Foo”.