GNU Bash & The UNIX Philosophy
A lot of developers I talk to don’t really know what
bash is. If I call it “the command line”,
then they get it. “Oh, yeah, the thing I’ve got to pop open to git push
, or
npm start
or pip install
. Yeah, I get it. Why, what about it?”
To be clear, I have no intention of shaming those folks. For most developers (especially modern ones), that’s all they need to know: it’s a command-line window that they use to do a few things with their software to get it to run. And that’s totally fine.
But man, was I enthralled by bash
when I first started using it. Yes, it’s
“the command line”. Yes, I need it to install libraries for my project. Yes,
every now and then I’d Google how to do something, and find arcane shell voodoo
to solve some problem I had. But I didn’t come to appreciate what bash
really is, or what it can really do for me, until I started diving deeper
into my tech toolbox.
bash
is a shell; it’s a piece of software that wraps around your system’s
kernel (like a… shell). The kernel is the piece of software that ultimately
handles the translation of software instructions, to hardware functions. In this
way, a shell is the closest human-readable interface to this low-level tooling.
A shell can be the “glue language” of an operating system – it can be used to
manage how your software tools interact with one another. It’s an incredibly
powerful tool that is often overlooked unless you have an explicit use case for
it.
bash
, specifically, is a GNU-enhanced version of the original Unix shell
program, sh
. It’s very mature & stable (it was first released in 1989), but
still under active development. As with most software, bash
is the name of the
software itself and of the programming language required to interface with it.
The language is quite feature-complete: it has support for conditionals,
iteration (loops), functions, variables, arithmetic operations, and other common
features of most programming languages. Shell code gets a lot of flak in an era
where languages like Python & Ruby exist (much more readable/expressive/safe),
but when used correctly, it can
be remarkably powerful and lightweight, given its age.
UNIX-alike shells (like bash
) were originally designed to adhere to the “UNIX
Philosophy”, which is a set of
guiding principles for minimalistic, modular software development. While the
UNIX Philosophy has drawn some specific criticism over its 40-plus-year
existence, the core tenets of what it lays out still hold true today. There’s a
lot to digest from writings on the Philosophy itself, but the two pieces that
are the most recognizably relevant today are that developers should:
-
Write programs that do one thing, and do it well; and
-
Write programs that are designed to work together with other, as-yet-unknown programs.
The third most common tenet in the shell ecosystem is that you should “write programs to handle text streams, since text is a universal interface”. This is still true in some software domains, but not all. But it is very relevant in shell environments, as UNIX-alike shells treat everything as strings (in this regard, the shell has the most strongly-typed language in the world!).
Now, if you’ve ever been around any “big data” technologies, you’ve likely at least heard of Hadoop, MapReduce, etc. These are distributed processing toolsets that can allow for incredibly fast (or even just possible) data processing tasks. By distributing workflows across mulitiple machines, instead of equipping a single machine with many hundreds of gigabytes of RAM, you can get your results faster. But when the data you do have is small enough to fit on a single machine, using these tools can be overkill. Even for data files that are several gigabytes in size, the overhead associated with a distributed processing framework can be less efficient than an OS-native approach using local tools. That article I linked walks through the particulars of a case study that the author did to demonstrate this.
This is a really, really powerful idea. We’re often so inundated with whatever the latest, buzzwordy tools & frameworks are that we forget that the problems we’re trying to solve aren’t necessarily new; tools already exist to solve the most common, timeless problems. There shouldn’t be a need to pull in several megabytes (or more) of dependencies, or download new, decentralized software just to:
-
Run parallel tasks
-
Process / parse JSON
-
Work with RDBMS data
-
Interpolate dynamic input into another set of commands
-
And on and on and on.
(I’d also like to continue on this kind of “dependency hell” tangent, specifically regarding the Python 3 standard library and how it’s criminally ignored, but I’ll save that for another time).
Now, those example programs I linked above are not at all an exhaustive list of
the toolbox available to you; I bring those up because they’re some of my
favorites for those specific tasks. But the most wonderful thing about using
these CLI tools in a UNIX-alike environment is that they’re all composable. I
can curl
some REST API to retrieve a JSON response, pipe it to jq
to extract
relevant fields / rebuild the structure, pipe that to parallel
to push each
bit of JSON to AWS S3, and then append each S3
status response to a log to inspect / process for retry later.
That all might look something like this:
curl 'https://some.data.api.org/api' \
| jq -c '.[] | {.key: .some.nested.value}' \
| parallel -j0 -I{} \
aws s3 cp <(echo {}) s3://somebucket/somekey \
>> s3.log
Don’t worry if some of that looks like the shell voodoo I mentioned earlier in the post (because it totally is). The point is, each line in that block is being processed by a separate, specific tool, and each of those tools has one clear, well-defined job that it does very well:
-
curl
gets the data -
jq
changes the data -
parallel
runs thes3 cp
command to upload the data, using as may processes as possible -
>>
redirects the terminal output to append to a log file.
And again, all of these tools are either already installed, or are just an
install command away via your OS’ package manager (aws
being the exception;
it’s just supported by these tools). No fuddling with environments, far fewer
version conflicts (these are mostly super-stable software), and an overall
straightforward development experience.
Note that each of these steps is modular. Perhaps some rework would be in
order if you were to swap one tool out for another, but that depends on your
program logic of the data format, not the tooling itself. For example, no other
change would be necessary to swap curl
out for wget
, or to drop the
parallel
line entirely (if you just want to run the S3 upload commands in
serial). It might looks arcane at first, but once the syntax of each tool is
understood, I find this to be remarkably beautiful & powerful.
In fact, I love the idea of this approach so much, I’ll be giving a talk on it at the Dev Up conference in St. Louis this October (shameless plug). If you find yourself in the area, come check out the conference!
There’s so much more I could say about bash
, the GNU ecosystem of tools, and
the UNIX Philosophy, but suffice it to say that if you’ve not spent a little
time exploring each of those, I sincerely believe that you’re missing out. Who
among us doesn’t like to geek out on vintage crap; you like record players,
don’t you?
Footnotes
-
An additional benefit of learning tools like
bash
and other native friends, is that maybe you work in an environment where “better” tools aren’t permitted. I’ve been working on a client project recently that has me writing a lot ofbash
scripts, either for demoing functionality, configuration, integration testing the stack, or promoting code between environments. -
For a light-hearted, but also quite topical interpretation of the UNIX Philosophy, have a read through the UNIX Koans, which depict the teachings of “Master Foo”.