layout: true
Lecture 3: Unix and You
--- class: center, middle # Week 3 --- # Announcements * Basic 1 is "due" today * Actually due on the 24th * Advanced 1, Basic 2, and Advanced 2 are out * Lecture 2 survey is closing today --- class: center, middle # Unix and You ### Lecture 3 Where I try not to turn this into an OS lecture --- # Overview 1. What is Unix? 2. How does Unix work? 3. Interacting with Unix via Shells (feat. Bash) --- # What is Unix? (review) * Family of operating systems derived from the original AT&T Unix from the '70s * Fun fact: C was developed for use with the original Unix -- * Emphasis on small programs and scripts composed into bigger and bigger systems * Preference for plain-text files for configuration and data -- * Spawned many derivatives and clones: BSD, Solaris, AIX, mac OS, Linux * Became so prevalent in industry and academia that it's been immortalized as a set of standards: __POSIX__ (IEEE 1003) -- * From here on out, whenever I say or write "Unix" and "\*nix" I'm referring to (mostly) POSIX-compliant systems * mac OS is POSIX-certified, while Linux is not --- # What does POSIX mean for us? * We get a neat set of standards! * As long as you follow the standards (and avoid any implementation-specific behavior), your scripts/code should work on other POSIX systems -- ## Examples of POSIX standard things * C POSIX API: headers like `unistd.h`, `fcntl.h`, `pthread.h`, `sys/types.h` * Command line interface and utilities: `cd`, `ls`, `mkdir`, `grep` * [Commands in the specification](https://en.wikipedia.org/wiki/List_of_Unix_commands) * Sort by "Status"; "Mandatory" ones are the useful ones to look at * File paths/names * Directory structure * Environment variables: `USER`, `HOME`, `PATH` --- # Unix philosophy * Write programs that do one thing and do it well. * Write programs to work together. * Write programs to handle text streams, because that is a universal interface.
- Peter Salus,
A Quarter-Century of Unix
(1994)
--- class: center, middle # How does Unix work? ### We're starting from the ground up >:) --- # Components ### Kernel * Software that serves as the intermediary between hardware resources and user applications * Manages hardwares access * Handles things like multi-tasking, security enforcement, file systems, device drivers, launching programs, and more * Present a stable application programming interface (API) for user programs to use in the form of _system calls_ -- ### Applications * Software that users run and interact with * Includes things like Bash, nano, VS Code, Gnome Desktop, `ls` etc. -- #### Hand in hand, these form an overall operating system --- # Unix design * Effectively boils down to processes interacting with files * Program: list of instructions to execute * Process: a running instance of a program -- * __Files serve as a sort of universal interface__ * Processes pass data to each other via a read/write interface --- # Unix processes * Identified by a process ID (PID) * Associated with a user * Has a current working directory -- * Has an associated program _image_: the actual CPU instructions to run * Has memory containing the image and program data like variables --- # Unix processes * File descriptor table * Handles to various resources that have a file interface (read/write/__seek__) * _File descriptors_ are indexes into this table -- * 0: Standard input (`stdin`, `cin`) * 1: Standard output (`stdout`, `cout`) * 2: Standard error (`stderr`, `cerr`) -- * POSIX functions for handling these: `open()`, `close()`, `read()` etc. * Don't confuse them with C stdio functions: `fopen()`, `fclose()`, `fread()` etc. (these are often an abstraction for the POSIX functions) -- * Environment variables * Provide information about the process's environment * `PATH`: directories to find executables in * `PWD`: current working directory * `USER`: user * `HOME`: user's home directory * ...and more --- ## Signals * A way to communicate with processes * `man 7 signal` * `kill` (ignore the name) can signal processes * ^C (Ctrl-C) at a terminal sends SIGINT (interrupt) * ^Z sends SIGTSTP (terminal stop) * Programs can implement handlers for custom behavior * SIGKILL and SIGSTOP can't be handled --- ## Process creation * A process calls `fork()` to make a copy of itself * The process is called the "parent" and the copy is called the "child" * The child is a __perfect copy__ of the parent, except for the `fork()` return value * This includes program variables, program arguments, environment variables\*, etc. -- * The child can call `exec()` functions to load a new program * `man 3 exec` * This wipes the process's memory for the new program's data * Environment variables and file descriptor table is left the same -- * Cool, I'll have it run `execvp("ls", args)` to list the current directory! -- * But what if we parameterized the executable to run? * We have the beginnings of a shell... --- # Unix files * In Unix, everything is a file * Data living on a disk? That's a file * Directories? Those are special kinds of files * Your instance of `vim`? That can be represented by a bunch of files! -- * Unix files represent a _stream of bytes_ that you can read from or write to * Serves as a neat interface * `stdin` and `stdout` are seen as files by your program * __What if we tie the output of one process to the input of another?__ -- * Files have various properties * You can check them with `ls -l` * `r`: read * `w`: write * `x`: execute * These three are often grouped together to form an octal digit (gasp! octal!) * User owner, group owner * `chmod` and `chown` can modify these --- # (Generic) Unix directory structure ### Some normal ones * `/`: root, the beginning of all things * `/bin`: binaries * `/lib`: libraries * `/etc`: configuration files * `/var`: "variable" files, logs and other files that change over time * `/home`: user home directories ### Everything is a file * `/dev`: device files * `/proc`: files that represent runtime OS information --- # Putting them together * It's just processes interacting with other processes and files * Processes create more processes (yes there is a [primordial process](https://en.wikipedia.org/wiki/Init)) * What if we hooked up processes end to end, `stdin` to `stdout`? * We can form a "pipeline" of data processing * What if we tied the output of a process to a file instead of a terminal? -- ### This is the job of a shell --- class: center, middle # Interacting with Unix via Shells ## feat. Bash ### `:(){ :|:& };:` #### Do __NOT__ run this --- # Job control * We're familiar with just launching a process * `$ echo "hello world"` -- * There's other things we can do, like launch it in the background with `&` * `$ echo "hello world" &` -- * ^C (SIGINT) can cause most process to stop * ^Z (SIGTSTP) can cause most processes to suspend -- * `jobs` can list out processes (jobs table) that the shell is managing * `bg` can background a process, yielding the terminal back to the shell * `fg` can foreground a process, giving it active control of the terminal * `bg` and `fg` can index off of the jobs table * `disown` can have the shell give up ownership of a process * The `?` variable holds the exit status of the last command * 0 means success/true * Not 0 means failure/false --- # Stringing together commands * `cmd1 && cmd2` (and) * Run `cmd2` if `cmd1` succeeded * `cmd1 || cmd2` (or) * Run `cmd2` if `cmd1` failed * `cmd1 ; cmd2` * Run `cmd2` after `cmd1` -- * `cmd1 | cmd2` * Connect output of `cmd1` to input of `cmd2` * `cmd1`'s fd 1 -> `cmd2`'s fd 0 * `$ echo "hello" | rev` --- # File redirection * `<`: set file as standard input (fd 0) * `$ cmd1 < read.txt` -- * `>`: set file as standard output, overwrite (fd 1) * `$ cmd1 > somefile.txt` -- * `>>`: set file as standard output, append (fd 1) * `$ cmd1 >> somelog.txt` -- ## General form (brackets mean optional) * `[n]<`: set file as an input for fd _n_ (fd 0 if unspecified) * "input" means that the process can `read()` from this fd * `[n]>`: set file as an output for fd _n_ (fd 1 if unspecified) * "output" means that the process can `write()` to this fd * `2>`: capture `stderr` to a file * `[n]>>`: set file as an output for fd _n_, append mode (fd 1 if unspecified) --- ## Advanced Bash file redirection * `&>`: set file as fd 1 and fd 2, overwrite (`stdout` and `stderr` go to same file) * `&>>`: set file as fd 1 and fd 2, append (`stdout` and `stderr` go to same file) * `[n]<>`: set file as input and output on fd _n_ (fd 0 if unspecified) * `[n]<&digit[-]`: copies fd _digit_ to fd _n_ (0 if unspecified) for input; `-` closes _digit_ * `[n]>&digit[-]`: copies fd _digit_ to fd _n_ (1 if unspecified) for output; `-` closes _digit_ -- * `<<`: "Here document"; given a delimiter, enter data as standard input ```bash $ cat << SOME_DELIM > here are some words > some more words > SOME_DELIM ``` * `<<<`: "Here string"; provide string directly as standard input ```bash $ cat <<< "here's a string!" ``` * Both Here documents and strings will expand variables (coming up) --- # Diving into Bash * Side note: `bash` != `sh` * `bash` has a feature superset over `sh` (kinda like a `vim`/`vi` relationship) * Again, confounded by some systems linking/aliasing `sh` to `bash` * While this is about Bash, many other shells have the same syntax * The horse's mouth: [GNU Bash manual](https://www.gnu.org/software/bash/manual/) * If you like the nitty gritty details it's a great read * These slides summarize major features of Bash -- ## What Bash does * Receive a command from a file or terminal input * Splits it into tokens separated by __white-space__ * Takes into account _"quoting"_ rules * Expands/substitutes special tokens * Perform file redirections (and making sure they don't end up as command args) * Execute command --- ## Finding programs to execute * If the command has a `/` in it, it's treated as a filepath and the file will tried to be executed * `$ somedir/somescript` * `$ ./somescript` * If the command doesn't have a `/`, `PATH` will be searched for a corresponding binary * `$ vim` -> searches `PATH` and finds it at `/usr/bin/vim` * This is why you have to specify `./` to run something in your current directory -- ## Shell built-ins * Some commands are "built-in"/implemented by the shell * These will take precedent over ones in the `PATH` * Some other commands don't make sense outside of a shell * Think about why `cd` is a built-in and not a separate utility * (hint: `fork()` and `exec()`) --- ## Shell and environment variables * Shell variables stored inside the shell _process_ * They're handled by the Bash program itself, stored as program data in the process's memory * Launched commands don't inherit them (what does `exec()` do?) * Set them with `varname=varvalue` * __Meaningful whitespace!__ * `varname = varvalue` is interpreted as "run `varname` with arguments `=` and `varvalue`" -- * You can set _environment_ variables with `export` * `export varname=varvalue` --- ## Command grouping * We discussed before that we can string commands together with `;`, `&&`, `||` * We can also group commands together as a unit, with redirects staying local to them: * `(commands)`: performs _commands_ in a "subshell" (another shell _process_/_instance_; what does this mean for _shell_ variables?) * `{ commands; }`: performs _commands_ in the calling shell instance * __Note__: There has to be spaces around the brackets and a semicolon (or newline or `&`) terminating the _commands_ --- ## Expansion and substitution #### Bash has special characters that will indicate that it should _expand_ or _substitute_ to something in a command -- ### Variable expansion * `$varname` will expand to the value of `varname` * `${varname}`: you can use curly brackets to explicitly draw the boundaries on the variable name * `$ echo ${varname}somestring` vs `$ echo $varnamesomestring` * __Note__: expansions/substitutions will be further split into individual tokens by their white-space -- ### Command substitution (via subshell) * `$(command)` will substitute the output of a _command_ in the brackets * `$(echo hello | rev)` will be substituted with "olleh" --- ### Process substitution * `<(command)` will substitute the _command_ output as a filepath, with the output of _command_ being __readable__ * `>(command)` will substitute the _command_ input as a filepath, with the input of _command_ being __writeable__ * `$ diff <(echo hello) <(echo olleh | rev)` * `diff` takes in two file names, but we're replacing them with command outputs -- ### Arithmetic expansion * `$((expr))` will expand to an evaluated arithmetic expression _expr_ --- ## But wait... * What if I actually wanted to __not__ expand a variable and keep the `$`? * What if I didn't want a variable to be split by white-space? * What if I'm lazy and don't want to escape spaces? --- ## Quoting * Allows you to retain certain characters without Bash expanding them and keep them one string * Common use case is to preserve spaces e.g. for filepaths that have spaces in them (spaces delimit tokens in a command) -- * Single quotes (`'`) preserves __all__ of the characters between them * `$ echo '$HOME'` will output `$HOME` -- * Double quotes (`"`) preserve all characters except: `$`, `\`, and backtick * `$ ls "$HOME/Evil Directory With Spaces"` will list the contents of a directory `/home/jdoe/Evil Directory With Spaces` * __Variables expanded inside of double quotes retain their white-space__ * (without this, that path would've had to have been `$HOME/Evil\ Directory\ With\ Spaces`, using `\` to escape the space characters) -- * Note that when quoting, the quotes don't appear in the program's argument * `$ someutil 'imastring'`: `someutil`'s argv[1] will be `imastring` --- ## Control flow ### `if-elif-else` ```bash # '#' comments out the rest of the line # brackets indicate optional parts if test-commands; then commands [elif more-test-commands; then more-commands] [else alt-commands] fi ``` * _test-commands_ is executed and its __exit status__ is used as the condition * ___0___ = success = "true", everything else is "false" -- * You can put the `if-elif-else` structure on one line! * This applies to the upcoming control flow structures as well --- ### Commands for conditionals #### You can use any commands for conditions, but these constructs should be familiar: * `test expr`: `test` command * Shorthand: `[ expr ]` (remember your spaces! `[` is technically a utility name) * `test $a -eq $b` * `[ $a -eq $b ]` -- * `[[ expr ]]`: Bash conditional * Richer set of operators: `==`, `=`, `!=`, `<`, `>`, among others * __Note__: The symbol operators above operate on strings, thus`<` and `>` operators do lexicographic (i.e. dictionary) comparison; "100" is lexicographically less than "2" since for the first characters "1" comes before "2" * Use specific arithmetic binary operators (_a la_ `test`) if you intend on comparing numeric values -- * `[[ $a == $b ]]` * `[[ $a < $b ]]`: this would evaluate to "true" if a=100, b=2 -- * `(( expr ))`: Bash arithmetic conditional * Evaluates as an arithmetic expression * `(( $a < $b ))`: this would evaluate to "false" if a=100, b=2 --- ### `while` ```bash while test-commands; do commands done ``` * Similarly to `if`, the exit status of _test-commands_ is used as the conditional * Repeats _commands_ until the condition __fails__ -- ### `until` ```bash until test-commands; do commands done ``` * Repeats _commands_ until the condition __succeeds__ --- ### `for` ```bash for var in list; do commands done ``` * _list_ will be __expanded__ and on each iteration _var_ will be set to each member of the list * __Note__: if there is no `in list`, it will implicitly iterate over the argument list (i.e. `$@`) --- ## Functions ```bash func-name () compound-command # or function func-name [()] compound-command # [] for optional parens ``` * A __compound command__ is a __command group__ (`()`, `{}`) or a control flow element (`if-elif-else`, `for`) * Called by invoking them like any other utility, including passing arguments * Arguments can be accessed via `$n`, where _n_ is the argument number * `$@`: list of arguments * `$#`: number of arguments --- ### Examples ```bash hello-world () { if echo "Hello world!"; then echo "This should print" fi } # calling hello-world ``` ```bash function touch-dir for x in $(ls); do touch $x; done # calling touch-dir ``` --- ```bash echo-args () { for x in $@; do echo $x done } # calling echo-args a b c d e f g ``` ```bash divide () { if (( $2 == 0 )); then echo "Error: divide by zero" 1>&2 # the redirection copies stderr to stdout # so when echo outputs to its stdout, it's # really going to stderr else echo $(($1 / $2)) fi } # calling divide 10 2 divide 10 0 ``` --- ## Scripts * As was mentioned a few weeks ago, it's annoying to have to type things/go to the history to repeatedly run some commands * Scripts are just plain-text files with commands in them * __There's no special syntax for scripts: if you can enter the commands in them line by line at the terminal it would work__ * You can treat it as a simple programming language -- * You can specify the interpreter program with a "shebang" on the first line * `#!/bin/bash` * `#!/bin/zsh` * `#!/usr/bin/env python` -- * Arguments work like that of functions: * `$n` __Note__: $0 will refer to the script's name, as per \*nix program argument convention * `$@` * `$#` --- ### __Running__ vs __sourcing__ * _Running_ (executing) a script puts it into its own shell instance; shell variables set _won't_ be visible to the parent shell * `./script.sh` * `bash script.sh` -- * _Sourcing_ a script makes your _current_ shell instance run each command in it; shell variables set _will_ be visible * `source script.sh` * `. script.sh` -- * Think about the nuance here -- * Behavior of `cd` when running a script vs sourcing a script? --- ### __Running__ vs __sourcing__ * Say your shell is currently at `/home/bob` * There's a script called `go-places` with the following contents: ```bash cd /var/log ``` * Q1: Where would your current shell be if you ran `$ bash go-places`? * Q2: Where would your current shell be if you ran `$ source go-places`? --- ### __Running__ vs __sourcing__ * Say your shell is currently at `/home/bob` * There's a script called `go-places` with the following contents: ```bash cd /var/log ``` * Q1: Where would your current shell be if you ran `$ bash go-places`? * A: `/home/bob` * This will create a new Bash instance, which will then perform the `cd`. * This will result in the current shell staying in the current directory, as it never ran `cd` in the first place * Q2: Where would your current shell be if you ran `$ source go-places`? * A: `/var/log` * This will cause the current shell to read in and execute the `cd` * This will result in the current shell changing directories --- class: center, middle # Any other questions?