layout: true
Lecture 3: Unix and You
--- class: center, middle # Week 3 --- # Announcements * Basic 2 and Advanced 2 are out * Lecture 2 survey is closing today * Have you created and logged into the course server yet? --- class: center, middle # Unix and You ### Lecture 3 Where I try not to turn this into an OS lecture --- # Overview 1. What is Unix? 2. How does Unix work? 3. Interacting with Unix via Shells (feat. Bash) --- # What is Unix? * Family of operating systems derived from the original AT&T Unix from the '70s * Fun fact: C was developed for use with the original Unix -- * Emphasis on small programs and scripts composed into bigger and bigger systems * Preference for plain-text files for configuration and data -- * Spawned many derivatives and clones: BSD, Solaris, AIX, mac OS, Linux * Became so prevalent in industry and academia that it's been immortalized as a set of standards: __POSIX__ (IEEE 1003) -- * From here on out, whenever I say or write "Unix" and "\*nix" I'm referring to (mostly) POSIX-compliant systems * mac OS is POSIX-certified, while Linux is not --- # What does POSIX mean for us? * We get a neat set of standards! * As long as you follow the standards (and avoid any implementation-specific behavior), your scripts/code should work on other POSIX systems -- ## Examples of POSIX standard things * C POSIX API: headers like `unistd.h`, `fcntl.h`, `pthread.h`, `sys/types.h` -- * Command line interface and utilities: `cd`, `ls`, `mkdir`, `grep` * [Commands in the specification](https://en.wikipedia.org/wiki/List_of_Unix_commands) * Sort by "Status"; "Mandatory" ones are pretty useful ones to look at -- * File paths/names * Directory structure * Environment variables: `USER`, `HOME`, `PATH` --- # Unix philosophy * Write programs that do one thing and do it well. * Write programs to work together. * Write programs to handle text streams, because that is a universal interface.
- Peter Salus,
A Quarter-Century of Unix
(1994)
--- class: center, middle # How does Unix work? ### We're starting from the ground up >:) --- # Components ### Kernel * Software that serves as the intermediary between hardware resources and user applications * Manages hardware resources and access to them * Handles things like multi-tasking, security enforcement, file systems, device drivers, launching programs, and more * Present a stable _application programming interface_ (API) for user programs to use in the form of _system calls_ --- # Components ### Libraries * Reusable pre-written software you can call upon * Provide functionality that would be a pain to write every time (e.g. graphics) -- ### Applications * Software that users run and interact with or assist in the background * Includes things like Bash, nano, VS Code, Gnome Desktop, `ls` etc. -- ### Hand in hand, these form an overall operating system --- # Unix design * Effectively boils down to processes interacting with files * Program: list of instructions to execute * Process: a running instance of a program -- * __Files serve as a sort of universal interface__ * Processes pass data to each other via a read/write interface --- # Unix processes * Identified by a process ID (PID) * Associated with a user * Has a current working directory -- * Has an associated program _image_: the actual CPU instructions to run * Has memory containing the image and program data like variables -- * Environment variables * Provide information about the process's environment * `PATH`: directories to find executables in * `PWD`: current working directory * `USER`: user * `HOME`: user's home directory * ...and more --- # Unix processes * File descriptor table * Handles to various resources that have a file interface (read/write/__seek__) * _File descriptors_ are indexes into this table -- * 0: Standard input (`stdin`, `cin`) * 1: Standard output (`stdout`, `cout`) * 2: Standard error (`stderr`, `cerr`) -- * POSIX functions for handling these: `open()`, `close()`, `read()` etc. * Don't confuse them with C stdio functions: `fopen()`, `fclose()`, `fread()` etc. (these are often an abstraction for the POSIX functions) --- ## Signals * A way to communicate with processes * `man 7 signal` * `kill` (ignore the name) can signal processes * ^C (Ctrl-C) at a terminal sends SIGINT (interrupt) * ^Z sends SIGTSTP (terminal stop) * Programs can implement handlers for custom behavior * SIGKILL and SIGSTOP can't be handled --- ### [signals - turnoff.us](http://turnoff.us/geek/signals/)
--- ### [the real reason not to use sigkill - turnoff.us](http://turnoff.us/geek/dont-sigkill/)
--- ## Process creation * A process calls `fork()` to make a copy of itself * The process is called the "parent" and the copy is called the "child" * The child is a __perfect copy__ of the parent, except for the `fork()` return value * This includes program variables, program arguments, environment variables\*, etc. -- * The child can call `exec()` functions to load a new program * `man 3 exec` * This wipes the process's memory for the new program's data * Environment variables and file descriptor table is left the same -- * Cool, I'll have it run `execvp("ls", args)` to list the current directory! -- * But what if we parameterized the executable to run? * We have the beginnings of a shell... --- # Unix files * In Unix, everything is a file * Data living on a disk? That's a file * Directories? Those are special kinds of files * Your instance of `vim`? That can be represented by a bunch of files! -- * Unix files represent a _stream of bytes_ that you can read from or write to * Serves as a neat interface * `stdin` and `stdout` are seen as files by your program * __What if we tie the output of one process to the input of another?__ --- # Unix files * Files have various properties * You can check them with `ls -l` * `stat` can give more detailed information * `r`: read * `w`: write * `x`: execute * These three are often grouped together to form an octal digit (gasp! octal!) * User owner, group owner * `chmod` and `chown` can modify these --- # (Generic) Unix directory structure ### Some normal ones * `/`: root, the beginning of all things * `/bin`: binaries * `/lib`: libraries * `/etc`: configuration files * `/var`: "variable" files, logs and other files that change over time * `/home`: user home directories ### Everything is a file * `/dev`: device files * `/proc`: files that represent runtime OS information --- # Putting them together * It's just processes interacting with other processes and files * Processes create more processes (yes there is a [primordial process](https://en.wikipedia.org/wiki/Init)) * What if we hooked up processes end to end, `stdin` to `stdout`? * We can form a "pipeline" of data processing * What if we tied the output of a process to a file instead of a terminal? -- ### This is the job of a shell --- class: center, middle # Interacting with Unix via Shells ## feat. Bash ### `:(){ :|:& };:` #### Do __NOT__ run this --- # Job control * We're familiar with just launching a process * `$ echo "hello world"` -- * There's other things we can do, like launch it in the background with `&` * `$ echo "hello world" &` -- * ^C (SIGINT) can cause most process to stop * ^Z (SIGTSTP) can cause most processes to suspend --- # Job control * `jobs` can list out processes (jobs table) that the shell is managing * `bg` can background a process, yielding the terminal back to the shell * `fg` can foreground a process, giving it active control of the terminal * `bg` and `fg` can index off of the jobs table * `disown` can have the shell give up ownership of a process * The `?` variable holds the __exit status__ of the last command * 0 means success/true * Not 0 means failure/false --- ## Stringing together commands * `cmd1 && cmd2` * Run `cmd2` if `cmd1` succeeded * Like a short-circuiting AND in other languages * `cmd1 || cmd2` * Run `cmd2` if `cmd1` failed * Like a short-circuiting OR in other languages * `cmd1 ; cmd2` * Run `cmd2` after `cmd1` -- * `cmd1 | cmd2` * Connect standard output of `cmd1` to input of `cmd2` * `cmd1`'s fd 1 -> `cmd2`'s fd 0 * `$ echo "hello" | rev` --- # File redirection * `<`: set file as standard input (fd 0) * `$ cmd1 < read.txt` -- * `>`: set file as standard output, overwrite (fd 1) * `$ cmd1 > somefile.txt` -- * `>>`: set file as standard output, append (fd 1) * `$ cmd1 >> somelog.txt` -- ### General form (brackets mean optional) * `[n]<`: set file as an input for fd _n_ (fd 0 if unspecified) * "input" means that the process can `read()` from this fd * `[n]>`: set file as an output for fd _n_ (fd 1 if unspecified) * "output" means that the process can `write()` to this fd * `2>`: capture `stderr` to a file * `[n]>>`: set file as an output for fd _n_, append mode (fd 1 if unspecified) --- ## Advanced Bash file redirection * `&>`: set file as fd 1 and fd 2, overwrite (`stdout` and `stderr` go to same file) * `&>>`: set file as fd 1 and fd 2, append (`stdout` and `stderr` go to same file) * `[n]<>`: set file as input and output on fd _n_ (fd 0 if unspecified) * `[n]<&digit[-]`: copies fd _digit_ to fd _n_ (0 if unspecified) for input; `-` closes _digit_ * `[n]>&digit[-]`: copies fd _digit_ to fd _n_ (1 if unspecified) for output; `-` closes _digit_ --- ## Advanced Bash file redirection * `<<`: "Here document"; given a delimiter, enter data as standard input ```bash $ cat << SOME_DELIM > here are some words > some more words > SOME_DELIM ``` * `<<<`: "Here string"; provide string directly as standard input ```bash $ cat <<< "here's a string!" ``` * With this power, no longer will you need to pipe an `echo` to pass in a string! * `echo "some string" | rev` * `rev <<< "some string"` * Here documents and strings will expand variables (coming up) --- # Diving into Bash * Side note: `bash` != `sh` * `bash` has a feature superset over `sh` (kinda like a `vim`/`vi` relationship) * Again, confounded by some systems linking/aliasing `sh` to `bash` * (Some systems link/alias `sh` to `dash`) * While this is about Bash, many other shells have the similar, if not same, syntax * The horse's mouth: [GNU Bash manual](https://www.gnu.org/software/bash/manual/) * If you like the nitty gritty details it's a great read * These slides summarize major features of Bash --- ## What Bash does 1. Receive a command from a file or terminal input * `ls -l $HOME > some_file` 2. Splits it into tokens separated by __white-space__ * Takes into account _"quoting"_ rules * `ls`, `-l`, __`$HOME`__, `>`, `some_file` 3. Expands/substitutes special tokens * `ls`, `-l`, __`/home/brandon`__, `>`, `some_file` 4. Perform file redirections (and making sure they don't end up as command args) * `ls`, `-l`, `/home/brandon`; __(set standard output to `some_file`)__ 5. Execute command * `argc` = 3 * `argv` = [`"ls"`, `"-l"`, `"/home/brandon"`] * Standard output redirected to `some_file` --- ## Finding programs to execute * If the command has a `/` in it, it's treated as a filepath and the file will tried to be executed * `$ somedir/somescript` * `$ ./somescript` * If the command doesn't have a `/`, `PATH` will be searched for a corresponding binary * `$ vim` -> searches `PATH` and finds it at `/usr/bin/vim` * This is why you have to specify `./` to run something in your current directory -- ### Shell built-ins * Some commands are "built-in"/implemented by the shell * These will take precedent over ones in the `PATH` * Some other commands don't make sense outside of a shell * Think about why `cd` is a built-in and not a separate utility * (hint: `fork()` and `exec()`) --- ## Shell and environment variables * Shell variables stored inside the shell _process_ * They're handled by the Bash program itself, stored as program data in the process's memory * Launched commands don't inherit them (what does `exec()` do?) * Set them with `varname=varvalue` * __Meaningful whitespace!__ * `varname = varvalue` is interpreted as "run `varname` with arguments `=` and `varvalue`" -- * You can set _environment_ variables with `export` * `export varname=varvalue` * `export existing_variable` * Sets a variable to be __exported__ to new processes --- ## Command grouping * We discussed before that we can string commands together with `;`, `&&`, `||` * We can also group commands together as a unit, with redirects staying local to them: -- * `(commands)`: performs _commands_ __in a "subshell"__ (another shell _process_/_instance_; what does this mean for _shell_ variables?) -- * `{ commands; }`: performs _commands_ __in the calling shell instance__ * __Note__: There has to be spaces around the brackets and a semicolon (or newline or `&`) terminating the _commands_ --- ## Expansion and substitution #### Bash has special characters that will indicate that it should _expand_ or _substitute_ to something in a command -- ### Variable expansion * `$varname` will expand to the value of `varname` * `${varname}`: you can use curly brackets to explicitly draw the boundaries on the variable name * `$ echo ${varname}somestring` vs `$ echo $varnamesomestring` * __Note__: expansions/substitutions will be further split into individual tokens by their white-space -- ### Command substitution (via subshell) * `$(command)` will substitute the output of a _command_ in the brackets * `$(echo hello | rev)` will be substituted with "olleh" --- ### Process substitution * `<(command)` will substitute the _command_ output as a filepath, with the output of _command_ being __readable__ * `>(command)` will substitute the _command_ input as a filepath, with the input of _command_ being __writeable__ * `$ diff <(echo hello) <(echo olleh | rev)` * `diff` takes in two file names, but we're replacing them with command outputs -- ### Arithmetic expansion * `$((expr))` will expand to an evaluated arithmetic expression _expr_ --- ### Filename expansion ("glob"/"wildcards") * The `*`, `?`, and `[` characters tells Bash to perform pattern matches against filenames for a given token/word * `*` matches any string * `?` matches any single character * `[`...`]` matches one of any of the characters enclosed in the brackets * There's more fun with this: check the manual * A token/word with these will expand out to matching filenames -- * Examples * `*` expands to all the files in the current directory * `*.md` expands to all files that end in `.md` (`*` matches against anything) * `file?.txt` expands to all files that start with `file`, have a single character, then end in `.txt` * `file[13579].txt` expands to all files that start with `file` and an odd single digit number and ends in `.txt` --- ## But wait... * What if I actually wanted to __not__ expand a variable and keep the `$`? * What if I didn't want a variable to be split by white-space? * What if I'm lazy and don't want to escape spaces? --- ## Quoting * Allows you to retain certain characters without Bash expanding them and keep them one string * Common use case is to preserve spaces e.g. for filepaths that have spaces in them (spaces delimit tokens in a command) -- * Single quotes (`'`) preserves __all__ of the characters between them * `$ echo '$HOME'` will output `$HOME` -- * Double quotes (`"`) preserve all characters except: `$`, `\`, and backtick * `$ ls "$HOME/Evil Directory With Spaces"` will list the contents of a directory `/home/jdoe/Evil Directory With Spaces` * __Variables expanded inside of double quotes retain their white-space__ * (without this, that path would've had to have been `$HOME/Evil\ Directory\ With\ Spaces`, using `\` to escape the space characters) -- * Note that when quoting, the quotes don't appear in the program's argument * `$ someutil 'imastring'`: `someutil`'s `argv[1]` will be `imastring` --- ## Control flow ### `if-elif-else` ```bash # '#' comments out the rest of the line # elif and else are optional parts if test-commands; then commands elif more-test-commands; then more-commands else alt-commands fi ``` * _test-commands_ is executed and its __exit status__ is used as the condition * ___0___ = success = "true", everything else is "false" -- * You can put the `if-elif-else` structure on one line! * This applies to the upcoming control flow structures as well --- ### Commands for conditionals #### You can use any commands for conditions, but these constructs should be familiar: * `test expr`: `test` command * Shorthand: `[ expr ]` (remember your spaces! `[` is technically a utility name) * `test $a -eq $b` * `[ $a -eq $b ]` --- ### Commands for conditionals #### You can use any commands for conditions, but these constructs should be familiar: * `[[ expr ]]`: Bash conditional * Richer set of operators: `==`, `=`, `!=`, `<`, `>`, among others * __Note__: The symbol operators above operate on strings, thus `<` and `>` operators do lexicographic (i.e. dictionary) comparison; "100" is lexicographically less than "2" since for the first characters "1" comes before "2" * Use specific arithmetic binary operators (_a la_ `test`: e.g. `-lt`) if you intend on comparing numeric values -- * `[[ $a == $b ]]` * `[[ $a < $b ]]`: this would evaluate to "true" if a=100, b=2 * `[[ $a -lt $b ]]`: this would evaluate to "false" if a=100, b=2 -- * `((expr))`: Bash arithmetic conditional * Evaluates as an arithmetic expression * `(($a < $b))`: this would evaluate to "false" if a=100, b=2 --- ### `while` ```bash while test-commands; do commands done ``` * Similarly to `if`, the exit status of _test-commands_ is used as the conditional * Repeats _commands_ until the condition __fails__ -- ### `until` ```bash until test-commands; do commands done ``` * Repeats _commands_ until the condition __succeeds__ --- ### `for` ```bash for var in list; do commands done ``` * Each iteration _var_ will be set to each member of the _list_ * _list_ is simply a list of whitespace-delimited strings * _list_ will have any necessary expansions performed * __Note__: if there is no `in list`, it will implicitly iterate over the argument list (i.e. `$@`) * Example lists: * `1 2 3 4 5` * `$(ls)` * `$(seq 1 5)` --- ## Functions ```bash func-name () compound-command # parens are mandatory # or function func-name () compound-command # parens are optional ``` * A __compound command__ is a __command group__ (`()`, `{}`) or a control flow element (`if-elif-else`, `for`) * Called by invoking them like any other utility, including __passing arguments__ * Arguments can be accessed via `$n`, where _n_ is the argument number * `$@`: list of arguments * `$#`: number of arguments --- ### Examples ```bash hello-world () { if echo "Hello world!"; then echo "This should print" fi } # calling hello-world ``` ```bash function touch-dir for x in $(ls); do touch $x; done # calling touch-dir ``` --- ```bash function echo-args () { for x in $@; do echo $x done } # calling echo-args a b c d e f g ``` ```bash function divide { if (( $2 == 0 )); then echo "Error: divide by zero" 1>&2 # the redirection copies stderr to stdout so when echo # outputs it's really going to the caller's stderr else echo $(($1 / $2)) fi } # calling divide 10 2 divide 10 0 ``` --- ## Scripts * It's annoying to have to type things/go to the history to repeatedly run some commands * Scripts are just plain-text files with commands in them * __There's no special syntax for scripts: if you enter the commands in them line by line at the terminal it would work__ -- * You can specify the interpreter program with a "shebang" on the first line * `#!/bin/bash` * `#!/bin/zsh` * `#!/usr/bin/env python` (for Python, not shell, scripts) -- * __Arguments work like that of functions__: * `$n` __Note__: $0 will refer to the script's name, as per \*nix program argument convention * `$@` * `$#` --- ### __Running__ vs __sourcing__ * _Running_ (executing) a script puts it into its own shell instance; shell variables set _won't_ be visible to the parent shell * `./script.sh` * `bash script.sh` -- * _Sourcing_ a script makes your _current_ shell instance run each command in it; shell variables set _will_ be visible * `source script.sh` * `. script.sh` -- * Think about the nuance here -- * Behavior of `cd` when running a script vs sourcing a script? --- ### __Running__ vs __sourcing__ * Say your shell is currently at `/home/bob` * There's a script called `go-places` with the following contents: ```bash cd /var/log ``` * Q1: Where would your current shell be if you ran `$ bash go-places`? * Q2: Where would your current shell be if you ran `$ source go-places`? --- ### __Running__ vs __sourcing__ * Say your shell is currently at `/home/bob` * There's a script called `go-places` with the following contents: ```bash cd /var/log ``` * Q1: Where would your current shell be if you ran `$ bash go-places`? * A: `/home/bob` * This will create a new Bash instance, which will then perform the `cd`. * The current shell stays in the current directory as it never ran `cd` in the first place * Q2: Where would your current shell be if you ran `$ source go-places`? * A: `/var/log` * This will cause the current shell to read in and execute the `cd` * This will result in the current shell changing directories --- class: center, middle # Any other questions?