Skip to main content

Shell Features

This section provides examples and details for each feature that needs to be implemented.

It assumes you have read and have in mind the grammar of the shell language.

Shell commands

If Statements

The if command includes if, else, and elif.

if true; then
echo toto
else
echo tata
fi

The condition and body can contain any number of commands, separated by a semicolon (;) or a newline (\n). This is a valid if clause:

if echo ACUs; while false; do echo not printed
done
echo are
then
echo the; echo best!; fi
Ending the condition

Conditions and body are defined by the compound_list rule:

compound_list = and_or [';'] {'\n'} ;

Since and_or, and more generally simple_commands, are simply a list of words, removing the semicolon (or newline) before the then would include it inside the condition command.

if echo toto then # Consider 'then' to be a simple WORD and an argument of the
# echo command.

if echo toto; then # Writes 'toto' to standard output and compute the `then`
# keyword as a separate command

While and Until Loops

while and until loops work similarly except for one small detail: the condition of the loop is inverted.

  • while loops stop when the condition is false;
  • until loops stop when the condition is true.
compound_list

while and until loops use the compound_list rule for the body and condition. Therefore, they are under the same limitation as the if clause: a semicolon or a newline must be written before the do keyword for it to be recognized properly.

For Loops

Requirements

To be completely functional, for loops require Parameter Expansion to be implemented.

for loops iterate on each WORD listed after the in keyword.

Even with Parameter Expansion, for loops iterating on a variable content will not work as expected. It requires Field Splitting to be implemented.

Here is the expected behavior without Field Splitting:

42sh$ cat example.sh
VALUES='1 2 3 4 5'
for i in $VALUES; do
echo $i
done
42sh$ ./42sh example.sh
1 2 3 4 5
tip

This can be emulated by setting the IFS environment variable to ''.

Command Blocks

Command blocks are used to explicitly create command lists.

42sh$ { echo a; echo b; } | tr b h
a
h

They work similarly to parentheses in C.

Grammar delimitation

Like the if clause, command blocks contain a compound_list rule to describe its list of commands.

Since braces are reserved words, they can be counted as a WORD. It is under the same limitation as if clauses: the last command must have a trailing semicolon or newline for the closing brace to be recognized.

Functions

Functions are defined by a WORD, parentheses and a shell_command following it. The body must only be parsed once, and stored in your shell environment.

Function execution changes the value of special variables, those need to be restored after exiting the function. Note that additional arguments can be passed without error:

42sh$ f() { echo toto; }
42sh$ f fail rendu;
toto
Grammar ambiguity

Since functions start with a word, they can be confused with the simple_command rule. You may want to implement a lookahead in the lexer to check for parentheses.

Subshells

Like functions, subshells create a new execution environment. Everything applied to this environment must not be applied to the parent one:

42sh$ a=sh; (a=42; echo -n $a); echo $a
42sh

However, subshells copy the parent environment at creation:

42sh$ f() { echo sh; }
42sh$ VAR=42; (echo -n $VAR; f)
42sh
note

Exiting in a subshell does not stop the execution of the whole program but only the current environment:

42sh$ exit 1 || echo 42 || echo sh
42sh$ (exit 1 || echo 42) || echo sh
sh

Case Statement

Case statement allow for simple pattern matching. One of its particularities is the use of globbing in match cases:

case $input in
first)
echo in first
;;
secon?)
echo in second
;;
*)
echo the rest
esac
tip

You may use fnmatch(3) to compare values.

Simple commands

Redirections

Redirection is defined by the following symbols:

  • > or >|: redirection on a write fd/buffer.
  • <: redirection on a read fd/buffer.
  • >& and <&: reuse an opened fd.
  • <>: combination of > and <.
Clobber

The clobber (>|) has a different use case than > when set -C is active. If you do not implement the set builtin, you can consider the two to be equivalent.

As written by the grammar, redirections can be found anywhere in the command:

42sh$ > file1 echo toto # Writes toto into file1
42sh$ echo toto > file2 # Writes toto into file2
42sh$ echo > file3 toto # Writes toto into file3

The redirection symbol can also be preceded by a number: the file descriptor to redirect to.

Read the Prerequisites section or watch the "File Descriptors, Pipes and Redirections" conference for more information.

Pipes

Pipes redirect the content of stdout into the stdin of another command. They can be chained, and are executed from left to right:

42sh$ echo toto | tr o a | cat
# echo toto >outputs> toto
# toto >input> tr o a >outputs> tata
# tata >input> cat >outputs> tata
tata

Read the Prerequisites section or watch the "File Descriptors, Pipes and Redirections" conference for more information.

Negation

Negation reverses the exit status of a command. All non-zero exit codes become zero, and zero becomes one.

This can be applied to any type of command, not only true and false.

While the negation (!) is written in the pipeline rule, it is applied on the result of the pipe and not the first command.

danger

In bash --posix, double negations are optimized and removed to keep the original exit code.

42sh$ cd toto tata
bash: cd: too many arguments
42sh$ echo $?
2
42sh$ ! ! cd toto tata
bash: cd: too many arguments
42sh echo $?
2

This specific behaviour is NOT POSIX compliant, and will therefore not be tested.

And/Or Operators

and and or operators work like classic and and or found in other programming languages.

They are evaluated lazily, meaning "only if necessary":

false && echo toto # Does not print anything to standard output
true || echo tata # Also does not print anything

Variable Assignment

Variable assignments take the form of a single ASSIGNMENT_WORD, composed of a variable name, an equal sign and its value.

note

Since this must be included in a single word, no space can be put without using quotes.

Assignment words can be found alone or at the start of a command:

42sh$ VAR=toto
42sh$ VAR=42 echo sh

The behaviour differs based on what succeeds it:

  • when written alone, the variable is stored in the current shell environment.
42sh$ VAR=toto    # Stores VAR in the variable list of the current shell.
42sh$ echo $VAR # Since VAR is in the variable list, it is expanded to `toto`
toto
  • when written before a command, it is not stored but defined as an environment variable for the command:
42sh$ PWD=toto env | grep PWD
PWD=toto
OLDPWD=/path/to/previous/dir
42sh$ echo $PWD
/path/to/current/dir
Execution and Expansion

Be careful about the difference between execution and expansion. Since expansion is done before execution, sometimes variables will be expanded before their value is set.

For example:

# Since expansion is done before execution, 'echo $VAR' is expanded to 'echo  '
# before VAR is assigned a value (execution).
42sh$ VAR=toto echo $VAR

Aliases

Aliases allow word substitutions. They require implementing the alias and unalias builtins:

42sh$ alias foo=ls
42sh$ foo # performs an ls
...
note

Alias substitutions are done by the lexer. Meaning aliases cannot be defined and used in a single command:

42sh$ alias foo=ls; foo
bash: foo: command not found

Here-Document

Here-Documents are redirections that allow multiline input. They work by defining a delimiter, and using this to recognize the end of the input:

42sh$ cat <<21sh # Defines '21sh' as a delimiter.
best project
of the year
21sh # Delimiter encountered, end of the redirection input.
best project
of the year

If the redirection contains a dash (-) before the delimiter, all leading tabs are removed from the input:

42sh$ cat <<21sh
input with tabs
21sh
input with tabs
42sh$ cat <<-21sh
input without tabs
21sh
input without tabs