Shell Features
This section provides examples and details for each feature that needs to be implemented.
It assumes you have read and have in mind the grammar of the shell language.
Shell commands
If Statements
The if command includes if, else, and elif.
if true; then
echo toto
else
echo tata
fi
The condition and body can contain any number of commands, separated by a semicolon
(;) or a newline (\n). This is a valid if clause:
if echo ACUs; while false; do echo not printed
done
echo are
then
echo the; echo best!; fi
Conditions and body are defined by the compound_list rule:
compound_list = and_or [';'] {'\n'} ;
Since and_or, and more generally simple_commands, are simply a list of words,
removing the semicolon (or newline) before the then would include it inside the
condition command.
if echo toto then # Consider 'then' to be a simple WORD and an argument of the
# echo command.
if echo toto; then # Writes 'toto' to standard output and compute the `then`
# keyword as a separate command
While and Until Loops
while and until loops work similarly except for one small detail: the
condition of the loop is inverted.
whileloops stop when the condition is false;untilloops stop when the condition is true.
while and until loops use the compound_list rule for the body and
condition. Therefore, they are under the same limitation as the if clause:
a semicolon or a newline must be written before the do keyword for it to be
recognized properly.
For Loops
To be completely functional, for loops require Parameter Expansion
to be implemented.
for loops iterate on each WORD listed after the in keyword.
Even with Parameter Expansion, for loops iterating on a variable content will not work as expected. It requires Field Splitting to be implemented.
Here is the expected behavior without Field Splitting:
42sh$ cat example.sh
VALUES='1 2 3 4 5'
for i in $VALUES; do
echo $i
done
42sh$ ./42sh example.sh
1 2 3 4 5
This can be emulated by setting the IFS environment variable to ''.
Command Blocks
Command blocks are used to explicitly create command lists.
42sh$ { echo a; echo b; } | tr b h
a
h
They work similarly to parentheses in C.
Like the if clause, command blocks contain a compound_list rule to describe
its list of commands.
Since braces are reserved words, they can be counted as a WORD.
It is under the same limitation as if clauses: the last command must have a
trailing semicolon or newline for the closing brace to be recognized.
Functions
Functions are defined by a WORD, parentheses and a shell_command following it.
The body must only be parsed once, and stored in your shell environment.
Function execution changes the value of special variables, those need to be restored after exiting the function. Note that additional arguments can be passed without error:
42sh$ f() { echo toto; }
42sh$ f fail rendu;
toto
Since functions start with a word, they can be confused with the simple_command
rule.
You may want to implement a lookahead in the lexer to check for parentheses.
Subshells
Like functions, subshells create a new execution environment. Everything applied to this environment must not be applied to the parent one:
42sh$ a=sh; (a=42; echo -n $a); echo $a
42sh
However, subshells copy the parent environment at creation:
42sh$ f() { echo sh; }
42sh$ VAR=42; (echo -n $VAR; f)
42sh
Exiting in a subshell does not stop the execution of the whole program but only the current environment:
42sh$ exit 1 || echo 42 || echo sh
42sh$ (exit 1 || echo 42) || echo sh
sh
Case Statement
Case statement allow for simple pattern matching. One of its particularities is the use of globbing in match cases:
case $input in
first)
echo in first
;;
secon?)
echo in second
;;
*)
echo the rest
esac
You may use fnmatch(3) to compare values.
Simple commands
Redirections
Redirection is defined by the following symbols:
>or>|: redirection on a write fd/buffer.<: redirection on a read fd/buffer.>&and<&: reuse an opened fd.<>: combination of>and<.
The clobber (>|) has a different use case than > when set -C is active.
If you do not implement the set builtin, you can consider the two to be
equivalent.
As written by the grammar, redirections can be found anywhere in the command:
42sh$ > file1 echo toto # Writes toto into file1
42sh$ echo toto > file2 # Writes toto into file2
42sh$ echo > file3 toto # Writes toto into file3
The redirection symbol can also be preceded by a number: the file descriptor to redirect to.
Read the Prerequisites section or watch the "File Descriptors, Pipes and Redirections" conference for more information.
Pipes
Pipes redirect the content of stdout into the stdin of another command. They can be chained, and are executed from left to right:
42sh$ echo toto | tr o a | cat
# echo toto >outputs> toto
# toto >input> tr o a >outputs> tata
# tata >input> cat >outputs> tata
tata
Read the Prerequisites section or watch the "File Descriptors, Pipes and Redirections" conference for more information.
Negation
Negation reverses the exit status of a command. All non-zero exit codes become zero, and zero becomes one.
This can be applied to any type of command, not only true and false.
While the negation (!) is written in the pipeline rule, it is
applied on the result of the pipe and not the first command.
In bash --posix, double negations are optimized and removed to keep the
original exit code.
42sh$ cd toto tata
bash: cd: too many arguments
42sh$ echo $?
2
42sh$ ! ! cd toto tata
bash: cd: too many arguments
42sh echo $?
2
This specific behaviour is NOT POSIX compliant, and will therefore not be tested.
And/Or Operators
and and or operators work like classic and and or found in other
programming languages.
They are evaluated lazily, meaning "only if necessary":
false && echo toto # Does not print anything to standard output
true || echo tata # Also does not print anything
Variable Assignment
Variable assignments take the form of a single ASSIGNMENT_WORD, composed of a variable name, an equal sign and its value.
Since this must be included in a single word, no space can be put without using quotes.
Assignment words can be found alone or at the start of a command:
42sh$ VAR=toto
42sh$ VAR=42 echo sh
The behaviour differs based on what succeeds it:
- when written alone, the variable is stored in the current shell environment.
42sh$ VAR=toto # Stores VAR in the variable list of the current shell.
42sh$ echo $VAR # Since VAR is in the variable list, it is expanded to `toto`
toto
- when written before a command, it is not stored but defined as an environment variable for the command:
42sh$ PWD=toto env | grep PWD
PWD=toto
OLDPWD=/path/to/previous/dir
42sh$ echo $PWD
/path/to/current/dir
Be careful about the difference between execution and expansion. Since expansion is done before execution, sometimes variables will be expanded before their value is set.
For example:
# Since expansion is done before execution, 'echo $VAR' is expanded to 'echo '
# before VAR is assigned a value (execution).
42sh$ VAR=toto echo $VAR
Aliases
Aliases allow word substitutions. They require implementing the alias and
unalias builtins:
42sh$ alias foo=ls
42sh$ foo # performs an ls
...
Alias substitutions are done by the lexer. Meaning aliases cannot be defined and used in a single command:
42sh$ alias foo=ls; foo
bash: foo: command not found
Here-Document
Here-Documents are redirections that allow multiline input. They work by defining a delimiter, and using this to recognize the end of the input:
42sh$ cat <<21sh # Defines '21sh' as a delimiter.
best project
of the year
21sh # Delimiter encountered, end of the redirection input.
best project
of the year
If the redirection contains a dash (-) before the delimiter, all leading
tabs are removed from the input:
42sh$ cat <<21sh
input with tabs
21sh
input with tabs
42sh$ cat <<-21sh
input without tabs
21sh
input without tabs