Word Expansion
This article is here to give some examples on Shell Expansions to get you started for the project.
When in doubt, refer first to the subject and then to the SCL.
Word expansion is the process in which the shell transforms the content of a WORD by applying a couple of operations.
Before executing a command, multiple steps of expansion must be applied to each WORD, in a specific order.
Those steps are specified here in order of execution. For more details on the execution order, see section 2.6 of the SCL.
Each expansion is triggered when a specific set of characters appears. Therefore, not all types of expansion will be applied to each WORD.
Those characters are specified in each section in this page. You can also read the specification of each expansion type in section 2.6 of the SCL.
Tilde expansion
This is a bonus feature.
When a word begins with an unquoted ~ (tilde), all the characters until
the first (optional) unquoted / represent a tilde-prefix.
For example, in the path ~toto/tata.txt, ~toto is the tilde-prefix.
If the characters in the tilde-prefix are not quoted, the Shell will attempt to
use it as a potential user login. The tilde-prefix is therefore replaced by
the working directory associated with the name. This can be retrieved with
getpwnam(3).
For example, if the user toto is linked to the directory /tmp/toto, here is
the expected behavior:
42sh$ cd ~toto/tata/
42sh$ pwd
/tmp/toto/tata
If the tilde-prefix only contains the tilde, it is replaced by the value of the
HOME variable. For example:
42sh$ echo ~
/home/xavier.login
Parameter expansion
The goal of parameter expansion is to replace variables by their values inside a word.
It can be found in 2 forms: $param and ${param} with param the name of the
variable.
While braces are optional, there are 2 cases where they must be used:
- The parameter is a number with more than 2 digits:
42sh$ cat script.sh
echo $10
42sh$ ./script.sh 11 22 33 44 55 66 77 88 99 42
110 # Expands $1 and adds a 0
42sh$ cat script2.sh
echo ${10}
42sh$ ./script2.sh 11 22 33 44 55 66 77 88 99 42
42
- There are characters that could be part of the variable name following
param:
42sh$ var=toto
42sh$ echo $vartoto # Searches for the variable named 'vartoto'
42sh$ echo ${var}toto
totototo
Parameter expansion is applied on each WORD of a command before its execution.
42sh$ letter=W
42sh$ echo ${letter}ORD1 ${letter}ORD2 ${letter}ORD3 ${letter}ORD4
# after expansion, the command is 'echo WORD1 WORD2 WORD3 WORD4'
# execution therefore gives the following output:
WORD1 WORD2 WORD3 WORD4
It also applies to the first word of the command:
42sh$ command=echo
42sh$ $command toto
# after expansion, the command is 'echo toto'
# execution therefore gives the following output:
toto
Do not completely remove the variables from your tokens! It could break the
behavior of for loops, as the value of the variable changes for each
execution.
Command substitution
The goal of command substitution is to use the output of a command inside a WORD.
It can be found in 2 forms: $(command) and `command`.
In practice, the shell:
- retrieves the command;
- gives it to the Parse-Execution loop and captures the final output. The Parse-Execution loop is the one in charge of calling the parser and execution block;
- removes all the trailing newlines from the output;
- substitutes the command by the output.
42sh$ echo $(echo 42)sh
# Expands to: echo 42sh
42sh
The command can also contain other forms of expansion:
42sh$ toto=ta
42sh$ echo toto and $(echo ta$(echo $toto))
# Expands to:
# echo toto and $(echo ta$(echo ta))
# echo toto and $(echo tata)
# echo toto and tata
toto and tata
As for the parameter expansion, do not modify the string in place, as a command could return a different result each time.
Arithmetic expansion
This is a bonus feature.
The goal of this expansion is to evaluate arithmetic expressions and substitute the result.
It uses the following format: $((expression))
42sh$ echo $((1 + 1))
2
Before evaluating the expression, Parameter Expansion, Command substitution and Quote removal are applied.
42sh$ a=41
42sh$ echo $(($a + 1))
42
The $ before a variable inside an Arithmetic Expansion can be omitted.
However, its presence changes the behavior of the expansion.
When there is no $, the expansion simply evaluates the expression:
42sh$ a=3+2
42sh$ echo $((a * 2)) # Since a = 5, we write the result of 5 * 2
10
When there is a $, we expand the expression before evaluation:
42sh$ a=3+2
42sh$ echo $(($a * 2))
# Expands to 3 + 2 * 2. Because of operator priority, the result is 7
7
Operators
All operators are specified in section 1.1.2 of the SCL.
The ** operator is not specified by the SCL. You can mimic the behavior of
bash --posix.
Field splitting
Field splitting cuts a WORD into smaller WORDs.
The delimiter between each cut is described by the IFS variable. By default,
it expands to space, tab or newline.
42sh$ val="10 5 20"
42sh$ seq $val
# After parameter expansion, the command is composed of 2 WORDs:
# "seq" and "10 5 20"
# After field splitting, the second WORD is split in three. The final command
# is composed of 4 WORDs:
# "seq", "10", "5" and "20".
# After execution, this prints to stdout:
10
15
20
Each character in the IFS is considered as a separator. When a full word is written, each character of that word can be used as a separator:
42sh$ IFS=toto # 't' or 'o' can be a separator
42sh$ var="10t5o20"
42sh$ seq $var
10
15
20
A word can be split only if it is not between quotes:
42sh$ var="10 5 20"
42sh$ seq "$var"
# Expands to 'seq "10 5 20"'
seq: invalid floating point argument: ‘10 5 20’
Try 'seq --help' for more information.
Pathname expansion
This is a bonus feature.
This feature expands wildcard characters *, ? and [ into a list of files
matching the pattern.
Each file matching creates a different WORD in the command.
42sh$ ls
a.out tata.txt test.sh toto.txt
42sh$ ls *.txt # Expands to 'ls' 'tata.txt' 'toto.txt'
tata.txt toto.txt
See section 2.13 of the SCL for more information on wildcard characters.
You can check out glob(7) for a little history of globbing, examples and
tricky behavior that they allow, as well as tips on how to use them efficiently.
Quote removal
The goal of quote removal is to remove quote characters: single quotes ',
double quotes " and escape \.
Quotes are removed only if they are not themselves quoted. In section 2.2 of the SCL, this is mentioned as "preserving the literal value" of the character.
The escape character quotes only the following character.
echo \"'toto'
"toto
Single quotes preserve the value of all characters, even other quotes or the escape character:
42sh$ echo '"""\""""'
"""\""""
Double quotes preserve the value of single quotes, but allow escaping:
42sh$ echo "\"''"
"''
For Quote Removal to work, the lexer needs to include each quote in the WORD. Refer to section 2.3 of the SCL.
Effect of quotes during expansion
As explained in section 2.3 of the SCL, the single quote
('), double (") quotes and escape (\) have a different effect on the
different expansion steps.
Escaping
If any character that starts an expansion is quoted, this does not trigger the expansion.
This is valid for tilde (~) for Tilde Expansion, backquote (`)
for Command Substitution and dollar sign ($) for all other types of
expansion.
42sh$ echo \~
~
42sh$ echo \$toto
$toto
Escaped quoting characters are also not removed.
42sh$ echo "\""\\\'
"\'
Single quotes
NOT a SINGLE form of expansion must be done if between single quotes:
42sh$ echo '$toto"$(echo tata)"'
$toto"$(echo tata)"
Double quotes
Between double quotes, only the dollar sign retains meaning.
Therefore, only the Parameter Expansion, Command Substitution and Arithmetic Expansion steps are applied.
If your testsuite is written in shell, use single quotes instead of double quotes. Otherwise, parameter expansion would be called before calling your 42sh.
Expansion: a detailed example
In this section, we will apply each step of the expansion on the following command:
42sh$ var=sh
42sh$ letter=e
42sh$ $letter'c'$(echo "ho $((20 + 22))"sh"\n\n")
Tilde Expansion
Since there is no tilde in the command, we have no Tilde Expansion to apply.
Parameter Expansion
Only the first dollar sign is considered for parameter expansion, as it is not followed by a parenthesis.
Since this parameter does not contain curly brackets, we need to take the maximum number of characters that can make a variable.
As a quote cannot be part of a variable name, the longest variable name
after the dollar sign is letter (see section 3.235 of the SCL).
$letter is then replaced by the value of the corresponding variable.
After expansion, here is the resulting command:
42sh$ e'c'$(echo -e "ho $((20 + 22))"sh)
Command Substitution
Here is the string that applies for Command Substitution: $(echo -e "ho $((20 + 22))"sh"\n\n")
We first start by sending the command between parentheses to the Parse-Execution loop.
Before execution, another step of expansion is triggered. We skip directly to Arithmetic Expansion since there is nothing to be done for the other steps.
Arithmetic Expansion
Only the $((20 + 22)) section registers for this expansion.
Since there is no need for Parameter Expansion, Command substitution and quote removal, we skip those expansion steps and evaluate the expression.
After evaluation and replacement, here is the resulting command:
42sh$ echo -e "ho 42"sh"\n\n"
Field splitting
Since the IFS was not redefined, the field separator is either space, tab or
newline.
For the WORDs echo and -e, no splitting is done since they do not contain
any separator.
For the WORD "ho 42"sh"\n\n", the space is inside of double quotes, which
prevents any Field Splitting to be done.
Quote Removal
For this section, we remove all the non-quoted quotes. In this case, this concerns all quotes in this section.
Here is the resulting command:
42sh$ echo -e ho 42sh\n\n
Be careful! Since no field splitting was done, 'ho 42sh\n\n' is made of one single WORD.
Execution
After execution of the final command, we capture the following output:
ho 42sh
After removing the trailing newlines, we obtain the string ho 42sh.
Here is the command at the end of Command Substitution:
42sh$ e'c'ho 42sh
Since no Field Splitting has be done yet, the string e'c'ho 42sh is only
made of ONE single WORD.
Field Splitting
Since the IFS was not redefined, the field separator is either space, tab or
newline.
The WORD contains only 1 space between e'c'ho and 42sh.
Since it is not quoted, that WORD is split into 2 separate WORDs:
e'c'ho and 42sh.
Pathname Expansion
No character in the command registers for Pathname Expansion.
Quote Removal
The only word containing quotes is e'c'ho. Since none of those quotes are
quoted, we simply remove them.
The final command is:
42sh$ echo 42sh