Skip to main content

Word Expansion

note

This article is here to give some examples on Shell Expansions to get you started for the project.

When in doubt, refer first to the subject and then to the SCL.

Word expansion is the process in which the shell transforms the content of a WORD by applying a couple of operations.

Before executing a command, multiple steps of expansion must be applied to each WORD, in a specific order.

Those steps are specified here in order of execution. For more details on the execution order, see section 2.6 of the SCL.

info

Each expansion is triggered when a specific set of characters appears. Therefore, not all types of expansion will be applied to each WORD.

Those characters are specified in each section in this page. You can also read the specification of each expansion type in section 2.6 of the SCL.

Tilde expansion

note

This is a bonus feature.

When a word begins with an unquoted ~ (tilde), all the characters until the first (optional) unquoted / represent a tilde-prefix. For example, in the path ~toto/tata.txt, ~toto is the tilde-prefix.

If the characters in the tilde-prefix are not quoted, the Shell will attempt to use it as a potential user login. The tilde-prefix is therefore replaced by the working directory associated with the name. This can be retrieved with getpwnam(3).

For example, if the user toto is linked to the directory /tmp/toto, here is the expected behavior:

42sh$ cd ~toto/tata/
42sh$ pwd
/tmp/toto/tata

If the tilde-prefix only contains the tilde, it is replaced by the value of the HOME variable. For example:

42sh$ echo ~
/home/xavier.login

Parameter expansion

The goal of parameter expansion is to replace variables by their values inside a word.

It can be found in 2 forms: $param and ${param} with param the name of the variable.

info

While braces are optional, there are 2 cases where they must be used:

  1. The parameter is a number with more than 2 digits:
42sh$ cat script.sh
echo $10
42sh$ ./script.sh 11 22 33 44 55 66 77 88 99 42
110 # Expands $1 and adds a 0
42sh$ cat script2.sh
echo ${10}
42sh$ ./script2.sh 11 22 33 44 55 66 77 88 99 42
42
  1. There are characters that could be part of the variable name following param:
42sh$ var=toto
42sh$ echo $vartoto # Searches for the variable named 'vartoto'

42sh$ echo ${var}toto
totototo

Parameter expansion is applied on each WORD of a command before its execution.

42sh$ letter=W
42sh$ echo ${letter}ORD1 ${letter}ORD2 ${letter}ORD3 ${letter}ORD4
# after expansion, the command is 'echo WORD1 WORD2 WORD3 WORD4'
# execution therefore gives the following output:
WORD1 WORD2 WORD3 WORD4

It also applies to the first word of the command:

42sh$ command=echo
42sh$ $command toto
# after expansion, the command is 'echo toto'
# execution therefore gives the following output:
toto
danger

Do not completely remove the variables from your tokens! It could break the behavior of for loops, as the value of the variable changes for each execution.

Command substitution

The goal of command substitution is to use the output of a command inside a WORD.

It can be found in 2 forms: $(command) and `command`.

In practice, the shell:

  1. retrieves the command;
  2. gives it to the Parse-Execution loop and captures the final output. The Parse-Execution loop is the one in charge of calling the parser and execution block;
  3. removes all the trailing newlines from the output;
  4. substitutes the command by the output.
42sh$ echo $(echo 42)sh
# Expands to: echo 42sh
42sh
note

The command can also contain other forms of expansion:

42sh$ toto=ta
42sh$ echo toto and $(echo ta$(echo $toto))
# Expands to:
# echo toto and $(echo ta$(echo ta))
# echo toto and $(echo tata)
# echo toto and tata
toto and tata
danger

As for the parameter expansion, do not modify the string in place, as a command could return a different result each time.

Arithmetic expansion

note

This is a bonus feature.

The goal of this expansion is to evaluate arithmetic expressions and substitute the result.

It uses the following format: $((expression))

42sh$ echo $((1 + 1))
2

Before evaluating the expression, Parameter Expansion, Command substitution and Quote removal are applied.

42sh$ a=41
42sh$ echo $(($a + 1))
42
info

The $ before a variable inside an Arithmetic Expansion can be omitted. However, its presence changes the behavior of the expansion.

When there is no $, the expansion simply evaluates the expression:

42sh$ a=3+2
42sh$ echo $((a * 2)) # Since a = 5, we write the result of 5 * 2
10

When there is a $, we expand the expression before evaluation:

42sh$ a=3+2
42sh$ echo $(($a * 2))
# Expands to 3 + 2 * 2. Because of operator priority, the result is 7
7

Operators

All operators are specified in section 1.1.2 of the SCL.

danger

The ** operator is not specified by the SCL. You can mimic the behavior of bash --posix.

Field splitting

Field splitting cuts a WORD into smaller WORDs.

The delimiter between each cut is described by the IFS variable. By default, it expands to space, tab or newline.

42sh$ val="10 5 20"
42sh$ seq $val
# After parameter expansion, the command is composed of 2 WORDs:
# "seq" and "10 5 20"
# After field splitting, the second WORD is split in three. The final command
# is composed of 4 WORDs:
# "seq", "10", "5" and "20".
# After execution, this prints to stdout:
10
15
20
info

Each character in the IFS is considered as a separator. When a full word is written, each character of that word can be used as a separator:

42sh$ IFS=toto # 't' or 'o' can be a separator
42sh$ var="10t5o20"
42sh$ seq $var
10
15
20
note

A word can be split only if it is not between quotes:

42sh$ var="10 5 20"
42sh$ seq "$var"
# Expands to 'seq "10 5 20"'
seq: invalid floating point argument: ‘10 5 20
Try 'seq --help' for more information.

Pathname expansion

note

This is a bonus feature.

This feature expands wildcard characters *, ? and [ into a list of files matching the pattern.

Each file matching creates a different WORD in the command.

42sh$ ls
a.out tata.txt test.sh toto.txt
42sh$ ls *.txt # Expands to 'ls' 'tata.txt' 'toto.txt'
tata.txt toto.txt

See section 2.13 of the SCL for more information on wildcard characters.

tip

You can check out glob(7) for a little history of globbing, examples and tricky behavior that they allow, as well as tips on how to use them efficiently.

Quote removal

The goal of quote removal is to remove quote characters: single quotes ', double quotes " and escape \.

Quotes are removed only if they are not themselves quoted. In section 2.2 of the SCL, this is mentioned as "preserving the literal value" of the character.

The escape character quotes only the following character.

echo \"'toto'
"toto

Single quotes preserve the value of all characters, even other quotes or the escape character:

42sh$ echo '"""\""""'
"""\""""

Double quotes preserve the value of single quotes, but allow escaping:

42sh$ echo "\"''"
"''
info

For Quote Removal to work, the lexer needs to include each quote in the WORD. Refer to section 2.3 of the SCL.

Effect of quotes during expansion

As explained in section 2.3 of the SCL, the single quote ('), double (") quotes and escape (\) have a different effect on the different expansion steps.

Escaping

If any character that starts an expansion is quoted, this does not trigger the expansion.

This is valid for tilde (~) for Tilde Expansion, backquote (`) for Command Substitution and dollar sign ($) for all other types of expansion.

42sh$ echo \~
~
42sh$ echo \$toto
$toto

Escaped quoting characters are also not removed.

42sh$ echo "\""\\\'
"\'

Single quotes

NOT a SINGLE form of expansion must be done if between single quotes:

42sh$ echo '$toto"$(echo tata)"'
$toto"$(echo tata)"

Double quotes

Between double quotes, only the dollar sign retains meaning.

Therefore, only the Parameter Expansion, Command Substitution and Arithmetic Expansion steps are applied.

tip

If your testsuite is written in shell, use single quotes instead of double quotes. Otherwise, parameter expansion would be called before calling your 42sh.

Expansion: a detailed example

In this section, we will apply each step of the expansion on the following command:

42sh$ var=sh
42sh$ letter=e
42sh$ $letter'c'$(echo "ho $((20 + 22))"sh"\n\n")

Tilde Expansion

Since there is no tilde in the command, we have no Tilde Expansion to apply.

Parameter Expansion

Only the first dollar sign is considered for parameter expansion, as it is not followed by a parenthesis.

Since this parameter does not contain curly brackets, we need to take the maximum number of characters that can make a variable.

As a quote cannot be part of a variable name, the longest variable name after the dollar sign is letter (see section 3.235 of the SCL).

$letter is then replaced by the value of the corresponding variable. After expansion, here is the resulting command:

42sh$ e'c'$(echo -e "ho $((20 + 22))"sh)

Command Substitution

Here is the string that applies for Command Substitution: $(echo -e "ho $((20 + 22))"sh"\n\n")

We first start by sending the command between parentheses to the Parse-Execution loop.

Before execution, another step of expansion is triggered. We skip directly to Arithmetic Expansion since there is nothing to be done for the other steps.

Arithmetic Expansion

Only the $((20 + 22)) section registers for this expansion.

Since there is no need for Parameter Expansion, Command substitution and quote removal, we skip those expansion steps and evaluate the expression.

After evaluation and replacement, here is the resulting command:

42sh$ echo -e "ho 42"sh"\n\n"

Field splitting

Since the IFS was not redefined, the field separator is either space, tab or newline.

For the WORDs echo and -e, no splitting is done since they do not contain any separator.

For the WORD "ho 42"sh"\n\n", the space is inside of double quotes, which prevents any Field Splitting to be done.

Quote Removal

For this section, we remove all the non-quoted quotes. In this case, this concerns all quotes in this section.

Here is the resulting command:

42sh$ echo -e ho 42sh\n\n
danger

Be careful! Since no field splitting was done, 'ho 42sh\n\n' is made of one single WORD.

Execution

After execution of the final command, we capture the following output:

ho 42sh


After removing the trailing newlines, we obtain the string ho 42sh.

Here is the command at the end of Command Substitution:

42sh$ e'c'ho 42sh
danger

Since no Field Splitting has be done yet, the string e'c'ho 42sh is only made of ONE single WORD.

Field Splitting

Since the IFS was not redefined, the field separator is either space, tab or newline.

The WORD contains only 1 space between e'c'ho and 42sh.

Since it is not quoted, that WORD is split into 2 separate WORDs: e'c'ho and 42sh.

Pathname Expansion

No character in the command registers for Pathname Expansion.

Quote Removal

The only word containing quotes is e'c'ho. Since none of those quotes are quoted, we simply remove them.

The final command is:

42sh$ echo 42sh