Skip to main content

Shell Architecture

In this article, we will go through multiple notions of the suggested shell architecture. We will particularly delve deeper into the functioning of each part of the architecture and use concrete examples.

Shell Anatomy

42sh Shell Architecture

Parse execute loop

A shell implementation generally starts with this node from where we call the parser module to get an AST and evaluate it (via the execution module).

This node also represents the loop in the sense that while there is still an AST built by the parser from the input (via the lexer and IO Backend evidently) then there is still shell instructions to evaluate.

Example

Take for instance the following grammar:

input =
list '\n'
| list EOF
| '\n'
| EOF
;

list = command { ';' command } [ ';' ] ;
command = WORD { WORD } ;

We want to evaluate the following shell program:

echo 42sh best project; echo yes
cat file.txt

We call the parser module once. The parser module builds the following AST according to the grammar corresponding to the first line of the shell program:

echo 42sh best project; echo yes
cat file.txt
note

This AST is only an example of a way to represent a piece of shell program. There are of course multiple representations possible. Here, the input node is not really necessary.

We can then evaluate this AST, meaning executing the 2 echo commands.

However, we did not finish to evaluate the entire shell program. Indeed, we still have to evaluate the second line of the shell program. This is where the notion of parsing executing loop is important. We want to make a second call to the parser module and then evaluate the returned AST, repeating this process until the end of input.

A second call to the parser in this example would give the following:

echo 42sh best project; echo yes
cat file.txt
caution

As we are evaluating the AST, we are also updating the context (environment variables, last return code...)

Parser - Lexer - IO Backend

Parser, lexer and IO Backend allows to build an AST from an input.

  • The parsing module calls the lexer to get tokens that can be processed according to grammar rules.
  • The lexing module calls the IO Backend to get characters to produce tokens used in the grammar.
  • The IO Backend is an abstraction layer on the different types of input available. Its goal is to get characters from the input whatever the type of input (string, stdin…).
info

Checkout the article on parsing a language in C

Execution

In the case of an AST representing arithmetical operations such as 42 + 42, evaluation of such an AST would be calculating the result. In the same manner, evaluation of an AST representing a shell program corresponds to the execution of it.

The execution module travels your AST by evaluating each node, executing any commands, updating the environment...

Example 1

echo yes

In the case of a simple command, the execution module takes in the corresponding AST, travels any node leading to the command node, executes the command using exec(3), waits for its status code and saves it.

Example 2

if true
then echo "yes"
else echo "no"
fi

In this case, the module should execute only one of the 2 echo commands depending on the if clause. This means that the execution module travels to either the then clause or the else clause.

Updating the execution environment

While evaluation of simple shell commands such as echo Hello World! is fairly simple, evaluation of some other types of nodes have an effect on the following evaluation of other nodes.

TEST=Test.
echo $TEST # Test.

In the above example, line 1 defines the variable TEST and following instructions will be impacted.

Therefore, the execution module must take into account updating the execution environment.

Architecture

In a major project such as 42sh which has several complex modules, adopting a good code architecture is important especially when collaborating in a team.

There are always several ways to structure a project. In this part, we will present a common structure for the 42sh project.

A common and efficient structure would be the following:

.
└── src/
├── ast/
├── execution/
├── io/
├── lexer/
├── parser/
├── utils/
└── 42sh.c
  • ast/, execution/, io/, lexer/, parser/ correspond evidently to each of the module of the shell anatomy.
  • utils/ would be used to store any utility or miscellaneous functions
  • 42sh.c : starting point
caution

Above architecture does not mention any build system related files

42sh.c

The following code sample presents some key actions to do:

  1. Parsing the arguments: check the input type, check any additionnal options (such as a pretty-print option)
  2. Setup the io backend if needed
  3. Parsing execution loop
  4. Return gracefully with the last exit code

parser/

This is where you would write all your functions related to the parsing module.

As you would want to define different functions for different grammar rules, it would be wise to split any implementation into multiple files such as:

.
└── parser/
├── parser.h
├── parser.c
├── parse_and_or.c
├── parse_cmd.c
├── parse_pipe.c
├── parse_redir.c
└── ...

lexer/

This is where you would implement the key components of the lexer which are popping a token and peeking a token.

info

Checkout the article on parsing a language in C to have a more detailed explanation

execution/

This is where you would write all your functions related to the execution module.

At this point, using the execution module implies that you have an AST and you want to evaluate it.

As you would want to define different functions for each type of node of your AST, it would be wise to split any implementation into multiple files such as:

.
└── execution/
├── exec.h
├── exec.c
├── exec_and.c
├── exec_or.c
├── exec_and.C
├── exec_pipe.c
├── exec_redir.c
└── ...
caution

This architecture is presented as an example and is not complete. Some features are not mentionned such as the expansion part.