Shell Architecture
In this article, we will go through multiple notions of the suggested shell architecture. We will particularly delve deeper into the functioning of each part of the architecture and use concrete examples.
Shell Anatomy
Parse-Execute Loop
A shell implementation generally starts with this node from where we call the parser module to get an AST and evaluate it (via the execution module).
This node also represents the loop in the sense that while there is still an AST built by the parser from the input (via the lexer and IO Backend) then there still are shell instructions to evaluate.
Example
Take for instance the following grammar:
input =
list '\n'
| list EOF
| '\n'
| EOF
;
list = command { ';' command } [ ';' ] ;
command = WORD { WORD } ;
We want to evaluate the following shell program:
echo 42sh best project; echo yes
cat file.txt
We call the parser module once. The parser module builds the following AST according to the grammar corresponding to the first line of the shell program:
echo 42sh best project; echo yes
cat file.txt
This AST is only an example of a way to represent a piece of shell program. There are of course multiple possible representations. Here, the input node is not really necessary.
We can then evaluate this AST, meaning executing both echo commands.
However, we did not finish evaluating the entire shell program. Indeed, we still have to evaluate the second line. This is where the notion of the parsing-executing loop is important. We want to make a second call to the parser module and then evaluate the returned AST, repeating this process until the end of input.
A second call to the parser in this example would give the following:
echo 42sh best project; echo yes
cat file.txt
As we are evaluating the AST, we are also updating the context (environment variables, last return code...).
Parser - Lexer - IO Backend
Parser, lexer and IO Backend allow to build an AST from an input.
- The parsing module calls the lexer to get tokens that can be processed according to grammar rules.
- The lexing module calls the IO Backend to get characters to produce tokens used in the grammar.
- The IO Backend is an abstraction layer on the different types of input available. Its goal is to get characters from the input, whatever its type might be (string, stdin…).
Execution
In the case of an AST representing arithmetical operations such as 42 + 42
,
evaluation would be calculating the result. In the same manner,
evaluation of an AST representing a shell program corresponds to its execution.
The execution module travels your AST by evaluating each node, executing any commands, updating the environment...
Example 1
echo yes
In the case of a simple command, the execution module takes in the corresponding AST, travels any node leading to the command node, executes the command using exec(3), waits for its status code and saves it.
Example 2
if true
then echo "yes"
else echo "no"
fi
In this case, the module should execute only one of the two echo commands depending on the if clause. This means that the execution module travels to either the then clause or the else clause.
Updating the Execution Environment
While evaluating simple shell commands such as echo Hello World!
is fairly
simple, evaluation of some other types of nodes have an effect on the
evaluation of other nodes following it.
TEST=Test.
echo $TEST # Test.
In the above example, line 1 defines the variable TEST and following instructions will be impacted.
Therefore, the execution module must take into account updating the execution environment.
Architecture
In a major project such as 42sh which has several complex modules, adopting a good code architecture is important especially when collaborating in a team.
There are always several ways to structure a project. In this part, we will present a common structure for the 42sh project.
A common and efficient structure would be the following:
.
└── src/
├── ast/
├── execution/
├── io/
├── lexer/
├── parser/
├── utils/
└── 42sh.c
ast/
,execution/
,io/
,lexer/
,parser/
correspond to each module of the shell anatomy.utils/
would be used to store any utility or miscellaneous functions.42sh.c
: starting point.
The above architecture does not mention any build system related files.
42sh.c
The following code sample presents some key actions to do:
- Parsing the arguments: check the input type, check any additionnal options (such as a pretty-print option).
- Setup the IO Backend if needed.
- Parsing-Execution loop.
- Return gracefully with the last exit code.
parser/
This is where you would write all your functions related to the parsing module.
As you would want to define different functions for different grammar rules, it would be wise to split the implementation into multiple files such as:
.
└── parser/
├── parser.h
├── parser.c
├── parse_and_or.c
├── parse_cmd.c
├── parse_pipe.c
├── parse_redir.c
└── ...
lexer/
This is where you would implement the key components of the lexer which are popping a token and peeking a token.
Check out the article on parsing a language in C to have a more detailed explanation.
execution/
This is where you would write all your functions related to the execution module.
At this point, using the execution module implies that you have an AST and you want to evaluate it.
As you would want to define different functions for each type of node of your AST, it would be wise to split the implementation into multiple files such as:
.
└── execution/
├── exec.h
├── exec.c
├── exec_and.c
├── exec_or.c
├── exec_and.C
├── exec_pipe.c
├── exec_redir.c
└── ...
This architecture is presented as an example and is not complete. Some features are not mentionned such as the expansion part.