Shell Architecture
In this article, we will go through multiple notions of the suggested shell architecture. We will particularly delve deeper into the functioning of each part of the architecture and use concrete examples.
Shell Anatomy
Parse execute loop
A shell implementation generally starts with this node from where we call the parser module to get an AST and evaluate it (via the execution module).
This node also represents the loop in the sense that while there is still an AST built by the parser from the input (via the lexer and IO Backend evidently) then there is still shell instructions to evaluate.
Example
Take for instance the following grammar:
input =
list '\n'
| list EOF
| '\n'
| EOF
;
list = command { ';' command } [ ';' ] ;
command = WORD { WORD } ;
We want to evaluate the following shell program:
echo 42sh best project; echo yes
cat file.txt
We call the parser module once. The parser module builds the following AST according to the grammar corresponding to the first line of the shell program:
echo 42sh best project; echo yes
cat file.txt
This AST is only an example of a way to represent a piece of shell program. There are of course multiple representations possible. Here, the input node is not really necessary.
We can then evaluate this AST, meaning executing the 2 echo commands.
However, we did not finish to evaluate the entire shell program. Indeed, we still have to evaluate the second line of the shell program. This is where the notion of parsing executing loop is important. We want to make a second call to the parser module and then evaluate the returned AST, repeating this process until the end of input.
A second call to the parser in this example would give the following:
echo 42sh best project; echo yes
cat file.txt
As we are evaluating the AST, we are also updating the context (environment variables, last return code...)
Parser - Lexer - IO Backend
Parser, lexer and IO Backend allows to build an AST from an input.
- The parsing module calls the lexer to get tokens that can be processed according to grammar rules.
- The lexing module calls the IO Backend to get characters to produce tokens used in the grammar.
- The IO Backend is an abstraction layer on the different types of input available. Its goal is to get characters from the input whatever the type of input (string, stdin…).
Checkout the article on parsing a language in C
Execution
In the case of an AST representing arithmetical operations such as 42 + 42
,
evaluation of such an AST would be calculating the result. In the same manner,
evaluation of an AST representing a shell program corresponds to the execution
of it.
The execution module travels your AST by evaluating each node, executing any commands, updating the environment...
Example 1
echo yes
In the case of a simple command, the execution module takes in the corresponding AST, travels any node leading to the command node, executes the command using exec(3), waits for its status code and saves it.
Example 2
if true
then echo "yes"
else echo "no"
fi
In this case, the module should execute only one of the 2 echo commands depending on the if clause. This means that the execution module travels to either the then clause or the else clause.
Updating the execution environment
While evaluation of simple shell commands such as echo Hello World!
is fairly
simple, evaluation of some other types of nodes have an effect on the following
evaluation of other nodes.
TEST=Test.
echo $TEST # Test.
In the above example, line 1 defines the variable TEST and following instructions will be impacted.
Therefore, the execution module must take into account updating the execution environment.
Architecture
In a major project such as 42sh which has several complex modules, adopting a good code architecture is important especially when collaborating in a team.
There are always several ways to structure a project. In this part, we will present a common structure for the 42sh project.
A common and efficient structure would be the following:
.
└── src/
├── ast/
├── execution/
├── io/
├── lexer/
├── parser/
├── utils/
└── 42sh.c
ast/
,execution/
,io/
,lexer/
,parser/
correspond evidently to each of the module of the shell anatomy.utils/
would be used to store any utility or miscellaneous functions42sh.c
: starting point
Above architecture does not mention any build system related files
42sh.c
The following code sample presents some key actions to do:
- Parsing the arguments: check the input type, check any additionnal options (such as a pretty-print option)
- Setup the io backend if needed
- Parsing execution loop
- Return gracefully with the last exit code
parser/
This is where you would write all your functions related to the parsing module.
As you would want to define different functions for different grammar rules, it would be wise to split any implementation into multiple files such as:
.
└── parser/
├── parser.h
├── parser.c
├── parse_and_or.c
├── parse_cmd.c
├── parse_pipe.c
├── parse_redir.c
└── ...
lexer/
This is where you would implement the key components of the lexer which are popping a token and peeking a token.
Checkout the article on parsing a language in C to have a more detailed explanation
execution/
This is where you would write all your functions related to the execution module.
At this point, using the execution module implies that you have an AST and you want to evaluate it.
As you would want to define different functions for each type of node of your AST, it would be wise to split any implementation into multiple files such as:
.
└── execution/
├── exec.h
├── exec.c
├── exec_and.c
├── exec_or.c
├── exec_and.C
├── exec_pipe.c
├── exec_redir.c
└── ...
This architecture is presented as an example and is not complete. Some features are not mentionned such as the expansion part.