Skip to main content

Epoll

note

For HTTPd, Epoll is a bonus and not mandatory, you should only attempt his implementation if you are already confident in your server functionalities.

danger

This guide should work as an introduction to epoll and does not constitute a replacement for the epoll(7) man page as well as the man pages for the associated functions.

What is Epoll ?

epoll(7) is a Linux kernel API used for asynchronous programming, particularly for handling I/O (Input/Output) operations. It is a high-performance tool crucial for building servers that need to handle thousands of connections at the same time.

Instead of busy waiting (constantly checking if data is ready, which wastes CPU), epoll allows the kernel to monitor many data sources simultaneously. It then notifies the program only when data is available on one or more sources. This lets the program effectively switch between tasks (like handling other connections) while waiting, making it significantly more efficient and scalable.

Epoll C API

The Epoll C API is accessible with the following header.

#include <sys/epoll.h>
tip

The Epoll api uses bit masking extensively, you should be familiar with this concept before continuing this guide.

The Epoll structures

typedef union epoll_data
{
void *ptr;
int fd;
uint32_t u32;
uint64_t u64;
} epoll_data_t;

struct epoll_event
{
uint32_t events; /* Epoll events (EPOLLIN / EPOLLOUT / EPOLLET / ...) */
epoll_data_t data; /* User data variable */
};

The Epoll functions

note

All Epoll functions returns -1 on error with errno set to indicate the error. You should always check the return code of epoll functions to avoid errors.

int epoll_create1(int flags);

epoll_create1(2) creates a new epoll instance. EPOLL_CLOEXEC is the only available flag. Check the man page for more information.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *_Nullable event);
  • epfd: the epoll instance
  • op: the operation to do with the fd
  • fd: the file descriptor to update
  • event: the event to fill

The operations available are the following ones:

  • EPOLL_CTL_ADD: add fd to the interest list
  • EPOLL_CTL_MOD: update the settings associated with fd
  • EPOLL_CTL_DEL: remove fd from the interest list
int epoll_wait(int epfd, struct epoll_event events[.maxevents], int maxevents,
int timeout);

epoll_wait(2) fills the events parameters with a list of epoll_event from the ready list and returns the number of events filled.

  • epfd: the epoll instance
  • events: buffer that will be filled with events from the ready list
  • maxevent: the maximum number of events returned
  • timeout: time in milliseconds epoll_wait(2) will block, -1 disables the timeout

Simple usage example

danger

For readability purposes, in every example on this page, the return value of every syscall (open, epoll_..., socket, etc.) will not be checked. Do not forget to do it!

if (function(...) == -1)
{
// Handle error
}

Initializing

// Return values are not checked for readability purposes
int fd4 = open("file4", O_RDWR, 0);
int fd5 = open("file5", O_RDWR, 0);
int epollfd = epoll_create1(0);
int fd7 = open("file7", O_RDWR, 0);

As you should know, at the beginning of the program, 3 file descriptors are already opened. While the result of epoll_create1 is also a fd, it is better if you think of it as a container for other fds.

Using the Epoll instance

struct epoll_event event; // Event struct used for epoll_ctl
struct epoll_event events[MAX_EVENTS]; // Event array filled by epoll_wait

// Add FD7 to the interest list
event.data.fd = fd7;
event.events = EPOLLOUT | EPOLLET; // Bitmask flags
epoll_ctl(epollfd, EPOLL_CTL_ADD, fd7, &event);

// Add FD5 to the interest list
event.data.fd = fd5;
epoll_ctl(epollfd, EPOLL_CTL_ADD, fd5, &event);

// Add STDIN to the interest list
event.data.fd = STDIN_FILENO;
event.events = EPOLLIN;
epoll_ctl(epollfd, EPOLL_CTL_ADD, STDIN_FILENO, &event);
info

As you can see, the epoll_event passed as reference in epoll_ctl can be reused. The reason is that it is copied internally by epoll.

/*
Blocks until a write event happens in either FD5 or FD7 or until the
STDIN fd is ready to be read. It will fill the events array with ready.
*/
int nb_fds_ready = epoll_wait(epollfd, events, MAX_EVENTS, -1);

for (int i = 0; i < nb_fds_ready; i++) // Iterate over the ready list
{
struct epoll_event ready_event = events[i];
do_use_fd(ready_event.data.fd);
}

For example, if a write was made on STDIN, the file descriptor will go into the ready list. When epoll_wait is later called, it will be returned by epoll as well as the events triggered (mostly EPOLLIN or EPOLLOUT). More information can be found in the epoll_ctl(2) man page.

Edge triggered and Level Triggered

Epoll have 2 modes:

  • Level Triggered
  • Edge Triggered

Edge Triggered mode can be enabled per file descriptor using the EPOLLET flag.

While the older poll(2) call can only support level-triggered mode, epoll adds a new mode which is called edge triggered.

Level triggered is simple, the file descriptors monitored using epoll returns from epoll_wait when the fd is available.

Which means for EPOLLIN, when data is available to be read and when using EPOLLOUT, when data can be sent using server.

Edge triggered is different, the file descriptors monitored using epoll returns from epoll_wait only when the state changes even if it is already ready.

Which means for EPOLLIN, when new data is available to be read and for EPOLLOUT, when the file descriptor goes from a non-writable to a writable state.

danger

You should be careful when using EPOLLIN | EPOLLOUT | EPOLLET to check what event was triggered in your event loop.

for (int i = 0; i < nb_fds_ready; i++)
{
struct epoll_event ready_event = events[i];
if (events[i].event == EPOLLIN)
do_use_fd_read(ready_event.data.fd);
else if (events[i].event == EPOLLOUT)
do_use_fd_write(ready_event.data.fd);
else
// Other possible events if needed
}

Epoll for sockets

note

The following example is heavily inspired by the example from the epoll(7) man page.

struct epoll_event event; // Event struct used for epoll_ctl
struct epoll_event events[MAX_EVENTS]; // Event array filled by epoll_wait

int server_socket;

/* Code to set up listening socket, 'server_socket',
(socket(), bind(), listen()) omitted. */

epollfd = epoll_create1(0);

event.events = EPOLLIN;
event.data.fd = server_socket;

epoll_ctl(epollfd, EPOLL_CTL_ADD, server_socket, &ev);

while (true)
{
int nb_fds_ready = epoll_wait(epollfd, events, MAX_EVENTS, -1);

for (int i = 0; i < nb_fds_ready; i++) // Iterate over the ready list
{
struct epoll_event ready_event = events[i];
if (ready_event.data.fd == server_socket)
{
// A client is trying to connect to our socket

/* Accept the client
* Note: `accept4` allows us to directly set the client socket to
* be nonblocking. */
int client_socket =
accept4(server_socket, NULL, NULL, SOCK_NONBLOCK);

// Choose the event type to handle
event.events = EPOLLIN | EPOLLOUT | EPOLLET;
event.data.fd = client_socket;

// Add the client to the interest list
epoll_ctl(epollfd, EPOLL_CTL_ADD, client_socket, &event);
}
else
{
/* An already connected client is ready for transfer
* What you do here will depends on what you want and what
* epoll_event you put in the client.
*
* For example, if you put EPOLLIN | EPOLLOUT | EPOLLET, you need
* to check what was the event triggered before attempting any
* reading or writing on the client socket.
*
* If you used EPOLLET, you need to read/write until EAGAIN.
*
* If you used only EPOLLIN and you know need to send data to a
* client, you can use epoll_ctl_mod to change the client_socket
* mode.
*
* Do not forget to remove the file descriptor from the interest
* list using epoll_ctl when you finished dealing with the client.
*/
}
}
}
tip

For these examples, we always used the fd attribute of the epoll_data(3) union. If you need more information to be linked to your client (e.g., previously sent data), do not hesitate to use the void * pointer.

danger

You should use either recv(2) or send(2) per epoll event, not both.

Single client connection animation

Below is an animation of the server socket setup and a client connection to our socket.

Multi-client connection animation

Below is an animation of 2 events that are happening at the same time:

  • A new client is trying to connect
  • The first client is sending data