Asynchronous Non-blocking I/O under the hood: poll, select, epoll/kqueue

Why asynchronous non-blocking I/O??

Chi tiết ở The C10K problem

Mô hình blocking I/O với 1 thread/1 request có hai vấn đề chính:

Memory Overhead: Mỗi thread yêu cầu một lượng memory đáng kể:
- Stack memory: Mặc định 1MB/thread trên Linux
- Kernel memory
CPU Overhead do context switching:
- Mỗi lần switch giữa các thread tốn khoảng 1-100 microseconds
- Phải lưu/load register states
- Invalidate CPU cache, TLB entries
- Kernel scheduler overhead

Với 10,000 concurrent connections, overhead có thể lên tới 10GB RAM và hàng nghìn context switches mỗi giây.

File Descriptor (FD): “In Linux, everything is a file”

Trước khi đi tiếp, giới thiệu sơ về FD trong Linux:

File Descriptor là một số integer dùng để tham chiếu đến các resources trong Linux kernel.

Khi process tạo một file mới hoặc mở một kết nối socket:

Kernel tạo entry trong Global File Table
Tạo entry trong process’s FD table trỏ tới Global File Table entry
Trả về FD number cho process

Các phần trong bài viết được dựa trên sách The Linux Programming Interface - Chapter 63: Alternative I/O models.

How Did Linux Blocking I/O System Calls Evolve?

Ngày xưa, I/O trong Linux chỉ có thể là blocking call, kiểu như sau:

// Đọc file từ một FD
ssize_t read(int fd, void *buf, size_t count);

// Ghi file vào một FD
ssize_t write(int fd, const void *buf, size_t count);

Blocking call là sao? Là khi cần request read/write gì đó (I/O, pipe, disk…), sau khi call (read/write func) thì process (thread) sẽ bị sleep cho tới khi các I/O operation này được thực hiện xong, muốn làm gì (ví dụ xử lí 1 request khác) thì phải fork process (thread) khác mà làm, dẫn tới 10K Problem.

Servers need to watch a lot of file descriptors

Trên server, mỗi lần khi bạn đồng ý mở một kết nối tới client với system call accept, sẽ có một FD sinh ra để biểu diễn kết nối đó cho process biết.

Đối với hầu hết các web server, thường phải xử lí hàng nghìn kết nối tới client cùng lúc. Lúc này server sẽ cần biết khi nào client trả dữ liệu mới lên những kết nối này, để server có thể process và response lại cho client, có thể được thực hiện bằng một vòng for loop như bên dưới:

# Server tạo 1000 request tới DB (client) để query data, và đợi data từ DB client trả về
for x in database_query_connections:
    if has_new_response_data(x):
        process_response(x)

Vấn đề với cách làm này là nó có thể lãng phí rất nhiều CPU time. Thay vì lãng phí tất cả CPU time vào việc hỏi “có update nào mới không?”, chúng ta có thể đơn giản là nói với Linux kernal “này, tao đang có 100 FD, báo tao khi có một cái nào đó update data nhá!“.

Why need select & poll?

Trước khi có select/poll, để handle nhiều connections cùng lúc, server có 2 lựa chọn:

Blocking I/O với nhiều threads:
- Mỗi connection một thread
- Tốn nhiều resources (memory, CPU context switching)
- Không scale tốt với số lượng lớn connections

Non-blocking I/O với busy waiting:

while (1) {
    for (int i = 0; i < n_conns; i++) {
        // Try read, immediately return if no data
        int n = read(conns[i], buf, sizeof(buf));
        if (n > 0) {
            // Process data
        }
    }
}

Lãng phí CPU cycles
Không hiệu quả khi số connections lớn

Select và poll được tạo ra để giải quyết những vấn đề trên:

Cho phép monitor nhiều FDs cùng lúc
Block cho tới khi có data sẵn sàng
Không tốn CPU cycles cho busy waiting

Start with `poll` & `select`

2 Syscall này có sẵn trên tất cả các phiên bản Unix, trong khi epoll và kqueue là tuỳ vào hệ điều hành nào (Linux kernel xài epoll, BSD-based systems như macOS xài kqueue, 2 thằng này khá giống nhau).

Giải thích về poll và select

select và poll là hai system call cơ bản để monitor nhiều file descriptors. Cả hai đều cho phép một process kiểm tra xem có I/O operation nào có thể thực hiện được trên một tập các FDs mà không bị block.

Select

select là syscall cũ hơn, được giới thiệu từ BSD 4.2:

Sử dụng bitmask để theo dõi FDs (fd_set)
Giới hạn số lượng FDs có thể monitor (thường là 1024)
Phải set up lại fd_set sau mỗi lần gọi
Có thể monitor 3 loại events: read, write, và exceptions

Poll

poll được giới thiệu sau để khắc phục một số hạn chế của select:

Sử dụng array của struct pollfd thay vì bitmask
Không có giới hạn cứng về số lượng FDs
Không cần set up lại array sau mỗi lần gọi
Cung cấp nhiều event types hơn (POLLRDHUP, POLLPRI, etc.)

Khác biệt chính giữa poll và select

Cấu trúc dữ liệu:
- select: Sử dụng 3 fd_sets riêng biệt cho read, write và exceptions
- poll: Sử dụng một mảng struct pollfd với các flags events
Giới hạn FDs:
- select: Bị giới hạn bởi FD_SETSIZE (thường là 1024)
- poll: Không có giới hạn cứng về số lượng FDs
Performance:
- select: Phải copy và setup lại fd_sets sau mỗi lần gọi
- poll: Không cần setup lại array, chỉ cần clear revents
Timeout handling:
- select: Sử dụng struct timeval (microsecond precision)
- poll: Sử dụng milliseconds
Portability:
- select: Có mặt trên hầu hết các hệ thống
- poll: Không có sẵn trên một số hệ thống cũ

Cách sử dụng poll và select

Dưới đây là signatures của hai syscall:

// Select syscall
int pselect(int nfds, fd_set *readfds, fd_set *writefds,
            fd_set *exceptfds, const struct timespec *timeout,
            const sigset_t *sigmask);

// Poll syscall
int ppoll(struct pollfd *fds, nfds_t nfds,
          const struct timespec *tmo_p, const sigset_t *sigmask);

Parameters quan trọng:

timeout:
- -1: block vô hạn cho đến khi có event
- 0: non-blocking, return ngay lập tức
- > 0: block trong khoảng thời gian được định sẵn

Ví dụ sử dụng select:

fd_set readfds;
struct timeval tv;
int retval;

FD_ZERO(&readfds);
FD_SET(sock_fd, &readfds);
tv.tv_sec = 5;  // 5 seconds timeout

retval = select(sock_fd + 1, &readfds, NULL, NULL, &tv);
if (retval == -1)
    perror("select()");
else if (retval)
    printf("Data is available now.\n");
else
    printf("No data within five seconds.\n");

Ví dụ sử dụng poll:

struct pollfd fds[1];
int timeout_msecs = 5000;    // 5 seconds timeout
int retval;

fds[0].fd = sock_fd;
fds[0].events = POLLIN;

retval = poll(fds, 1, timeout_msecs);
if (retval == -1)
    perror("poll()");
else if (retval)
    printf("Data is available now.\n");
else
    printf("No data within five seconds.\n");

So sánh poll/select với read truyền thống

Để thấy rõ ưu điểm của poll/select, hãy so sánh các cách xử lý multiple connections:

Read truyền thống (blocking):

// Phải block cho từng connection
for (int i = 0; i < n_conns; i++) {
    char buf[1024];
    int n = read(conns[i], buf, sizeof(buf));  // Block cho đến khi có data
    if (n > 0) {
        process_data(buf, n);
    }
}
// Problem: Nếu connection thứ nhất không có data, 
// sẽ block toàn bộ quá trình, không xử lý được các kết nối khác

Read non-blocking với busy-waiting:

// Set tất cả socket sang non-blocking mode trước đó
// fcntl(conns[i], F_SETFL, O_NONBLOCK);

// Phải liên tục check từng connection
while (1) {
    for (int i = 0; i < n_conns; i++) {
        char buf[1024];
        int n = read(conns[i], buf, sizeof(buf));
        if (n > 0) {
            process_data(buf, n);
        } else if (n < 0) {
            if (errno == EAGAIN || errno == EWOULDBLOCK) {
                // Không có data sẵn sàng, tiếp tục với socket tiếp theo
                continue;
            } else {
                // Xử lý lỗi
                handle_error();
            }
        }
    }
    // Problem: Tốn CPU do liên tục kiểm tra tất cả các socket
    // mặc dù hầu hết không có data
}

Sử dụng select:

fd_set readfds;
int max_fd = -1;

while (1) {
    // Thiết lập các FD cần theo dõi
    FD_ZERO(&readfds);
    for (int i = 0; i < n_conns; i++) {
        FD_SET(conns[i], &readfds);
        if (conns[i] > max_fd) {
            max_fd = conns[i];
        }
    }
    
    // Block cho đến khi có bất kỳ FD nào sẵn sàng
    int activity = select(max_fd + 1, &readfds, NULL, NULL, NULL);
    
    if (activity < 0) {
        perror("select error");
        break;
    }
    
    // Kiểm tra FD nào đã sẵn sàng
    for (int i = 0; i < n_conns; i++) {
        if (FD_ISSET(conns[i], &readfds)) {
            char buf[1024];
            int n = read(conns[i], buf, sizeof(buf));  // Sẽ không bị block
            if (n > 0) {
                process_data(buf, n);
            } else if (n == 0) {
                // Connection closed
                close(conns[i]);
                // Xử lý đóng kết nối
            } else {
                // Xử lý lỗi
                handle_error();
            }
        }
    }
}

Sử dụng poll:

struct pollfd fds[MAX_CONNS];

// Thiết lập array of pollfd structures
for (int i = 0; i < n_conns; i++) {
    fds[i].fd = conns[i];
    fds[i].events = POLLIN;  // Quan tâm đến readable events
}

while (1) {
    // Block cho đến khi có bất kỳ FD nào sẵn sàng
    int activity = poll(fds, n_conns, -1);  // -1 = block vô hạn
    
    if (activity < 0) {
        perror("poll error");
        break;
    }
    
    // Kiểm tra FD nào đã sẵn sàng
    for (int i = 0; i < n_conns; i++) {
        if (fds[i].revents & POLLIN) {
            char buf[1024];
            int n = read(fds[i].fd, buf, sizeof(buf));
            if (n > 0) {
                process_data(buf, n);
            } else if (n == 0) {
                // Connection closed
                close(fds[i].fd);
                fds[i].fd = -1;  // Đánh dấu fd đã đóng
            } else {
                // Xử lý lỗi
                handle_error();
            }
        }
    }
}

Ưu điểm của poll/select:

Không block khi không cần thiết - chỉ block khi không có FD nào sẵn sàng
Không tốn CPU cho busy waiting - chỉ tiêu tốn CPU khi xử lý data thực sự
Có thể handle nhiều connections với một thread
Timeout control linh hoạt (có thể thiết lập timeout cho poll/select)
Portable across Unix systems

Nhược điểm:

Phải copy FD sets giữa user space và kernel space mỗi lần gọi
Phải scan toàn bộ FD set mỗi lần gọi, độ phức tạp O(n)
Giới hạn về số lượng FDs (FD_SETSIZE trong select, thường là 1024)
Không có thông tin về số lượng FD sẵn sàng (chỉ biết có ít nhất 1)

What’s epoll?

Để giải quyết những nhược điểm nêu trên, epoll được ra đời, đặc biệt khi xử lý số lượng lớn các kết nối cùng lúc.

epoll là một group các syscall (epoll_create, epoll_ctl, epoll_wait) cho Linux kernel một danh sách FD để theo dõi và cập nhật. Khác với poll/select, epoll có thể hoạt động ở cả hai mode: level-triggered (mặc định) và edge-triggered.

Đây là các bước để dùng epoll:

Gọi epoll_create để nói kernel rằng bạn chuẩn bị epolling, hàm này sẽ trả về một id
Gọi epoll_ctl để nói kernel FD nào bạn muốn nhận update khi nó thay đổi, bạn có thể cung cấp nhiều loại FD khác nhau (pipes, FIFOs, sockets, POSIX message queues, inotify instances, devices, …).
Gọi epoll_wait để chờ nhận cập nhật về các FD trong danh sách bạn cần theo dõi.

Những cải tiến của epoll:

Hiệu năng tốt hơn

epoll: Sử dụng callback mechanism thay vì scanning - kernel chỉ trả về những FDs có sự kiện, độ phức tạp O(1) thay vì O(N)
epoll: Không phải copy lại danh sách FDs giữa các lần gọi

Không có giới hạn cứng về số lượng FDs

Có thể theo dõi hàng chục nghìn connections mà không gặp giới hạn số lượng

Lưu trữ trạng thái trong kernel

Sử dụng 3 system calls khác nhau để tạo, quản lý và chờ sự kiện (epoll_create, epoll_ctl, epoll_wait)
Không cần thiết lập lại danh sách FDs sau mỗi lần gọi

Hỗ trợ cả hai chế độ notification

Level-triggered và Edge-triggered (là gì??)

Level-triggered vs Edge-triggered

Đây là 2 mode khác nhau trong việc notify I/O events:

Level-triggered (LT):

Notify liên tục khi FD ở trạng thái ready
Ví dụ: Khi socket có data, sẽ liên tục notify cho đến khi data được đọc hết
Được support bởi poll/select/epoll
An toàn hơn vì không bỏ sót events

Edge-triggered (ET):

Chỉ notify một lần khi FD chuyển từ not-ready sang ready
Ví dụ: Chỉ notify một lần khi data mới đến socket
Chỉ được support bởi epoll
Hiệu năng tốt hơn nhưng cần handle code cẩn thận để không bỏ sót events (data)

Tại sao cần 2 mode khác nhau?

2 mode này được thiết kế để đáp ứng các usecase khác nhau trong việc xử lý I/O:

Level-triggered sinh ra trước và phù hợp với:
- Các ứng dụng đơn giản, cần độ tin cậy cao
- Xử lý data theo chunks nhỏ, không cần đọc hết một lần
- Developers chỉ cần đọc data khi cần, không phải lo về việc đọc hết buffer
- Các framework legacy đã được thiết kế với model này
Edge-triggered ra đời sau để giải quyết:
- Vấn đề hiệu năng khi số lượng connections rất lớn
- Giảm số lượng system calls không cần thiết
- Cho phép implement zero-copy I/O dễ dàng hơn (chuyển data trực tiếp từ disk/network buffer vào socket buffer mà không copy qua user space)
- Phù hợp với các ứng dụng modern async I/O

Khi nào nên dùng mode nào?

Sử dụng Level-triggered khi:

Ứng dụng cần đơn giản, dễ maintain
Độ tin cậy quan trọng hơn hiệu năng
Xử lý data theo từng phần nhỏ
Không muốn implement logic phức tạp để đọc hết buffer
Sử dụng các framework/libraries legacy

Sử dụng Edge-triggered khi:

Cần tối ưu hiệu năng cho số lượng connections lớn
Có thể implement logic đọc hết buffer một cách chính xác
Muốn giảm thiểu số lượng system calls
Khi muốn implement zero-copy I/O

Ví dụ minh họa sự khác biệt:

LT mode với 100 bytes trong socket buffer:
- Notify "readable"
- Read 50 bytes
- Vẫn notify "readable" (vì còn 50 bytes)
- Read 50 bytes còn lại
- Không notify nữa

ET mode với 100 bytes trong socket buffer:
- Notify "readable" (lần đầu data đến)
- Read 50 bytes
- Không notify nữa
- Phải đọc hết data trong một lần

Giải thích chi tiết về Level-triggered notification

Level-triggered (LT) mode hoạt động dựa trên trạng thái của file descriptor:

Khi buffer còn data để đọc (bất kể bao nhiêu), FD sẽ được coi là “readable”
Notify sẽ tiếp tục cho đến khi TOÀN BỘ data trong buffer được đọc hết
Cái này giống như một “level” (mức) - nếu level > 0 (còn data), thì sẽ tiếp tục notify

Trong ví dụ với 100 bytes ở trên:

Ban đầu có 100 bytes trong buffer → notify “readable”
Sau khi đọc 50 bytes:
- Buffer vẫn còn 50 bytes chưa đọc
- Do đó FD vẫn ở trạng thái “readable”
- Level-triggered sẽ tiếp tục notify vì trạng thái này
Chỉ khi đọc hết 100 bytes, buffer trống thì mới dừng notify

Đây là một trong những điểm mạnh của Level-triggered mode:

An toàn hơn vì không bỏ sót data
Không cần đọc hết data trong một lần
Phù hợp với các ứng dụng xử lý data theo chunks nhỏ

So với Edge-triggered mode:

ET chỉ notify khi có sự thay đổi trạng thái (edge/cạnh)
Chỉ notify một lần khi data mới đến
Buộc phải đọc hết data trong một lần, nếu không có thể bỏ sót data

Đây là lý do tại sao nhiều framework như Java NIO (pre NIO.2) chọn Level-triggered làm mặc định - nó an toàn và dễ sử dụng hơn, dù có thể không hiệu quả bằng Edge-triggered trong một số trường hợp.

Edge-triggered epoll

Để sử dụng epoll ở mode edge-triggered, chúng ta cần:

Set flag EPOLLET khi đăng ký FD với epoll
Set FD sang non-blocking mode
Đọc hết data trong buffer mỗi khi nhận được notification

Ví dụ thực tế

Web Server xử lý requests với epoll:

// Khởi tạo epoll
int epfd = epoll_create1(0);
struct epoll_event ev, events[MAX_EVENTS];

// Level-triggered mode (mặc định)
ev.events = EPOLLIN;
ev.data.fd = server_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, server_fd, &ev);

while (1) {
    int nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].data.fd == server_fd) {
            // Accept new connection
            int client_fd = accept(server_fd, NULL, NULL);
            // Add to epoll
            ev.events = EPOLLIN;
            ev.data.fd = client_fd;
            epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev);
        } else {
            // Level-triggered: Sẽ notify liên tục khi còn data
            char buf[1024];
            int n;
            while ((n = read(events[i].data.fd, buf, sizeof(buf))) > 0) {
                process_data(buf, n);
            }
        }
    }
}

// Edge-triggered mode
ev.events = EPOLLIN | EPOLLET;  // Thêm EPOLLET flag
ev.data.fd = server_fd;
epoll_ctl(epfd, EPOLL_CTL_ADD, server_fd, &ev);

while (1) {
    int nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
    for (int i = 0; i < nfds; i++) {
        if (events[i].data.fd == server_fd) {
            // Accept new connection
            int client_fd = accept(server_fd, NULL, NULL);
            // Set non-blocking mode
            fcntl(client_fd, F_SETFL, fcntl(client_fd, F_GETFL, 0) | O_NONBLOCK);
            // Add to epoll
            ev.events = EPOLLIN | EPOLLET;
            ev.data.fd = client_fd;
            epoll_ctl(epfd, EPOLL_CTL_ADD, client_fd, &ev);
        } else {
            // Edge-triggered: Chỉ notify một lần khi có data mới
            char buf[1024];
            int n;
            while (1) {
                n = read(events[i].data.fd, buf, sizeof(buf));
                if (n <= 0) {
                    if (errno == EAGAIN || errno == EWOULDBLOCK) {
                        // Đã đọc hết data
                        break;
                    }
                    // Error handling
                    break;
                }
                process_data(buf, n);
            }
        }
    }
}

Zero Copy và Epoll

Zero copy là một kỹ thuật tối ưu I/O trong đó data được chuyển trực tiếp từ disk buffer hoặc network buffer vào application buffer mà không cần copy qua user space. Khi kết hợp với epoll (với edge-triggered mode), zero copy có thể giảm đáng kể overhead của I/O operations.

Một số runtime/framework nổi tiếng sử dụng zero copy với epoll:

Netty Framework (Java):
- Sử dụng FileRegion và native transport (epoll) để implement zero copy
- Thường được dùng trong các web server để transfer large files
Nginx:
- Sử dụng sendfile() syscall kết hợp với epoll
- Rất hiệu quả khi serving static files
- Giảm CPU usage và tăng throughput đáng kể

Who uses `epoll`?

Hầu hết các modern frameworks và runtime đều sử dụng epoll (trên Linux) hoặc kqueue (trên BSD/MacOS) để implement event loop và async I/O:

1. Node.js (libuv)

const server = require('http').createServer();

server.on('connection', (socket) => {
    console.log('New connection');
    socket.on('data', (data) => {
        console.log('Received:', data.toString());
    });
});

server.listen(8080);

Under the hood, libuv sử dụng epoll để monitor socket events.

2. Go Runtime

func main() {
    ln, err := net.Listen("tcp", ":8080")
    if err != nil {
        panic(err)
    }
    
    for {
        conn, err := ln.Accept()
        if err != nil {
            continue
        }
        go handleConnection(conn)  // Creates a new goroutine
    }
}

Go runtime sử dụng netpoller (dựa trên epoll) để implement non-blocking network I/O. Khi một goroutine thực hiện network I/O:

Runtime đăng ký FD với epoll
Goroutine được park (tạm dừng)
Khi có event, scheduler sẽ wake up goroutine

3. Nginx Event Loop

events {
    use epoll;  # Explicitly use epoll
    worker_connections 1024;
}

Nginx sử dụng epoll để xử lý hàng nghìn connections đồng thời trên mỗi worker process.

4. Modern Java NIO

// NIO (Level-triggered) Example
ServerSocketChannel serverSocket = ServerSocketChannel.open();
serverSocket.bind(new InetSocketAddress(8080));
serverSocket.configureBlocking(false);

Selector selector = Selector.open();
serverSocket.register(selector, SelectionKey.OP_ACCEPT);

while (true) {
    selector.select();
    Set<SelectionKey> selectedKeys = selector.selectedKeys();
    Iterator<SelectionKey> iter = selectedKeys.iterator();
    
    while (iter.hasNext()) {
        SelectionKey key = iter.next();
        iter.remove();
        
        if (key.isAcceptable()) {
            // Accept new connection
            SocketChannel client = serverSocket.accept();
            client.configureBlocking(false);
            client.register(selector, SelectionKey.OP_READ);
        }
        
        if (key.isReadable()) {
            // Level-triggered: Will be notified as long as there's data
            SocketChannel client = (SocketChannel) key.channel();
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            int bytesRead = client.read(buffer);
            if (bytesRead > 0) {
                buffer.flip();
                // Process data...
            }
        }
    }
}

// NIO.2 (Edge-triggered) Example with AsynchronousSocketChannel
public class AsyncServer {
    private final AsynchronousServerSocketChannel server;
    
    public AsyncServer() throws IOException {
        server = AsynchronousServerSocketChannel.open()
                .bind(new InetSocketAddress(8080));
    }
    
    public void start() {
        server.accept(null, new CompletionHandler<AsynchronousSocketChannel, Void>() {
            @Override
            public void completed(AsynchronousSocketChannel client, Void attachment) {
                // Accept next connection
                server.accept(null, this);
                
                // Edge-triggered: Only notified once when data arrives
                ByteBuffer buffer = ByteBuffer.allocate(1024);
                client.read(buffer, buffer, new CompletionHandler<Integer, ByteBuffer>() {
                    @Override
                    public void completed(Integer result, ByteBuffer buffer) {
                        if (result > 0) {
                            buffer.flip();
                            // Must read ALL available data since we only get notified once
                            while (buffer.hasRemaining()) {
                                // Process data...
                            }
                            // Setup next read
                            buffer.clear();
                            client.read(buffer, buffer, this);
                        }
                    }
                    
                    @Override
                    public void failed(Throwable exc, ByteBuffer attachment) {
                        try {
                            client.close();
                        } catch (IOException e) {
                            // Handle error
                        }
                    }
                });
            }
            
            @Override
            public void failed(Throwable exc, Void attachment) {
                // Handle error
            }
        });
    }
}

Java NIO (pre NIO.2) sử dụng level-triggered notification thông qua Selector API. Nghĩa là Selector sẽ liên tục thông báo cho đến khi toàn bộ dữ liệu được đọc hết khỏi buffer.

NIO.2 (từ Java 7) giới thiệu AsynchronousSocketChannel với mô hình edge-triggered thông qua CompletionHandler callbacks. Khi có dữ liệu mới, callback chỉ được gọi một lần, vì vậy code phải đảm bảo đọc hết toàn bộ dữ liệu có sẵn và thiết lập lại callback cho lần đọc tiếp theo (code sẽ phức tạp và dễ lỗi hơn).

Java NIO Selector trên Linux sử dụng epoll từ JDK 1.5+

Netty (Netty Native Transport)

Hmm thực tế thì hầu hết các project async trong Java ecosystem đều không dùng Java NIO API mà dùng Netty framework.

Netty được dùng cho hầu hết các framework async nổi tiếng như PlayFramework, Spring WebFlux và Akka, tuy nhiên thay vì sử dụng Java NIO APIs (stdlib), Netty lại chọn implement trực tiếp epoll/kqueue thông qua JNI (Java Native Interface, gọi thẳng OS syscall). Lí do là:

Performance tốt hơn:
- Giảm số lần context switch giữa user space và kernel space
- Tránh overhead của Java NIO selector implementation (nghĩa là Java NIO ver cũ)
- Zero-copy transfer được implement hiệu quả hơn
- Giảm GC pressure do ít object allocation hơn
Kiểm soát tốt hơn:
- Truy cập trực tiếp các tính năng của epoll/kqueue mà Java NIO không expose
- Có thể tối ưu cho từng platform cụ thể (Linux/BSD)
- Xử lý edge cases và error handling tốt hơn

5. Python Async Frameworks

import asyncio

async def handle_connection(reader, writer):
    data = await reader.read(100)
    writer.write(data)
    await writer.drain()
    writer.close()

async def main():
    server = await asyncio.start_server(
        handle_connection, '127.0.0.1', 8080)
    await server.serve_forever()

asyncio.run(main())

Python asyncio sử dụng epoll thông qua selector module.

6. Redis

Redis là một in-memory data structure store được sử dụng như database, cache, và message broker. Redis sử dụng mô hình event-driven single-threaded với multiplexed I/O nhờ epoll để đạt được hiệu suất cao.

// Trích từ source code của Redis (ae_epoll.c)
static int aeApiPoll(aeEventLoop *eventLoop, struct timeval *tvp) {
    aeApiState *state = eventLoop->apidata;
    int retval, numevents = 0;

    retval = epoll_wait(state->epfd, state->events, eventLoop->setsize,
            tvp ? (tvp->tv_sec*1000 + tvp->tv_usec/1000) : -1);
    
    if (retval > 0) {
        int j;
        numevents = retval;
        for (j = 0; j < numevents; j++) {
            int mask = 0;
            struct epoll_event *e = state->events+j;

            if (e->events & EPOLLIN) mask |= AE_READABLE;
            if (e->events & EPOLLOUT) mask |= AE_WRITABLE;
            if (e->events & EPOLLERR) mask |= AE_WRITABLE|AE_READABLE;
            if (e->events & EPOLLHUP) mask |= AE_WRITABLE|AE_READABLE;
            
            eventLoop->fired[j].fd = e->data.fd;
            eventLoop->fired[j].mask = mask;
        }
    }
    return numevents;
}

Redis xử lý hàng chục nghìn connections đồng thời trên một thread duy nhất nhờ vào epoll và mô hình event loop. Điều này giúp Redis đạt được latency cực thấp và throughput cao mà không cần phức tạp hóa thiết kế với multi-threading.

Performance Benefits

Việc sử dụng epoll mang lại những lợi ích quan trọng:

Scalability

Có thể handle hàng nghìn connections với overhead thấp
Memory usage không tăng tuyến tính theo số connections

Performance

Giảm CPU usage do ít context switches
Latency thấp hơn do không phải scan toàn bộ FD list

Resource Efficiency

Không cần tạo thread cho mỗi connection
Kernel space memory usage thấp hơn