Tag: rnet

  • Pushing Python to 20,000 Requests/Second

    Can you send 20,000 requests per second from a single Python application? That’s over 1.2 million a minute and nearly 2 billion a day. Most developers would say no. Python is great, but it isn’t known for that kind of raw network performance.

    I wanted to test that assumption. This article shows how I combined an async Python script with a Rust-based library and deep OS-level tuning to achieve that number. The full code and test setup are available on GitHub.

    If you prefer to watch a video, I made a 3 minute video going over this:

    The Right Tool for the Job: rnet

    Standard Python libraries are great, but for extreme throughput, you need something designed for it. For this test, I used rnet, a Python networking library built on top of the Rust library wreq.

    This hybrid approach gives you the best of both worlds:

    1. Python’s asyncio: Easy to write and manage high-level concurrent logic.
    2. Rust’s Performance: The underlying network and TLS operations are handled by blazing-fast, memory-safe Rust code.

    A key advantage of rnet is its robust TLS configuration, which is effective at bypassing Web Application Firewalls (WAFs) like Cloudflare that often challenge standard Python clients.

    The Code: A Simple Async Worker

    The client script itself is straightforward. It uses asyncio to create a pool of concurrent workers that send requests as fast as possible. The main logic involves creating rnet clients and gathering the tasks.

    Python

    # From send_request/rnet_test.py
    
    async def run_load_test(wid, clients, counter, total_requests):
        local_success = 0
        # ... error handling setup ...
    
        while i < total_requests:
            try:
                # Main request loop
                resp = await clients[i % len(clients)].get(url)
                # ... process response status ...
                local_success += 1
            except Exception as e:
                # ... handle errors ...
            finally:
                i += 1
        return [local_success, local_fail, local_statuses, local_errors]
    
    # ... main function sets up asyncio loop and runs the test ...
    

    But the code is only a small part of the story. The real performance gains come from tuning the machines themselves.

    The Secret Sauce: OS and Server Tuning

    You cannot achieve this level of concurrency with default system settings. Your OS will start dropping connections long before your code breaks a sweat. Both the client machine sending the requests and the server receiving them needed significant tuning.

    Client-Side Tuning

    This script configures the client machine to handle a massive number of outgoing connections.

    client/tune_server.sh

    1. Increase Max Open Files: Every network connection is a file descriptor. The default limit (usually 1024) is too low. We raise it to over 65,000.Bash# Set hard and soft nofile limits echo "* soft nofile 65536" >> /etc/security/limits.conf echo "* hard nofile 65536" >> /etc/security/limits.conf
    2. Expand Ephemeral Port Range: This increases the number of available ports for outgoing connections, preventing port exhaustion.Bash# Set ephemeral port range for high outbound connections echo "net.ipv4.ip_local_port_range = 1024 65535" >> /etc/sysctl.conf
    3. Enable TCP TIME_WAIT Reuse: This allows the kernel to reuse sockets in a TIME_WAIT state for new connections, which is critical for rapidly opening and closing connections.Bash# Enable TCP TIME_WAIT reuse echo "net.ipv4.tcp_tw_reuse = 1" >> /etc/sysctl.conf

    Server-Side Tuning

    The server needs to be ready to accept and process this flood of traffic.

    remote/startup_script.sh

    1. Increase Connection Backlog Queue: This setting (somaxconn) defines the maximum number of connections waiting to be accepted by the server. The default is tiny. We raise it to the kernel maximum. Bash# Set somaxconn for a large backlog queue echo "net.core.somaxconn = 65535" >> /etc/sysctl.conf
    2. Increase Nginx Worker Connections: The server uses Nginx to handle requests. We need to configure it to use more worker processes and allow a higher number of connections per worker. Nginx# From Nginx config setup worker_processes auto; events { worker_connections 65535; multi_accept on; }

    The Results

    I ran these tests using Vultr cloud servers.

    • An 8-core machine consistently hit ~15,000 requests/second.
    • A 32-core machine peaked at ~20,000 requests/second.

    Interestingly, even during the 10 million request test, the CPU usage was not maxed out. This suggests the bottleneck wasn’t the CPU, and there’s still more performance to gain by investigating other factors like the network fabric or kernel scheduler.

    Conclusion: Python Isn’t Slow

    This experiment shows that when it comes to I/O-bound tasks, Python’s perceived “slowness” is often a myth. Performance is a full-stack problem. By choosing the right library and tuning the underlying operating system, Python can handle enormous network loads, putting it in the same league as traditionally “faster” languages for this kind of work.

    So next time you need to build a high-performance scraper, a load testing tool, or a real-time data ingestion service, don’t count Python out.

    Take a look at my Projects or Contact me if you want us to work on something cool! The consultation is Free!