Tag: high-performance

  • How 20,000 Requests Sent/Sec in Python Helps Scale Your Business

    How 20,000 Requests Sent/Sec in Python Helps Scale Your Business

    For any scaling company, there’s a familiar and costly problem. The technology that helped you launch and find your first customers eventually starts to slow down.

    Many businesses start with a flexible language like Python because it lets them build and adapt quickly. But when it’s time for serious growth, the conventional wisdom says you have to do a “rewrite”—a slow and expensive migration to a supposedly faster, more complex system.

    This “rewrite tax” is a huge drain. It halts product development and burns money right when you need to be moving fastest. But what if you could skip it entirely?

    I ran an experiment and successfully got a single Python application to handle 20,000 requests per second. This isn’t just a technical benchmark; it’s a new blueprint for scaling a business efficiently.

    The New Approach: A Smarter Engine, Not a Different Car

    Instead of swapping out the entire car, I focused on upgrading the engine. The solution was built on two key ideas:

    1. A Hybrid Engine: The application’s main logic stayed in Python, which is easy to work with. But the heavy lifting—the core networking tasks—was handled by a super-fast Rust component. Think of it as keeping the user-friendly dashboard of your car but installing a Formula 1 engine. This gives you both ease of use and raw power.
    2. Tuning the Racetrack: A powerful engine is useless on a bumpy road. I tuned the server’s operating system (kernel-level tuning) to handle a massive volume of traffic. This is like paving the racetrack so your car can actually reach its top speed without crashing.

    This approach proves that with a smart architecture, you don’t have to choose between speed of development and speed of execution.

    What This Means for Your Business

    This technical breakthrough translates directly into three critical advantages for any growing company:

    1. Drastically Lower Costs

    The financial benefits are clear. First, you need fewer servers to handle your workload, which directly lowers your monthly cloud bill. Second, you can hire from the massive global pool of Python developers instead of searching for expensive, specialized engineers.

    2. Get to Market Faster

    Skipping the “rewrite” phase can save you six months or more of engineering time. Your team can stay focused on building new features and serving your customers, allowing you to innovate faster than competitors who are stuck rebuilding their foundation.

    3. A Powerful Advantage in Data

    In a world driven by AI, the company with the best data wins. The ability to gather information at this speed—whether for market research, price tracking, or training AI models—is a huge competitive edge. It means you get better insights, faster.

    Conclusion: Performance Is a Business Decision

    Viewing performance as just a technical detail is a mistake. It’s a core business strategy.

    How efficiently your technology scales impacts everything from your budget to your ability to innovate. This experiment shows that the supposed limits of common technologies are often just a failure to think differently.

    The real bottleneck is rarely the tool itself, but how you use it. By making smarter architectural choices early on, you can build a business that’s not only quick to launch but also ready for massive growth.

  • Pushing Python to 20,000 Requests/Second

    Can you send 20,000 requests per second from a single Python application? That’s over 1.2 million a minute and nearly 2 billion a day. Most developers would say no. Python is great, but it isn’t known for that kind of raw network performance.

    I wanted to test that assumption. This article shows how I combined an async Python script with a Rust-based library and deep OS-level tuning to achieve that number. The full code and test setup are available on GitHub.

    If you prefer to watch a video, I made a 3 minute video going over this:

    The Right Tool for the Job: rnet

    Standard Python libraries are great, but for extreme throughput, you need something designed for it. For this test, I used rnet, a Python networking library built on top of the Rust library wreq.

    This hybrid approach gives you the best of both worlds:

    1. Python’s asyncio: Easy to write and manage high-level concurrent logic.
    2. Rust’s Performance: The underlying network and TLS operations are handled by blazing-fast, memory-safe Rust code.

    A key advantage of rnet is its robust TLS configuration, which is effective at bypassing Web Application Firewalls (WAFs) like Cloudflare that often challenge standard Python clients.

    The Code: A Simple Async Worker

    The client script itself is straightforward. It uses asyncio to create a pool of concurrent workers that send requests as fast as possible. The main logic involves creating rnet clients and gathering the tasks.

    Python

    # From send_request/rnet_test.py
    
    async def run_load_test(wid, clients, counter, total_requests):
        local_success = 0
        # ... error handling setup ...
    
        while i < total_requests:
            try:
                # Main request loop
                resp = await clients[i % len(clients)].get(url)
                # ... process response status ...
                local_success += 1
            except Exception as e:
                # ... handle errors ...
            finally:
                i += 1
        return [local_success, local_fail, local_statuses, local_errors]
    
    # ... main function sets up asyncio loop and runs the test ...
    

    But the code is only a small part of the story. The real performance gains come from tuning the machines themselves.

    The Secret Sauce: OS and Server Tuning

    You cannot achieve this level of concurrency with default system settings. Your OS will start dropping connections long before your code breaks a sweat. Both the client machine sending the requests and the server receiving them needed significant tuning.

    Client-Side Tuning

    This script configures the client machine to handle a massive number of outgoing connections.

    client/tune_server.sh

    1. Increase Max Open Files: Every network connection is a file descriptor. The default limit (usually 1024) is too low. We raise it to over 65,000.Bash# Set hard and soft nofile limits echo "* soft nofile 65536" >> /etc/security/limits.conf echo "* hard nofile 65536" >> /etc/security/limits.conf
    2. Expand Ephemeral Port Range: This increases the number of available ports for outgoing connections, preventing port exhaustion.Bash# Set ephemeral port range for high outbound connections echo "net.ipv4.ip_local_port_range = 1024 65535" >> /etc/sysctl.conf
    3. Enable TCP TIME_WAIT Reuse: This allows the kernel to reuse sockets in a TIME_WAIT state for new connections, which is critical for rapidly opening and closing connections.Bash# Enable TCP TIME_WAIT reuse echo "net.ipv4.tcp_tw_reuse = 1" >> /etc/sysctl.conf

    Server-Side Tuning

    The server needs to be ready to accept and process this flood of traffic.

    remote/startup_script.sh

    1. Increase Connection Backlog Queue: This setting (somaxconn) defines the maximum number of connections waiting to be accepted by the server. The default is tiny. We raise it to the kernel maximum. Bash# Set somaxconn for a large backlog queue echo "net.core.somaxconn = 65535" >> /etc/sysctl.conf
    2. Increase Nginx Worker Connections: The server uses Nginx to handle requests. We need to configure it to use more worker processes and allow a higher number of connections per worker. Nginx# From Nginx config setup worker_processes auto; events { worker_connections 65535; multi_accept on; }

    The Results

    I ran these tests using Vultr cloud servers.

    • An 8-core machine consistently hit ~15,000 requests/second.
    • A 32-core machine peaked at ~20,000 requests/second.

    Interestingly, even during the 10 million request test, the CPU usage was not maxed out. This suggests the bottleneck wasn’t the CPU, and there’s still more performance to gain by investigating other factors like the network fabric or kernel scheduler.

    Conclusion: Python Isn’t Slow

    This experiment shows that when it comes to I/O-bound tasks, Python’s perceived “slowness” is often a myth. Performance is a full-stack problem. By choosing the right library and tuning the underlying operating system, Python can handle enormous network loads, putting it in the same league as traditionally “faster” languages for this kind of work.

    So next time you need to build a high-performance scraper, a load testing tool, or a real-time data ingestion service, don’t count Python out.

    Take a look at my Projects or Contact me if you want us to work on something cool! The consultation is Free!