Script runtime optimization with pexpect

Automation itself is a milestone when it comes to efficiency. Tasks can be executed simultaneously on many devices even during nights/holidays. Sometimes, however, it has its limits. As the task execution time grows more and more, it’s a good idea to take a look at the code and think about the pexpect optimization. You can be even forced to do it if the runtime becomes unacceptable.

In today’s article, we will review the process of reloading a Cisco virtual C8k router, and think about how we can optimize the runtime of our script.

Workflow overview

The router will have an initial configuration. We will access it with virsh console command, as it sits on the KVM hypervisor. Since there is no username configured, we don’t need to log in. After connecting to the VM console, we will try to find the prompt. It can be in the user exec or privileged mode. If it’s in user exec, we need to execute enable command, because reload command can be used from the privileged mode. After rebooting, we will once again grab a prompt.

In both scenarios, we will measure the time between logging into the virsh console, and finding a prompt after reboot.

Time.sleep

Looking at the various scripts using pexpect, I came to the conclusion, that sleep method from the time library is pexpect users favorite one to use. In this example, we will combine both.

Let’s start!

import pexpect
import time

user_exec_prompt = "Router>"
privileged_exec_prompt = "Router#"

def reboot_with_sleep():
    start_time = time.time()
    child = pexpect.spawn("virsh console vm1")
    child.sendline("\r")
    
    index = child.expect([user_exec_prompt, privileged_exec_prompt])
    if index == 0:
        child.sendline("en")
        child.expect(privileged_exec_prompt)
    
    child.sendline("reload")
    child.expect("Proceed with reload\? \[confirm\]")
    child.sendline("\r")
    
    time.sleep(200)

    child.sendline("\r")
    child.expect(user_exec_prompt)
    print("--- Execution time: {} seconds ---".format(time.time() - start_time))

reboot_with_sleep()

The script is pretty straightforward. After executing reload command, we have to confirm it – which we do by sending a new line, and the router reloads. What’s worth highlighting is the expect right before reload confirmation.

child.expect("Proceed with reload\? \[confirm\]")

We’re using backslashes here before “?”, “[” and “]”. It’s because expect method treats string argument as a regex pattern, and those characters have a special meaning there, so they’re treated differently. That’s why we’re using a backslash, so the expect can treat those characters literally.

After confirming reload command, we’re using the time.wait method. It takes an int as an argument. In this case, we’re waiting 200 seconds. Right after that, we’re sending a new line, and expecting a router prompt.

After the script execution, in the terminal, we’re getting a log message with the total time.

--- Execution time: 202.91023445129395 seconds ---

Expect

The other way to handle router reboot, is to just use the expect function from pexpect. It can be used to distinguish the router state after which we’re able to log in after reloading.

But how we can deduce what to put as a expect argument?

The solution is straightforward. We need to manually reboot the router, gather logs, and choose the line after which the router is ready for the login process. I’ve chosen the following phrase.

Press RETURN to get started!

After this communicate, I know, that my router is ready, and after sending a new line, I’ll be prompted for the username and then for the password.

Keep in mind, that this message may vary between the vendors, devices, and even software versions. You can choose another log that will fit you better. For example, you can take a log based on the interface changing state, or service starting.

I’ve prepared a couple of lines from the router start that could be potentially used.

*Mar 14 11:46:19.912: %LINK-3-UPDOWN: Interface GigabitEthernet1, changed state to down
[...]
*Mar 14 11:46:21.301: %SSH-5-ENABLED: SSH 2.0 has been enabled
[...]
*Mar 14 11:46:23.752: %SYS-6-BOOTTIME: Time taken to reboot after reload =  105 seconds
[...]
*Mar 14 11:46:39.054: %DHCP-6-ADDRESS_ASSIGN: Interface GigabitEthernet1 assigned DHCP address 10.0.0.2, mask 255.255.255.0, hostname Router

You know your infrastructure well, and you should be deciding at which point in time your devices are ready for interaction.

There’s one more important thing. By default, every expect waits for 30 seconds for the match. In our case, however, that’s not enough. Maybe for some lightweight routers, this timeout will fit, but for most unfortunately not.

That’s why, we have to explicitly pass another argument to the expect function to extend it. In our case, we assume that 300 seconds of waiting for Press RETURN to get started! will be enough.

Here’s the complete code.

import pexpect
import time

user_exec_prompt = "Router>"
privileged_exec_prompt = "Router#"


def reboot_with_expect():
    start_time = time.time()
    child = pexpect.spawn("virsh console vm1")
    child.sendline("\r")
    
    index = child.expect([user_exec_prompt, privileged_exec_prompt])
    if index == 0:
        child.sendline("en")
        child.expect(privileged_exec_prompt)
    
    child.sendline("reload")
    child.expect("Proceed with reload\? \[confirm\]")
    child.sendline("\r")
    
    child.expect("Press RETURN to get started\!", timeout=300)

    child.sendline("\r")
    child.expect(user_exec_prompt)
    print("--- Execution time: {} seconds ---".format(time.time() - start_time))


reboot_with_expect()

Let’s check the execution time.

--- Execution time: 113.39121198654175 seconds ---

As you can see, it’s faster than the previous one. We’re saving about 89 seconds here.

Other approaches

We went through two scripts, but there are more solutions to that problem. For example, you can implement a full-blown method, that constantly tries to find a router prompt and execute it right after the reboot. In such case, if you adjust wait timers properly, it will be efficient.

As a very basic template, you can take a code snippet from this article. The while loop can be enhanced to be able to find prompt in such scenario.

Method compartment

Every method has its advantages and disadvantages. Let’s take a glance at them.

Expect combined with Time.sleep method

A huge advantage of this method is its simplicity. We’re just waiting a fixed amount of time after executing reload command. But on the other hand, “just waiting” becomes also a disadvantage.

What amount of time should we put into a sleep argument?

We can measure how long it takes for the router to reboot and set it.

But what if the router will take more time to reload because of random reasons?

Our script can just fail.

The easiest way to overcome it is to add some time buffer, for example, another 10 seconds. But in such an approach, if the router will boot in the standard time, we will waste some time. It’s not that problematic in small infrastructure, but it becomes a huge problem within a scale. The execution time of both scripts differed by 89 seconds. If you have 5 routers, the difference becomes over 7 minutes, and with 50 – over an hour.

Pure expect

We already know that this implementation is more complicated than the combination of Time.sleep, but the execution time is optimal. However, in this method, I see one main drawback. It’s the argument for expect function, which informs us about the availability of the router after reload. We can put there a specific log, but those tend to change within the software version releases. You can be forced to troubleshoot your script from time to time because of that. Another drawback is, if you have multiple device models and vendors, it’s hard to find a universal pattern/log that can be used for determining, that router is ready for interaction after the reboot.

Find prompt method

I’ve mentioned this approach before. It seems to be a good idea because when you adjust timers between retries properly, the execution time is near optimal. It’s also easier to implement this method to be vendor agnostic, compared to the previous ones.

One main drawback that I see in this approach, is the periodical refreshment of pexpect buffer by sending a new line to the console. It can break a booting process of a machine. Because of that, the find prompt function needs to handle such situations, which implies, that it’s the most complicated among those three.

Summary

It turns out, that seemingly easy process, such as machine reload is not that easy if you want to have time-efficient and resilient code.

We went through three examples of how it can be handled. There are more available options, even combinations between those three.

Finding the best solution really comes down to knowing your environment, limitations, and needs. Based on that, you can deduce, what will work for you best.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *