ClusterShell: parallel SSH on many hosts
How do you gather uptime information from a large number of remote hosts? Open a bunch of terminals and paste the command to each of them? Loop over the hosts with a shell script? Thankfully, there is a better way.
Table of Contents
Introduction
I have learned a lot during the Locked Shields exercise. One of the key takeaways for me was the importance of quickly running ad-hoc commands on the many machines I administered. Ansible wasn’t quite suitable for this task, and I refused to write another bad shell script. The solution: ClusterShell.
ClusterShell (or clush
) is a CLI tool that allows us to run shell commands on
many hosts and copy files to and from them. Outputs can be saved to
a directory for later analysis. In this article, I go through the configuration,
common use cases, and a few gotchas to be aware of. But first, here’s a demo:
Configuring the hosts
Host file
The simplest method of specifying hosts to run on is creating a simple text file:
# example.txt
host01.example.com
host02.example.com
host03.example.com
We use the --hostfile
flag to target all hosts in the file. In the following
example, we run the command id
on the hosts specified in the file:
clush --hostfile example.txt id
# host01.example.com: uid=1000(user) gid=1000(user) groups=1000(user),27(sudo)
# host02.example.com: uid=1000(user) gid=1000(user) groups=1000(user),27(sudo)
# host03.example.com: uid=1000(user) gid=1000(user) groups=1000(user),27(sudo)
It is simple, but annoying to specify the host file over and over again. We also miss advanced features, such as groups and patterns. More about that later.
YAML files
Let’s create ClusterShell groups configuration file in our local config directory:
# ~/.config/clustershell/groups.conf
[Main]
autodir: $CFGDIR/groups
default: staticyaml
The Main
section contains two parameters:
autodir
- directory with YAML files containing sources, groups, and hostsdefault
- the default source
As you can see above, the default source is staticyaml
, and the groups directory
is ~/.config/clustershell/groups/
. Let’s now create the YAML file inside of it
and define a few hosts:
# ~/.config/clustershell/groups/hosts.yml
staticyaml:
example:
- host01.example.com
- host02.example.com
- host03.example.com
homelab:
- homelab01.localdomain
- homelab02.localdomain
The file contains one source called staticyaml
and two groups, example
and
homelab
each with several hosts. We can now run clush
like this:
# run `uptime` on all hosts from source staticyaml
clush -a -s staticyaml uptime
# staticyaml is the default source, so we don't have to specify it
# explicitly
clush -a uptime
# run `uptime` on homelab group
clush -g homelab uptime
# run `uptime` on hosts matching the pattern
clush -w 'host[01-02]*' uptime
External commands
The third and most powerful way is specifying external commands that return the hosts. We can, for example, use this to calculate hosts based on Ansible inventory or SSH configuration.
Let’s extend the groups.conf
configuration:
# ~/.config/clustershell/groups.conf
[Main]
autodir: $CFGDIR/groups
default: staticyaml
[ssh]
map: grep -Po "(?<=^Host).*$" ~/.ssh/config | tr -d ' '
all: grep -Po "(?<=^Host).*$" ~/.ssh/config | tr -d ' '
[ls14]
map: $CFGDIR/ansible_hosts.py --cwd /home/me/code/ls14/ansible --group $GROUP
all: $CFGDIR/ansible_hosts.py --cwd /home/me/code/ls14/ansible
list: $CFGDIR/ansible_hosts.py --cwd /home/me/code/ls14/ansible --list
We have now configured two more sources called ssh
and ls14
. Each source has
to specify a few parameters:
map
returns hosts that belong to a particular group (required)all
returns all hosts (optional)list
returns all groups (optional)
Note
The current working directory of map
, all
, and list
commands is the configuration directory, not the directory from which you start clush
. Unfortunately, that’s the way clush
works.
The ssh
source includes all hosts defined in your ~/.ssh/config
file. This
only works if your SSH config doesn’t include any wildcards or patterns. The
ls14
source specifies a python script
that returns hosts and groups based on an Ansible inventory.
Now we have three sources in total: staticyaml
which we created in the
beginning, and ssh
and ls14
, which we have added just now. Remember, we have
configured staticyaml
to be the default, so we now have to specify the other
sources explicitly:
# run `uptime` all hosts in ~/.ssh/ssh_config
clush -s ssh -a uptime
# run `uname -r` on all hosts defined in the ansible inventory
clush -s ls14 -a uname -r
# run `uname -r` group private defined in the ansible inventory
clush -s ls14 -g private uname -r
This is about it for configuring ClusterShell. Let’s clarify a few remaining questions.
Q&A
Are the SSH sessions persistent?
No, new session is started for each command, including in the interactive mode. Notice that e.g. changing directories does not work:
clush -w homelab02.localdomain
Enter 'quit' to leave this interactive mode
Working with nodes: homelab02.localdomain
clush> pwd
homelab02.localdomain: /home/user
clush> cd /
clush> pwd
homelab02.localdomain: /home/user
Instead, we have to do this:
clush -w homelab02.localdomain
Enter 'quit' to leave this interactive mode
Working with nodes: homelab02.localdomain
clush> cd / && pwd
homelab02.localdomain: /
Another option is passing bash scripts.
How to run bash scripts?
cat script.sh | clush -w homelab02.localdomain
How to capture output?
Watch out
The files are overwritten each time the command is run. Output is not appended.
cat script.sh | clush --outdir outdir --stderr errdir -w 'homelab[01-02].localdomain' bash
# homelab01.localdomain: /home/user
# homelab01.localdomain: Linux
# homelab01.localdomain: This goes to stderr
# homelab02.localdomain: /home/user
# homelab02.localdomain: Linux
# homelab02.localdomain: This goes to stderr
tree outdir errdir
# errdir
# ├── homelab01.localdomain
# └── homelab02.localdomain
# outdir
# ├── homelab01.localdomain
# └── homelab02.localdomain
cat errdir/homelab02.localdomain
# This goes to stderr
How to copy files?
# upload files
clush -w 'homelab[01-02].localdomain' --copy .zshrc .dotfiles/
# upload files to specified directory
clush -w 'homelab[01-02].localdomain' --copy .zshrc .dotfiles/ --dest /tmp/upload
# download files
mkdir download
clush -w 'homelab[01-02].localdomain' --rcopy .bash_history --dest download
tree download
# download
# ├── .zshrc.homelab01.localdomain
# └── .zshrc.homelab02.localdomain
How about sudo
?
If you cannot ssh as root and you need to use sudo to run privileged tasks,
create a new file clush.conf
in your ClusterShell configuration directory and
add the following lines. Here, we define a mode and set a command prefix to all
commands run via clush:
# ~/.config/clustershell/clush.conf
[mode:sudo]
password_prompt: yes
command_prefix: /usr/bin/sudo -S -p "''"
You can now run the clush
with the -m sudo
flag. All hosts must share the
same sudo passwords, which is far from ideal. It is more secure to allow root to
SSH in with a SSH key backed by FIDO2 or TPM.
Various SSH keys, ports, hostnames, options, etc.
Use SSH config to configure these. There is a tool ansible-inventory-to-ssh-config that can help with that.
How about SSH host keys?
SSH host keys are used to authenticate the server to the client. Before a first connection, the user must interactively verify the SSH host key before connection. Here’s a simple bash script that fetches hostnames for a list of domains.
while read domain; do
if ! ssh-keygen -F "$domain" -f ~/.ssh/known_hosts | grep found; then
ssh-keyscan -H "$domain" >> ~/.ssh/known_hosts
echo
fi
done
The script does not work for IP addresses and non-default ports. For these use cases, the command would like something like this:
if ! ssh-keygen -F '[127.0.0.1]:2222' -f ~/.ssh/known_hosts | grep found; then
ssh-keyscan -p 2222 -H '127.0.0.1' >> ~/.ssh/known_hosts
fi