At this point, it is expected that you will have a simple lab setup with some virtual devices and some simple test boards, each regularly passing health checks. It is generally a good idea to allow time for your lab to settle. Run a number of test jobs and understand the administrative burden before trying to expand your setup.
Once you are happy that things are working and you know how to run that simple lab, here are suggested steps to follow to grow it.
When planning your lab, keep in mind the basic requirements of automation and LAVA:
The simplest LAVA instance is a single master with a single worker on the same machine. Adding more devices to such an instance may quickly cause problems with load. Test jobs may time out on downloads or decompression and devices will go offline.
The first step in growing a LAVA lab is to add a remote worker. Remote workers can be added to any V2 master. To do so, use the Django administration interface to add new devices and device types, allocate some or all devices to the newly created remote worker and optionally configure encrpytion on the master and on the worker for secure communication between them.
As load increases, the master will typically benefit from having fewer and fewer devices directly attached to the worker running on the master machine. Complex labs will typically only have devices attached to remote workers.
Depending on the workload and admin preferences, there are several lab layouts that can make sense:
This is the starting layout for a fresh installation. Depending on the capability of the master, this layout can support a small variety of devices and a small number of users. This layout does not scale well. Adding too many devices or users to this setup can lead to the highest overall maintenance burden, per test job, of all the layouts here.
In all of these example diagrams, Infrastructure represents the extra equipment that might be used alongside the LAVA master and workers, such as mirrors, caching proxies etc.
A medium to large lab can operate well with a single master controlling
multiple workers, especially if the master is a dedicated server running only
lava-server
.
A custom frontend can use custom result handling to aggregate data from multiple separate masters into a single data set. The different masters can be geographically separated and run by different admins. This is the system used to great effect by KernelCI.org.
When different teams need different sets of device types and configurations and where there is little overlap between the result sets for each team, a micro-instance layout may make sense.
The original single lab is split into separate networks, each with a separate complete instance of a LAVA master and one or more workers. This will give each team their own dedicated micro-instance, but the administrators of the lab can use common infrastructure just like a single lab in a single location. Each micro-instance can be grown in a similar way to any other instance, by adding more devices and more workers.
The optimum configuration will depend massively on the devices and test jobs that you expect to run. Use the multiple masters, multiple workers option where all test jobs feed into a single data set. Use micro-instances where teams have discrete sets of results. Any combination of micro-instances can still be aggregated behind one or more custom frontends to get different overviews of the results.
As an example, the Linaro LAVA lab in Cambridge is a hybrid setup. It operates using a set of micro-instances, some of which provide results to frontends like KernelCI.org.
Some labs have found it beneficial to have identical machines serving as the workers, in identical racks. This makes administration of a large lab much easier. It can also be beneficial to take this one stage further and have a similar, if not identical, set of devices on each worker. If your lab has a wide range of test job submissions which cover most device types, you may find that a similar layout helps balance the load.
Consider local mirroring or caching of resources such as NFS rootfs tarballs, kernel images, compressed images and git repositories. It is valuable to make downloads to the worker as quick as possible - slow downloads will inflate the run time of every test.
One of the administrative problems of CI is that these images change frequently, so a caching proxy may be more effective than a direct mirror of the build system storage.
Conversely, the use of https://
URLs inside test jobs typically will
make caches and proxies much less effective. Not supporting https://
access to git repositories or build system storage can have implications
for the physical layout of the lab, depending on local policy.
Depending on the lab, local mirroring of one or more distribution package archives can also be useful.
Note
This may rely on the build system for NFS rootfs and other deployments being configured to always use the local mirror in those images. This can then have implications for test writers trying to debug failed test jobs without access to the mirror.
Consider the implications of persistence. LAVA does not (currently) archive old test jobs, log files or results. The longer a single master is collating the results from multiple workers, the larger the dataset on that master becomes. This can have implications for the time required to perform backups, extract results or run database migrations during upgrades.
Consider reliability concerns - each site should have UPS support. Some sites may need generators as well. This is not just needed for the master and workers: it will also be required for all the devices, the network switches and and all your other lab infrastructure.
Devices in LAVA always need to remain in a state which can be automated. This may add lots of extra requirements: custom hardware, extra cabling and other support devices not commonly found in general hosting locations. This also means that LAVA is not suitable for customer-facing testing, debugging or triage.
Important
If the master and one or more of the workers are to be connected across the internet instead of within a locally managed subnet, encrpytion on the master and on all workers is strongly recommended.
LAVA V2 supports geographically separate masters and workers. Workers can be protected behind a firewall or even using a NAT internet connection, without the need to use dynamic DNS or other services. Connections are made from the worker to the master, so the only requirement is that the ZMQ ports configured on the master are open to the internet and therefore use encryption.
Physically separating different workers is also possible but has implications:
The Linaro lab in Cambridge has provided most of the real-world experience used to construct this guide. If you are looking for guidance about how to grow your lab, please talk to us on the lava-devel mailing list.