Github Actions is a handy CICD system conveniently built into Github
and nicely integrated with the rest of the source code management side of the
house. It's pretty flexible and can be trigger in numerous ways and
workflows can be connected together to suit a variety of use cases.
Components of various complexity can be broke out and shared as well.
It's really nice stuff and way better than the bad old days of managing a
sketchy Jenkins node, but it does have a few limitations.
One such limit is a 6 hour maximum time a job is allowed to run. After that, the job will be unkindly killed and the workflow likely will be marked as failed. This applies not only if you are using the Github hosted runners, but also for self hosted runners as well.
For quick builds and tests this isn't an issue, but on occasion you may run into use cases where you really need something to be able to work beyond this limitation.
By leveraging some features of Github Actions and a few shell tricks, I'll demonstrate one such way to work around this.
Today we will cover:
- Job dependencies
- Job outputs
- Controlling job time with
timeout
- Saving and restoring artifacts
- Other considerations
Job Dependencies
A Github Actions workflow is made up of one or more job that contains one or more steps. The six hour limit for Github Actions is for the cumulative time of a single job. We can use this to our advantage and string together multiple jobs.
Github Actions handles job dependencies by using the needs syntax. For example I may have two jobs try1, and try2. I can make try2 depend on and wait for try1 by adding the following to the try2 job:
try2:
needs: try1
For complex workflows, it is possible to have a list of dependencies, or to make execution depend on an 'if' block. Let's leverage that ability in the next step.
Job Outputs
We would now like to be able to control if our next job executes based on a value from a step in our current job. We we can set environment variables and refer to these within a job like so:
- name: Set env var
run: echo "MYVAR='1234'" >> $GITHUB_ENV
This won't quite work for us though. Subsequent jobs will not have access to this environment, and there are various other restrictions on what context you can refer this this var. Instead, we need to look into setting outputs.
First we need to set the output from a step within a job:
- name: Set step output
id: mystep
run: echo "done=true" >> $GITHUB_OUTPUT
This syntax mirrors the environment variable setting syntax, but here
we indicate that this is a step output. We will want to refer to this
particular step later, so we specify an id of mystep
as well.
At this point we can refer to our output like so steps.mystep.outputs.done
.
We can use this in any way we would normally use a Github
context
such as in logic statements, or other references.
This still is not referrable outside of this job, however. We will need to explicitly indicate that our job has outputs as well. For example:
try1:
outputs:
done: ${{ steps.mystep.outputs.done }}
At this point we can now refer to the outputs from the job try1
from our
subsequent jobs that we setup with our dependencies above.
Let's make our second job execution conditional:
try2:
if: needs.try1.outputs.done == 'false'
needs: try1
Now the job try2
will only execute if the output of job try1
is set to the
string 'false'. Note that this output is available from the
needs context.
Without specifying the job in needs
we will not have access to this job
output.
Okay, we have our basic flow sorted, now how will we run or skip subsequent jobs based on a timeout?
Controlling job timeout
Github Actions does not have a direct way of exiting out of a job after a specific amount of time. Luckily the GNU coreutils timeout command can help us here. This command will let us set a max wait time for a command and will send a signal after the specified timeout. This is quite configurable and will allow us to determine if we want to preserve the original command status, send a different signal besides the default SIGTERM, or if a subsequent SIGKILL will be sent.
A simple example:
$ timeout 30s sleep 90
We will want to get a bit fancier and combine this with our output setting step above:
- name: Set step output
id: mystep
run: timeout 300m ./longcommand.sh && echo "done=true" >> $GITHUB_OUTPUT || echo "done=false" >> $GITHUB_OUTPUT
In the above example, we set a max timeout of 300 minutes for the script
longcommand.sh
. If this times out done
is set to 'false' if we don't time out
done
is set to 'true'. This is a simple example and you may wish to have
more complex logic here.
Note that the timeout is set to 300 minutes, not 360 minutes. Recall that the max job timeout is for the entire job, including any job setup, cleanup, and storage. Here we have added an amble buffer.
Now our step output, and therefore our job output setting will be dynamically set based on the results of this command.
Saving and Restoring Artifacts
In some scenarios, we may wish to conditionally save our work in progress if our job was interrupted, but to do something different if our work is complete. This can be accomplished by leverage the workflow artifact storage feature, while referencing our outputs we setup above. For example:
- name: Success!
if: steps.mystep.outputs.done == 'true'
run: |
echo "Success we are done!"
- name: Saving work.
if: steps.mystep.outputs.done == 'false'
uses: actions/upload-artifact@v3
with:
name: inprogress
path: ./work
retention-days: 2
Our downstream job can then restore this artifact before we restart our work:
- name: Restore inprogress work
uses: actions/download-artifact@v3
with:
name: inprogress
Other Considerations
A full example of the concepts above can be found here.
Depending on the logic of our jobs, and the number of dependent job runs for your use case, it may make sense to capture some of this in a reusable workflow or perhaps a composite action. That'll be a discussion for another day.