3. Faster: Automating your scripts
Now you have a few scripts to play with you might want to find tools that amplify your scripts, automating your automations and letting machines take the strain for you. I’ll introduce a few tools below and I hope they’re helpful. None of them were too difficult to install or set up, though I would recommend treading carefully with any process that you have to use sudo for. If you have a question that directly relates to my blog I’d be happy to chat too, so drop me a line.
It’s not really an automation tool, but I’m adding it here first because automation might increase your CPU usage, and multiple scripts can quickly overburden your system leading to out of memory kills or crashes. Stephen McConnachie installed this to the RAWcooked virtual machine a few months ago and it’s provided excellent insight into CPU perfomance, particularly when juggling multipe jobs with just 8-cores and 12GB of RAM. I regularly consult netdata’s CPU display to check it’s not in the high 90% before I decide to run a quick FFmpeg framemd5 command or mediainfo enquiry. It provides live performance monitoring for systems and applications via a web browser (see image above), while storing long-term metrics for several weeks. The best thing about it though, it’s free open-source software! It runs on Linux, MacOS and a few others and the project is hosted on GitHub.com where you can download it. Take a look at their live demos which show everything this software is capable of – something I need to investigate further!
You’ll have seen GNU Parallel called throughout the BFI’s scripts, where it’s being used to trigger RAWcooked, mv commands and rm commands. Parallel is a shell tool which executes multiple jobs in parallel, hence the name. It has some useful command line features such as –eta which gives an estimated countdown to job completion, and –joblog which logs job successes and fails. When combined with –resume or –resume-failed, –joblog restarts jobs that haven’t completed. It’s an external programme so you will need to install it, and you can find out more at GNU.org. Once installed check out the
man parallel page via Terminal which is packed full of operational instructions. There are a tons of introductions to GNU Parallel on the internet and blogs with really great information so I’ll let you have a search. I will direct you to a really great series of intro videos available at YouTube by Ole Tange.
Cron is like a mystical automation tool I’ve heard mentioned a few times in various IT offices by Linux ‘yodas’ I’ve encountered. I didn’t dream I would ever be able to access and utilise it so easily and frequently in my day to day workflows. Cron – from Chronos the Greek word for time – is a job scheduler available to some Unix-style systems including Linux and MacOS.
A crontab (cron table, which contains the cron jobs) contains a series of lines each representing an individual job. Like scripts, you can comment a line out prefixing it with # if you want to pause a job for a while. Each line has a fixed format that lists in a specific order minutes, hours, day of the month, month, day of the week, user and command to execute.
*/15 * * * * root /path_to_script/bash.sh
In the picture above you can see this pattern being followed. The white lines are active cron jobs, the top two activate rawcook.sh and are set to run every fifteen minutes all day and night. The bottom two run post_rawcook.sh every 8 hours, the first on the hour and the second at five minutes past, to stop them activating simultaneously. The rawcook.sh cron jobs activate on the hour and every fifteen minutes following, and post_rawcook.sh activates at midnight, 8am and 4pm.
With our Ubuntu virtual machines we run the crontabs from the system crontab as a root user, so within /etc/crontab. It’s not always recommended to use this cron, and be careful not to change any system jobs that will be listed within this table. To edit this you will read you should use ‘crontab -e’ in Terminal, but as I’m editing the system crontab I use ‘sudo nano /etc/crontab’ and edit this table. Read more about it here at Configuring Crontab. And if you need help configuring your job lines, specifically how to schedule the date and time, then check out Crontab Guru, the examples page which I’ve found incredibly helpful. Out of the same stable as Crontab Guru is Cronitor, a cron monitoring programme that helps you when cron goes drastically wrong (and it can if you don’t have any measures to prevent scripts running concurrently). Thankfully, I recently discovered Flock and if you’re a Linux user I’d recommend you check it out.
Flock was a delightful little find for me a few months ago, when I was struggling to limit the potential of too many cron jobs starting at once. I was aware that lock files were being used for other BFI scripts, and a search led me to Linux’s own Flock. You may have seen in the image above a Flock Lock Details column. It includes a path to the flock programme, instuctions to the software and a path to a flock lock file which you will need to create yourself before you can use flock.
I put my lock files in /var/run which is home to many log files, and named them rawcook1.lock and post_rawcooked2.lock, etc. I had to set the permissions to chmod 777 to ensure they could be written to. Once you have cron jobs working and the locks are active you can use ‘fuser -v /var/run/rawcook1.lock’ to view the active Process IDs (PID) associated with the job. As shown in the image here it usefully lists the PID and the command associated with it. So I can see from these two locks that three instances of RAWcooked are operating, and only one of those is currently encoding using FFmpeg. If, for any reason, I needed to terminate these script processes I could use the kill command against the PID number 5095 to terminate the bash element. If you have no current jobs locked when you run the ‘fuser’ command Terminal will just return nothing. On occassion these lock files have been erased from /var/run, so you might need to recreate them from time to time.
Flock combined with bash scripts and cron scheduling has created my dream RAWcooking team and never causes me any problems with overloading the CPU. The one minor drawback I have experienced with Flock is with emailed error messaging which I tried to implement recently. Crontab allows you to specify an email address that it will send stderr messages out to if a job fails. But because of the frequency of my cron jobs issuing every 15 minutes, and that Flock reports an error when a lock attempt fails, I was receiving upto three error emails every quarter of an hour which rapidly filled my inbox. I discovered Flock through this awesome blog which will help you set it up too Prevent cronjobs from overlapping in Linux by Mattias Geniar.
As part of my experimentation with error notifications I also discovered Cronic. It stops repeated email notifications being sent from cron, limiting the output to only when an error occurs. This would be anything with an exit value other than 0, which means script executed successfully. The thing I liked about cronic was you could combine it with the bash xtrace feature (-x) and the cronic email would pass on the debugging output. You implement it by simply adding a column to crontab called cronic, one column before the command.
I only briefly experimented with it. I remember having really interesting results but didn’t quite configure it properly so I want to return to it soon. I did experience problems running it with Flock every fifteen minutes again – but I think with tweaking I can find a good solution.