DIY Python microservices in audiovisual preservation workflows

This summer I had an amazing opportunity to work with a Computer Science PhD student for a few days receiving assistance writing and learning Python, resulting in the development of a couple of scripts for the Media Archive for Central England (MACE). I’ve been struggling to work with Python for over a year since discovering the IFI Irish Film Archive’s amazing scripts and how they’re used to streamline and standardise complicated audiovisual (AV) workflows. I’ve been using command line ‘for loops’ with FFmpeg for automation which has helped meet intensifying workflows. However, my growing understanding of the need to generate better data about AV files, and the need for fixity checks requires a new level of automation — script level!

This blog will reflect on a few things I’ve learned so far. I’m still new to Python, so please be aware that I might not provide the best solutions to issues, and will probably edit it as my knowledge increases. All feedback is very welcome.

CHAPTERS

What is Python?
Microservices
Using the IFIscripts

copyit.py

seq2ffv1.py

normalise.py
Learning from my installation errors
Virtual Environments
MACE’s first Microservice

1. What is Python?

For those totally new to programming, Python is one of many scripting languages such as C, C++, Ruby, HTML, or Java. It’s reputed to be one of the easiest to learn and most versatile to use. Considered a ‘glue’ language it can easily integrate different systems and has become popular with Computer Science machine learning and algorithm development. It is a cross-platform script and should work easily on any operating system. For Mac users Python comes preinstalled, so it’s really easy to set up and use. The only difficulty for institutional use can be admin installation permissions, particularly with regard to upgrading Python versions or installing dependencies such as FFmpeg. Python 3.7 is the latest version but currently MacOS only comes pre-installed with Python 2.7 – which is being phased out January 2020.

The video above shows you how simple it is to run a script (IFIscript copyit.py, more on this below) using Terminal or Command Prompt. Scripts are often activated by just typing two words, such as “python scriptname.py” and dropping a path to a file onto a command line. All the complexity in the script becomes evident when you open the .py file in a text editor. A few simple functions can take a couple of hundred lines of code particularly when the script factors in error handling etc.

To run one yourself, first make sure the script you want to use is compliant with the version of Python installed on your computer. You can check by opening Terminal or Command Prompt and typing:

python --version

You may notice my own command in the video above specified python3 in the call, which is a result of working with MacOS preinstalled with Python version 2, and the way I installed Python3 on my home laptop. I have to call python3 so that the correct version is used to execute the script. If I just used the python call it would default to the preintsalled version 2. I’ve more on this in chapter 4 that will clarify how to correctly install Python3 so you don’t have the same issue.

For MACE’s AV preservation the appeal of Python scripting lies in its relative ease of use, and easy learning curve. No one at MACE is a computer scientist, but with the increasing drive to use FFmpeg and open source softwares via command line it becomes apparent that a language like Python can amplify the usefulness of these tools. Python can ‘glue’ open source archive tools seamlessly into the code creating powerful archival workflow solutions. Maintaining archival standards across different users of different Operating Systems is useful even in an archive as small as MACE. The options for script combinations are limitless, and constantly amendable when managed in house. When you start linking scripts together to form a larger interactive workflow you’ve stepped into the world of Microservices.

2. Microservices

“Properly managing audiovisual archival material requires identifying, using, and possibly creating the right tools and workflows to facilitate archival objectives. In creating these workflows, two models are possible. One model is the monolithic architecture, which includes complex all-in-one systems (for instance, a comprehensive digital asset management system). Another model is the microservice architecture, which combines independent tools into a loosely coupled system based upon common underlying standards and understandings. In a microservice architecture, an individual tool may be added, replaced, or upgraded independently of the other tools… The use of microservice architectures within audiovisual archives puts the media and metadata itself rather than the system at the center of archival management.”
Annie Schweikert and Dave Rice

The best resource I found that clearly defines archival usage of microservice architecture was recently published by IASA-WEB: Microservices in Audiovisual Archives. Annie Schweikert and Dave Rice introduce microservice architecture in opposition to monolithic architecture, and explain clearly the Open Archival Information System (OAIS). OAIS originated to assist space programmes preserve and make available data gathered during missions.

Screen Shot 2019-09-11 at 12.45.12.png The picture right illustrates the use of Submission Information Package (SIP) ingested and converted into Archival Information Package (AIP) for storage, and Dissemination Information Package (DIP) as a means for supplying data. This kind of workflow can have many small and overlapping microservices. The document also features interesting script examples taken from City University New York’s microservice workflows, Terminal based command line Bash Scripts that contain instructions for executing tasks on MacOS and Linux platforms. You can see more of their extensive microservice collection here: mediamicroservices/mm.

For a quicker introduction to Microservices take a look at this 30 minute presentation by Dave Rice “Managing Digital Preservation by Managing Despair and Paranoia”:

3. Using the IFIscripts

For an excellent cross-platform alternative that uses Python3 scripts look no further than the DPC award winning IFI Irish Film Archive Scripts, IFIscripts by Kieran O’Leary. The IFIscripts were developed in response to a large intake of material at the Irish Film Institute during their Loopline Conservation Project. From this repository of over fifty microservices, there are many really useful standalone scripts including three that I’m delighted to have used at MACE. You can read everything you need to know about this remarkable collection of open source scripts here: IFIscripts documentation.

To download them and start testing them on your computer visit the IFIscripts GitHub page – https://github.com/kieranjol/IFIscripts and hit the green ‘Clone or download’ button, which gives you the option to Download ZIP. I put this file in my /Users/Joanna/folder, unzip it and then direct my Terminal or Command Prompt into the unzipped folder like this:

cd /Users/Joanna/ifiscripts

I must emphasise you don’t have to know how to write Python to use these scripts. Some knowledge and guidance about Python installation would have speeded the process for me – and which I’ll aim to provide below if you want to try them. However, excellent guidance from Kieran ultimately got me up and running.

A useful step after you’ve navigated into the script folder is to run a few of the help menus for the scripts. It will make sure everything is compatible with your Python version, and give you some guidance on how to run the scripts. Open a script’s helps by typing this:

python script_name.py -h

If your version isn’t correct or there’s a problem somewhere you will receive an Error message in Terminal that can help you track the problem down, by either googling the error or posting it as a GitHub Issue (read more about using using GitHub in this post).

Screen Shot 2019-09-16 at 14.33.22 — A common error I used to get with scripts following upgrade to Python3

The best clue to the primary cause of a problem comes from the Error message itself (shown above as TypeError: write()…) and the last script specific issue listed in a sometimes long list (eg, File “normalise.py”, line 192, in main). This helps you pinpoint the place the script broke, and what type of break it was.

Just a reminder though, before you try them out for yourself it definitely helps to understand how OAIS microservice architecture fits within AV preservation workflows at other institutions such as the IFI, or CUNY. Without this knowledge I found the folder structuring and naming conventions of the scripts a little confusing.

copyit.py

My line manager and I had been discussing checksum verifications on and off for a year or so at MACE, but never settled on a way to implement fixity activities into our already jam-packed workflow. Copyit.py was a revelation! Reading the IFIscript documentation, and seeing how this specific script functioned at the IFI provided a solution I could see working for us straight out of the box. The script allows the user to select either an individual file or directory to copy from, then creates a checksum manifest and copies it to a location you’ve indicated. It then generates another manifest in the destination directory and validates the first manifest against the second, with the script telling you if the checksums match or not!

Screen Shot 2019-08-07 at 14.26.51 — The most wonderful line of code in *Copyit.py*

In the first video of this post I demonstrate copying a directory full of DPX images (dummy DPX files generated by FFmpeg) into another directory, and show how simple this script it to run in Terminal. The operating command used in the first post video above:

python3 copyit.py path_to_DPX_directory/ path_to_location/

As you can see in the video you gain two desktop folders which hold the MD5 manifests of the first set of DPX files and a log of the script copying process. The log is really useful as it will flag any errors encountered during execution of the script, if any arise. The script doesn’t make two folders for each copy cycle, but keeps adding to the first desktop set, so now I keep them on my desktop and clear the logs every now and then. I’ve used copyit.py for transferring files to and from MACE’s Drobo spinning disks, to and from hard drives, and across networks. Any scenario where it is critical that archival file integrity is maintained. I also used it regularly for copying to and from LTO tape. A recent script addition allows you to add -lto or -l to the start of the command which switches copying on MacOS from rsync to gcp and increases copy speeds which is a big help when working with LTO. So your command would appear:

python3 copyit.py -lto path_to_file/file.mkv LTO_Volume/

It seems like a really simple bit of code when you use it day to day, but calling it simple does it no justice. The 758 lines of code carry out the following series of instructions:

Check the drive has write access and if the source is a file or directory
Deletes random files ‘.DS_Store‘, ‘Thumbs.db‘, ‘desktop.ini‘, ‘Desktop.ini‘
Using hashlib will produce an md5 checksum for the file or directory of files to be copied, and store that in a desktop folder named moveit_manifests
Checks the destination has enough available space to start the copy
The files begin copying, using script system tools specific to the operating system running the script, eg Win32 uses robocopy or Linux uses cp
Offers an option to use GCP instead of RSYNC on MacOS when copying to/from LTO (found to be a faster option for LTO copying, see amiaopensource/ltopers for more information)
Creates a destination checksum manifest alongside the copied file(s)
Counts for extra files in the destination and compares with those in destination manifest
Checks if you want to overwrite duplicate files or manifests as they’re encountered
Runs a verification check with the two checksum manifests and tells you if your files match or not

This list still doesn’t fully cover all the script functions, so I recommend taking a look at the official documents. Copyit.py is just one in the fleet of IFIscripts that interact together so you will see instructions that relate to external scripts, such as sipcreator.py which takes a file and builds a Submission Information Package. I have experimented with sipcreator.py a few times and it’s helped me understand the OAIS model of archiving, which in turn is helping shape MACE’s born digital developments at present.

This is a great first script to use, as it has no dependencies that need installing external to the IFIscripts. I’m so grateful we have this script and using it brings me joy, everyday. This script has been incredibly helpful in recent weeks after we suffered the loss of an ageing Drobo storage device. I’ve been able to remote in at all hours, and using copyit.py batch copy the vulnerable back up files to a new safe location. I’ve really grown to love this script and I think my colleagues are getting a bit bored of hearing me say it!

Screen Shot 2019-11-01 at 13.05.23.png

UPDATE 1st Nov 2019: Today I discovered a method for looping the copyit.py command in Terminal allowing for all the contents of a directory with a specific extension, such as .mkv, to be copied to an LTO one after another with one command. Up to now I’ve copied multiple items one at a time, or once in the directory the item sit in, and then had to take them out of the directory on the LTO. The second way is quicker but you lose the individial checksum manifest and log for each item, which is a shame.

The example above shows a quick test I ran on my home laptop (with Adobe invoice statements which I’m pleased to say are a thing of the past thanks to my open source life!) and it shows that for each item copied the copyit.py loop repeats and gives the “Your files have reached their destination and the checksums match” mid stream. You will have to run back through the outputs to check all has copied okay, but it’s a small price to pay for batch copying in this way with fixity checks incorporated. I’m so excited to have found this method, and thanks to Kieran O’Leary for answering a Twitter question I asked about bash for looping scripts with his IFI Batch Workshop notes, which led me to this discovery. The code I’ve been using successfully all today for MKV to LTO copies on my MacOS Terminal is:

find /directory_to_search/ -name "*.mkv" -exec python3 copyit.py -l {} /path_to_LTO_volume/ \;

It uses the find command with the exec option, allowing you to just copying the found .mkv files using -exec to execute the python script. Make sure you cd into IFIscripts folder first, and link your paths to this location by dragging and dropping. And as the IFA Batch Workshop notes say at the beginning “WARNING – with great power comes great reponsibility! Always perform tests in an isolated environment, perhaps on a test folder on an external drive. Be incredibly careful when launching any batch process on archival material.” I’ve updated my DPX preservation workflow post with an ‘Automating copyit.py’ section that provides more for loop and find loops scripst like this one for MacOS, and Windows, automation of copyit.

seq2ffv1.py

This is a script that I am testing at the moment as it’s just become Python3 ready, but the first few files I’ve created with it on MacOS are being written to our LTO long-term storage following successful checks. I regret not having realised its functionality a lot sooner, as it takes a large DPX or TIFF sequence and losslessly converts it to an FFV1/MKV video file (approximately half the size!) using RAWcooked before verifying losslessness of that file by fully reversing the FFV1/MKV and validating the source checksums. This script has additional dependencies and you will need to have RAWcooked installed, FFmpeg and MediaInfo – more about installing these in my previous blog post here. If you’re a Windows user then you will need to add these three softwares to your System Path – instruction here for Windows. MacOS users can use HomeBrew which will do this for you.

There are a few IFI-centric features to this script as it includes OAIS SIP folder structure creation during execution. You are prompted to choose which operator you are (a list of Irish Film Archive employees), and asked to supply a number in the form of ‘oe1234’ which becomes part of the directory structure. These are relatively small issues that you can adapt once you become a little more familiar with looking at code and tweaking things. For example, I’ve added an extra “agentName=Joanna White” for my own version, seen below in the log file:

Screen Shot 2019-09-14 at 12.52.16 — *seq2ffv1.py* log file detailing all the functions of the script and their success or failure

Seq2ffv1.py creates excellent side car files within the IFI SIP folder structure. This starts with a top directory named the same as your supplied directory, eg Scan01. This directory contains the SIP package which includes a seq2ffv1.py log file (shown above), another directory given the oe1234 name and a text file containing an MD5 manifest of all the contents of your original supplied directory Scan01. Within oe1234 is one directory with a Universally Unique Identifier (UUID). Within this UUID folder are three more directories titled Logs, Metadata and Objects and an MD5 manifest for all the contents of oe1234. The Logs directory contains two more log files, one for the SIP creation (seq2ffv1.py references the external IFIscript sipcreator.py to make the SIP folder structure) and a log for the RAWcooked process. Metadata includes another directory called Supplementals, and two xml files generated by mediainfo – a simple metadata extraction and more complex mediatrace extraction. Supplementals includes the same xml files but for the source folder Scan01. Finally, the Objects folder includes the FFV1/MKV files with the same UUID name issued for the earlier directory.

Up to now I’ve been manually RAWcooking, un-RAWcooking, and then generating a checksum myself of both versions and visually comparing them – a really time consuming and inefficient step. I do feel this check is necessary though, as this software had its first version release October 2018 and is still in an unofficial test phase. You can take a look at Jérôme Martinez’s GitHub development pages to see recent issues and updates to the software. We only started officially converting DPX folders to RAWcooked at MACE in the last few months, but it’s a remarkable piece of software worth your attention. Here’s a description from the website:

RAWcooked easily encodes RAW audio-visual sequences into a lossless video stream, reducing the file size by between one and two thirds. FFmpeg encodes the audio-visual data into a Matroska container (MKV) using the video codec FFV1, and audio codec FLAC. The metadata accompanying the RAW data is fully preserved, along with additional sidecar files such as MD5 checksums, LUT or XML if desired. This allows for the management of these audio-visual file formats in an effective and transparent way. The lossless Matroska video stream can be played in VLC or MPV media players, and writing and retrieving from storage devices such as LTO is significantly quicker. If you need to use the RAW source in its original form, one line of code will easily restore it bit-by-bit, faster than retrieving the same file from LTO tape storage. https://mediaarea.net/RAWcooked

There are many archives already using and supporting RAWcooked including the IFI’s Irish Film Archive, AV Preservation, CNA Luxembourg, National Library of Norway, Northwestern University Libraries, National Library of Wales, Brown Media Archives, and this month the British Film Institute began testing in preparation to move their 2PB of DPX data to preservation storage. Seq2ffv1.py is an awesome script with some amazing features that will completely change the way I handle DPX files in the future – and its inbuilt validation check is a deeply comforting feature. We just need to decide what to do with the excellent side car files this script generates – where we store them and if/how we extract information for database inclusion!

normalise.py

This script is a really nice way to begin converting uncompressed digitised video files to FFV1 and Matroska. I’ve only messed with it a little as I’ve been converting to FFV1/MKV using FFmpeg using for loops to automate the process. I wont write much about it, but if you want a good starting point for reliable bulk FFV1/MKV conversions in your archive then definitely explore using this one! To run the script you set input calls and output calls using -i and -o like so:

python normalise.py -i directory_path/file.mov -o location_path/

It then converts uncompressed files to FFV1/Matroska and runs checksum validations on the file to ensure the conversion was successful. Unlike seq2ffv1.py this script doesn’t enforce a complex SIP folder structure but just pops all the files into your location path which is really helpful if you’re not ready for full OAIS integration IFI style. You do have the option to make calls with the script that generate this structure though, shown best here in the scripts help settings:

Screen Shot 2019-09-16 at 09.46.55 — Help setting options available for SIP implementation: *python normalise.py -h*

Alongside your completed FFV1/MKV file you receive a mediainfo XML file, normalise.py script log, FFmpeg conversion log and FrameMD5 manifests.

4. Learning from my installation errors

These scripts aren’t the first Python I’ve used on my MacPro 5,1. Without really realising I was using Python! I’d downloaded md5tool.py, a script used to generate checksums for the TAR wrapped DPX folders and sidecar files, and had been successfully generating md5s with it for years. These have been written to LTO tapes so when we come to recover TAR wrapped DPX folders in the future we can at least verify the extraction from tape is correct manually.

When I first starting using the IFIscripts they were written for Python2, so there was a small window of opportunity to download and operate them this way. For some reason, unknown to me now, I decided to take the more awkward path of installing Python3 after reading that Python 2.7 is being phased out. Perhaps I just hadn’t realised at this point that the IFIscripts had been written in 2, and that there was some difference in operational usage. It wasn’t as easy getting Python3 installed on my MacPro 5,1 as it should have been – due to a complete lack of understanding and insufficient time to properly research. My two biggest obstacles were:

Finding a way to install version 3 with PIP, a package manager for installing python dependencies which needs to upgrade alongside the version you install.
The scripts needed upgrading to Python3, so I had to find a way to do this with little knowledge of Python.

Dt6kVYnW4AA34uL

I don’t really want to describe this period of painful PIP install experimentation. It’s all a bit of an embarrassing blur of misunderstanding.

The picture right was posted on Twitter around the same time I was stuck in my PIP3 install cycle hell and I found it amusing and reassuring that install issues weren’t just a problem for me.

I think the key error was not sorting out my paths, and making python2 call errors during attempts to get PIP3 and Python3 up and running. I knew about editing paths in Windows environment following installation of FFmpeg to Windows 10, but I hadn’t any experience of PATH editing in MacOS. Following are three links that will help you avoid the errors I made. The first is the official Python Guide to installing 3 correctly to MacOS. The second link will help you understand MacOS paths, and the third is a couple of alternative ways to install Python3 including an install to your Applications folder which might be preferable to you.

Using Homebrew to install Python3 correctly:
https://docs.python-guide.org/starting/install3/osx/

A guide to understand editing paths in MacOS, including explaining the step for editing path location in the installation technique above:
https://coolestguidesontheplanet.com/add-shell-path-osx/

A really nice couple of alternative (and slighter simpler) ways to install Python 3 again using HomeBrew without path editing, or installing it into your Applications folder:
https://www.saintlad.com/install-python-3-on-mac/

With regard to my second stumbling point converting Python2 scripts to Python3 to use IFIscripts, you don’t need to worry any more because the scripts are now kick-ass version 3! At the time I used the python package called 2to3 which reads the code and applies a series of ‘fixes’ that makes the code almost totally usable. I tried this out on copyit.py, and with a few bracket () changes on print statements that got missed by the script I managed to make a version of the code that worked successfully across my Windows and Mac computers. You can see my operable version in a pull request on GitHub here – it’s far from perfect but it worked and this gave me so much confidence to keep trying! It also helped me get started with the scripts ahead of the Python3 conversion – useful particularly on my Windows 10 computer which only has Python3 installed on it.

5. Virtual Environments (venv)

Since working with PhD student James Wingate I’ve learnt about the benefit of using Virtual Environments (venv), which come as part of the Python Standard Library for Python version 3.3 onward. Once you have Python3 installed you can create a venv and make local directory based installations of script dependencies that remain within the script directory. This may seem pointless but if you install a lot of dependencies to varying scripts it helps to avoid potential system conflicts – something it may be easy to do when you’re a rookie! Using a venv really appeals to me as I start to look for ways to make scripts accessible to multiple workstations around our office at MACE. I’m hoping that I will be able to use a venv to install and run FFmpeg/MediaInfo based scripts on admin restricted workstations which don’t allow system installs. I think I will have PATH problems to run subprocess calls with FFmpeg and may need some admin assistance from the University of Lincoln’s IT department… We’ll see!

You can even use 2to3 in a venv and make upgrades to older scripts locally. The venv can be any folder on your Desktop, or anywhere on your computer. It’s easy to do by typing a simple line of code. Call on your newly installed Python version 3 to generate a folder on Desktop for you called ENV_DIR:

python -m venv /Users/Joanna/Desktop/ENV_DIR/

Don’t forget to use python3 if you opt for a simple install. The -m venv command makes the virtual environment and the final entry is the directory name the command creates and houses the venv files, in this case a desktop folder called ENV_DIR. Next you need to activate the venv by using a piece of script inside the venv bin. Firstly, navigate into your new venv folder ENV_DIR:

cd /Users/Joanna/Desktop/ENV_DIR/

Then activate the environment by typing:

source bin/activate

You will now be launched into a virtual environment that will be discernible by your Terminal displaying (ENV_DIR) ahead of User$ sign, like so:

Screen Shot 2019-08-22 at 09.48.17

Once the environment is live anything you install within this window using pip install will be installed within this ENV_DIR folder and any scripts written and saved within this environment will work seamlessly. Take a look at the Installing packages using pip and virtual environments.

To deactivate the venv simply type:

deactivate

NOTE: Operating the IFIscripts within a venv will only work if you have FFmpeg, MediaInfo and RAWcooked installed correctly and system paths defined, as the scripts call them as if they’re installed this way. You could still create a virtual environment from the Python3 install and in this venv copy and paste the IFIscripts as a safe space to use and experiment with them, but I don’t see any great advantage to this if you just want to get experience using them. You may as well run them from a regular directory as I do!

Screen Shot 2019-09-16 at 13.17.19 — MACEscripts on GitHub

6. MACE’s first microservice

As I said at the start, in the last few months I’ve been lucky enough to work with a PhD researcher here at MACE who’s helped me learn a little more about Python. Thanks to James’ expertise we managed to developed a few scripts collaboratively in the few days we worked together – the most robust of these is rather unimaginatively named main.py. The concept behind this microservice was to reduce the need to keep two levels of mezzanine files for each digitised asset at MACE. It’s our current practise to hold thousands of ProRes with corresponding MPEG watermarked files and ideally we’d like to get rid of both, replace the mezzanine levels with one H264 MOV, then generate watermarked H264 MP4s on demand for clients, thereby saving long-term storage costs. This is still in development/discussion phase at present but seems a logical move as we increase our HD outputs.

Main.py asks if you want to trim an inputted intraframe file such as ProRes or uncompressed MOV and FFV1/MKV. This is useful to extract a specific section of a programme or remove trims and teases where content isn’t appropriate to supply. It further deinterlaces an interlaced file (or will when I make this small script change in coming weeks), overlays a watermark and embeds copyright metadata before exporting to H264 MP4 super fast using the wonderful FFmpeg library of codecs. Please feel free to download it and give it a go from the MACEscripts GitHub, but ensure the watermark folder is correctly titled and placed in the correct venv folder alongside the main.py script.

The script only accepts one of the four archival audiovisual file dimensions kept at MACE:

SD 720×576
HD 1280×720
Full HD 1920×1080
Cropped HD 1440×1080

The .png watermarks generated for overlay are the same dimensions as the video files, with no interlacing. MACE is currently testing a few MACE logos, however the watermark folder on our GitHub page is simply a copyright symbol set at 20% opacity for test purposes.

Main.py was written by James Wingate, but has been generously refactored and standardised to PEP8 by the wonderful Katherine Frances in recent months. Take a look at the closed GitHub pull requests to see what form these adjustments have taken and how ridiculously helpful they have been. This has been a really useful learning process for me, and has taught me so much about writing scripts. James also introduced me to codewars.com where I now work through small challenges to learn more about the building blocks of python such as variables, functions, arrays, loops and so much more! Codewar measures success by supplying the shortest answer possible to a given conundrum which isn’t the best practise when learning and sharing scripts for AV preservation Archivist Developers. I’ve learnt through Katherine’s input with MACEscripts, and Kieran’s example with IFIscripts, that making scripts easy to read and compliant with PEP8 standards is actually far more valuable to my learning process.

I know it takes more than one script to technically be a microservice, but I’m calling MACEscript’s first output that anyway! I will admit MACEscripts is a shamefaced attempt to emulate the amazing IFIscript in name, style and all. Please take it as a compliment Kieran! 😊 I’d love to eventually populate this space with a combination of adapted IFIscripts, alongside MACE’s bespoke scripts. I don’t think I’m quite capable of writing a script myself from scratch right now, but I think it will be achievable in coming months. In conclusion, I feel like the fog of confusion is finally lifting and this last year has built a good foundation for moving forward into microservice generation here at the Media Archive for Central England.

With thanks MACE for supporting my Python tuition and script development, Dave Rice and Annie Schweikert for their great Microservice documentation and Dave’s excellent video presentation. Thanks to anyone who has helpfully intercepted an undoubtedly vague question via my Twitter account. Thanks to James Wingate (PhD researcher at University of Lincoln) for his tuition and introducing me to virtual environments and CodeWars.com, and to Katherine Frances for amazing PEP8 conformance checks and generally tidying up the script and teaching me loads about scripting. Last but not least many thanks to Kieran O’Leary and the IFI Irish Film Archive for the amazing scripts and generous and valuable support in all things AV preservation and IFIscripts.

A few useful links:

Starting with the strongest of all assets in my working aresenal https://ffmpeg.org/

The award winning status of these scripts via the DPC website: IFI Open Source tools: IFIscripts/ Loopline project, Irish Film Institute

Kieran O’Leary’s blog also has a really excellent post about RAWcooked:
Introduction to FFV1 and Matroska for Film Scans

Check out City University New York’s Television Archive’s mediamicroservices: https://github.com/mediamicroservices/mm

Listing this paper by Dave Rice and Annie Schweikert again – it’s a must read for anyone wanting to understand the differences between monolith and microservices more clearly: http://journal.iasa-web.org/pubs/article/view/70

And here’s another Dave Rice paper I consider a must-read: Reconsidering the Checksum for Audiovisual Preservation: Detecting digital change in audiovisual data with decoders and checksums

The Digital Preservation Coalition have a really nice intro to Fixity and Checksums to help everyone understand them more easily – read more here

If you need human support for your FFmpeg problems then I recommend joining the FFmpeg-User email list. Since joining a couple of months ago my inbox has been filled with some of the most fascinating discussions about this remarkable tool:
https://lists.ffmpeg.org/mailman/listinfo/ffmpeg-user

Official Python tutorials: https://docs.python.org/3/tutorial/index.html

Codewars is a great way to really start to learn how to code from scratch, and the gamification is kind of addictive: www.codewars.com

To find out more about my love of Free and Open Source softwares and services for the archiving sector please take a look at my other posts, particularly Open Source FFV1 video capture workflow for MacOS if you want to what all the fuss is about FFmpeg, FFV1 and Matroska.

Here’s a few other amazing sites you should check out to learn more:
https://github.com/amiaopensource A collaborative space making resources that support the preservation and use of moving image media
https://mediaarea.net/ Home of some amazing archival software developers, every one deeply inspiring people to know
https://ashleyblewer.com/ Open source archival training guru, and programmer!
https://amiaopensource.github.io/ffmprovisr/ Open source FFmpeg solutions for archives
https://trac.ffmpeg.org/wiki/Encode/FFV1 FFV1 cheat sheet and guide to configuration
https://bavc.github.io/avaa/ nothing to do with this subject really, I just really love it!
https://www.fiafnet.org/pages/E-Resources/FFV1_and_Matroska_Reading_List.html
FFV1 and Matroska reading list from FIAF
http://bit.ly/amiamkvslides Saw this recently in a presentation by Stephen McConnachie at the BFI and feel it’s worth sharing here too. Matroska guide by Dave Rice and Morgan O Morel.

5 thoughts on “DIY Python microservices in audiovisual preservation workflows”

Pingback: DPX preservation workflow with RAWcooked and fixity checking – For the love of FOSS
Pingback: Using bash scripts to automate #avpres – For the love of FOSS
Hila Avraham says:

December 16, 2020 at 9:00 am

Amazing info! Thank you so much for sharing, very clear & helpful!

LikeLiked by 1 person

K Wiesinger says:

May 12, 2021 at 6:09 am

First thank you very much for breaking Kieran’s awesome python scripts down..
Concerning the automation of copyit.py:

I get the error message: “find: missing argument to ‘-exec’

I managed to figure out on stackoverflow and the likes that it has something to do with the escape rules – that’s why the “\;” is necessary at the end of the command.
But somehow I still get this error message.

Thankful for a reply

LikeLike

1. Joanna White says:
  
  May 12, 2021 at 7:27 pm
  
  Hi there, thank you for your feedback! If you want to send over the full error output I can help you see if there’s any more clues there. Otherwise it’s not an error I recognise from experience. But happy to try!
  
  LikeLike

What is Python?

Microservices

Using the IFIscripts