Finding the deepest AV metadata with MediaConch and MediaTrace

Recently I’ve needed to find new ways to expose specific metadata omissions within various AV files and this has led me to investigate the full power of MediaConch from MediaArea.net!

In case you don’t know it, MediaConch allows you to run bespoke AV metadata policies against your media files either one at a time through a GUI, or via command line which allows your to automate the process. To validate the supplied file, MediaConch compares set ‘rules’ within the XML policy against the file’s metadata, compiling a list of passes or fails per rule. This is then returned to you detailing where the AV file failed against the supplied policy. There are public policies available to use, or you can write your own.

MediaConch online web user interface, image courtesy MediaArea.net

I didn’t know about Media Area’s MediaConch web user interface MediaConchOnline. I just never spotted the little link on their website (just above the donate bar). If you’ve never used MediaConch then I recommend you try the web UI as a first step in testing your files against public policies, and then making your own simple policies. It takes so much of the complexity out of XML policy creation for the beginner and allows you to download your creation to use in your automated workflow.

As I didn’t use the web UI, much of my time was spent puzzling out how to make the policies work using a text editor and writing them into XML format. Many hours were spent wondering why nothing happened when I ran the policy against a file via command line – usually because I had formatted the XML wrongly! And as a complete newcomer I had a few headaches trying to navigate the ‘and’ ‘or’ policy options.

Thankfully those days are behind me, and this year there have been two MediaConch projects that have taught me a great deal about writing policies. Both have required using Media Area’s MediaTrace, a feature you can use via the web UI or when manually writing your XML policies.

DPX deep metadata discoveries
Video tape policy developments

I provide a quick overview of both below, describing the need that arose and the steps taken to create the metadata policy. Massive thanks to Jérôme Martinez from Media Area for his advice on creating MediaTrace policies in MediaConch. These policies can take longer to process, but the depth of detail you can check is so useful. Thank you again Jérôme and team for such an amazing tool!

DPX deep metadata discoveries

The BFI National Archive is currently redefining its DPX scanning and encoding policy and this has required some detailed analysis of DPX metadata to inform development of new policy requirements.

I must thank my colleagues in the Data and Digital Preservation Department who did most of this metadata work – Andrew Sargeant, Mat Fernandes, Lucy Wales and Stephen McConnachie. Much of this involved studying SMPTE and FADGI documentation, with additional feedback from Merle Friedrich from Technische Informationsbibliothek (TIB).

To support these proposed metadata changes we’ve needed to define a MediaConch policy that can pinpoint each, and fail when something critical isn’t present. This new policy is available to view on Media Area’s Public Policy page.

The complexity started when we were trying to identify how to check if the magic number metadata was little or big endian, and no choices appeared to support this enquiry in the web UI. I’d spent some time looking through the other public policies in the past and had seen examples that seemed to dig deeper into metadata, so decided to email Jérôme and ask him if any/all metadata you could find while using MediaInfo could be tested in a policy.

His answer was to describe to me the process of exporting a MediaTrace XML and then using the block data within to write policy rules using the ‘scope=mmt’ flag. Exporting to XML provides an easy to read structure which is helpful when building the policy, so I gave it a go using some AV files that I had to hand! You can use MediaInfo or MediaConch to make this XML using these commands via Terminal or Windows Command Prompt:

mediainfo --Details=1 --inform=XML <file>
mediaconch -tt -fx <file>

The resultant XML file doesn’t immediately make a lot of sense if you’re more used to reading MediaInfo’s standard text output. Everything is block based, but these block names form the basis of your MediaTrace rule in your policy. Below is an extract of a result I had for a DPX file:

<?xml version="1.0" encoding="UTF-8"?>
<MediaTrace
    xmlns="https://mediaarea.net/mediatrace"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="https://mediaarea.net/mediatrace https://mediaarea.net/mediatrace/mediatrace_0_1.xsd"
    version="0.1">
<creatingLibrary version="19.09" url="https://mediaarea.net/MediaInfo">MediaInfoLib</creatingLibrary>
<media ref="0112624.dpx" parser="DPX">
<block offset="0" name="Generic section header" size="1664">
    <block offset="0" name="File information" size="768">
        <data offset="0" name="Magic number">SDPX</data>
        <data offset="4" name="Offset to image data">65536</data>
        <data offset="8" name="Version number of header format">V2.0</data>
        <data offset="16" name="Total image file size">17645568</data>
        <data offset="20" name="Ditto Key">0</data>
        <data offset="24" name="Generic section header length">1664</data>
        <data offset="28" name="Industry specific header length">384</data>
        <data offset="32" name="User-defined header length">512</data>
        <data offset="36" name="FileName">c_1416563.0112624.dpx</data>
        <data offset="136" name="Creation Date">2019:02:14:13:24:12:GMT</data>
        <data offset="660" name="Encryption key">4294967295</data>
        <data offset="664" name="Reserved for future use">(104 bytes)</data>
    </block>

The details we need for our rule start after the ‘media ref’ tag, which gives the DPX name and parser. The next block, ‘Generic section header’ contains another block ‘File information’ which includes a series of different data entries including a ‘Magic number’ entry of SDPX (or big endian).

Now we have all the information we need to build our MediaTrace rule, placing these block names into the value section and separating them each with a forward slash. You can also enter the details in the value block into the Web UI when using the MediaTrace builder, and set your operator to equal to and enter your value SDPX.

<rule name="Magic number is SDPX (Big Endian)" value="Generic section header/File information/Magic number" occurrence="*" operator="=" scope="mmt">SDPX</rule>

But of course there are little and big endian DPX to check for, so this rule should be used as part of an ‘or’ policy which allows for either of the two entries to be present for the policy to pass:

<policy type="or" name="Magic number is Big or Little Endian">
  <rule name="Magic number is SDPX (Big Endian)" value="Generic section header/File information/Magic number" occurrence="*" operator="=" scope="mmt">SDPX</rule>
  <rule name="Magic number is XPDS (Little Endian)" value="Generic section header/File information/Magic number" occurrence="*" operator="=" scope="mmt">XPDS</rule>
</policy>

And then of course, to use this within a larger policy you’d need to format it within an ‘and’ policy, given below, to ensure that everything listed in the policy is checked for, including one of your two options:

<?xml version="1.0"?>
<policy type="and" name="BFI DPX basic checks" license="MIT">
  <policy type="and" name="DPX conformance check">
    <rule name="Format is DPX" value="Format" tracktype="General" occurrence="*" operator="=">DPX</rule>
    <rule name="File extension is DPX" value="FileExtension" tracktype="General" occurrence="*" operator="=">dpx</rule>
  </policy>
  <policy type="or" name="Magic number is Big or Little Endian">
    <rule name="Magic number is SDPX (Big Endian)" value="Generic section header/File information/Magic number" occurrence="*" operator="=" scope="mmt">SDPX</rule>
    <rule name="Magic number is XPDS (Little Endian)" value="Generic section header/File information/Magic number" occurrence="*" operator="=" scope="mmt">XPDS</rule>
  </policy>
</policy>

Working in this way we managed to track down most of the data needed for our public policy for DPX metadata within these metadata blocks. I’ve yet to locate a few additional fields, and haven’t yet uncovered a method to identify all potential field block names when they don’t exist in the file. I’d be grateful for any tips how to do that!

Take a look at all the metadata entries in this policy at Media Area’s Public Policy page – BFI DPX metadata conformance checker. It’s a very detailed document that is designed more as a guide to inform where metadata omissions may exist within a scanner’s metadata. If you wanted to use a policy like this in automated scripts which relied upon a pass/fail statement then it may be better to use fewer rules allowing for differences in scanner technology, age of said tech and DPX versioning.

Video tape policy developments

We’ve had failures with V210 mov to FFV1 mkv transcodes, and some excellent sleuthing from my colleague Michael Norman has revealed this is down to the presence of a Source_Delay value in the audio tracks (see image). It’s been quite common to receive files with Source_Delay in the video track alone, and these encode to FFV1 mkv fine. However, a handful of failures all have an equal Source_Delay value in the audio track as well as in the video track which prevents lossless encoding (checked by comparing frameMD5 manifests for source and output). We’re not sure what causes this, but to prevent the files coming through to us we had to uncover a method to identify this for our suppliers.

Looking into the metadata closely, you’ll find that although Source_Delay is visible in MediaInfo’s full display output using mediainfo -f <file> this specific field is not present in the accessible metadata and can’t be directly called in a policy using Source_Delay. Because of this I started to think our best possible means to identify the problem was by writing a script which uses FFprobe or MediaInfo to isolate the audio track metadata and run a string check. But I thought I’d dig a little deeper with MediaConch, and employed the MediaTrace technique again to see what I could find.

I output MediaTrace data for a failed V210 mov to XML and studied the track block data to find some indication of where a Source_Delay value can be pinpointed (see images below). Unfortunately you can’t differentiate between the track types in MediaTrace other than by the order they appear. Single video track files seem to have the first track block as video, with audio tracks in subsequent blocks. This means a call to a video track is made in exactly the same way as a call to an audio track, and the search for a delay could highlight in any of the blocks, invalidating the search for it in audio only.

So I tried to think of a method that would allow me to highlight an audio track and bypass the video track with the information I was seeking. This test was based on my assumption that MediaArea’s metadata tools read data line-by-line, top down. I wasn’t certain, but it seemed likely!

An audio track, notable by it’s volume metadata being greater than zero.

I found that video tracks had an obvious track width and height greater than zero, whereas the audio tracks just displayed zero. Similarly, audio tracks had a volume value greater than zero, and video tracks were zero volume. I concluded that if I wanted to find an audio track first, and then it’s Source_Delay I needed to look for a volume level greater than zero, or a track width/height equal to zero.

An audio track with Media time value greater than zero.

Now I had to locate a potential field that represents Source_Delay in the MediaInfo output. This turned out to be in the edit block, where an edit list contains data relating to duration, media time and media rate. In the correctly formatted V210 files this data was ‘0’, but this sample had a positive entry in the Media time metadata of the edit block (see above). Surely this coincidence is indicative of the delay, though I can’t claim to understand how the two correlate. Please do drop me a line if you know what edit block Media time represents!

So I ended up writing a short ‘and’ policy that simply expressed the three criteria that should be met, and in the correct order they should appear in an audio track block:

Track width must equal 0
Track volume must be greater than 0
Media time data must equal 0 – to pass the policy!

This way the track type tested has to be audio before the following criteria all aligned. A video file would fail the first two, skipping to the next track block where the audio would pass all three, hopefully:

<?xml version="1.0"?>
<policy type="and" name="Source delay error identification" license="MIT">
  <description>Test if there are source delay timings in the audio tracks for a given SD PAL capture. Where present the MediaConch policy will fail, notifying that the given MOV will not transcode to FFV1 Matroska and should be recaptured.</description>
  <policy type="and" name="Find source delay in Audio track">
    <rule name="Track Width = 0" value="File header/Track/Track Header/Track width" occurrence="*" operator="=" scope="mmt">0.000</rule>
    <rule name="Track Volume > 0" value="File header/Track/Track Header/Volume" occurrence="*" operator=">" scope="mmt">0</rule>
    <rule name="Track source_delay = 0" value="File header/Track/Edit/Edit List/Entry/Media time" occurrence="*" operator="=" scope="mmt">0</rule>
  </policy>
</policy>

This now forms part of our regular automated metadata policy conformance checks, as this code has been spliced into the V210 mov policy. Any files that now fail because of Source_Delay in the audio will be pulled out of the encoding path and rejected for replacement.

Thanks MediaConch for helping find a solid solution!

Media Area’s support network

Another thing I didn’t realise until quite late into using Media Area products is that you can sign up for paid membership to Media Area. This provides much needed financial support for the tools you use regularly, and depending upon the level you commit to you can have a say in the outcome of their product development. I’d definitely recommend at least becoming a member of this worthwhile resource!

Links

Media Area’s guide to using MediaConch:
mediaarea.net/MediaConch/Documentation/HowToUse

Media Area’s guide to using MediaTrace:
mediaarea.net/MediaTrace

Ashley Blewer’s training slides provide a brilliant introduction to MediaConch with images showing what passes and fails look like in the web UI. Highly recommended resource!
training.ashleyblewer.com/presentations/mediaconch.html

DPX deep metadata discoveries

Video tape policy developments

Media Area’s support network

Links

Share this:

Leave a comment Cancel reply