This is an explanation for how to edit together sound bite excerpts from longer MP3 files using something called SMIL -- or "Synchronized Multimedia Integration Language."
I've completed some successful experiments with SMIL and Quicktime that provide a promising solution for collaborative editing. A browser-based editing system could use the playlist mechanism to create sequences of sound bites. I discuss this more in these conversations with Lucas Gonze, Colin Brumelle and Farsheed -- and in this blog post: Playlists are to Music as Edit Decision Lists are to Film.
I'm passing along this information along so that some developers can add SMIL export functionality to the Drupal playlist module.
What does all of this mean?
I could upload the audio from the 45+ hours of interviews that I've conducted for this project, and then combine this SMIL mechanism with Drupal so that volunteers could start helping edit the film. This Collaborative Filmmaking schamatic has more details.
These volunteer edits would be dynamically generated online with SMIL, and other people could listen to them and rate them. The good edits could be translated into real offline edits via the IN and OUT times being exported through Final Cut Pro XML generated by Drupal.
SMIL is a pretty simple mark-up language similar to HTML that allows the creation of audio and video edit decision lists.
You can create a small text file that points to the IN and OUT times of audio or video source files, and then this SMIL file can then be played with Quicktime or Realplayer. It is a simple way to edit audio and video together using text mark-up language, which could easily be automatically generated from a playlist of sound clips.
Below are more details for using SMIL for dynamic editing of audio and video content...
DEMO #1: Editing Audio Clips Together
This is an edited demo that pulls three sound bite segments from three larger files:
http://www.echochamberproject.com/files/demo.mov
These are the three audio source files each running around 5 min.
http://www.echochamberproject.com/files/media/jarvis02.mp3
http://www.echochamberproject.com/files/media/nolan02.mp3
http://www.echochamberproject.com/files/media/searls04.mp3
These files are stored in a media subdirectory folder in the same directory as the demo.mov file.
I created the following demo.mov file in a text editor using the IN and OUT timecode data from Final Cut Pro XML. I just saved the text file as a *.mov file, and quicktime reads it as a movie as long as you have "SMILtext" before the "<?xml ..." -- I know it's counterintuitive to have anything before "<?xml ...", but Quicktime won't read it as a movie otherwise.
I also took out all of the carriage returns and extra spaces to get it to work -- I'm not sure if this is absolutely necessary, but I'll go ahead and post a raw text dump without the usual XML formatting.
SMILtext<?xml version="1.0" encoding="UTF-8"?><smil xmlns:qt="http://www.apple.com/quicktime/resources/smilextensions" qt:immediate-instantiation="true" qt:autoplay="true" qt:time-slider="true" qt:chapter-mode="clip"><head><meta name="author" content="EchoChamberProject.com"/><meta name="information" content="Written by Kent Bye"/><layout><root-layout height="240" width="320" background-color="#000000"/><region id="main" height="240" width="320" fit="hidden"/></layout></head><body><seq><audio src="media/nolan02.mp3" region="main" qt:chapter="nolan_367" clip-begin="npt=247.881s" clipBegin="npt=247.881s" clip-end="npt=260.427s" clipEnd = "npt=260.427s"/><audio src="media/jarvis02.mp3" region="main" qt:chapter="jarvis_479" clip-begin="npt=147.647s" clipBegin="npt=147.647s" clip-end="npt=165.932s" clipEnd = "npt=165.932s" /><audio src="media/searls04.mp3" region="main" qt:chapter="searls_688" clip-begin="npt=100.433s" clipBegin="npt=100.433s" clip-end="npt=117.817s" clipEnd = "npt=117.817"/></seq></body></smil>
The units for the Final Cut Pro XML timecode data are in frames -- where there are 29.97 frames per second. And so I divided the frames by 29.97 in order to get the IN and OUT points in seconds. Here are more timecode conversion details.
The great news is that the IN/OUT data was correctly predicted from the Final Cut Pro XML data, which ensures the portability back to the offline editing! In other words, much smaller MP3 files can be used as dummy placeholders for timecode continuity instead of having to upload very large audio or video files.
DEMO #2: Editing Audio Clips Together with Timecode Overlays
Using the same three media files as before, I was able to use the "textstream" functionality of SMIL to overlay timecode data over the edited sound bites files.
I generated this timecode.txt file in an XL spreadsheet, which has the timecode data in both seconds and in frames.
Being able to do this means that it would make it much easier for people to alter and control of the IN and OUT points of sound bites for any type of browser-based editing system.
Here is the timecode demo file:
http://www.echochamberproject.com/files/timecode_demo.mov
Warning: This file may not load on some computers.
Here is the source text of this file:
SMILtext<?xml version="1.0" encoding="UTF-8"?><smil xmlns:qt="http://www.apple.com/quicktime/resources/smilextensions" qt:immediate-instantiation="true" qt:autoplay="true" qt:time-slider="true" qt:chapter-mode="clip"><head><meta name="author" content="EchoChamberProject.com"/><meta name="information" content="Written by Kent Bye"/><layout><root-layout height="240" width="320" background-color="#000000"/><region id="main" height="240" width="320" fit="hidden"/></layout></head><body><seq><par><textstream src="timecode.txt" region="main" system-captions="on" title="captions" clip-begin="npt=247.881s" clipBegin="npt=247.881s" clip-end="npt=260.427s" clipEnd = "npt=260.427s"/>
<audio src="media/nolan02.mp3" region="main" qt:chapter="nolan_367" clip-begin="npt=247.881s" clipBegin="npt=247.881s" clip-end="npt=260.427s" clipEnd = "npt=260.427s"/></par><par><textstream src="timecode.txt" region="main" system-captions="on" title="captions" clip-begin="npt=147.647s" clipBegin="npt=147.647s" clip-end="npt=165.932s" clipEnd = "npt=165.932s"/>
<audio src="media/jarvis02.mp3" region="main" qt:chapter="jarvis_479" clip-begin="npt=147.647s" clipBegin="npt=147.647s" clip-end="npt=165.932s" clipEnd = "npt=165.932s" /></par><par><textstream src="timecode.txt" region="main" system-captions="on" title="captions" clip-begin="npt=100.433s" clipBegin="npt=100.433s" clip-end="npt=117.817s" clipEnd = "npt=117.817"/>
<audio src="media/searls04.mp3" region="main" qt:chapter="searls_688" clip-begin="npt=100.433s" clipBegin="npt=100.433s" clip-end="npt=117.817s" clipEnd = "npt=117.817"/></par></seq></body></smil>
Note that I assigned each sound bite to a Chapter. Being able to display the timecode data will greatly enhance the online editing capabilities.
AFTERTHOUGHTS
The biggest downside to SMIL in Quicktime is that it seems to have to load the entire audio source files before playing an excerpt -- so if you are only interested in playing 15-seconds of a 5-minute clip, then it has to load the entire 5-minutes before it will play the 15 seconds. This seems to be a limiting factor, but it could be minimized by making the source files as small as possible.
This Quicktime developer page on SMIL gives the most comprehensive overview of what you can do with the language -- however it is a little old and not totally up to date.
For example, no where in the documentation does it describe how to pull sound bite excerpts from larger audio/video chunks, and I had to do a lot of searching around before I finally found this post that explains that, "You need *both* the clip-begin and clipBegin tags for compatibility with SMIL1.0 and SMIL2.0"
So in other words, in order to edit together sound bite excerpts from larger files, you have to use both"clip-begin" and "clipBegin" -- which makes no sense why you must use both the 1.0 and 2.0 syntax -- but it is what I had to do before it would work with Quicktime.