Subject: Re: Derivative/collective works and OSL
From: Chuck Swiger <>
Date: Sun, 06 Feb 2005 17:08:40 -0500

John Cowan wrote:
[ ... ]
> There is a similar dispute, though not so problematic, at the other end:
> does mere compiling of source code create a derivative work, or is the
> object code the original work in a different medium, as a paperback
> book is the same work as a hardback original?  
> Nobody knows the answer to that either.

It may well be true that the courts have not considered a dispute involving 
this specific issue and resolved it in a way that sets a clear standard or 

However, there exists a branch of software engineering known as compiler 
design, and there exist experts in that field who have written working 
compilers who share a common understanding of how a compiler toolchain 
operates: compilers perform a mechanical, deterministic, and reversible 
transformation on source code to produce object code.

By definition, this transformation does not change the semantic meaning of the 
program and does not involve human decision-making or any possibility of 
creativity. [1]

To use your analogy as a starting point, consider taking a book and 
translating it to another language.  For human languages, this is a creative 
process since there can be many ways to translate something, the process is 
not deterministic since two translators often produce output which is 
noticably different, and the process is not reversible: if you translate a 
sentence from English to Russian, and then from Russian back to English, it is 
very likely that what you get back is not the same as the original work.

[ A classic example from NLP was: "The spirit is willing, but the flesh is 
weak." became "The vodka is good but the meat is rotten." ]

Computer languages are unlike human languages: they possess well-defined 
semantics, enforced by compiler parsing rules like LR(1) or LALR(1) which 
forbid ambiguity and ensure that well-defined source code has one and only one 
meaning when compiled.  You can compile a source code file with one compiler 
into an object file, decompile the object file via a disassembler or debugger 
like gdb, and then recompile that result into a new object file using a 
different compiler.  You will end up with a program that has the same exact 
behavior and meaning as the original program.

The process of compiling software is thus very similar to photocopying an 
original document, and then photocopying the copy.  With analog photometric 
reproduction, the process is lossy (the "Xerox" effect where a second copy 
becomes blurry compared with the original), but a digital process does not 
suffer generation loss.

> The reason it matters is that pretty much everyone agrees that a tarball
> is a collective work, 

If I put a book-- a single work, written by a single author-- into a box and 
mail that box, the box only contains a single work.  If I put two books into 
the box, then there are two works in the box, but that does not mean the box 
is a collective work: it is a mere aggregation of two components which are 
distinct and can be handled seperately without any confusion.

The tape archive format, or tarball, is a method of packaging content for 
shipment over the network or for convenient long-term storage, just as the box 
  used for the sake of example is a convenient method for packaging content 
for shipment via the postal service.

A tarball of a single work is an archive containing a single work, not a 
collective work.  A tarball of two seperate works is an archive of two 
seperate works, which is a simple aggregation and not a collective work. [2]

> ...and if when compiled it is still a collective work,
> then it is not derivative of any of the works contained in the tarball.

You can't compile a tarball without extracting the contents any more than one 
could read a book mailed to you in a box without opening that box, first.

Is a photocopy of a document considered a derived work, or is it considered to 
be the same thing as the original work for practical and legal purposes?


[1]: If the code being compiled has a bug that results in undefined behavior, 
the compiler is allowed to produce different results when invoked with 
different optimization flags or compared with the output generated by another 
compiler.  While true, this does not refute my argument: what the compiler is 
allowed to do when compiling/optimizing source code is required not to diverge 
for code which does not involve undefined behavior.

Page 586 of _Compilers: Principles, Techniques, and Tools_ by Aho, Sethi, and 
Ullman states:  "First, a transformation must preserve the meaning of 
programs.  That is, an 'optimization' must not change the output produced by a 
program for a given input, or cause an error, such as a division by zero, that 
was not present in the original program.  The influence of this criterion 
prevades this chapter; at all times we take the 'safe' approach of missing an 
opportunity to apply a transformation rather than risk changing what the 
program does."

[2]: The vast majority of archives found on various FTP and websites contain a 
single work comprised of one or more source code files.  There are a few cases 
where a tarball contains several works, such as nmap shipping with libpcap, or 
Python coming with expat, but it is easy to see that these are two seperate 
works because the archive keeps them in two seperate directory tree hierarchies.

I suppose it would be possible to rip out all of the pages from two books, and 
mix them together on a chapter-by-chapter or page-by-page basis to form a new 
work which actually was a single indivisible compilation, just as it would be 
possible to mix all of the files of two software projects together to work a 
new work, but that is certainly not the normal case.