« Video roundup | Main | Immunity, traffic, anniversary »

Reprints, quoting, transclusion, and Creative Commons


It seems like a bunch of stuff related to reusing material has been coming up lately.

For example, various people have quoted our Stories We've Seen Too Often list in full, without asking us first. Harper's published it a while back; more recently, various bloggers copied and pasted it into their blogs. I initially reacted with annoyance; one of my key tenets is that if you want to copy something someone else has written, you should ask first. (To be fair, I don't always do that in all circumstances.) I even added a note on the page saying something like "If you want to reprint this, please ask us." But after a couple of people wrote to ask for permission, I thought about it more carefully, and realized that I don't actually have any serious objection to people copying that list; I was just taken by surprise.

So I added a section on how to reprint the list, which basically says "You have permission to reprint this if you really want to, but please credit us." And now people don't have to ask for permission, and I don't have to be disgruntled by their not asking.

And then there's the Featured Blog thing at the Asimov's and Analog sites. I confess that it seems a little odd to me; there are lots of ways that people read this journal outside of the context of my page, but those usually involve acquiring a feed and displaying the contents of the feed in some way that matches the new context (usually intermixed with other people's feeds), like a LiveJournal "friends" list. Embedding my whole blog main page in the page seems to me to be a pretty unusual approach. There's nothing wrong with it; it's just not what I'm used to.

And I just received a request from another blogger to be a "guest blogger," by which they meant (in this case) that they wanted permission to reprint the entirety of my "How I explained infodumps and saved humanity" piece (which, I'm delighted to see, is still #5 in the list of Google search results for [infodump]). My immediate gut reaction was a negative one--I'm happy for people to quote a couple paragraphs of that and link to the original, but I'm hesitant to allow people to copy an entire entry written by me and post it elsewhere, even if they ask nicely first (as this other blogger did).

And then there's Creative Commons. Ben R. asked us recently about adding a Creative Commons license to his latest Strange Horizons story, "The House Beyond Your Sky," in our archives after our exclusivity period is over. Which made me think more about Creative Commons licenses and what does and doesn't appeal to me about them.

I love the idea of creators granting blanket permission for others to use their creations in certain ways, and Creative Commons lets you do that. But I'm a little uncomfortable, for my own work, with letting go of control over the original work.

I don't mind if someone takes a journal entry of mine (or other thing I've written) and publishes a detailed annotated version of it. I don't mind if someone translates it into another language (though the control freak in me wants to somehow be sure they've done a good job with the translation; rather difficult, given that I don't read any languages other than English). I don't mind if someone quotes from it. But I don't like the idea of someone creating a full word-for-word copy, without adding anything new, and publishing it elsewhere.

I think part of why I don't like that idea is that on the web, there should only need to be one copy of a given work. If I've got the definitive version of a work on my site, then I'd much rather have people link to it than copy it. In some cases, making copies of something that appears online is (in some sense) good because the site where the work lives might go away (or slip behind a for-pay password wall) and then the work would be gone from the web; but in my case, my site is pretty likely to last my lifetime (unless the web as we know it goes away before I die), and if the web still exists by then, I hope someone will keep my stuff online even after I die.

And in some cases, making copies of something is good because the site where the work lives has limited bandwidth and the work is large; that's one reason that software distribution often has mirror sites. But the work I create is almost exclusively plain text, and I have a lot of bandwidth, and so far I haven't needed mirrors.

So it occurred to me that I think what I really want is a Creative Commons license that allows only derivative works--not exact copies of the whole work.

So I was pleased to learn that the new Creative Commons Sampling License allows exactly that. The legal language and the plain-English language aren't as much in sync on certain points as I would like; in particular, the CC General Counsel informed me in email that the intent of the Sampling License is to allow people to use the entire work if they want to, as long as they use it transformatively (like translating it). Which is pretty much what I want, but probably not what most people will think the Sampling License allows.

At any rate, I've been toying with the idea of applying a CC Sampling License to my whole journal. I'm not quite ready to make that jump yet, but I'm intrigued by the idea.

As for allowing someone to copy an entire entry and repost it elsewhere (after explicitly asking permission, not as part of a CC license), I'm still on the fence. I know that a lot of people don't follow links, so more people are likely to read the piece if I allow a full copy; and what harm is there, really, in allowing that? I'm not making any money off the piece either way, nor is anyone else. I lose a certain amount of control over it--but I lost a certain amount of control over it by posting it on the web. I don't lose PageRank or whuffie, because the copy will include a link back to the original.

And yet, my gut feeling is still opposed to the idea. I'm not sure why.

Thoughts welcome.

But I suppose I need a couple of disclaimers here first, to quell outraged comments from authors who think I'm talking about their work instead of mine:

  • I am totally not saying everyone should CC license their work.
  • My situation is very different from that of fiction writers who are trying to make a living from their work. I'm talking entirely about work that will never make me any money (I have no interest in trying to sell this stuff) and that I'm delighted to have people read for free on my own site.


If I’ve got the definitive version of a work on my site, then I’d much rather have people link to it than copy it.

The scholar in me has a lot of trouble with that. If there's only ever one copy, and you control it, you can change what it says at any time, and then deny that it ever said anything different.

I’ve been toying with the idea of applying a CC Sampling License to my whole journal

If you apply a sampling license to your entire journal, then I can copy a few (dozen) entries and put them into a new context such as entries from other people's journals. It's important to be clear about the unit that your license applies to.

Do you see your blog as a large set of individual creations (posts and comments), as a collective work, or as a unitary work constantly evolving into new editions as you add posts? Those are all valid perspectives, each useful in various ways, and each with very different implications under copyright law. As your blog includes comments by other authors, are those comments part of a larger collective or unitary work such as a web page or the blog as a whole? Do the authors retain copyright to their individual comments, and implicitly grant you a license to integrate their comments into your web site as individual pages or parts of pages? Who has the copyright of the collective work that includes comments, then?

David, the scholar in you should be reassured by the fact that you and others can print the work as and when you see it, that archive.org or similar snapshot services may provide long-term backup copies, and that Google may provide short-term backup copies. The scholar in you should also be reassured by the existence of a clearly authoritative original for quotation and citation.

"May" and "may" are not at all reassuring. Archive.org misses a lot of changes and misses a lot of pages entirely. As for print, the point of a citation is to let other people check your references; saying "but I have a printout in my desk drawer" isn't nearly as helpful. And while being able to point to an authoritative latest edition is a great thing, it's an addition to being able to point to an original. It's not at all a substitute.

It occurs to me that this argument is more or less why Socrates was against writing and Buddy Bolden was against audio recordings.

the point of a citation is to let other people check your references

This is an important point that many academics ignore, but I don't see how more copying by non-authoritative third parties helps the citation problem. If the item you cite changes or disappears or is inaccessible or different to the next reader, there's a problem for careful scholarship. But how is that problem solved by a different copy being out there? Why would the reader trust the third party copy more than yours?

On the other hand, scholarship problems are greatly increased by the proliferation of non-careful copying. Errors and changes are introduced by accident or by design. Readers become familiar with different versions and the common literature which is so vitally important to scholarship is fractured further.

This fracturing of the common literature is already happening in many ways. Take a given scholarly article which is published in a reputable journal. Compared to 20 years ago, far more preprint versions exist and each is circulated far more widely. Fewer copies of the journal are printed. The article is reprinted in more books, generally with changes introduced in each new edition. All of this makes it less likely that you and I will read the same version of that article. And it makes it far less likely in practice that a citation or quotation will be checked against the same original. The advantages of increased publication may well far outweigh such costs, but those costs should be recognized.

Along similar lines, books are printed more frequently in shorter runs from digital files which are easily changed and frequently re-RIPped, which can lead to the nominally identical book in your library and in mine having a different content. Digital printing, which enables short run printing and on-demand printing, is great in many ways, but it introduces far more errors and degrades much less gracefully than camera-ready.

I come at this from the perspective of publishing academic papers on-line by hundreds of authors, and I consider the integrity of the scientific literature to be extremely important. For that reason, I ask authors not to post additional copies of their papers elsewhere once it is available from our site, but rather to link to our copy. When followed, that saves the author from the temptation of "fix just a few minor things" and thereby create the very citation problem you raise, and saves the reader from wondering which copy out there is authoritative. As with the posts on this blog, the original is freely available (and backed up in various ways), so it is unclear what advantage is gained by scattering further copies for general access. The scholarly costs of that sort of diffusion are much more clear when nominally identical copies are not in fact identical.

As I see it, Jed is in a similar situation as the publisher of his own materials on his web site. The authoritative version of a post is freely available on his site, and any particular complete copy posted elsewhere could be either identical or different. If it is identical, then what is the advantage in creating a new URL? If it is different, then the author's moral rights or reputation may be hurt (as well as creating citation problems).

To me, your conclusions make sense if you assume that authors will often deny things they have previously said, while people who are not authors and copy material from other web sites generally make perfect copies. But neither of those assumptions bear out in my experience, and the two groups of people have considerable overlap which complicates making clear distinctions in action or motivation between the two groups.

I believe what you want (as do I in most cases) is a more perfect and more complete version of archive.org. I suspect our disagreement is about whether a consciously made third party copy of a particular item can serve the same purpose, and how significant the drawbacks are of such copies.

David and Michael re copies vs originals: Yeah, it's a difficult issue. (The following doesn't really say anything y'all haven't already said; basically, I agree with both of you. But I'll say it anyway.)

On the one hand, David's right that people can and do change the original; in fact, I've done that myself. (If it's more than just fixing a typo, I usually make a note of the change--but sometimes I'm lazy or careless or just decide not to for whatever reason.) And that can be disconcerting and weirdly revisionist-historical. And if I want to retract something I've said and deny that I ever said it, you're right that if there's only one copy, it may be hard to prove me wrong. There are plenty of contexts where people want to deny they said something; it's nice when there's a trustworthy historical record--like a newspaper article--that can be checked.

On the other hand, Michael's right that people who make and post other copies of documents also can and do change those copies, and that problems can arise when copies that look like they should be identical turn out not to be. (This even happens with print journalism, of course; what if that trustworthy historical record was a misquote in the first place, and spread through hundreds of other newspapers before it could be corrected?)

I personally keep a series of backups of this journal dating back to its inception, roughly one backup a month, so if any disputes were to arise over what was originally posted, and if you trusted me to provide the original, that could help. But that's no comfort in cases where you don't trust the owner of the original.

And although I agree with Michael that archive.org is a useful resource for this sort of thing, I also agree with David that archive.org is incomplete, and less than ideal in a variety of other ways. For example, several early versions of my website no longer exist anywhere--it didn't occur to me to back them up before changing things, and archive.org doesn't have 'em. (Another example: a column that Mary Anne and I guest-wrote for Mouthorgan is no longer available online anywhere; it appeared on that site shortly after they switched to a dynamic system that archive.org couldn't archive.) With print documents, printing a new edition doesn't uncreate the old edition; with electronic media, especially online, it sometimes does. (Of course, historical scholarship is full of examples of the problems that can arise when you have multiple versions of a printed work, or when errors and/or changes are introduced in the copying process.)

It could be argued that having lots of people make copies makes it more likely that you'd be able to reconstruct the original. I suspect that if a hundred people copied a given entry, and you did some sort of textual analysis on the resulting copies, you could probably reconstruct the original entry even if a bunch of errors and/or intentional changes were introduced during or after copying. Online, this could be something like a distributed version of archive.org, sorta kinda.

Of course, if one of those copiers introduces an error that makes the work more likely to catch the popular imagination (like claiming that Kurt Vonnegut wrote a particular commencement speech), and then redistributes the erroneous copy, then the erroneous copy may end up much more widely available and much more widely believed than the original. Which is why we need snopes.com as well as archive.org.

I think what a lot of this comes down to is David's question: Do we want the authoritative version of a document, or the original version? I think the answer is (as David noted) that sometimes we want one, and sometimes we want the other. If we always wanted the authoritative version, then having a single copy would make sense. Since we sometimes want the original as well, something like archive.org (perhaps more perfect, more complete, and/or distributed) seems to also make sense.

But when what you really want is the actual original version, then I think relying on an organization whose purpose is to create and track versions of documents (like archive.org) may make more sense than relying on crowds whose purpose is to copy works that they find interesting and/or entertaining. (But I'm old-fashioned enough to still not quite trust this wisdom-of-crowds stuff.)

But yeah, none of these options is a perfect solution. It's a difficult, and interesting, question. Thanks to both of you for all the discussion!

Michael, re CC licenses: very good questions and points. I'll have to think about those further before I decide whether to proceed.

My gut feeling is that I would intend each entry to be a separate work for the purposes of the CC license. But I'm not sure whether MT's built-in CC support indicates that, or whether its phrasing suggests that the whole blog is a single work.

My gut feeling is also that commenters own the copyright on their comments, and that any license I apply to my entries doesn't apply to comments. On the other hand, I freely alter comments in various ways if I feel like it (occasionally fixing an obvious typo, more often doing things like cleaning up links that have the wrong syntax), without explicit permission from the commenters. And plenty of people quote other comments, sometimes in full, in their own comments. So this, too, is a tricky area that I obviously haven't thought through the ramifications of yet.

So thanks for the comment! Good food for thought.

Post a comment