• 0 Posts
  • 12 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle
  • Then the site is wrong to tell you that you can use the images in any way you want.

    That’s what I’m saying.

    intentionally violate copyright

    Why is it intentional? Some characters come up even in very generic prompts. I’ve been toying around with it and I’m finding it hard to come up with prompts containing “superhero” that don’t include superman in the outputs. Even asking explicitly for original characters doesn’t work.

    For the most part it hasn’t happened.

    And how do you measure that? You have a way for me to check if my prompt for “Queer guy standing on top of a mountain gazing solemnly into the distance” is strikingly similar to some unknown person’s deviantart uploads, just like my prompt containing “original superhero” was to superman?

    The status quo…

    Irrelevant to the discussion. We’re talking about copyright law here, ie about what rights a creator has on their original work, not whether they decide to exercise them in regards to fan art.

    until they get big enough

    Right, so now that multi-billion dollar companies are taking in the work of everyone under the sun to build services threatening to replace many jobs, are they “big enough” for you? Am I allowed to discuss it now?

    This is an argument-by-comparion.

    It’s not an argument by comparison (or it is a terrible one) because you compared it to something that differs (or you avoided mentioning) all the crucial parts of the issue. The discussion around AI exists specifically because of how the data to train them is sourced, because of the specific mechanisms they implement to produce their output, and because of demonstrated cases of producing output that is very clearly a copy of copyrighted work. By leaving the crucial aspects unspecified, your are trying to paint my argument as being that we should ban every device of any nature that could produce output that might under any circumstances happen to infringe on someone’s copyright, which is much easier for you to argue against without having to touch on any of the real talking points. This is why this is a strawman argument.

    You don’t own a copyright on a pattern

    Wrong. In the context of training AI, I’m taking about any observable pattern in the input data, which does include some forms of patterns that are copyright-able, eg the general likeness of a character rather than a specific drawing of them.

    your idea of how copyright should work here is regressive, harmful

    My ideas on copyright are very progressive actually. But we’re not discussing my ideas, we’re discussing existing copyright law and whether the “transformation” argument used by AI companies is bullshit. We’re discussing if it’s giving them a huge and unearned break from the copyright system that abuses the rest of us for their benefit.

    a description specific enough to produce Micky mouse from a machine that’s never seen it.

    Right, but then you would have to very strictly define Micky Mouse in your prompt. You would be the one providing this information, instead of it being part of the model. That would clearly not be an infringement on the model’s part!

    But then you would have to also solve the copyright infringement of Superman, Obi-Wan, Pikachu, some random person’s deviantart image depicting “Queer guy standing on top of a mountain gazing solemnly into the distance”, … . In the end, the only model that can claim without reasonable objection to have no tendency to illegally copy other peoples’ works is a model that is trained only on data with explicit permission.


  • If AI companies were predominantly advertising themselves as “we make your pictures of Micky mouse” you’d have a valid point.

    Doesn’t matter what it’s advertised as. That picture is, you agree, unusable. But the site I linked to above is selling this service and it’s telling me I can use the images in any way I want. I’m not stupid enough to use Mickey Mouse commercially, but what happens when the output is extremely similar to a character I’ve never heard of? I’m going to use it assuming it is an AI-generated character, and the creator is very unlikely to find out unless my work ends up being very famous. The end result is that the copyright of everything not widely recognizable is practically meaningless if we accept this practice.

    But at this point you’re basically arguing that it should be impossible to sell a magical machine that can draw anything you ask from it because it could be asked to draw copyright images.

    Straw man. This is not a magical device that can “draw anything”, and it doesn’t just happen to be able to draw copyrighted images as a side-effect of being able to create every imaginable thing, as you try to make it sound. This is a mundane device whose sole function is to try to copy patterns from its input set, which unfortunately is pirated. If you want to prove me wrong, make your own model without a single image of Micky Mouse or a tag with his name, then try to get it to draw him like I did before. You will fail because this machine’s ability to draw him is dependent on being trained on images of him.

    There are many ways this could be done ethically, like:

    • build it on open datasets, or on datasets you own, instead of pirating
    • don’t commercialize it
    • allow non-commercial uses, like research or just messing around (which would be a real transformative use)




  • Let’s remove the context of AI altogether.

    Yeah sure if you do that then you can say anything. But the context is crucial. Imagine that you could prove in court that I went down to the public library with a list that read “Books I want to read for the express purpose of mimicking, and that I get nothing else out of”, and on that list was your book. Imagine you had me on tape saying that for me writing is not a creative expression of myself, but rather I am always trying to find the word that the authors I have studied would use. Now that’s getting closer to the context of AI. I don’t know why you think you would need me to sell verbatim copies of your book to have a good case against me. Just a few passages should suffice given my shady and well-documented intentions.

    Well that’s basically what LLMs look like to me.


  • But what an LLM does meets your listed definition of transformative as well

    No it doesn’t. Sometimes the output is used in completely different ways but sometimes it is a direct substitute. The most obvious example is when it is writing code that the user intends to incorporate into their work. The output is not transformative by this definition as it serves the same purpose as the original works and adds no new value, except stripping away the copyright of course.

    everything it outputs is completely original

    [citation needed]

    that you can’t use to reconstitute the original work

    Who cares? That has never been the basis for copyright infringement. For example, as far as I know I can’t make and sell a doll that looks like Mickey Mouse from Steamboat Willie. It should be considered transformative work. A doll has nothing to do with the cartoon. It provides a completely different sort of value. It is not even close to being a direct copy or able to reconstitute the original. And yet, as far as I know I am not allowed to do it, and even if I am, I won’t risk going to court against Disney to find out. The fear alone has made sure that we mere mortals cannot copy and transform even the smallest parts of copyrighted works owned by big companies.

    I would find it hard to believe that if there is a Supreme Court ruling which finds digitalizing copyrighted material in a database is fair use and not derivative work

    Which case are you citing? Context matters. LLMs aren’t just a database. They are also a frontend to extract the data from these databases, that is being heavily marketed and sold to people who might otherwise have bought the original works instead.

    The lossy compression is also irrelevant, otherwise literally every pirated movie/series release would be legal. How lossy is it even? How would you measure it? I’ve seen github copilot spit out verbatim copies of code. I’m pretty sure that if I ask ChatGPT to recite me a very well known poem it will also be a verbatim copy. So there are at least some works that are included completely losslessly. Which ones? No one knows and that’s a big problem.


  • “Transformative” in this context does not mean simply not identical to the source material. It has to serve a different purpose and to provide additional value that cannot be derived from the original.

    The summary that they talk about in the article is a bad example for a lawsuit because it is indeed transformative. A summary provides a different sort of value than the original work. However if the same LLM writes a book based on the books used as training data, then it is definitely not an open and shut case whether this is transformative.


  • Not a lawyer so I can’t be sure. To my understanding a summary of a work is not a violation of copyright because the summary is transformative (serves a completely different purpose to the original work). But you probably can’t copy someone else’s summary, because now you are making a derivative that serves the same purpose as the original.

    So here are the issues with LLMs in this regard:

    • LLMs have been shown to produce verbatim or almost-verbatim copies of their training data
    • LLMs can’t figure out where their output came from so they can’t tell their user whether the output closely matches any existing work, and if it does what license it is distributed under
    • You can argue that by its nature, an LLM is only ever producing derivative works of its training data, even if they are not the verbatim or almost-verbatim copies I already mentioned