• 0 Posts
  • 22 Comments
Joined 10 months ago
cake
Cake day: December 18th, 2023

help-circle

  • Come to think of it. That DMCA argument would really wreck fair use.

    It’s illegal to remove “copyright management information” (CMI). In this case meaning the FOSS license. The argument was, that when copilot spits out verbatim snippets of source code without the license, this constitutes removal of the CMI. The point of the argument was that fair use is not a defense under the DMCA. These verbatim snippets are pretty obvious fair use to me, so countering that defense is important if they hope to get anywhere with their suit.

    By the same argument, any meme image is illegal. They are taken from somewhere without the original license or attribution. Yikes.





  • Wow, long take. I didn’t want “much the same” to bear a lot of meaning. In the german inquisitorial system, in a criminal case, the judge takes over the (police) investigation from the prosecution. When the police become aware of a possible crime, they inform the bureau of the state attorney. A state attorney is responsible for the investigation and for uncovering the truth. But once the case goes to court, the responsibility goes to the judge.

    In a civil suit, the parties are basically in charge and not the judge. It’s true that the judge has a more active role in German civil procedure. While the court is not supposed to run its own investigation, it can request additional evidence if it’s necessary to judge the arguments of either side. I am not clear on the details. Where matters of fact must be determined by an expert, either party can request the court to provide one. But they can also make their own arrangements. The court can also solicit an expert opinion on its own, if necessary. Typically, the expert’s opinion is given as a written statement. An oral disposition may happen when questions remain. Afaik, it’s unusual to depose an expert without having first requested a written statement. Either party or the court may question the witness.


  • Hmm. In what way is the German system more effective? I know of some hair-raising cases. Me, I blame the law-makers and not the judges, but others see it differently. I can’t think of a single related case, where I’d say that the judgement served everyone’s interests.

    ETA: Bad question. You explained how the German system is more effective. I’m wondering about cases where I can see this in action. IE: “well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.”




  • I’m categorically unable to name a justice or court jurisdiction anywhere in the US that consistently makes well-informed and incisive decisions on anything in the computer hardware / EE or computer science fields.

    Can you name one in Germany? Just asking.


    Anyway, at this stage of the trial only legal experts are involved. The judge examines if the legal arguments are sound, assuming the allegations are true. Whether the allegations are actually true will only be determined in the future. That’s also when Fair Use comes in. At that point, you need outside experts to advise on the non-legal aspects.



  • Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?

    Hmm. I’m not really sure why anyone would write such a text. There is no “weighted proportionality” (or pathways). Is this a common conception?

    You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

    I guess you picked up on the fact that transformers output a probability distribution. I don’t think anyone calls those an average, though you could have an average distribution. Come to think of it, before you use that to pick the next token, you usually mess with it a little to make it more or less “creative”. That’s certainly no longer an average.

    You can see a neural net as a kind of regression analysis. I don’t think I have ever heard someone calling that a kind of average, though. I’m also skeptical if you can see a transformer as a regression but I don’t know this stuff well enough. When you train on some data more often than on other data, that is not how you would do a regression. Certainly, once you start RLHF training, you have left regression territory for good.

    The GPTisms might be because they are overrepresented in the finetuning data. It might also be from the RLHF and/or brought out by the system prompt.


  • I accidentally clicked reply, sorry.

    B) you do know there’s a lot of different definitions of average, right?

    I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.




  • Who exactly creates the image is not the only issue and maybe I gave it too much prominence. Another factor is that the use of copyrighted training data is still being negotiated/litigated in the US. It will help if they tread lightly.

    My opinion is that it has to be legal on first amendment grounds, or more generally freedom of expression. Fair use (a US thing) derives from the 1st amendment, though not exclusively. If AI services can’t be used for creating protected speech, like parody, then this severely limits what the average person can express.

    What worries me is that the major lawsuits involve Big Tech companies. They have an interest in far-reaching IP laws; just not quite far-reaching enough to cut off their R&D.



  • You’re allowed to use copyrighted works for lots of reasons. EG satire parody, in which case you can legally publish it and make money.

    The problem is that this precise situation is not legally clear. Are you using the service to make the image or is the service making the image on your request?

    If the service is making the image and then sending it to you, then that may be a copyright violation.

    If the user is making the image while using the service as a tool, it may still be a problem. Whether this turns into a copyright violation depends a lot on what the user/creator does with the image. If they misuse it, the service might be sued for contributory infringement.

    Basically, they are playing it safe.


  • It’s all just weights and matrix multiplication and tokenization

    See, none of these is statistics, as such.

    Weights is maybe closest but they are supposed to represent the strength of a neural connection. This is originally inspired by neurobiology.

    Matrix multiplication is linear algebra and encountered in lots of contexts.

    Tokenization is a thing from NLP. It’s not what one would call a statistical method.

    So you can see where my advice comes from.

    Certainly there is nothing here that implies any kind of averaging going on.