• Jaded@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    6
    ·
    1 year ago

    Things might change but right now, you simply don’t need anyones authorization.

    Hopefully it doesn’t change because only a handful of companies have the data or the funds to buy the data, it would kill any kind of open source or low priced endeavour.

    • Flaky@iusearchlinux.fyi
      link
      fedilink
      English
      arrow-up
      4
      ·
      1 year ago

      FWIW, Common Crawl - a free/open-source dataset of crawled internet pages - was used by OpenAI for GPT-2 and GPT-3 as well as EleutherAI’s GPT-NeoX. Maybe on GPT3.5/ChatGPT as well but they’ve been hush about that.