{"id":20056,"date":"2023-07-17T11:32:55","date_gmt":"2023-07-17T18:32:55","guid":{"rendered":"https:\/\/www.kith.org\/jed\/?p=20056"},"modified":"2023-07-17T11:44:28","modified_gmt":"2023-07-17T18:44:28","slug":"some-lawsuits-over-generative-ai","status":"publish","type":"post","link":"https:\/\/www.kith.org\/jed\/2023\/07\/17\/some-lawsuits-over-generative-ai\/","title":{"rendered":"Some lawsuits over generative AI"},"content":{"rendered":"\r\n<p>Here are some links and notes about copyright lawsuits being filed against OpenAI and Meta and Google and Microsoft because of their generative AI systems\u2019 probable use of various kinds of training data.<\/p>\r\n<p>Links first:<\/p>\r\n<ul>\r\n  <li>\u201c<a href=\"https:\/\/www.theverge.com\/2022\/11\/8\/23446821\/microsoft-openai-github-copilot-class-action-lawsuit-ai-copyright-violation-training-data\">Microsoft, its subsidiary GitHub, and its business partner OpenAI have been targeted in a proposed class action lawsuit alleging that the companies\u2019 creation of AI-powered coding assistant GitHub Copilot relies on \u2018software piracy on an unprecedented scale.\u2019<\/a>\u201d (November 8)<\/li>\r\n  <li>\u201c<a href=\"https:\/\/www.cnn.com\/2023\/06\/28\/tech\/openai-chatgpt-microsoft-data-sued\/index.html\">[A] proposed class action lawsuit [\u2026] claims that OpenAI secretly scraped \u2018massive amounts of personal data from the internet\u2019<\/a>.\u201d (June 28) (This one doesn\u2019t appear to be a copyright lawsuit as such.)<\/li>\r\n  <li>\u201c<a href=\"https:\/\/www.theguardian.com\/books\/2023\/jul\/05\/authors-file-a-lawsuit-against-openai-for-unlawfully-ingesting-their-books\">[Authors] Mona Awad and Paul Tremblay allege [in a lawsuit] that their [copyrighted] books[\u2026] were \u2018used to train\u2019 ChatGPT because the chatbot generated \u2018very accurate summaries\u2019 of the works<\/a>.\u201d (July 5)<\/li>\r\n  <li><a href=\"https:\/\/www.theverge.com\/2023\/7\/9\/23788741\/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai\">Authors Sarah Silverman, Christopher Golden, and Richard Kadrey are suing OpenAI and Meta over similar issues<\/a>. (July 9)<\/li>\r\n  <li><a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/07\/book-authors-sue-openai-and-meta-over-text-used-to-train-ai\/\">Some further details<\/a> about the issue of illegal copies of books in \u201cshadow libraries,\u201d and about copies of books made available by Smashwords. (July 10)<\/li>\r\n  <li>\u201c<a href=\"https:\/\/www.cnn.com\/2023\/07\/11\/tech\/google-ai-lawsuit\/index.html\">Google hit with lawsuit alleging it stole data from millions of users to train its AI tools<\/a>.\u201d (July 12)<\/li>\r\n  <li><a href=\"https:\/\/www.yahoo.com\/entertainment\/sarah-silverman-openai-lawsuit-may-200823916.html\">Some analysis of some implications of these suits<\/a> for fair use and derivative works in US copyright law. (July 14)<\/li>\r\n<\/ul>\r\n<p>Now a few thoughts from me:<\/p>\r\n<p>The various things that the companies are being accused of are  mostly getting labeled as copyright violations, but I feel like there may be some pretty different issues involved, so I think it\u2019s worth separating out the strands. Some of the things they\u2019re being accused of:<\/p>\r\n<ul>\r\n  <li>Using collections of pirated ebooks as training data. Those books aren\u2019t offered for free by the authors or publishers for any use.<\/li>\r\n  <li>Using free-to-read ebooks as training data. In particular, Smashwords offers some free or temporarily free ebooks\u2014usually the first book in a series, to entice readers to read the rest of the series. (With the permission of the author.) Smashwords says that their interpretation of their terms of service doesn\u2019t allow this use of their free ebooks.<\/li>\r\n  <li>Using publicly available code as training data, \u201c[much] of which [is] published with licenses that require anyone reusing the code to credit its creators.\u201d<\/li>\r\n  <li>Using the open web in general as training data, including publicly available personal information about people (though the personal-data lawsuit doesn\u2019t appear to be talking about copyright as such).<\/li>\r\n  <li>Using publicly available summaries and descriptions and quotes from specific works as training data. For example, Goodreads includes a lot of that kind of material. (But to be clear: the appearance of that material on Goodreads itself isn\u2019t a copyright violation.) Another example: sites like SparkNotes provide \u201cstudy guides\u201d that can include detailed summaries of books.<\/li>\r\n<\/ul>\r\n<p>So it\u2019ll be interesting to see how and whether courts look differently at those different issues.<\/p>\r\n<p>Side note: As far as I know, nobody has yet accused the LLMs of producing text that\u2019s identical to the text used in copyrighted training data (except in the code case); if an LLM did that, that might be a clearer-cut copyright issue than some of the above. (But I don\u2019t mean to say that\u2019s the only possible way a copyright claim could succeed; for example, close paraphrases can also be copyright violations.)<\/p>\r\n<p>Another side note: I find it particularly interesting that all five of the authors involved appear to be basing their claims (at least in part) on the LLMs\u2019 ability to generate detailed summaries of the books in question. To me, those summaries look more like they\u2019re derived from human-written summaries and reviews than like they\u2019re derived from the text of the books themselves.<\/p>\r\n\n","protected":false},"excerpt":{"rendered":"","protected":false},"author":5,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[132],"tags":[],"class_list":["post-20056","post","type-post","status-publish","format-standard","hentry","category-ai"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts\/20056","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/users\/5"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/comments?post=20056"}],"version-history":[{"count":4,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts\/20056\/revisions"}],"predecessor-version":[{"id":20060,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/posts\/20056\/revisions\/20060"}],"wp:attachment":[{"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/media?parent=20056"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/categories?post=20056"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.kith.org\/jed\/wp-json\/wp\/v2\/tags?post=20056"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}