Good morning or good afternoon, depending on where you are. My name is Scott Sholder. I'm a partner with Cowan, DaBaetes, Abrahams, and Sheppard LLP in New York City. I'm the co-chair of the firm's litigation practice, focusing mostly on copyright, trademark and the intersection of entertainment and media and technology. Today, we're going to be talking about a topic that is near and dear to me and that I've been talking about and reading about and thinking about quite a lot generative AI and copyright specifically authorship, copyrightability and infringement. One of the biggest issues around today, not only in the copyright world, but kind of on everybody's mind in a lot of different areas of life today. For our purposes, for your purposes, we're going to be talking about the copyright angles. That's me is the most fun and the most relevant to my practice. And I hope you find it interesting and informative. So let's talk a little bit about what we're going to cover here. There's three main issues. The first is kind of two sub parts authorship and copyrightability of content. Who's considered the owner? Who's considered the author? And how can content be protected? Second, we'll talk a little bit about litigation over training that's been working its way through the courts. No, no real decisions yet, but there's a lot of cases brewing, a lot of potential for case law to be made. So we'll talk a little bit about those cases and we'll go over some takeaways and best practices for the various players in the space, people who are creating content, AI technologists who are creating the algorithms themselves, artists implications for artistic works being used.
Et cetera. So we'll go over some practical points as well, but we're going to start with a little bit of background on AI. Um, I just a disclaimer, I am a copyright lawyer. I am not a data scientist or a computer scientist. Um, so we're not going to go really deep into the technology behind artificial intelligence because I myself only understand it kind of at a, at a high level. But that's really kind of all you need to know at this point for purposes of understanding the legal issues, particularly on the copyright side. Um, obviously if you're bringing a case or dealing with advising clients on more technical issues, you'll, you'll do a little deeper dive. And I have done that myself. But don't get into it here. I think for our purposes, the basics are good enough. So artificial intelligence is a obviously a term that's been thrown around a lot. Um, probably heard about things called machine learning and deep learning. These are all kind of various levels of artificial intelligence. So in general is, is is defined as a set of techniques that that are supposed to approximate human or animal cognition to some degree using a computer. Machine learning is a type of AI enabling computers to learn from data without being programmed.
So it's kind of a set it and forget it. The machine trains itself using mathematical models to analyze data and make predictions. So a lot of what you're seeing coming out of these engines ChatGPT and the art generators like Dall-e two and Midjourney and stable diffusion is predictive based on training data. So we'll talk a bit more about training data later. Um, but AI models will generally, um, use machine learning to teach themselves about what things should look like in the world or how human language patterns work in order to predict what to say or what to produce visually to make it look like what the user wants. And that's done through machine learning. Deep learning is a is a, is another level down from machine learning. It's a subfield using neural networks to process and analyze more complicated data. Um, this is where we get more into tasks like image and speech recognition, pattern recognition, um, and, you know, teaching the computer to come up with. You know, features within the data and how to organize that information to create an output that is predictive of what the user wants. So machine learning, deep learning kind of work together. Deep learning is a more complex version of machine learning. So again, very highly technical. Um, and for our purposes, we can boil it down to something a little bit even simpler than this, right? Um, the basics of generative AI. And when we're talking about generative AI, we mean anything that is, any AI program that is used to generate content.
So ChatGPT Creating text Dall-e two Midjourney Stable diffusion. Et cetera. Creating images. We've now got music generators that are coming on the market, text music, and there are all other kinds of text video and there are all kinds of iterations that are that are out there and that are being developed. So. At the at its most basic level, the user comes up with an idea types a prompt into whichever engine that they're using. And the prompt is essentially a series of words or descriptions. Could be as simple as a few words, could be as complicated as a paragraph. There are ways to engineer the prompt to come out with the kind of content that you want using certain terms. Um, and it's it's a skill in and of itself to figure out how to get the machine to do what you want, but you enter the prompt and outcomes, uh, presumably something resembling what you've asked for an image, a text, a video, a song. Et cetera. Um, and you know, caveat to that is that there is between the prompt and the image or the text or video output, there's a bit of a black box. Nobody really knows what's happening underneath the hood or why. Uh, the, the AI engine is doing what it's doing. And if you enter the same prompt over and over again, you're going to get slightly different results.
So it's we'll talk about this concept more later when we talk about authorship and originality and registration. Um, but the unpredictability of what comes out is a is a potential issue when it comes to, you know, the world of copyright. All right. I mentioned training data before. This is really important, particularly when it comes to the the lawsuits we're going to talk about later. Uh, AI models are trained by ingestion of huge amounts of data and for purposes of, you know, something like a ChatGPT. You're talking about lots and lots of text books, articles, blog posts, you name it. Anything that can help the machine understand speech patterns and be able to replicate human, the human voice um, for, for image generators. It's it's kind of anything out there that's visual. Um, photographs, paintings, drawings. Uh, et cetera. It's, uh, you know, in this data is ingested along with any associated metadata and text such as captions. And that's important because the machines, depending on what type of model is being used, will use the associated metadata and captions to to to match, uh, words with images in order to better predict what to come up with when a user enters a prompt. We'll see later in one of the litigations that, um, uh. Stock photo images, stock photo and stock content companies have have very valuable text associated with their images, descriptors of those images. And a lot of that material is also being used to train the algorithms to understand what to what to create so much, if not all of this content is copied from the internet through scraping of of publicly available websites and databases.
There are there are a lot of third party kind of research companies. Some of them are abroad that copy a lot of this information. There's tremendous amounts of data stored in databases that have been scraped from the Internet, and then they work with the companies who then use that data to train their respective AI models. So the models at two at some point copy the content from these databases or temporary copies. So we're not talking about massive servers with stored full versions, you know, permanently stored full versions of this content. There is a temporary copy made for learning purposes to, to train the model on on what content is out there. It's not a there are no collages being made. It's not a it's not that type of tool. Um, uh, essentially what happens is that the content is broken apart into mathematical representations of the content. So think about kind of coordinates on a map, ones and zeros that a computer can understand. It's, it's a vast matrix of data points that, um, that the computer can understand. It's like taking the content and translating it into a different language. Um, but you know, there is a copy involved and that is going to be the basis of a lot of lawsuits. It already is and is, you know, certainly problematic from a copyright standpoint.
So I mentioned before there are various types of generated content, including image speech, video, music and code. On the speech side, you've got chat the video side, make a video on the music side or Usenet sound draw on the image. Generators I think are probably the most well known aside from chat GPT, which really has been in the spotlight lately. But everything kind of started with, uh, with these players here. Um. Dall-e midjourney stable diffusion. Um, less than a year ago, everyone started playing around with these with these image generators and coming up with interesting and funny and, uh, you know, sometimes beautiful and sometimes terrifying images using these, using just prompts. Um, and that, that trend kind of caught fire. And, um, you know, it's, it's been extremely popular and there have been a lot of headlines about the art, um, created using AI. Um, and you know, same thing goes for chat GPT lots of controversy, controversy over the use of chat GPT in certain contexts we'll talk about later. So just some, some examples of the images that can be generated on, um, uh dall-e two midjourney and stable diffusion. Don't ask me why, but this is always the example that I use on Dall-e two. It's a squirrel riding a bicycle. That was my prompt, literally just a squirrel riding a bicycle. I think I got maybe four results and this is the one I picked because I thought it was the best.
Um, it's fairly photorealistic. There's some, you know, things are a little odd. The nose is a little off and the ears are a little off, but, um, this is par for the course with. With AI generated images. Sometimes things are not quite right. We've got a little kind of uncanny valley going on. Um, but this is the kind of output that you're going to get from from a dall-e two You know, it's pretty, it looks, looks pretty nice. And it's a very simple prompt. On the stable diffusion side. These did not make these. These came from from another source. These are kind of hypotheticals. Marvel supervillains. Stable diffusion is quite good with with portraits, as is mid-journey. They get very realistic human faces. Um, again, a little uncanny valley tends to also have a problem with with creating hands. Um, it's sure that's being dealt with as new generations of the software come out, but you'll often see six fingers or fingers that are too long or kind of twisted and distorted. It's a bit odd, but generally it's quite, quite striking content. Mid-journey. So this is another one that I made. I just told it to make a red fire breathing dragon in the style of a 1980s Dungeons and Dragons manual. Um, so little spoiler about about my nerdy background but thought this was fun. Um, again, beautiful images. And there's been some controversy because there are certain artists that are very popular and these generators being asked to create art in their style.
One of them is Greg Rutkowski, who happens to be a fantasy artist who does lots of Dragon paintings and and drawings. I did not tell it to make this in the style of Greg Gawkowski, just as a disclaimer, but he he has been in the news quite a bit because folks have been known to try to copy his style using using platforms like Midjourney. And ChatGPT again, also all over the news for a lot of reasons. Um, it's been it's been used in many different capacities. People use it to write poems and songs and short stories and cover letters. Um, we'll talk a little bit later about some trouble that lawyers got into using it to draft a brief. Um, it is prone to come up with, um, fake or inaccurate statements also known as hallucinations. Um, so one has to be very careful when using this this engine and relying on it. Do a lot of due diligence and checking of your content. Um, so here I asked it to just draft a cordial and lighthearted greeting message to the audience members attending my presentation on AI and copyright law through the continuing Legal Education Service. Quimby And you can see here what it wrote. Um. Take it for what it is and form your own opinions about the content. Okay. On to the legal issues. So the first thing we'll talk about, kind of foundationally is authorship in generative, creative work.
So what does it mean to be an author of this work? Who actually is the author who owns it? Um, I'm asking all these questions and don't really have straight answers for you yet. This is again, a very rapidly developing area of the law. But the as with pretty much everything when it comes to um, at least copyright, the law lags behind the technology. And so we have to kind of, uh, work with what we have and try to figure out how best to apply it and let the courts decide what comes next until until new legislation is passed. So starting at the basics, the US Constitution, the copyright clause indicates that a copyright protects original works of authorship fixed in a tangible medium of expression. The Constitution does not say who an author is. It doesn't define writing or author, and neither does the copyright act. Um. So sorry. The Copyright Act protects original works of authorship. The copyright clause is quoted up top here to promote the progress of science and useful arts by securing for limited times to authors and inventors the exclusive right to their respective writings and discoveries. So again, authors not defined anywhere in the Constitution. Neither is it defined in the Copyright Act. So this is where where things get a little tricky. We have to rely on guidance from courts and guidance from copyright office. Bro. Giles versus Sereni is one of the seminal cases on authorship.
And this centered on on photography. So see how old this case is from 1884, when photography was really kind of starting out as an as an art form. And the the claim here was that this photograph of Oscar Wilde was not original enough to be copyrightable, and therefore there was no infringement by the defendant in the case so borrowed. Giles had marketed an unauthorized lithograph of his photograph of Oscar Wilde and claimed that the photographing of Oscar Wilde was simply a mechanical process. Um, so the holding here is that ceroni as the photographer was in fact the author because he posed Wilde, he selected the costume drapes, the lighting and. Set the mood. Evoke the desired expression. Et cetera. All of the creative contributions into the to to creating this work were were provided by the photographer. It wasn't just simply clicking a button. So the takeaway here is that the court recognized that human control of a mechanical device can lead to authorship when the human is using the machine to create a work. This begs a lot of questions about to what extent can we analogize AI engines that that we have been talking about to photographs to to cameras and photographers? Is is this simply a mechanical process or is there selection and arrangement coordination of creative elements sufficient to lead to authorship by a human as opposed to just generated by a machine? So this case is again seminal and has been kind of the foundation of discussion of authorship surrounding finding creative works.
This is one of my favorite cases of all time. Naruto versus Slater. Um, that's Naruto on the right. He is a crested macaque. Uh. This is just known known mostly everywhere as the monkey selfie case. So this gentleman, uh, took the, uh, uh, camera of a professional nature photographer while the photographer walked away and took this delightful selfie. Um. So Peta actually sued on behalf of the monkey of the monkey asking to give him authorship rights in the photo. Um claiming copyright on behalf of of Naruto here. And that um. And then Naruto was the author. So ultimately the Ninth Circuit held that Naruto was not, in fact, the author. There is no mention of animals in the Copyright Act. Um, the Copyright Office had concluded that works created by animals were not entitled to copyright protection, and the Ninth Circuit deferred to the administrative agency. Yeah. We'll talk more about the Copyright Office's guidance on created works, but the Copyright Office does not have the power to bind anyone really. But but its pronouncements are highly persuasive to courts generally because there is deference to administrative agencies who have the expertise on these areas, particularly something highly technical like copyright, copyright law. And one thing to note is that the decision and the Ninth Circuit actually turned on standing and not copyright law. He didn't have standing to sue on behalf of this monkey.
But there, you know, there was a pretty clear indication in the decision itself that, um, our friend Naruto here could not be considered the author of a photograph because he's not human. It's close to human. But he's not human. Similarly, the Ninth Circuit, albeit somewhat earlier, found that celestial beings cannot be copyright owners or authors. This case dealt with a book that was supposedly written by celestial beings who channeled the material to the the authors who were merely the scribes and wrote wrote down the findings or the pronouncements of these supernatural things. The court found that celestial beings cannot be copyright owners or authors because they too are not human. Rather, the humans who compiled, selected, coordinated and arranged those teachings into the book can be deemed the authors. There's there was more going on in this case. But but for our purposes, um, when you have even if somebody else supplied the teachings, uh, the fact of the matter is that humans organize them, wrote them down, uh, structured them into a book, likely edited it, um, and arranged everything such that you have a full copyrightable work in the form of this Urantia book. And so the humans were deemed to be the authors in that sense as opposed to the non-human celestial beings. So this begs the question, can an artist be the author of a work created using AI? What we have to ask is whether the work is one of human authorship or the computer assisted the author, or whether the traditional elements of authorship were actually conceived and executed by the machine.
So it's a bit of line drawing and, you know, case by case assessment. But in order to figure out whether somebody who uses an AI engine can be the author, you have to ask the question of whether there is actually sufficient human authorship to create an independently copyrightable interest in the work. If there was no real human intervention and all you did was type in a couple of words, you know, my squirrel on a bicycle might not actually be copyrightable because I just said squirrel on a bicycle. I didn't know what was coming out the other end. Um, I didn't really, uh, you know, engage in any traditional authorship of the work in terms of, uh, you know, creative contributions other than just to type the words. Um, and this is not coming from me. This is coming from the Copyright Office's compendium, uh, last revised a couple of years ago. But this is a, a massive and extremely thorough and informative publication put out by the Copyright Office, um, discussing, you know, soup to nuts on, on copyright law and register register ability and protection of work. So this is really an important piece of administrative guidance that that anyone in the copyright space should be looking at. So three sections here in particular are important. Um, per the Copyright Office, they will refuse to register a work if a human being did not create the work.
Uh, the work has to possess at least some minimal degree of creativity, which is a very, very low threshold. And, um, this is not like patent law where in order to have to receive patent protection, you need to have novelty and non-obviousness. Um, the bar for creativity is very low here. Um, but that being said, the Copyright Office also will not register works produced by a machine or a mere mechanical process. Now this is important. A mechanical process that operates randomly or automatically without any creative input or intervention from a human author. This begs the question of is that what's happening when one puts in a prompt to an AI program? Is this a mere mechanical process? Is it operating randomly? Again, like I said before, we don't exactly know what's happening underneath the hood. So there is some randomness to it, some lack of of creative, of a human creative mastermind, because you can describe your prompt all you want, but you still don't really know what's going to come out the other side. Um, but you know, also on the other side of that is the creative, is the prompt creative enough input or intervention from a human to warrant some, some form of protection. So this again is from a couple of years ago, but much more recently, in fact. Hang on this last this is from March 2021, the the compendium and now March of 2023 is when we've got some guidance on on authorship.
And this came out of a couple of cases that will that we'll talk about not not litigation cases but disputes before the well one litigation and one dispute before the Copyright Office concerning human authorship of work. So the current Copyright office guidance on authorship here is you first of all, you have to include you have to disclose any generated content If you're if you're seeking to register a copyright in a work that was generated, at least in part using AI, you have to disclose that. Um, you have to disclose it if it's appreciable as opposed to de minimis. So if it, if the content would itself be independently copyrightable, um, had it been created solely by a human, you have to disclose that if you use AI for ideation for, you know, initial drafting, storyboarding, that kind of thing. And it didn't really appear in more than a de minimis form in the final product and you don't have to disclose it in your application to register the work. Um, so yes, traditional elements of authorship must have been conceived and executed by a human and not the machine. Um, and so the Copyright Office in determining Registrability will consider if the contributions are a result of a mechanical reproduction, how the tool operates and was used and. This is a case by case inquiry. What is interesting, though, about the Copyright Office is that they are not digging into each of these applications on an individual basis and challenging them as as what happened in the US Patent and Trademark Office.
With a trademark application, you get an examining attorney. They can issue office actions and say, you know, we have this, that or the other problem with your application. Generally, unless there's something glaringly wrong with the application, the Examiners of the Copyright Office is going to take the applicant at their word. Um, so it's kind of an honor system in terms of disclosing generated content. Um, but these are the guidelines that are provided for purposes of, of what are deemed appropriate disclosures in the, in the process of registering a work. Um, one, one thing to note and we'll talk a little bit more about this in a minute is that if a human takes a generated material and selects and arranges or coordinates it in a creative way, there could be enough copyrightable elements to support a claim in kind of two pieces. One is the the human authored aspects. If there was some, you know, significant editing done on top of the generated content. And number two, the work as a whole. Um, so if you take a number of generated images and arrange them into a collage or as we'll talk about in a minute, use them to, to illustrate a comic book that you write. The comic book overall could be copyrightable, but not the images themselves.
So a bit of, again, line drawing and case by case inquiry. But this is the guidance that we have right now. This image is called a recent entrance to Paradise. It was created by an AI program designed by Dr. Steven Thaler, who built the AI and entered the prompts and let the engine create you know, what what it was going to create. Um, the Copyright Office rejected his, um, his application and he ended up litigating over it. So, uh. This is the case. Fowler versus Perlmutter. It's in the district of. So Fowler, software engineer, had asked, is asking the Federal Court to overturn the Copyright Office's refusal to register this work. A recent entrance to Paradise. Paradise on the copyright application. The author was listed as Creativity Machine, which was his his software and the the Copyright Office. Based on the guidance that we've gone through before, a couple slides ago and the presumably the compendium guidance as well rejected the application, saying that the machine cannot be an author. So Thaler is arguing that he could own the copyright where the computer is the artist, similar to a work for hire concept where pursuant to a contract, an artist can be listed, a hired artist can be listed as an author while the company still owns the copyright. So the Copyright Office denied this again because the applicant applicant's representations, um, led them to to conclude that the work contained no human authorship. As far as as far as I'm aware, the cases in the summary judgment phase.
Um, and once a decision comes out, we'll have some more guidance from the district of DC on registrability of this type of work and we'll go from there. So this is the case that I mentioned a moment ago about the comic book. An artist named Chris Castronova created a comic book called Zarya of The Dawn, which is pictured here. The images were generated using, um, I forget if it was Midjourney. Yeah. Midjourney Um, and you know, we're edited, we're selected and arranged. There was a lot of iteration going on. Um, now, when the artist, uh, submitted the application to, uh, the Copyright Office, they did not disclose that there was a generated material available or included. And in fairness, this was before the, the guidance came out from the Copyright Office regarding disclosure. And in fact, this is the the case that prompted the prompt. It, no pun intended, prompted the Copyright Office to to come out with the guidance that that showed you a little a little while ago. So Casanova created this comic book, wrote the text and curated the images and originally initially had got a copyright granted. And I remember reading about this when it happened. It was it was kind of big news that this generated comic book was granted copyright protection. And the reason it came out in the news is because on kashtanova social media, you know, they showed that they got this copyright application but indicated that they had used Midjourney to create the images.
So this got the attention of the Copyright Office and the Copyright Office ultimately reconsidered the application. Um, now, Casanova had indicated that they used hundreds or thousands of text prompts, went through hundreds of iterations of these of these images and in some instances edited the images in midjourney. Now, for those familiar with the kind of basic tenets of copyright law, there is no such thing as the sweat of the brow doctrine. So it kind of doesn't matter how much work you put into something. If it's not inherently copyrightable, it doesn't matter that you spent a lot of time on it. So essentially what what the Copyright Office, you know, where the copyright office landed was that Casanova authored the work's text, which is true. She didn't use any AI to create the text. So she's the author of the text, but she's also the author of the selection, coordination and arrangement of both the written and visual elements. So you see here a series of, uh, of images with captions and dialogue bubbles. She is the author. Uh, sorry, they are the author of the overall composition of this comic book and all the elements included, but not the the visual. Images themselves. So the Copyright Office said, um, the work that was done to these images in terms of editing was too minor and imperceptible to supply the necessary creativity for copyright protection.
So again, it doesn't matter how long they worked or how hard they worked, the images themselves were deemed not created by a human being, sufficient, uh, with sufficient human authorship to to warrant copyright protection in and of themselves. But copyright protection was granted for the comic book overall and the and the text. So it's an interesting exercise in line drawing. Um, and again important to disclose the use of AI in, in creating works like this in order to determine the scope of protection. Uh, it's, it's always good to know how protected you are if you ever need to pursue a claim for infringement or if you're going to license something or assign your work, you know, for purposes of chain of title, um, you know, people want to know who, who made the who made the work and whether you have all the rights in them. So this, this all has kind of important implications downstream. Okay, let's talk about my bread and butter litigation. There are a number of pending litigations happening right now that are wending their way through the courts. What's interesting is there is one that's been around for a while called Thomson Reuters versus Ross Intelligence. And this deals with an AI model being used to kind of ingest and learn from westlaw's cases and case captions. And so that case is actually in the summary judgment phase as far as I know, and there may or may not be a ruling on that, you know, somewhat soon that could give some guidance on these future cases.
So a lot of people have been focusing on the Getty Images and Anderson and Sarah Silverman's case. But there is this one that's been sitting around for a little while that's further along and could potentially give some guidance on on fair use, which we'll we'll talk about in a bit. So Getty Images versus stability is one of the big, bigger cases that grabbed some headlines recently earlier in the in the year. So Getty Images sued Stability, which runs the stable diffusion engine, claiming that it used more than 12 million of Getty's photos without permission or compensation. So. Getty Images says that stability, stability. I used its copyrighted images and associated text and metadata to train its its text to image tool. So I mentioned before that the metadata and the captions and the associated text are really important because that's one of the main ways that an AI program will learn how to associate a prompt with a certain image output. So if you've got a, you know, photo as as shown in the Getty Images complaint of soccer players playing soccer. Um, the. I program is not going to know that those are soccer players unless there's some sort of text associated with it that it can learn from and say, Oh, that's what two people playing soccer looks like. Um, so the text associated with the images is quite valuable, you know, from, from Getty Images perspective.
Um, one of the ways that Getty Images is able to determine that his images were used for training, is that because the output from stable diffusion sometimes had a still had the Getty Images watermark on them, albeit kind of modified and distorted, but it's it was still there. Um, so Getty Images claims that stability benefits from right So this is what I mentioned before, the accuracy and detail of its image and text pairing to train the train the model. Um, right now there's a motion to dismiss on jurisdictional grounds after an amended complaint was filed a couple of months ago. So nothing substantive at the time of recording. Um, on this case, but it continues to, to move along. Uh, Anderson versus stability in Northern district California is another another big case that grabbed a lot of headlines. This is a putative class action filed by artists alleging that stability midjourney and Deviant Art, which is a platform that uses some of these other technologies in the background for its own, uh, AI program. Um, used these artists copyrighted images to, to train their models without consent. So very similar to the Getty Images case in the sense that these artists are alleging that their images were scraped from the Internet and were used for for training the models without without permission. So a couple of issues here. In order to to prevail on a copyright infringement claim, the plaintiff has to show ownership of a valid copyright and actionable copying of element of original elements of the work.
So there are two of the plaintiffs first did didn't have copyright registrations, which is a prerequisite to filing a lawsuit. It's a little counterintuitive. You technically have a copyrighted work as soon as it's fixed in a tangible medium of expression, but you can't actually pursue any claims or enforce your rights unless you registered the works. So two of the three artists had not registered their works. Um, that was, that was an issue on motion to dismiss. And one of the other main things that came up was that the artists had not identified specific works that were copied or any infringing works that were created. So this, this challenging because there are massive datasets out there. There are there are ways to figure out whether your works have been used. If you get access to the datasets, don't know how searchable they are, but presumably you could look through them. There are websites. This will come up more later, but there are websites like have I been trained.com where you can upload, uh, you know, your works into the site's database and it will compare it with what's appearing in certain training datasets. It's not, it's not everything, but it's, it, it checks some of the major ones. So it's hard, it's hard to determine this unless, you know, like with the Getty Images case, you have output that's clearly showing the watermark or, you know, for instance, in the case of an author, if you say, you know, write me a sequel to this famous book and it starts writing it with accurate depictions of the characters and similar language to the to the prior books, then it's pretty clear that those works were used as training data.
Um, so based on some of the, based on these issues, among other things, the court recently said that it was inclined to dismiss the case with leave to amend. I don't think a decision has come out yet. This was this was a result of the oral argument. Um, but if that's the case, then we will see. Presumably we'll see an amended complaint soon. Okay. Tremblay versus OpenAI. Um, OpenAI being the company that, uh, that developed ChatGPT. So now we've moved from art to text. So two authors are suing OpenAI for the use of their works as training data allegations similar to two Andersons brought by the same lawyers. Um, along with the allegation of copying for training purposes. The plaintiffs are also alleging the creation of derivative works um in the form of output, so the engine can create summaries of existing works. For instance, they're claiming that those are infringing derivatives and there's a DMCA claim for alteration of copyright management information, which typically refers to things like credits, watermarks, metadata, etcetera. And there are various common law claims as well. So this is an interesting one to watch on the on the author's side.
Similarly, Sarah Silverman, comedian and actress and author also sued. Openai. Northern District of California. It's another putative class action brought on behalf of of Sarah Silverman and all book authors whose works were used as training data for OpenAI. So the argument here is that the authors did not consent to the use of their copyrighted books for training material. Again, this is there's a lot of these cases with very similar allegations. This this. The issues surrounding the copying of mass copying of works without authorization is is a big issue. And that's going to be kind of the one of the major battlegrounds in determining what is or isn't permitted in terms of in terms of fair use in this case, also had, along with copyright infringement, a DMCA claim and various common law claims. So Kadri versus meta platforms filed by the same parties as this, the open case. This is the other Sarah Silverman case. So she she brought two two cases, one against OpenAI and one against meta platforms, meta being the parent company of Facebook. Um, so this was a yet another putative class action brought on behalf of book authors, um, use whose works were used as training data for meta for their product, which is not not getting as much attention as ChatGPT or even Google's bard, but it's out there. Um, same arguments. Authors didn't consent to the use of their copyrighted books and training materials for Lama, which is, uh meta's.
Um, don't know whether it's meta's database or data set or whether it's its actual model. Um, but anyway, same, similar arguments to the other, the other couple of cases and similar claims, copyright infringement, DMCA, common law, torts. All right. Uh, J. L versus Alphabet Inc. Alphabet Inc is Google, Google's parent company. So this case is a class action brought by eight individuals, uh, on behalf of a putative class of millions of Internet users. Um, so including copyright holders. So same law firm. Um, we'll talk about the case later. That's that's coming next. Um, so the argument here is that Google in training and and ramping up its Bard AI program engaged in scraping of data from Internet websites all over the Internet, including pulling information from their own services like Gmail to train their their AI product. And Bard is Bard is like ChatGPT. It's a text text generator. So this is a massive complaint. It's it is, um, uh, close to 100 pages, if not more. It includes copyright claims, but also. A allegations concerning violations of privacy, property rights and and other other claims related to the collection of personal data and personal identifying information and the like. So this is this is a pretty huge complaint. There's a significant copyright piece to it, given that some of the putative class members and named class members are either authors or artists. But this this goes beyond just copyright infringement, covers a lot of other grounds.
All right. I had mentioned previously this is this is really just a privacy case. So again, this is against OpenAI. Um, anonymous plaintiff sued OpenAI and and its related corporate entities over similar to the to the Google case collection of personal data sensitive things like medical records, um, conversations through, you know, through social media. Um, there's all kinds of, of privacy statutes, state invasions of privacy and unfair competition allegations, consumer fraud, negligence, privacy, torts, conversion. This is another, another big complaint. It's not it's not a copyright case, but I figured it was probably worth worth raising here because it's pretty significant. Okay. All of this begs the question, is scraping and copying content from the Internet without without permission fair use? I don't know. The answer is unknown. Some of the pending cases might answer this question. I mentioned before the Thomson Reuters versus Ross Intelligence, which may may kind of be the first one to address that issue. And what's interesting is that that case was briefed before this Supreme Court's decision in Warhol versus Freeman. And yet some of the arguments that came up in the briefing were quite similar to what the court ended up coming down on in its decision in Warhol, interpreting the fair use statute a little bit differently from from how it has been interpreted in the past. So these cases will will likely answer the questions, whether there ends up being some sort of circuit split down the line and raising this up to the Supreme Court again is another question.
So just two basics of fair use. I'm sure those who deal with with copyright on any regular basis will know this. But for the uninitiated, 17 USC Section 107 is the fair use statute. Um, fair use is is presented as an affirmative defense, but it's actually a it's actually a non-infringing use. It's not it's not a defense. Technically. It's a it just isn't infringing if you're found to have engaged in fair use. So the statute talks about things like criticism, comment, news reporting, teaching, scholarship, research, etcetera. Those are not considered infringement. But in order to figure out whether you fall into a permissible category of use, there are four non-exclusive factors. And it's interesting. They're non-exclusive because I don't remember reading any recent case that talks about any factors other than these except possibly the public interest behind the use. But in any event, the four factors are listed here. The purpose and character of the use, including the commercial nature of the of the works. Factor number one is the one that's been. Debated and litigated for many years back in the 90s, the early 90s in the Campbell versus Acuff-rose decision, it was determined that factor one included an element of transformative use. So the purpose and character of the use. Courts from from then on have had to ask Is it transformative? Was there new expression or meaning or message added? What was the purpose of of the use.
Was there a new expression? Was it being used in a different way? It's not in the statute, but it was kind of a gloss over the statute that has become the the kind of linchpin of the of the fair use analysis in recent years and has kind of gotten out of control in terms of interpretation. Two and three aren't usually determinative. So the nature of the copyrighted work, the more creative a copyrighted work is, the harder it is to claim fair use, the more factual content the copyrighted work has, the easier it is to claim fair use. Um. Number three the amount and substantiality of the portion used in relation to the copyrighted work as a whole. Did you use a little or a lot? That's really the question. The, the uh, it's a quantitative and qualitative inquiry. So even if you use a small amount, if it's the heart of the work, the most important part of the work that kind of destroys the market for it, if you use it, then you still can be considered to have infringe. On the other hand, when it comes to things like, like visual works, um, you can potentially use the whole work and sometimes it'll be fair use depending on how you use it. Um, so this is a this again, not a determinative factor, but still it's still fairly important. Last factor is the effect on the potential market or value of the copyrighted work.
This is you know, it does the. The allegedly infringing Work Act as a market substitute. And how does it impact the licensing value or the value for the existing market or potential market for the work? So factors one and four have a lot of interplay. And the Supreme Court's decision in Warhol recently kind of conflated them even more and putting more weight on the question of the commercial nature of the work and asking whether the infringing use and the original use were in the same context and whether that context was commercial, so that the Warhol decision didn't go so much into the creative additions to a creative transformation of an allegedly infringing work, but rather the. Non-profit versus commercial purpose and whether the existing work was in that same vertical. For lack of a better term, this is all going to come up in the in the context of of training data. Um, is the scraping, is the copying and use for training an AI algorithm considered fair use? Is it transformative? Uh, I feel like a lot of emphasis now is going to be put on the commercial nature of, of these, these businesses. Um, uh, this, it's going to be a complicated question. And I think, I think the Warhol decision will have some impact on the, the arguments that come out. Um, but, you know, we'll have to see where the courts come out and applying the fair use statute and the most recent case law to a new technology that wasn't really on the minds of the drafters of the 1970 1976 Copyright Act.
So moving on to more practical takeaways and best practices. So on the question of copyright ability, again, you have to consider whether the question of authorship. You have to decide to what extent was generated material used. And in order to maximize the potential for a finding of human authorship, there has to be significant human input in the in the process. Probably best to document how that human input shapes and controls the final creative output. You don't need to document material. You don't have to document your creation process in order to have a in order to successfully register a work. But, you know, if you're using AI, it's probably best to keep track of how you're using it and to what extent you actually created something or edited something yourself. If you're going to if you're anticipating a challenge down the line. Uh, on the registration side, as I mentioned before, you need to disclose the inclusion of any generated materials. If it's more than de minimis and if you have pending applications or prior registrations where you didn't disclose that material, those should be updated to disclose the inclusion of the generated material. Just to give you an idea of what the disclosure form looks like, this really is not any different from the forms that have been used in the past.
There are limitations of claims have been around for forever. And you have to you know, you have to disclaim things that are, you know, factual or public domain or previously registered. So this is it's no different in in disclosing generated content. And it's very simple. You don't have to go into any great detail about, you know, what exact elements were created by AI and what weren't be, you know, something that some guidance that the Copyright Office gave in a webinar. You know, if you're if you're using Xara Vadhana as an example, you can say it would be sufficient to say you're disclaiming under material excluded, some generated text or some generated images and material that you're including, you know, human generated text, human generated images. So it's easy. You don't have to go into any great detail. It's very, very basic disclosure requirement. In terms of best practices. So we'll kind of go go by category here. For content owners, those who create or own and or own their own original content. Water watermarks are useful to protect your content if you're concerned about it being used in AI and being able to prove that it was used in AI as shown in the Getty Images case, the watermark can sometimes pop out the other side. If you're worried about your works being used to train data. Data used to train AI through various datasets you can check Have I been trained.com? Again, this doesn't cover all the training datasets, but it covers a good amount of them.
Some of the AI platforms now have opt out features, so you can say, I don't want my stuff in there, I don't want it to be used for training. Um, obviously policing your content is important. If you're seeing things on social media or websites or online marketplaces that look like your works, um, you can, you can potentially take them down if they're or submit, you know, DMCA takedown notices depending on the, on the platform if they're substantially similar, you can use services like Tineye or Google's reverse image search to see if your work or anything similar to your work or is appearing on the Internet. There is a there's a new product coming out of the University of Chicago called Glaze that prevents style mimicry. Don't really know how it does it, but it's it has some way of confusing the the AI programs and, you know, tell it to make a Greg Rakowski and if his works have been you know used with if he's used glaze however glaze is used presumably the the engine won't understand and will get confused and won't actually mimic the style. Again, not a not a computer scientist. So I don't know exactly how it works, but it's an interesting countermeasure. There's a lot of these coming out, um, different types of programs to prevent artists, uh, style mimicry and to prevent works from being ingested in the first place and then check their website terms and conditions.
Um, you know, on your website or on business websites, you can, you can put in the terms and conditions that the content shouldn't be used as. Training data doesn't necessarily mean that they're not going to be used, but at least it's another safeguard to be able to say, well, they accessed our our website and violated the terms of service. Um, and if you're entering into, into contracts with, with third parties. You know, including safeguards surrounding content and use of your content and AI training, training models, if you're licensing your content to somebody and you don't want it to be trained or used in the context of an AI engine, then you should put that in the contract to prevent to prevent a third party from doing that without your permission. For users and artists. Obviously the best practice is to find licensed or clean datasets from from businesses that are not using scraped material without permission. You know, check for watermarks on anything that you are potentially going to use, change or edit the content significantly. You know, if you want to maximize your chances of protection and registration, there needs to be, you know, significant human authorship. So don't just if you want to be able to register something or you want to be able to protect it later, just putting a prompt in and using the output probably is not going to suffice. Which brings me brings me to not using AI as the final product and use it as ideation, etcetera.
Sure there are ways around that. But think some some companies are are stopping are starting to disallow in the style of living artists prompts. Last thing I'll cover is just a cautionary tale for lawyers using AI. Ai is going to become a significant portion of of any lawyer's practice. Um, this is a little bit outside the copyright piece, but I think it's important for lawyers who are attending. Um. I can be used to draft letters and briefs and do research. But as this Metaverse's versus Avianca case shows, you really have to be careful. This was a couple of lawyers who had used AI to draft the brief and the hallucinated these various cases that don't exist. So and the attorneys got in trouble. They were they were sanctioned. And, you know, this kind of goes back to the very basics of of lawyering. You just have to check your work site, check your cases, make sure that what's coming out of the you know, of the other side of the engine makes sense and isn't completely made up. You also have to be careful about feeding it with confidential or privileged information. Um, I hesitate to use, you know, these engines for, for any significant drafting until they become more secure and stop spitting out falsehoods. But be careful. Um. It's just an excerpt from from an exchange with the judge is, you know, some of these cases just were not made up or just were not real.
They were totally made up. So, again, check your work site. Check an attribute, check your cases. Don't rely solely on AI to do your research. Check the judge's rules. Now, some some judges are starting to put out rules that that where they either don't allow you to use AI generated content or you have to disclose generated content. So, you know, basics of of litigation practice is always to check the judge and court rules anyway. But now it's even more important, especially if you intend on using AI in your practice. So as I mentioned before, the attorneys were sanctioned. They had to disclose the sanctions opinions to judges listed as as having drafted the fake opinions and there was a $5,000 penalty. So with new technology comes a lot of responsibility. And it's it's tempting to to try to, you know, use these products for efficiency purposes. And they should be they should be able to make the practice of law more efficient. But one has to be very careful and, you know, always engage in the in the best practices that you were taught even before I existed. So thanks for attending. Appreciate your time and and listening to me ramble for more than an hour about topics that I really find super interesting and I hope you found interesting too. And, um, happy to have spent the time with you. Thanks a lot.