Departing from History to Share a Very Personal Story 😧

Published: Fri, 03/28/25

Updated: Fri, 03/28/25

March 28, 2025

Hello ,

Sorry, no inspiring story from history today. Unfortunately, I've had to spend the week learning about AI large language models and writing four separate letters to Mark Zuckerberg.

The Facebook guy is now full-speed ahead on his Llama AI project, but all the tech bros are racing to build the best AI, and the competition is fierce.

Zuckerberg's Meta Platform tried to get ahead by secretly using copyrighted books without permission. Four of mine are included in the up to 7.5 million books he's accused of ripping off.

To Meta CEO Mark Zuckerberg,

It has come to my attention that you have used my book, FANNIE NEVER FLINCHED, in the training of your generative AI models without permission from me, and in violation of my rights under copyright.

This letter is to put you on notice that you do not have the right to use my work to train your AI models. You must obtain express permission and provide reasonable licensing terms for authors’ works.

Meta lawyers told a federal judge this week in San Francisco the company made "fair use" of books to develop its large language model Llama. More on the lawsuit below, including internal communications showing the company knew it was going beyond the "fair use" allowed by copyright law.

Authors Accuse Zuckerberg of Copyright Infringement

Why would a trillion dollar company steal from authors, many of whom do not even make a living?

"Because they needed books for their quality writing, style, expression, and long-form narration," says The Authors Guild, "and [Meta] would rather steal them than ask and pay for them as they do for all of the other necessary components of their AI, such as electricity and programming."

This week, as I've learned about the authors' class action lawsuit against Meta, I've also been busy trying to protect my Google document and shield my website content from the flood of AI scrapers.

Authors Richard Kadrey, Sarah Silverman, and Ta-Nehisi Coates filed the Kadrey v. Meta class action in Northern California accusing the tech giant of infringing on their intellectual property rights. Their court filings suggest that Mark Zuckerberg gave the Llama team permission to train the models using copyrighted works.

Mark Zuckerberg at the F8 2018 Keynote. Source: Wikimedia

Court documents reveal that Meta used a dataset called LibGen for Llama-related training. LibGen was a website that illegally pirated books and make them available to anyone on the internet.

The site was finally shut down after being sued numerous times, ordered to shut down, and fined tens of millions of dollars for copyright infringement. I'd seen my books on this site and been helpless to remove them.

According to filings in Kadrey v. Meta company employees referred to LibGen as a “data set we know to be pirated," and flagged that its use “may undermine [Meta’s] negotiating position with regulators.”

Some decision-makers within Meta apparently believed that failing to use Libgen for model training could seriously hurt Meta’s competitiveness in the AI race, calling Libgen “essential to meet SOTA numbers across all categories,” referring to topping the best, state-of-the-art (SOTA) AI models and benchmark categories.

The filing also cites a memo to Meta AI decision-makers noting that after “escalation to MZ,” Meta’s AI team “[was] approved to use LibGen.” (MZ, here, is rather obvious shorthand for “Mark Zuckerberg.”)

An email outlined “mitigations” to reduce Meta’s legal exposure, including combing through Libgen files for words like “stolen” or “pirated,” and also simply not publicly citing usage. “We would not disclose use of Libgen datasets used to train.”

Earlier court filings indicated Meta considered buying the publisher Simon & Schuster in order to use published books to train their AI models, but the Meta execs determined it would take too long to negotiate licenses and reasoned that fair use was a solid defense.

New filings this week show portions of internal work chats between Meta staffers, and paint the clearest picture yet of how Meta may have come to use copyrighted data to train its AI.

“[M]y opinion would be (in the line of ‘ask forgiveness, not for permission’): we try to acquire the books and escalate it to execs so they make the call,” wrote Xavier Martinet, a Meta research engineer, in a chat dated February 2023, according to the filings. “[T]his is why they set up this gen ai org for [sic]: so we can be less risk averse.”

After another staffer pointed out that using unauthorized, copyrighted materials might be grounds for a legal challenge, Martinet doubled down, arguing that “a gazillion” startups were probably already using pirated books for training.

This summer the court is expected to decide whether Meta broke copyright laws. If so, then authors who books were used will be officially certified as a class in the suit.

The case is one of many AI copyright disputes slowly winding through the U.S. court system.

For more on the AI race for data and how Google may already have used yours to train AI, see this article in the New York Times.

Justine Bateman, a filmmaker, former actress and author of two books, told the Copyright Office that A.I. models were taking content — including her writing and films — without permission or payment. “This is the largest theft in the United States, period,” she said in an interview.

Sources

https://techcrunch.com/2025/03/08/judge-allows-authors-ai-copyright-lawsuit-against-meta-to-move-forward/

https://techcrunch.com/2025/02/21/court-filings-show-meta-staffers-discussed-using-copyrighted-content-for-ai-training/

https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas-llama-team-the-ok-to-train-on-copyrighted-works-filing-claims/

https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html

On a related note, but happier news, this week I went through the process to certify and register my books as Human Authored. Sadly, the book market is becoming increasingly saturated with books created by AI.

So much so, that at one point Amazon limited people from uploading more than 100 books per day. Obviously, even the best writer couldn't write even one book a day.

Here is the visible mark that verifies a book was created by a human, not generated by AI. I can't put stickers on all the books already published, but hopefully in future that will be possible.

Like my article today? Forward this email to share with family and friends.

News and Links

I recommend this article from the Mountain Journal about how the federal budget purge is impacting Yellowstone National Park.

My visit to Yellowstone, Fall 2020.

The "Hands Off" National Day of Action is scheduled for April 5. Donald Trump and Elon Musk think this country belongs to them. They're taking everything they can get their hands on, and daring the world to stop them. I say, hands off our National Forests and National Parks.

On Saturday, April 5th, we're taking to the streets nationwide to fight back with a clear message: Hands off! Find your nearest demonstration here.

Until next week...

Follow me on social media

This newsletter is a reader-supported publication.

To support my work, consider becoming a paid subscriber.

Upgrade to paid subscription

Read a great book? Have a burning question? Let me know. If you know someone who might enjoy my newsletter or books, please forward this e-mail. I will never spam you or sell your email address, you can unsubscribe anytime at the link below.

To find out more about my books, how I help students, teachers, librarians and writers visit my website at www.MaryCronkFarrell.com.

Contact me at MaryCronkFarrell@gmail.com. Click here to subscribe to this newsletter.

Author Mary Cronk Farrell
Mary writes compelling history books about courageous unknown women who helped shape American history.
Her newest book CLOSE UP ON WAR: THE STORY OF PIONEERING PHOTOJOURNALIST CATHERINE LEROY IN VIETNAM has four starred reviews. Both teens and adults enjoy Mary's books

WWW.MaryCronkFarrell.com

426 West 24th
Spokane WA 99203
US

Unsubscribe | Change Subscriber Options