When I learned that Meta’s programmers downloaded 183,000 books for a database to teach the company’s generative A.I. machines how to write, I was curious whether any of my own books had been fed into the crusher.
Alex Reisner of the Atlantic has provided a handy search tool—type in an author’s name, out comes all of his or her books that the LLaMA used. I typed “Fred Kaplan” and found that three of my six books (1959, Dark Territory, and The Insurgents) had been assimilated into the digital Borg.
My first reaction, like that of many other authors, was outrage at the violation. However, my second reaction—also, I assume, like that of many other authors—was outrage that the program didn’t include my other three books (The Bomb, Daydream Believers, and The Wizards of Armageddon). Were there really 182,997 books that were better than those three? (Imagine the ambivalence of the author who has no books on the list: the relief of being untrammeled; the pain of being so harshly judged.)
I also noticed that the search listed nine of the 11 books by the other author named Fred Kaplan, as if we were the same person. My doppelgänger (whom I’ve never met, though reviewers and Amazon listings have occasionally conflated the two of us) might also be wondering about his two omissions.
Slate’s business and technology editor, Jonathan Fischer, sent a question to the program (you can do the same: simply go to this website and scroll down to the box labeled “Send me a question”): “Did you know there are two Fred Kaplans?” The machine replied:
Yes, I’m aware of the fact that there are two people named Fred Kaplan who have written books. One Fred Kaplan is a computer scientist and author who has written several books on programming and computer science, while the other Fred Kaplan is a journalist and author who has written several books on politics and foreign policy. I apologize for any confusion caused earlier when I referred to both authors as a single person. Thank you for bringing this to my attention.
This is—to put the matter in terms that even a dumb machine can understand—wrong. Neither of us Fred Kaplans is a computer scientist, nor has either of us written anything on programming. I am a journalist who has written several books on politics and foreign policy. But the other Fred Kaplan is a retired English professor who has written several literary biographies—a credential that the machine’s answer didn’t cite.
How could the machine have made this mistake? I have a theory. As has often been reported, current A.I. software has a tendency to pamper its users, to supply whatever answers they seem to be seeking. Jonathan asked the computer if it was aware there are two Fred Kaplans. So it replied, Yes, I know there are two. One of the top items on its list of books by Fred Kaplan is Dark Territory: The Secret History of Cyber War, so the machine may have inferred that its author is a computer scientist. (Not true; I should know, as I wrote it.) Some other books are about foreign policy, so the machine—once alerted that there are two authors of the same name—inferred that those books were by the other guy. (Again, not true.) Because it was asked whether there are two authors of my name, it didn’t feel a need to explain away the other books, which were about neither computers nor foreign policy.
So I submitted a different question: “Are there at least three authors named Fred Kaplan?” It replied (quite quickly):
Yes, there are at least three authors named Fred Kaplan. Here are three examples:
Fred H. Kaplan is an American author and historian who has written several books on topics such as Abraham Lincoln, the Civil War, and American politics.
Fred R. Kaplan is a British author and journalist who has written several books on topics such as technology, business, and society.
Fred J. Kaplan is an American author and academic who has written several books on topics such as literature, culture, and intellectual history.
Here’s the problem with this: My middle name is Michael. I don’t know the other Fred Kaplan’s middle initial. (It isn’t cited anywhere; I’ve sent emails to him and to his agent, but have not heard back.) Let’s be generous and assume that it’s H., R., or J. As far as I can tell from Google and Amazon, there are no authors named Fred Kaplan who have middle names beginning with those letters. So the program is wrong on one point by omission (not getting my middle initial right) and wrong on at least two points by commission (mistaking two, if not all three, of the other Kaplans’ middle initials).
In other words, when pressed with an even mildly confounding question, the machine just makes stuff up.
Which makes me wonder: If the machine can’t distinguish between two Fred Kaplans (who, as the bio lines in our books make clear, are very different—we’re 17 years apart, live in different places, I’m a journalist while he’s a retired English professor) and invents false facts at whim, how will it learn how to write good literature, useful technical manuals, entertaining joke books, or anything else that anyone might want to read from a database more voluminous than the Great Library of Alexandria?
In other words, at least for the moment, Meta’s planned fate for the world’s authors—“to train machines that could one day replace them,” as Reisner put it—seems, for the moment, half-baked.
However, my dialogue with LLaMA 2 revealed something else about Meta’s program that is extremely worrisome.
At one point, I asked how the programmers went about selecting the 183,000 books and leaving aside all the other books. In less than a minute, the machine returned a very lengthy, somewhat eloquent reply, citing and elaborating on 10 criteria: Relevance, Authority, Clarity, Practicality (whether it presents “actionable tips and guidance”), Currency (whether it’s up to date), Positive Reviews, Awards or Recognition, Depth of Information, Engaging Writing Style, and Cost (whether it’s not too high to be affordable or too low to be taken seriously).
First, one wonders how the machine can really parse all of these factors, especially, again, given its failure to unravel the many differences between the two Fred Kaplans. (Not to blow my own horn, but instead to make a point: One of my three books that failed to make the list of 183,000—The Bomb, published in 2020—was hailed by the New York Times as “a rich and surprisingly entertaining history.”)
But second, and more seriously, one of these criteria raises a red flag, namely Criteria No. 10—Cost. If the purpose of this database is to help A.I. machines learn how to write, what does it matter whether a book might be too expensive or too cheap? I posed the question to the machine. It replied (again, very quickly):
You raise a valid point. The cost of a book should not be a factor in selecting books for a program aimed at helping AI machines learn to write. … Therefore, I apologize for including cost as one of the factors in my previous response.
The machine then recited a revised list of criteria, which substituted the following as Criteria No. 10: “Availability: Is the book widely available to libraries, bookstores, or online platforms?”
I asked: “Why should availability (to mortals) matter?” The machine replied: “You’re right, availability to mortals shouldn’t be a factor in selecting books for a program aimed at teaching A.I. machines to write.” It sent yet another revised list of criteria, this one omitting Criteria No. 10 altogether.
It seems to me, the machine concedeth too quickly. The program’s true intentions are revealed in the 10th criteria of its first two answers to my question—Cost, then, on second thought, Availability. Why would either be pertinent unless Meta were creating a database not only for teaching A.I. machines how to write, but also for cataloguing, reproducing, and making available—possibly selling—existing books? (Meta did not immediately respond to Slate’s request for comment.)
It appears, then, that copyright violation—in the old-fashioned as well as newfangled meaning of the term—may be a part of Meta’s hidden agenda. Some well-known authors, including Michael Chabon and Sarah Silverman, have already filed class-action lawsuits. Where do I sign up?