Memorization of training data by neural language models raises questions about data privacy and fair use

2021-01-26

Neural language models memorize some parts of their training data.

As far as we know, this is relatively rare. (Since generalization error is very low for these models, they couldn’t be memorizing very much of their training data.) But it’s not insignificant:

We focus on GPT-2 and find that at least 0.1% of its text generations (a very conservative estimate) contain long verbatim strings that are “copy-pasted” from a document in its training set.

(“Does GPT-2 Know Your Phone Number?” BAIR team, 20 December 2020, and their related paper Nicholas Carlini et al.*)

Although it’s infrequent, this behavior poses significant problems:

BAIR suggests that curated data sets are the most reasonable way forward. (Since it’s not clear how to apply differential privacy to this problem, and since sanitizing the web would be too hard.)

I don’t understand how that actually solves any of the problems they point out. Someone still needs to decide on these questions about contextual integrity, right to be forgotten, and copyright.

How does this happen with models? Why do models memorize certain texts, and not others? How does this idea translate to other domains, for example computer vision models? I thought that this problem was somewhat know in the CV space as well from a long time ago. I might be making that up. Certainly most companies working on CV products like self-driving cars are using curated data sets, but perhaps that’s more to do with business risk mitigation.

Carlini, Nicholas, Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, et al. “Extracting Training Data from Large Language Models,” December 14, 2020. https://arxiv.org/abs/2012.07805.