转:Anna's Archive的博文:提供收藏给30余家AI公司进行训练,多数是中国公司

Anna’s Archive的blog原文链接


Copyright reform is necessary for national security

annas-archive.li/blog, 2025-01-31 — companion articles by TorrentFreak: first, second

TL;DR: Chinese LLMs (including DeepSeek) are trained on my illegal archive of books and papers — the largest in the world. The West needs to overhaul copyright law as a matter of national security.

Not too long ago, “shadow-libraries” were dying. Sci-Hub, the massive illegal archive of academic papers, had stopped taking in new works, due to lawsuits. “Z-Library”, the largest illegal library of books, saw its alleged creators arrested on criminal copyright charges. They incredibly managed to escape their arrest, but their library is no less under threat.

When Z-Library faced shutdown, I had already backed up its entire library and was searching for a platform to house it. That was my motivation for starting Anna’s Archive: a continuation of the mission behind those earlier initiatives. We’ve since grown to be the largest shadow library in the world, hosting more than 140 million copyrighted texts across numerous formats — books, academic papers, magazines, newspapers, and beyond.

Me and my team are ideologues. We believe that preserving and hosting these files is morally right. Libraries around the world are seeing funding cuts, and we can’t trust humanity’s heritage to corporations either.

Then came AI. Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality. This is notable given China’s role as a signatory to nearly all major international copyright treaties.

We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).

If the West wants to stay ahead in the race of LLMs, and ultimately, AGI, it needs to reconsider its position on copyright, and soon. Whether you agree with us or not on our moral case, this is now becoming a case of economics, and even of national security. All power blocs are building artificial super-scientists, super-hackers, and super-militaries. Freedom of information is becoming a matter of survival for these countries — even a matter of national security.

Our team is from all over the world, and we don’t have a particular alignment. But we’d encourage countries with strong copyright laws to use this existential threat to reform them. So what to do?

Our first recommendation is straightforward: shorten the copyright term. In the US, copyright is granted for 70 years after the author’s death. This is absurd. We can bring this in line with patents, which are granted for 20 years after filing. This should be more than enough time for authors of books, papers, music, art, and other creative works, to get fully compensated for their efforts (including longer-term projects such as movie adaptations).

Then, at a minimum, policymakers should include carve-outs for the mass-preservation and dissemination of texts. If lost revenue from individual customers is the main worry, personal-level distribution could remain prohibited. In turn, those capable of managing vast repositories — companies training LLMs, along with libraries and other archives — would be covered by these exceptions.

Some countries are already doing a version of this. TorrentFreak reported that China and Japan have introduced AI exceptions to their copyright laws. It is unclear to us how this interacts with international treaties, but it certainly gives cover to their domestic companies, which explains what we’ve been seeing.

As for Anna’s Archive — we will continue our underground work rooted in moral conviction. Yet our greatest wish is to enter the light, and amplify our impact legally. Please reform copyright.

Read the companion articles by TorrentFreak: first, second

1 Like

Anna’s Blog安娜的博客

Updates about Anna’s Archive, the largest truly open library in human history.
有关安娜档案馆的最新信息,这是人类历史上最大的真正开放图书馆。

Copyright reform is necessary for national security

版权改革对于国家安全必不可少

annas-archive.li/blog, 2025-01-31 — companion articles by TorrentFreak: first, second
Annas-Archive.li/blog,2025-01-31 - Torrentfreak的同伴文章:第一第二

TL;DR: Chinese LLMs (including DeepSeek) are trained on my illegal archive of books and papers — the largest in the world. The West needs to overhaul copyright law as a matter of national security.
TL; DR:中文LLMs(包括DeepSeek)接受了我的非法书籍和论文档案的培训,这是世界上最大的。西方需要根据国家安全进行大修版权法。

Not too long ago, “shadow-libraries” were dying. Sci-Hub, the massive illegal archive of academic papers, had stopped taking in new works, due to lawsuits. “Z-Library”, the largest illegal library of books, saw its alleged creators arrested on criminal copyright charges. They incredibly managed to escape their arrest, but their library is no less under threat.
不久前,“影子上映”快要死了。大规模的非法非法档案库科学枢纽由于诉讼而停止从事新作品。最大的非法书籍图书馆“ Z-Library”看到其涉嫌创作者以刑事版权指控被捕。他们令人难以置信地逃脱了被捕,但他们的图书馆也受到威胁。

When Z-Library faced shutdown, I had already backed up its entire library and was searching for a platform to house it. That was my motivation for starting Anna’s Archive: a continuation of the mission behind those earlier initiatives. We’ve since grown to be the largest shadow library in the world, hosting more than 140 million copyrighted texts across numerous formats — books, academic papers, magazines, newspapers, and beyond.
当Z-Library面对关闭时,我已经备份了整个图书馆,正在寻找一个平台来容纳它。那是我开始安娜档案的动机:这些早期倡议背后的使命的延续。从那以后,我们已经成为世界上最大的影子图书馆,跨越了许多格式的版权文本超过1.4亿个,例如书籍,学术论文,杂志,报纸等。

Me and my team are ideologues. We believe that preserving and hosting these files is morally right. Libraries around the world are seeing funding cuts, and we can’t trust humanity’s heritage to corporations either.
我和我的团队都是思想家。我们认为,保存和托管这些文件在道德上是正确的。世界各地的图书馆正在削减资金,我们也不相信人类对公司的遗产。

Then came AI. Virtually all major companies building LLMs contacted us to train on our data. Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality. This is notable given China’s role as a signatory to nearly all major international copyright treaties.
然后是AI。几乎所有主要公司建设LLMs与我们联系以培训我们的数据。一旦意识到我们工作的非法性质,大多数(但不是全部!)美国公司就重新考虑了他们。相比之下,中国公司热情地接受了我们的收藏,显然对其合法性并没有使其陷入困境。鉴于中国是几乎所有主要国际版权条约的签署国,这是值得注意的。

We have given high-speed access to about 30 companies. Most of them are LLM companies, and some are data brokers, who will resell our collection. Most are Chinese, though we’ve also worked with companies from the US, Europe, Russia, South Korea, and Japan. DeepSeek admitted that an earlier version was trained on part of our collection, though they’re tight-lipped about their latest model (probably also trained on our data though).
我们已经高速访问了大约30家公司。他们大多数是LLM公司,有些是数据经纪人,他们将转售我们的收藏。大多数是中国人,尽管我们还与美国,欧洲,俄罗斯,韩国和日本的公司合作。 DeepSeek承认,尽管他们对最新型号进行了训练,但对我们系列的一部分进行了培训(虽然也可能对我们的数据进行了培训)。

If the West wants to stay ahead in the race of LLMs, and ultimately, AGI, it needs to reconsider its position on copyright, and soon. Whether you agree with us or not on our moral case, this is now becoming a case of economics, and even of national security. All power blocs are building artificial super-scientists, super-hackers, and super-militaries. Freedom of information is becoming a matter of survival for these countries — even a matter of national security.
如果西方想在比赛中保持领先地位LLMs而且,最终,它需要重新考虑其在版权上的立场,并很快。无论您是否同意我们的道德案例,这现在都已成为经济学的案例,甚至是国家安全的案例。所有电力集团都在建造人工超级科学家,超级骑行者和超级军事。信息自由正在成为这些国家的生存问题,甚至是国家安全问题。

Our team is from all over the world, and we don’t have a particular alignment. But we’d encourage countries with strong copyright laws to use this existential threat to reform them. So what to do?
我们的团队来自世界各地,我们没有特殊的一致性。但是,我们鼓励拥有强大版权法的国家使用这种存在的威胁来改革它们。那该怎么办?

Our first recommendation is straightforward: shorten the copyright term. In the US, copyright is granted for 70 years after the author’s death. This is absurd. We can bring this in line with patents, which are granted for 20 years after filing. This should be more than enough time for authors of books, papers, music, art, and other creative works, to get fully compensated for their efforts (including longer-term projects such as movie adaptations).
我们的第一个建议很简单:缩短版权术语。在美国,版权是在作者去世后70年授予的。这是荒谬的。我们可以将其与专利相符,专利已授予备案后20年。对于书籍,论文,音乐,艺术和其他创意作品的作者来说,这应该是足够的时间,以完全弥补他们的努力(包括长期项目,例如电影改编)。

Then, at a minimum, policymakers should include carve-outs for the mass-preservation and dissemination of texts. If lost revenue from individual customers is the main worry, personal-level distribution could remain prohibited. In turn, those capable of managing vast repositories — companies training LLMs, along with libraries and other archives — would be covered by these exceptions.
然后,至少,决策者应包括大规模保护和传播文本的雕刻。如果单个客户的收入损失是主要的担忧,那么个人级别的分销可能仍然禁止。反过来,那些能够管理大量存储库的人 - 公司培训LLMs,以及图书馆和其他档案 - 将由这些例外涵盖。

Some countries are already doing a version of this. TorrentFreak reported that China and Japan have introduced AI exceptions to their copyright laws. It is unclear to us how this interacts with international treaties, but it certainly gives cover to their domestic companies, which explains what we’ve been seeing.
一些国家已经在做一个版本。 Torrentfreak报道说,中国和日本已经为其版权法引入了AI例外。我们尚不清楚这与国际条约如何互动,但它肯定会掩盖他们的国内公司,这解释了我们所看到的。

As for Anna’s Archive — we will continue our underground work rooted in moral conviction. Yet our greatest wish is to enter the light, and amplify our impact legally. Please reform copyright.
至于安娜的档案 - 我们将继续以道德信念为基础的地下作品。然而,我们最大的愿望是进入光明,并在法律上扩大我们的影响。请改革版权。

Read the companion articles by TorrentFreak: first, second
阅读Torrentfreak的同伴文章:第一第二

这个机器翻译还是比较糟糕。夸克翻译也很不好。安娜是不是想要政府放松版权严苛的限制,令西方国内知识获取更具有法律自由度呢?

Meta 也是一样。

1 Like

《Meta通过Anna档案以BT方式下载超81TB数据,尽管种子较少》

由Ernesto Van der Sar撰写

最新解封的法庭文件显示,Meta通过Anna的档案从影子图书馆中下载了大量数据。该公司使用BitTorrent的情况早已为人所知,但内部电子邮件通信揭示了下载数据的来源和数量(以TB计),以及由于种子较少而导致的资源有限和下载速度缓慢的问题。

上周末,影子图书馆Anna的档案提出,对于人工智能公司而言,获取“盗版”书籍可能关乎国家安全。这一有争议的观点背后的逻辑在于,如果美国公司使用从影子图书馆获得的数据来训练人工智能模型,它们将面临法律后果。然而,其他国家对此的顾虑较少,这可能使外国公司在技术上占据优势。美国科技公司深知影子图书馆的潜在力量。作为Facebook、Instagram和WhatsApp的母公司,Meta从未否认其使用这些图书馆来训练早期版本的人工智能模型。

Meta并非个例。中国人工智能领域的颠覆者DeepSeek也公开承认使用了来自“盗版”来源的数据。然而,迄今为止,主要是美国的大型科技公司被告上法庭。由包括Richard Kadrey、Sarah Silverman和Christopher Golden在内的作者提起的集体诉讼就是这样一起版权侵权案件。这些作者指控Meta未经许可使用了他们的作品。上个月,他们提交了一份经过修改的诉状,其中包含了与BitTorrent相关的指控。原告认为这尤其成问题,因为BitTorrent用户通常也会将内容上传到第三方。

“Meta通过一个名为LibTorrent的平台,使用比特流协议从LibGen下载了数百万本盗版书籍。Meta内部承认,使用这种协议存在法律问题,”第三次修改的诉状中指出。“通过比特流协议下载,Meta知道自己在为其他盗版书籍用户充当分发点时,正在助长进一步的版权侵权。”

这些被指控的不当行为需要在法庭上得到证明,因此版权所有者要求访问Meta的BT客户端日志和种子数据。该请求被拒绝。尽管如此,版权所有者还是在调查期间获得了与BT相关的证据。许多细节之前已被封存,但昨天添加到档案中的解封副本揭示了新信息。原告引用Meta内部的一封电子邮件称,该公司试图通过Anna的档案获取数据。虽然由于种子数量较少,这颇具挑战性,但他们还是成功获得了数TB的数据。

“[Meta]非法的BT下载规模之大令人震惊:仅在去年春天,Meta就通过Anna的档案从多个影子图书馆下载了至少81.7TB的数据,其中包括从Z-Library和LibGen下载的至少35.7TB的数据。”“Meta之前还从LibGen下载了80.6TB的数据,”原告在解封的文件中指出,其中提到Anna的档案时使用了“AA”的缩写。

解封的电子邮件还提到了互联网档案馆(Internet Archive)作为一个关键来源,尽管它不是一个典型的影子图书馆。该邮件概述了所取得的进展,并指出“种子较少”和“下载速度缓慢”带来了挑战。

版权担忧?Meta的员工并非没有意识到潜在的版权问题。根据解封的记录,一名员工表示:“我认为使用盗版材料应该超出我们的道德底线。”此外,公司内部还讨论了不使用Facebook基础设施进行BT下载,以“避免将种子/下载者追溯到Meta服务器”的风险。原告已经知道这些评论和引用,但现在它们已进入公共领域。它们揭示了更多内部讨论的内容,但对于Meta而言,这些BT下载指控并非改变游戏规则的因素。

Meta:合理使用。上周,Meta提交了一份动议,要求驳回作者关于“移除版权管理信息”的指控以及违反加利福尼亚州刑法第502条的指控,辩称这两项指控均未得到妥善陈述。Meta并未要求驳回版权侵权指控,但相信它可以在简易判决中“驳斥这一毫无根据的指控”。“原告没有提出任何一起实例来证明任何书籍的任何部分实际上被第三方通过Meta的BT下载,更不用说原告的书籍以某种方式被Meta分发,”该公司写道。这并不意味着Meta否认使用了影子图书馆,其论点是,根据美国版权法,使用此类数据来训练其大型语言模型(LLM)构成合理使用。

所有相关引用文件的副本均可通过Free.law的Courtlistener获得