[ADDED book download] Notes on making an ebook copy




Reason for this post:

This post serves as part of an account of a major scanned book copies provider (or source of data leak) in China, 超星 (Chao Xing) or 读秀 (Du Xiu).

One can find the book mentioned below with an MD5 A665E0B6F83F9FE49120BE27F3A85902 for searching on libgen.

CS:APP 3e with scanned main text from the Chinese adaptation.

Uploader’s note:

There are a few versions of electronic copies of the third edition available so far on the internet:

  • An EPUB file, likely scraped from somewhere similar to Pearson eText;
  • Some PDF file converted from the aforementioned EPUB file, the pages of which have no resemblance to the print copy;
  • A truePDF of the Global Edition, which the original authors complained about its modifications in exercises and problems in the online errata: https://csapp.cs.cmu.edu/3e/errata.html ;
  • A PDF, with the size of around 35 megabytes, of scanned, monochrome page images, with color front and back cover.

The uploader made and uploaded this version to provide a reading experience similar to the original edition published in Norht America, with some tricks and tools to put several parts together.

This file combines the color covers as well as references and index pages from the 35MB PDF file, the front matter (preface) PDF pages from CS:APP website, and main text from the book’s adaptation in China.

About the Chinese adaptation:

Since book publishing in China is (nominally) restricted to those of the public sector, such foreign books are either sold directly or through importers, or as in this case, a domestic publisher obtains an authorization for an adaptation. The Chinese adaptation of CS:APP 3e, published by China Machine Press 机械工业出版社 in 2017, lacks certain sections of the original book, as what Pearson does in recent years, including in this case the English preface (replaced by a Chinese translation), list of references and index. But the main text remained mostly identical to the original North America edition.

Some other adaptations or reprints of adaptations (such as a reprint of Weiss’ Data Structure and Algorithm Analysis in C, an earlier adaptation published 2010 and a reprint in 2019) of Pearson textbooks in recent years, including those by China Machine Press and the Publishing House of Electronics Industory 电子工业出版社, among others, share similar issues. Some even redacted the original table of contents and a Chinese translation was put forward instead.

About the source of the scanned main text:

There is something called Chao Xing 超星 in China, which relies on libraries of colleges and public insitutions for book copies, and then scans the whole book for a digitized copy, many of which are for internal, proprietary use among those print copy providers. Some of these books are available online either within a group of higher education and research institutions or through access provided by public libraries, such as:

But such scanned data have had massive leaks in the past few years and in various forms, which enabled some third-party vendors to profit from the leaked database, by a single book (usually in the name of 代找, ‘find (the scanned copy) on one’s behalf’) or in bulk. Some even provide access to an even larger collection of past leaks, known as 读秀 (Du Xiu) 2.0, 3.0, 4.0, etc.

In fact, nearly ALL scanned copies I have posted on this forum so far are from the latter.

Such scanned copies have an SSID (SS means presumably ‘super star’, 超星) number, and the corresponding SSID for the adaptation used in this copy is 14679086.

More details on the history as well as tools used to make this file can be found at cnblogs.com/stronghorse (in Chinese) or download the executables of the English edition at My Files .

A brief description of making this copy:

A few tools were used to implement the tricks to resemble the original book. Here are the steps:

  • Extracting, renaming and reorganizing raw scanned image files.
  • Find the margins by inspecting and cropping the preface PDF.
  • Cut out the content, put it in a frame with the margins and other image processing using ComicEnhancerPro available from the aforementioned links to align the pages, in an attempt to smoothen the reading experience when switching between pages of different sources.
  • Make bookmarks of the covers and body text with bookcontents.dat using PdgCntEditor.
  • Combine and perform OCR of the processed front and back cover, the main text, references and index images using Pdg2Pic (no English version available).
  • Insert front matter PDF pages between the covers and the body text.
  • Edit the bookmarks of the combined PDF file with PdgCntEditor.