Skip to main content


Scanning sessions and data format are very similar to those of the full scrolls.

2019 Scans

Fragment 1 (PHerc. Paris. 2 Fr 47)
Fragment 2 (PHerc. Paris. 2 Fr 143)
Fragment 3 (PHerc. Paris. 1 Fr 34)
Fragment 4 (PHerc. Paris. 1 Fr 39)

3.24µm 3D X-ray scans, infrared images, and hand-labeled ink masks for four detached scroll fragments. Each fragment scan is 7,000-14,000 .tif files. Both 54keV and 88keV volumes are released for every fragment, though they are NOT aligned with each other. Fragment 4 was originally held back for automated scoring in the Kaggle competition but has since been released.

Just like the scrolls, the four fragments are from the Institut de France and were scanned at the Diamond Light Source particle accelerator. More technical details: “EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT”.

The fragments are really tiny! Here you can see them in context:

Françoise Bérard (Director of the Library at the Institute de France) holding a tray of fragments; Fragment 1 up close; a fragment vertically mounted for scanning at Diamond Light Source

2023 Scans

Fragment 5 (PHerc. 1667 Cr 1 Fr 3)
Fragment 6 (PHerc. 51 Cr 4 Fr 48)

Very similar to the 2023 scans of the scrolls: same voxel sizes and resolutions. As opposed to the 2019 scan volumes, the different volumes of the 2023 scans ARE aligned. More technical details: “EduceLab Herculaneum Scroll Data (2023) Info Sheet”.

Fragment 5 (PHerc. 1667 Cr 1 Fr 3) is from the same original scroll as Scroll 4 (PHerc. 1667), which was partially unrolled in 1987 using the Oslo method. Find this fragment on

Data format

The fragment data is published on the data server in the .volpkg format, which is the data format used by Volume Cartographer (learn more in Tutorial 3 and on the Scrolls page). It's the same for all the fragments:

  • /fragments/*.volpkg/
    • /config.json: Metadata.
    • /volumes/: Multiple volumes for the various resolutions and incident energies. For the 2019 scans (fragments 1-4) the different volumes are not aligned, for the 2023 scans (fragments 5-6) they are aligned.
    • /working/reference/: Photos of the fragment (normal and infrared).

The 3D volumes of the fragments are sliced from the side:


Infrared light makes the ink more clearly visible:

Infrared photo of Fragment 1

Surface volumes

Just like with the segments in the scroll, we have made surface volumes for the fragments. This is necessary because even though they look flat from a top-down photo, the fragments are not completely flat.

We have also aligned (“registered”) the surface volumes with the infrared photos, and made a binary ink mask of where we think there is ink, in consultation with papyrologists.

  • /working/54keV_exposed_surface/: Data about the processed surface volume.
    • /surface_volume/*.tif: The actual surface volume of 65 layers.
    • /ir.png: Infrared photo, aligned with the surface volume.
    • /inklabels.png: Manually created binary labels for the aligned photo (ink vs no-ink).
    • /mask.png: Mask of where there is actually a surface (so you don't train on empty space).
    • /*.ply: Surface mesh at different stages of processing (manually cleaned up in Meshlab).
    • /alignment.psd: Photoshop file for the manual alignment of the infrared photo to the surface volume.
Middle layer (32.tif) of Fragment 1’s surface volume
Aligned infrared
Aligned binary ink labels

ML training

The idea is to train ML models on these fragments, since we have the ground truth data of where the ink is (in addition to the newly discovered “crackle” method). Then, those ML models can be applied to the scrolls.

At a high level, training on a fragment works like this:

From a fragment (a) we obtain a 3D volume (b), from which we segment a mesh (c), around which we sample a surface volume (d). We also take an infrared photo (e) of the fragment, which we align (f) with the surface volume, and then manually turn into a binary label image (g). For more details, see Tutorial 5.