Skip to main content


4µm 3D X-ray scans, infrared images, and hand-labeled ink masks for three broken-off fragments of papyrus. Each fragment scan is 7,000-14,000 .tif files. The 3D scans are in vertical slices from side to side. Both 54keV and 88keV volumes are a released for every fragment. These data are for use in training ink detection models.

Just like the scrolls, the four fragments are from the Institut de France and were scanned at the Diamond Light Source particle accelerator.

Fragment 1
Fragment 2
Fragment 3
Fragment 4

The idea is to train ML models on these fragments, since we have the ground truth data of where the ink is. Then, those ML models can be applied to the scrolls.

At a high level, training on a fragment works like this:

From a fragment (a) we obtain a 3D volume (b), from which we segment a mesh (c), around which we sample a surface volume (d). We also take an infrared photo (e) of the fragment, which we align (f) with the surface volume, and then manually turn into a binary label image (g). For more details, see Tutorial 4.

Data format

The fragment data is published on the data server on in the .volpkg format, which is the data format used by Volume Cartographer (learn more in Tutorial 3). It's the same for all the fragments:

  • /fragments/Frag{1,2,3,4}.volpkg/
    • /config.json: Metadata.
    • /volumes/: Two volumes each: one scan at 54keV, one at 88keV. Both have a 4µm voxel size.
    • /working/<name>.mp4: Video montage of the slices.
    • /working/reference/: Photos of the fragment (normal and infrared).

The 3D volumes of the fragments are sliced from the side:


Infrared light makes the ink more clearly visible:

Infrared photo of Fragment 1

Surface volumes

Just like with the segments in the scroll, we have made surface volumes for the fragments. This is necessary because even though they look flat from a top-down photo, the fragments are not completely flat.

We have also aligned (“registered”) the surface volumes with the infrared photos, and made a binary ink mask of where we think there is ink, in consultation with papyrologists.

  • /working/54keV_exposed_surface/: Data about the processed surface volume.
    • /surface_volume/*.tif: The actual surface volume of 65 layers.
    • /ir.png: Infrared photo, aligned with the surface volume.
    • /inklabels.png: Manually created binary labels for the aligned photo (ink vs no-ink).
    • /mask.png: Mask of where there is actually a surface (so you don't train on empty space).
    • /*.ply: Surface mesh at different stages of processing (manually cleaned up in Meshlab).
    • /alignment.psd: Photoshop file for the manual alignment of the infrared photo to the surface volume.
Middle layer (32.tif) of Fragment 1’s surface volume
Aligned infrared
Aligned binary ink labels