Skip to main content

Fragments

Scanning sessions and data format are very similar to those of the full scrolls. 3.24µm and/or 7.91µm 3D x-ray scans, infrared images, and hand-labeled ink masks for each of the six detached scroll fragments. 3D x-ray scan volumes of Fragments 5-6 are aligned, but Fragments 1-4 are NOT aligned.

Fragment 1 (PHerc. Paris 2 Fr 47)

Fragment 1 (PHerc. Paris. 2 Fr 47)

Volume 20230205142449: 3.24µm, 54keV, 7219 x 20MB .tif files. Total size: 145 GB

Volume 20230213100222: 3.24µm, 88keV, 7229 x 24MB .tif files. Total size: 171 GB

Fragment 2 (PHerc. Paris 2 Fr 143)

Fragment 2 (PHerc. Paris. 2 Fr 143)

Volume 20230216174557: 3.24µm, 54keV, 14111 x 46MB .tif files. Total size: 645 GB

Volume 20230226143835: 3.24µm, 88keV, 14144 x 43MB .tif files. Total size: 599 GB

Fragment 3 (PHerc. Paris 1 Fr 34)

Fragment 3 (PHerc. Paris. 1 Fr 34)

Volume 20230212182547: 3.24µm, 88keV, 6650 x 20MB .tif files. Total size: 134 GB

Volume 20230215142309: 3.24µm, 54keV, 6656 x 18MB .tif files. Total size: 121 GB

Fragment 4 (PHerc. Paris 1 Fr 39) Originally held back for automated scoring in the Kaggle competition, this fragment has since been released.

Fragment 4 (PHerc. Paris. 1 Fr 39)

Volume 20230215185642: 3.24µm, 54keV, 9231 x 23MB .tif files. Total size: 211 GB

Volume 20230222173037: 3.24µm, 88keV, 9209 x 24MB .tif files. Total size: 216 GB

Fragment 5 (PHerc. 1667 Cr 1 Fr 3) From the same original scroll as Scroll 4 (PHerc. 1667), which was partially unrolled in 1987 using the Oslo method. Find this fragment on Chartes.it.

Fragment 5 (PHerc. 1667 Cr 1 Fr 3)

Volume 20231121133215: 3.24µm, 70keV, 7010 x 13MB .tif files. Total size: 87 GB

Volume 20231130111236: 7.91µm, 70keV, 3131 x 3MB .tif files. Total size: 8.5 GB

Fragment 6 (PHerc. 51 Cr 4 Fr 8)

Fragment 6 (PHerc. 51 Cr 4 Fr 48)

Volume 20231121152933: 3.24µm, 53keV, 8855 x 29MB .tif files. Total size: 253 GB

Volume 20231130112027: 7.91µm, 53keV, 3683 x 6MB .tif files. Total size: 21 GB

Volume 20231201112849: 3.24µm, 88keV, 8855 x 29MB .tif files. Total size: 253 GB

Volume 20231201120546: 3.24µm, 70keV, 8855 x 29MB .tif files. Total size: 253 GB

For more technical details, see EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT and EduceLab Herculaneum Scroll Data (2023) Info Sheet

The fragments are really tiny! Here you can see them in context:

Françoise Bérard (Director of the Library at the Institute de France) holding a tray of fragments; Fragment 1 up close; a fragment vertically mounted for scanning at Diamond Light Source

Data format

The fragment data is published on the data server. It's the same for all the fragments:

  • /fragments/*.volpkg/
    • /config.json: Metadata.
    • /volumes/: Multiple volumes for the various resolutions and incident energies
    • /working/reference/: Photos of the fragment (normal and infrared).

The 3D volumes of the fragments are sliced from the side:

/fragments/Frag1.volpkg/volumes/20230205142449/4000.tif

Infrared light makes the ink more clearly visible:

Infrared photo of Fragment 1

Surface volumes

Just like with the segments in the scroll, we have made surface volumes for the fragments. This is necessary because even though they look flat from a top-down photo, the fragments are not completely flat.

We have also aligned (“registered”) the surface volumes with the infrared photos, and made a binary ink mask of where we think there is ink, in consultation with papyrologists.

  • /working/54keV_exposed_surface/: Data about the processed surface volume.
    • /surface_volume/*.tif: The actual surface volume of 65 layers.
    • /ir.png: Infrared photo, aligned with the surface volume.
    • /inklabels.png: Manually created binary labels for the aligned photo (ink vs no-ink).
    • /mask.png: Mask of where there is actually a surface (so you don't train on empty space).
    • /*.ply: Surface mesh at different stages of processing (manually cleaned up in Meshlab).
    • /alignment.psd: Photoshop file for the manual alignment of the infrared photo to the surface volume.
Middle layer (32.tif) of Fragment 1’s surface volume
Aligned infrared
Aligned binary ink labels

ML training

The idea is to train ML models on these fragments, since we have the ground truth data of where the ink is (in addition to the newly discovered “crackle” method). Then, those ML models can be applied to the scrolls.

At a high level, training on a fragment works like this:

From a fragment (a) we obtain a 3D volume (b), from which we segment a mesh (c), around which we sample a surface volume (d). We also take an infrared photo (e) of the fragment, which we align (f) with the surface volume, and then manually turn into a binary label image (g). For more details, see Tutorial 5.