Fragments
Scanning sessions and data format are very similar to those of the full scrolls. 3.24µm and/or 7.91µm 3D x-ray scans, infrared images, and hand-labeled ink masks for each of the six detached scroll fragments. 3D x-ray scan volumes of Fragments 5-6 are aligned, but Fragments 1-4 are NOT aligned.
Fragment 1 (PHerc. Paris 2 Fr 47)
Volume 20230205142449: 3.24µm, 54keV, 7219 x 20MB .tif files. Total size: 145 GB
Volume 20230213100222: 3.24µm, 88keV, 7229 x 24MB .tif files. Total size: 171 GB
Fragment 2 (PHerc. Paris 2 Fr 143)
Volume 20230216174557: 3.24µm, 54keV, 14111 x 46MB .tif files. Total size: 645 GB
Volume 20230226143835: 3.24µm, 88keV, 14144 x 43MB .tif files. Total size: 599 GB
Fragment 3 (PHerc. Paris 1 Fr 34)
Volume 20230212182547: 3.24µm, 88keV, 6650 x 20MB .tif files. Total size: 134 GB
Volume 20230215142309: 3.24µm, 54keV, 6656 x 18MB .tif files. Total size: 121 GB
Fragment 4 (PHerc. Paris 1 Fr 39) Originally held back for automated scoring in the Kaggle competition, this fragment has since been released.
Volume 20230215185642: 3.24µm, 54keV, 9231 x 23MB .tif files. Total size: 211 GB
Volume 20230222173037: 3.24µm, 88keV, 9209 x 24MB .tif files. Total size: 216 GB
Fragment 5 (PHerc. 1667 Cr 1 Fr 3) From the same original scroll as Scroll 4 (PHerc. 1667), which was partially unrolled in 1987 using the Oslo method. Find this fragment on Chartes.it.
Volume 20231121133215: 3.24µm, 70keV, 7010 x 13MB .tif files. Total size: 87 GB
Volume 20231130111236: 7.91µm, 70keV, 3131 x 3MB .tif files. Total size: 8.5 GB
Fragment 6 (PHerc. 51 Cr 4 Fr 8)
Volume 20231121152933: 3.24µm, 53keV, 8855 x 29MB .tif files. Total size: 253 GB
Volume 20231130112027: 7.91µm, 53keV, 3683 x 6MB .tif files. Total size: 21 GB
Volume 20231201112849: 3.24µm, 88keV, 8855 x 29MB .tif files. Total size: 253 GB
Volume 20231201120546: 3.24µm, 70keV, 8855 x 29MB .tif files. Total size: 253 GB
For more technical details, see EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT and EduceLab Herculaneum Scroll Data (2023) Info Sheet
The fragments are really tiny! Here you can see them in context:
Data format
The fragment data is published on the data server. It's the same for all the fragments:
/fragments/*.volpkg/
/config.json
: Metadata./volumes/
: Multiple volumes for the various resolutions and incident energies/working/reference/
: Photos of the fragment (normal and infrared).
The 3D volumes of the fragments are sliced from the side:
Infrared light makes the ink more clearly visible:
Surface volumes
Just like with the segments in the scroll, we have made surface volumes for the fragments. This is necessary because even though they look flat from a top-down photo, the fragments are not completely flat.
We have also aligned (“registered”) the surface volumes with the infrared photos, and made a binary ink mask of where we think there is ink, in consultation with papyrologists.
/working/54keV_exposed_surface/
: Data about the processed surface volume./surface_volume/*.tif
: The actual surface volume of 65 layers./ir.png
: Infrared photo, aligned with the surface volume./inklabels.png
: Manually created binary labels for the aligned photo (ink vs no-ink)./mask.png
: Mask of where there is actually a surface (so you don't train on empty space)./*.ply
: Surface mesh at different stages of processing (manually cleaned up in Meshlab)./alignment.psd
: Photoshop file for the manual alignment of the infrared photo to the surface volume.
ML training
The idea is to train ML models on these fragments, since we have the ground truth data of where the ink is (in addition to the newly discovered “crackle” method). Then, those ML models can be applied to the scrolls.
At a high level, training on a fragment works like this:
From a fragment (a) we obtain a 3D volume (b), from which we segment a mesh (c), around which we sample a surface volume (d). We also take an infrared photo (e) of the fragment, which we align (f) with the surface volume, and then manually turn into a binary label image (g). For more details, see Tutorial 5.