Skip to main content

Scrolls

The scroll data is from two scanning sessions:

  • 2019 scans, released at the start of the competition in March 2023.
  • 2023 scans, released in the fall of 2023 (announcement) on an ongoing basis, as the data is being processed.

2019 Scans

Scroll 1 (PHerc Paris 4)
Scroll 2 (PHerc Paris 3)

7.91µm 3D X-ray scans of two intact scrolls (top halves only), scanned in horizontal slices from bottom to top. Each half scroll scan is 14,000 .tif files, 120MB each. Each slice is 7.91µm tall, so this scroll half is 11.2cm tall. Both were scanned at 54keV, though we also released a smaller slice of Scroll 2 at 88keV. These are the scans you need to read to win the Grand Prize. More technical details: “EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT”.

The two scrolls are from the Institut de France and were scanned at about 7.91µm resolution at the Diamond Light Source particle accelerator. Currently the top half of each scroll is released (they were standing on end when they were scanned).

2023 Scans

Scroll 3 (PHerc 0332)
Scroll 4 (PHerc 1667)

New scans from 2023 (announcement). 3.24µm and 7.91µm 3D X-ray scans of two partially unrolled scrolls, scanned in horizontal slices from bottom to top. PHerc 1667 has an accompanying fragment. More technical details: “EduceLab Herculaneum Scroll Data (2023) Info Sheet”.

The scrolls are from the Officina dei Papiri Ercolanesi, Biblioteca Nazionale di Napoli Vittorio Emanuele III in Naples, Italy. Scanning was done at the Diamond Light Source particle accelerator, on the I12 beamline, using optical modules 2 and 3, which have pixel sizes of 7.91 µm and 3.24 µm, respectively. We scanned with monochromatic incident energies of 53 keV, 70 keV, 88 keV, and 105 keV. This means we have a total of 2*4=8 volumes for each scroll.

.volpkg format (used for both scrolls and fragments)

Both the scroll and fragment data is published on the data server on in the .volpkg format, which is the data format used by Volume Cartographer (learn more in Tutorial 3).

A .volpkg needs to have the following files and directories, otherwise Volume Cartographer will crash. You can, however, add more files or directories.

  • *.volpkg/
    • /config.json: Contains some metadata, such as the name and the thickness of the papyrus.
    • /volumes/: The actual CT scans. Each volume is an image stack of .tif files. The different volumes in a .volpkg may be CT scans of the same area, or of completely different areas. For different volumes of the same area, they may be aligned with each other (“registered”), or not.
      • /<id>/meta.json: Contains metadata about this particular volume.
      • /<id>/<number>.tif: The actual image stack itself.
    • /renders/: Where rendering data will be stored. May be empty.
    • /paths/: This is where segments are stored (see next chapter: Segments). May be empty.
    • /working/ (optional): We typically put miscellaneous related data in this directory.

A typical .tif file from the scrolls look like this, giving a cross-section view:

full-scrolls/Scroll1.volpkg/volumes/20230205180739/07000.tif (cropped)

Data server

You can find the full scroll data on the data server in the /full-scrolls/ folder.

  • Scroll 1 (PHerc Paris 4): The scroll for which we have by far the most segments, and in which the first letters have been discovered and the 2023 Grand Prize was claimed. Half of the scroll has been released in single 54keV volume.
  • Scroll 2 (PHerc Paris 3): Has proven harder to segment. We do have a 88keV “slab” (partial volume) in addition to the 54keV main volume. Also, the main volume has a scanning artifact in the middle of the volume.
  • Scroll 3 (PHerc 332): Canonical volume is 3.24µm, 53keV. Other volumes will become available as processing finishes. All volumes will be aligned to the canonical volume. Also has raw HDF files available, from before “windowing” the raw values to .tif integer values (see this FAQ item).
  • Scroll 4 (PHerc 1667): Canonical volume is 3.24µm, 88keV. Otherwise similar to Scroll 3. The volume with 3.24µm, 53keV had slight data loss during scanning, and so is not as good.

At 3.24µm and 7.91µm resolution, the data files are big. We believe this resolution is necessary to detect ink, as suggested by the excellent paper From invisibility to readability: Recovering the ink of Herculaneum.

This is a video montage of Scroll 1:

Community-contributed data

Each of the scroll directories also contains community-contributed data. These are used by various tools, for example to load data more efficiently based on which area of the scroll you’re looking at.
  • /volumes_small/: 10x smaller “thumbnails” of the slices.
    • /<volume_id>/{*.tif,meta.json}: Image stack of the 10x smaller volume, e.g. for use in Volume Cartographer. Generated using build_small.
    • /<volume_id>_small.tif: The same data, but in a single tif file with multiple layers. Generated using build_small_volume.
    • /<volume_id>_volumes_small_axis{1,2}/*.png: The same data but across the two orthogonal axes, and as pngs instead of tifs. Generated using perpSlices.py
  • /volumes_masked/: Same as /volumes/, but with non-scroll data blacked out, and the files compressed. About 2x smaller than the original volumes, though may take longer to load in software due to the compression. See this repo for more details (and this thread for details about the compression vs performance tradeoff).
  • /volume_grids/<volume_id>/cell_yxz_*.tif: The volume split into “cells” of 500x500x500 voxels. Generated using build_grid_layer.

One community member, Ahron Wayne (@WayneWayneHello on Discord), has bought a benchtop CT scanner and has been making various scans with them, with the goal of making a simulated scroll (a “campfire scroll”) that can be used as ground truth data. His scans can be found in /waynewaynehello-uploads/.

One of @WayneWayneHello's scans