Skip to main content

The Data

full-scrolls/
5.5TB
8µm 3D X-ray scans of two intact scrolls (top halves only), scanned in horizontal slices from bottom to top. Each half scroll scan is 14,000 .tif files, 120MB each. Each slice is 8µm tall, so this scroll half is 11.2cm tall. Both were scanned at 54keV, though we also released a smaller slice of Scroll 2 at 88keV. These are the scans you need to read to win the Grand Prize. See the meta.json files for details.
Scroll 1
Scroll 2
We are working on creating 3D segmentations of the scrolls (see Tutorial 3: Segmentation and Flattening), which is available alongside the raw scroll data on the data server (see Data Organization for details).
fragments/
1.8TB
4µm 3D X-ray scans, infrared images, and hand-labeled ink masks for three broken-off fragments of papyrus. Each fragment scan is 7,000 - 14,000 .tif files. The 3D scans are in vertical slices from side to side. Both 54keV and 88keV volumes are a released for every fragment. These data are for use in training ink detection models and entering the Kaggle competition.
Fragment 1
Fragment 2
Fragment 3
Kaggle data
37GB
Post-processed version of the “fragments/” data above. Two operations were done on this data:
  • 3D X-ray scans (54keV) were transformed into “surface volumes”, as described in the tutorials.
  • Infrared photos were aligned with these surface volumes, and binary ink masks were created to denote the presence of ink.
Files used for post-processing the fragments (such as the segment files) can be found alongside the fragments on the data server.
campfire.zip
338MB
X-ray scans and images of the Campfire Scroll (used in tutorials).

Campfire scroll

To learn more about the data, read the data paper “EduceLab-Scrolls: Verifiable Recovery of Text from Herculaneum Papyri using X-ray CT”.

How to Download the Data

Fill out the registration form and you will be provided with a download link automatically. The license terms of the data are specified in the form.

The tutorial data (campfire.zip) is available to download without registering.

If you are just entering the Kaggle competition and not working on the grand prize, you do not need to download any data from this page; all the relevant fragment data is available via Kaggle.

Where did the data come from?

The two full scrolls are from the Institut de France and were scanned at about 8µm resolution at the Diamond Light Source particle accelerator. We are only releasing data from the top halves of these scrolls (they were standing on end when they were scanned). We will work with you to apply your techniques to the bottom halves in order to validate your submission.

Scroll 1
Scroll 2

At 8µm resolution, the data files are big. We believe this resolution is necessary to detect ink, as suggested by the excellent paper From invisibility to readability: Recovering the ink of Herculaneum. Each .tif file in the full scroll sans is 8 micrometers tall so if you want to grab a centimeter of them from the middle of the scroll, you can just download 1,250 of them from the middle of the scan. We provide scripts for doing this once you've registered.

The fragments of detached papyrus were scanned at 4µm using the same particle accelerator. They are very tiny!

Francoise Berard, director of the library at the Institute de France, with the fragments just before they were scanned; and the research group in the scanning room of the Diamond Light Source particle accelerator.

What does the data look like?

A typical .tif file from the scrolls look like this, giving a top-down view:

full-scrolls/Scroll1.volpkg/volumes/20230205180739/07000.tif (cropped)

The fragments are sliced from the side:

fragments/Frag1.volpkg/volumes/20230205142449/4000.tif

Watch a video of a scroll:

The Monster Segment (in Scroll 1) (see the Data Organization page for more details):

Monster Segment texture
Location of the Monster Segment in the scroll