Skip to main content

Tutorial 1: Technical Overview

In these tutorials we share the current best tools and techniques for virtually unwrapping and reading carbonized papyrus scrolls without physically opening them.

Our expectation is that you will build on these techniques, improving the tools and models. But of course you may have better ideas, and are free to approach the Vesuvius Challenge any way you think will work!

The three steps in our process for reading a carbonized scroll are:

  1. Scanning: creating a 3D scan of a scroll or fragment using x-ray tomography
  2. Segmentation and Flattening: finding the layers of the rolled papyrus in the 3D scan and then unrolling them into a flattened "surface volume"
  3. Ink Detection: identifying the inked regions in the flattened surface volume using a machine learning model

Each of these steps has been shown to work, but to date no one has successfully put them all together and applied them to an unopened Herculaneum scroll to reveal whole passages of hidden text. That is your goal!

Virtually Unwrapping the En-Gedi Scroll

Before we dive into the Herculaneum papyri with their radiolucent ink, it's helpful to understand how the En-Gedi scroll was virtually unwrapped in 2015.

Here is an excellent 2 minute overview of how this was achieved:

For the Herculaneum papyri, many of the same steps apply, with one key change: the ink is not immediately visible to the naked eye. We will need to use a machine learning model to detect the ink.

Let's go through each of the key steps one by one.

1. Scanning

Input: physical scroll or fragment.
Output: 3D volume (.tif “image stack”).

If you've ever had a CT scan at a hospital, this is exactly the same process, except our scans were made in a particle accelerator and are much higher resolution.

Scanning involves taking hundreds to thousands of X-ray photographs of the object from different rotational angles. Typically this is done by having an X-ray source on one side of the object, and an X-ray camera on the other side, and rotating the object 360° on a platform.

A fragment rotating, with an X-ray source (from a particle accelerator) on one side, and an X-ray camera on the other side (source)

The X-ray photos are then combined into a 3D volume using one of a number of tomographic reconstruction algorithms. This is typically done by software that comes with the scanner. A volume is a 3D picture made up of 3D pixel cubes called voxels. The voxel size tells us the physical size of the cube, and the value stored in the voxel is an estimate of that location's radiodensity.

Artistic visualization of constructing a 3D volume; in reality the object rotates as it is scanned.

We store the 3D volume as a directory full of .tif files, where each file represents one horizontal crossection or "slice" of the object, typically starting at the bottom of the object and moving upwards. We call this a .tif image stack. In the case of our full scroll scans, each .tif file is 100MB. For the fragment scans, sizes range from 14MB to 40MB. For the flattened surface volumes, each .tif file can be up to 280MB.

Remember that each pixel in the image stack actually represents a cube of physical space. If your volume has a 10um voxel size, then 100 slices will give you 1mm (1000um) of the object.

.tif image stack
Scrubbing through the .tif images

Image stacks can be visualized using 3D volume rendering software. We will learn how to do this in “Tutorial 2: Scanning”.

3D volume of a proxy scroll
3D volume of a Herculaneum fragment, showing multiple layers of papyrus

2. Segmentation

Input: 3D volume (.tif “image stack”).
Output: 3D mesh (.obj).

The goal of segmentation is to identify and capture the 3D shape of each of the layers of the rolled papyrus scroll. Each individual surface in our 3D volume that we are able to identify is called a "segment."

Segmentation: finding a surface of papyrus.
We have to do this step both for scrolls and fragments:
  • Scrolls. We repeat this process many times for different internal surfaces.
    • Technically we could make one huge "segment" for the entire scroll, but the scroll wraps can be difficult to distinguish in practice, so we split it up into more manageable pieces.
    • Segmentation can be challenging, as different layers of papyrus can be damaged, distorted, or frayed. The carbonized papyrus blisters, and different layers can even fuse with each other.
  • Fragments. On fragments this process is a little easier, since they are already fairly flat and have an exposed surface on which we can actually see the ink. Still, the fragments are usually not completely flat, and can have "hidden layers" of papyrus attached underneath the visible layer.

We use an in-house tool called Volume Cartographer to manually annotate a surface on one of the slices from the image stack, and then the software extrapolates it along the z-axis to other slices.

The result is a 3D mesh (.obj file) called a “segment” which intersects the volume (i.e. the mesh coordinates are also volume coordinates).

3. Surface volumes

Input: 3D volume (.tif “image stack”) and 3D mesh (.obj).
Output: 3D “surface volume” around the mesh (.tif “image stack”).

To detect ink from the 3D X-ray scan, it is not sufficient to only examine the voxels which intersect our segment mesh; we also want to sample the voxels around the mesh:

  • downwards, “into the papyrus.” Ink might have seeped into the papyrus, so the voxels inside the papyrus might contain information about the presence of ink.
  • upwards, “above the papyrus.” Ink might be sitting on top of the surface, creating a small “hump” that might be detectable.

We might also not have traced the surface of the papyrus completely accurately during segmentation, so sampling voxels around the mesh also gives us some leeway.

Fortunately, this sampling approach can be a helpful optimization: the full 3D volume can be a huge amount of data (up to a terabyte), which is often not very practical to work with, but we only need the voxels which are close to our segment. We therefore do one additional step of data processing to create a new "subvolume" containing only the voxels in which we're interested.

“Extruding” the segment mesh to capture a subvolume. Every voxel inside this new mesh will be saved as a new volume.

We then "flatten" this subvolume into a new image stack, where each layer is a 2D image again. This process is similar to creating a map of the earth on flat paper: there are many different types of projections you can use, all of which have their own pros and cons.

Flattening of the subvolume.

The output of this process is a flattened 3D volume that has been sampled around the mesh, which we call a “surface volume”. This is again a .tif image stack, just like our original volume. However, it is much smaller than the original volume and more consistent since the papyrus always sits roughly in the middle of the volume.

The resulting “surface volume” is another .tif image stack.

In “Tutorial 3: Segmentation and Flattening” we’ll dive deeper into segmentation and virtual unwrapping.

4. Ground truth data alignment

Input: Raw infrared photo and 3D “surface volume” (.tif “image stack”).
Output: Aligned infrared photo and hand-labeled binary mask.

This step is only applicable for fragments, since we don’t have ground truth data for scrolls.

Once we have a surface volume containing a sheet of papyrus, we align the infrared photo to it, so it matches the surface as closely as possible. We have mostly done this manually. We use infrared photos because the ink has better contrast against the papyrus in the infrared spectrum.

The next manual step is to label where we believe there is ink, using the aligned infrared photo. Not all dark areas are ink: some are shadows, burn marks, or other types of damage. In cases where we aren't sure, we consult with papyrologists. The result of this process is a binary mask indicating where there is ink.

Unaligned and aligned infrared photos of a fragment, and the binary mask

It may not be strictly necessary to label the ink; you could instead learn to infer the infrared images from the x-ray data with no manual labeling. We have chosen to use binary labels to make it easier to quantify ink detection performance.

5. Ink detection

Input: 3D “surface volume” around the mesh (.tif “image stack”) and hand-labeled binary mask.
Output: Predicted ink mask.

We use machine learning models to detect ink, training them on ground truth data of fragments where we know the location of ink from the infrared photos.

Since the input is a “surface volume” consisting of several “slices” of information, the model can learn the different features of ink: its density; its thickness; whether it is sitting on top of the surface, has seeped into the papyrus, or both.

This is what our first progress prize is all about, and we go into great detail in “Tutorial 4: Ink Detection”.

6. Interpretation

Input: One or more predicted ink masks.
Output: Words, sentences, whole books, translations, journal papers, worldwide news coverage, eternal fame.

En-Gedi reconstruction of multiple fragments. Can you read it? (source)

Your work ends at ink detection. But for the world's papyrologists and classicists, this is where the excitement begins! Papyrologists can often extract more information than you might think. They are used to working with damaged, incomplete information, interpreting it, putting it into a historical context, and making history.