In 2023 Vesuvius Challenge made a breakthrough, extracting more than four passages of never-before-seen text from inside an unopened (and unopenable) carbonized scroll. We have proven techniques for virtually unrolling the papyrus scroll and recognizing the ink using machine learning. It wasn’t clear it was possible until we did it. This was stage one.
Our next step is to scale these techniques up so that we can read entire scrolls, and to figure out an efficient scanning protocol to allow us to scan and read the 300 extant scrolls, mostly in Naples. Two key technical problems need to be solved: segmentation at scale, and scanning at scale.
Segmentation at scale
The current bottleneck is tracing the papyrus surface inside the scan of the scroll (we call this “segmentation”). Currently we use manual tracing aided by various algorithms. This is quite expensive – about $100 per square centimeter. We spent about $200,000 so far to trace enough material to read the fifteen partial columns of text that were revealed in 2023.
Full scrolls are 10cm-20cm wide and up to 15 meters long. With current techniques it could cost $1-5 million to unroll an entire scroll. Given that there are 300 scrolls that need to be read, it could cost hundreds of millions or even more to unroll all of them. Clearly impractical. Also, there are parts of the scrolls that are so compressed, current techniques cannot unroll them at all.
A breakthrough is needed in segmentation. So in stage two, we are going to focus on solving auto-segmentation. We believe it’s possible to bring the cost of segmenting an entire scroll to $5000 or below. It might even be possible to fully automate it.
Our approach to solving segmentation at scale will be to continue to leverage the community through a series of open source “progress prizes” we will award throughout the year, while hiring in full-time or part-time roles the most productive contributors to do the needed deep work. We will set as our target for 2024 to read 90% of Scrolls 1-4, and offer a $100,000 grand prize to the first team to achieve this milestone.
Scanning at scale
Furthermore, each scan currently requires the use of a particle accelerator in England and conservator-supervised transportation of the scrolls two at a time from Naples in custom-made 3d-printed cases. This costs about $40k/scroll with current techniques, and is also subject to availability of beam time. Total cost to scan all 300 scrolls could be $30M with current techniques (at current prices).
A breakthrough is therefore also needed in scanning. We believe we can install one or more benchtop scanners in situ and scan the scrolls without removing them from the building. The benchtop sources will be slower but can run every day in parallel, probably enabling us to scan all the scrolls within a few years. We don’t know that this will work, but we suspect it will be possible to get the resolution that we need from a benchtop source. The only way to find out is to try.
We may also be able to devise a lower-cost scanning protocol using the particle accelerator, in the best case bringing the total cost to scan all the scrolls to $5M or below.
In 2024 we will explore both approaches to scanning at scale, including identifying the lowest-cost scanning protocol at the particle accelerator, negotiating for bulk rates at the particle accelerator, and testing an in situ scanner where the scrolls are housed. We will use full-time team members for this work.
In stage 2, we plan to pursue both the segmentation and scanning breakthroughs in parallel. Along the way, we’ll work with our community to make incremental improvements to ink detection and ensure that our process works on multiple scrolls.
By the end of stage two, our hope is to have read at least one entire scroll, and to be ready to start stage three. We think we can do all of this in 2024.
Once the segmentation breakthrough and the scanning breakthrough are in place, we’ll need to systematize and staff the scanning, segmenting, and reading pipeline. The best way to do this is with a full-time team of engineers and technicians; very likely the same set of people we will be hiring for stage 2, totaling about five employees. We think this stage will take about 3 months.
Once the pipeline is in place, our task is to read every scroll in the collection. We expect that scanning and reading all the remaining 300 scrolls can be done in 2-3 years, depending on what we learn about the maximum speed of scanning, and depending on our ability to secure access to run the benchtop scanners, or alternatively 50-200 days of beam time at one or more appropriate particle accelerators.
The final stage of the Vesuvius Challenge is inspiring the continued excavation of the Villa dei Papiri, and recovering in full the only surviving library from the ancient world. It is a near-certainty that there are more scrolls waiting for us in the dirt. Perhaps just a few, but there could be thousands of them.
Excavation is very expensive, but we expect this to be largely a political effort. Our hope is that the output of stages two and three above – previously unseen books from antiquity – will catalyze the will necessary to begin digging. If it does not, however, we will do whatever we can to make it happen.
We believe Stage 2 will cost $1-2M. Thanks to a generous donation of $2,084,000 from the Musk Foundation, this stage is now fully funded, and we have a little extra money from other donations to more aggressively explore scanning at scale, and even scan some more scrolls. Our eternal gratitude to all donors!
If the benchtop source works, or an efficient particle accelerator protocol can be devised, we believe Stage 3 will cost $4-8M. If it doesn’t, Stage 3 will cost $15M+, depending on the cost of beam time we are able to negotiate.