The Data
To download: Fill out the registration form and then visit the data server (LICENSE).
To learn more about the data, see the linked pages below. Also be sure to check out:
- EduceLab-Scrolls (2019): technical paper describing the original data.
- EduceLab Data Sheet (2023): technical paper describing more recent scans added to the dataset.
- Tutorials: what to do with the data.
- Our libraries to access data in 1-2 lines of code: in Python (with intro notebook) and in C!
Scrolls
Micro-CT scans of intact Herculaneum scrolls. The mission is to virtually unwrap the contents of the scrolls from the CT scans, revealing the text hidden within. Scroll 1 was used to win the 2023 Grand Prize, but 95% of the scroll remains unread!
Fragments
Micro-CT scans of detached scroll fragments. Since the fragments have exposed text on their surfaces, they can be used as ground truth for machine learning-based ink detection approaches (see Tutorial 5: Ink Detection).
Segments
Segmentation is the mapping of sheets of papyrus in a 3D X-ray volume. The resulting surface volumes can be used directly to look for ink.