Princeton University Library Systems

Departmental Blog

Valkyrie

We have been working on Plum for the last two years and in that time we have constantly struggled with performance for bulk loading, editing, and data migration. We have fixed some problems, and found ways to work around others by refactoring and/or disabling parts of the Hyrax stack. However, our performance is still not acceptable for production. For example, when we edit large books (500+ pages), the time to save grows linearly with the number of pages, climbing to a minute or more. Bulk operations on a few hundred objects can take several hours. We want a platform with robust, scalable support for complex objects, especially large or complicated ones that cannot be served by existing platforms. We have a number of books with more than 1000 pages, and book sets with dozens of volumes, and we need to support their ingest, maintenance, access, and preservation. We are hopeful that the Fedora API specification and alternative implementations, such as Cavendish, Derby, and Trellis will address these problems. However, these will take time to mature, and none are ready to use today. Additionally, it is not clear whether any of these implementations has the long-term or institutional support required for such critical software. Finally, even if any of these options were available now, the Hyrax stack is complicated by its reliance on two or three persistence backends, and keeping them synchronized.

As software developers, we should always strive to improve our software and address use cases in a sustainable way. With this in mind, we have been working on something new, a framework that can meet our performance needs and address the complexities of our requirements at scale. As the Samvera community continues to grow, flexibility and sustainability are critical to our success. One size will never fit all. In the spirit of community, the Hyrax stack must prioritize flexibility, scalability, and sustainability.

This is why we have started work on Valkyrie, a framework for introducing alternative persistence strategies into Hyrax-based repositories. Valkyrie began as an exercise to separate, abstract, and define the persistence requirements of Hyrax, and we have found that performant backends open up the possibility of creating a simpler repository application without the workarounds needed to avoid problems with the existing Fedora implementation.

Please join us!

We would love your help in developing, testing, and documenting Valkyrie and experimenting with different approaches, with the goal of making Hyrax more sustainable going forward. Coming up next week is a Data Mapper Working Group sprint that you can join to work on Valkyrie. We will be participating in the sprint, and working to port Plum to Valkyrie.

If Fedora is working for you, you can use Valkyrie with Fedora to have more control over your persistence strategy, and less application logic baked-in. If you want to try a different backend, Valkyrie will let you configure alternative persistence options to store metadata in Solr, Fedora, PostgreSQL, and/or in memory. We expect this list to grow over time, and Valkyrie includes infrastructure like shared specs to make it easier to add support for other persistence options. Valkyrie also lets you combine multiple backends, so you could store your data in both Fedora and PostgreSQL, if that makes sense for your application and infrastructure.