Zoroastrian Middle Persian: Digital Corpus and Dictionary

The Middle Persian language played a prominent historical and cultural role in the first millennium CE as the official language of the Sasanian Empire, with a usage spanning several religious traditions. Its texts link East and West in both linguistic and cultural terms, and cover a period stretching from late antiquity to the early Islamic period. Despite this, there is no comprehensive digital database for this language, nor is there a comprehensive lexicographical tool covering the full variety of its vocabulary throughout the long period of its existence.

As a first step towards this eventual goal, the present project aims to create an online open-access corpus of all Zoroastrian Middle Persian (henceforth: ZMP) texts in the Pahlavi script. The project will present a corpus of around 54 texts, containing some 687,000 words in transliteration and transcription, as well as digital photographic documentation of the 15 oldest codices. The texts will be supplied with morphological and partial syntactical annotation, and encoded according to the guidelines developed by the “Text Encoding Initiative” (TEI). This comprehensive digital corpus of Pahlavi texts will in turn be used as a basis for the creation of a digital Middle Persian-English dictionary of ZMP, comprising an estimated 7,000 lemmata. It is our hope that we will subsequently be able to expand our work into related projects to include and create corpora and dictionaries of other types of Middle Persian texts.

The digital corpus and the ensuing digital dictionary constitute two closely interlinked analytical instruments. They focus on two closely connected but separate aspects of the texts, syntax and semantics, which are linked together in the work of the project. A web-based working environment will be used, which will enable the collaborative processing of both corpus and dictionary and will serve as a user interface for research and analysis of the prepared resources. Moreover, the project aims to make the corpus of Pahlavi literature accessible to the analysis and methods of corpus linguistics developed in the Digital Humanities.

The project will adopt a comprehensive new approach and methodology for texts written in ZMP, thereby creating a common basis for comprehensive analysis of both linguistics and conceptual history. This approach also adopts a perspective that highlights ‘horizontal’ (i.e. genre) as well as ‘vertical’ (i.e. historical) differences between texts, both in the corpus and the dictionary. The project is thus conceived as a basis for identifying internal and external factors in the complex fabric of ZMP literary texts, and for providing an adequate means for differentiated analysis of cultural, religious and social history.

A final aim of the project is to bring about links and interactions between the present endeavor and other projects, whether completed or ongoing, in the field of Old and Middle Iranian Studies.


04/2021 – 03/2024 (03/2030)

Funded by


Affiliated Persons

Photograph of Prof. Dr. Kianoosh Rezania

Prof. Dr. Kianoosh Rezania

Principal Investigator

Universitätsstr. 90a
44789  Bochum
Office 1.10
+49 234 32-21979
Photograph of Dr. Iris Colditz

Dr. Iris Colditz

Research Associate

Universitätsstr. 90a
44789  Bochum
Office 1.09
+49 234 32-28272

Dr. Slavomír Čéplö

Research Associate

Universitätsstraße 90a
44789  Bochum
Office 1.09
+49 234 32-21287
Photograph of Dr. Thomas Jügel

Dr. Thomas Jügel

Research Associate

Universitätsstr. 90a
44789  Bochum
Office 1.09
+49 234 32-22382
Photograph of Seyyedehfatemeh (Raha) Musavi PhD

Seyyedehfatemeh (Raha) Musavi PhD

Research Associate


Narjes Eskandarnia

Research Assistant

Universitätstr. 90a
44789  Bochum
Office 1.12