How PAHA was made
Homepage: https://paha.site/
Repository: The PAHA digital repository is built and managed using Omeka S, an open source digital collections management and web publishing platform. The Omeka S installation and the repository data is hosted on Linode servers for the period of 24 months.
Point of Contact: contact.paha@gmail.com, Dr. Sarah Melsens. Email: sarah.melsens@flame.edu.in, Dr. Maya Dodd. Email: maya@flame.edu.in, Dr. Pushkar Sohoni. Email: pushkar.sohoni@iiserpune.ac.in
Dataset Summary: This dataset contains digital reproductions of roughly 7500 items from the private archives of building patrons, architects, engineers, and general contractors in the city of Pune, as well as records and transcripts of interviews with concerned persons.These so far undisclosed collections contain a wealth of valuable material on Pune’s built environment, including records pertaining to the city’s movie theaters, educational institutes, pioneering housing projects and much more. Digitization of these datasets safeguards architectural drawings, photographs of buildings (during and after construction), promotional material and correspondence, which are rapidly disappearing and at risk of being damaged. Classification and metadata description makes the material available for use. The material and the interviews not only document the modern heritage and historical evolutions in construction processes but also reveals the contributions and agency of local actors in the creation and appropriation of the city. The resulting online repository will be of interest to a diverse audience of architecture students, historians and scholars and citizens of Pune and diaspora.
After an initial unsuccessful grant application to the Modern Endangered Archives Program submitted by Sarah Melsens (FLAME University/CNRS), Pushkar Sohoni (IISER Pune), and Chetan Sahasrabudhe (BNCA, Pune) in 20xx, the initiators reached out to Prof. Maya Dodd, Professor in Digital Humanities at FLAME University. Prof. Maya Dodd, together with Prof. Mayurakshi Chaudhuri and Prof. Chiranjoy Chattopadhyay, were able to secure an Extramural Research Project grant (2022/EXT/002) from FLAME University for the research ‘Imaging and imagining Pune’s Architectural History’, which ran from 1 April 2022 to 31 March 2025. This grant funded the digitization and description of the archive.
Languages: English, Marathi.
Composition: The physical collection consists of architectural drawings, books, valuation report notebooks, photographs and photo albums, portfolios, brochures, oral history interviews and so on. The digitized collection is present in three types of media – images, documents and audio-visual multimedia files. File formats of the stored data are – TIFF, PNG, JPG, PDF, DOCX, MP4, WAV, MOV, MP3. Architectural drawings comprise the majority of the collection, which are digitized and stored as images in TIFF format.
Descriptive Statistics: Total 7499 objects were digitized (excluding oral history interviews). Out of which 7090 are architectural drawings, 72 are books and reports, 21 photo albums and 280 photographs. Total size of the digitized collection on the disk is [1.6 TB]
COLLECTION, CURATION, AND DIGITIZATION PROCESS
1. Curation Rationale – This project was born out of Dr. Sarah Melsens’s PhD work on the history of 20th century architectural development in the city of Pune. The physical materials in private collections that Dr. Melsens had identified and analyzed for this work between 2014 and 2020 not only contained valuable and previously undisclosed records on Pune’s built environment but were under risk of being damaged or permanently lost. Therefore, this project to digitize and preserve/safeguard these valuable records in the form of a digital archive accessible to researchers and general public alike was conceptualized, together with the expertise of Dr. Maya Dodd from FLAME University, Pune and Dr. Pushkar Sohoni from IISER, Pune. As a part of FLAME University’s initiative for building a public history of Pune city, the project received funding through the Extramural Research Project grant (2022/EXT/002), which ran from 1 April 2022 to 31 March 2025. The primary aim of this project is to make these materials that document the modern built heritage and the evolution in construction processes and agency of the local actors, available to diverse audiences and facilitate the academic and public engagement to develop a new understanding of collective public history of the Pune city through the lens of modern architectural practices.
2. Source Data – The original physical collections were sourced from their current owners who are either individual descendants of the collection owners or the architectural firms responsible for its creation. The original documents were created or produced by a diverse set of actors. In the case of architectural drawings, which comprise the majority of the collection, they were created by individual architects, draftsmen or engineers. In most cases, their names are available on the individual drawings. That is, however, not the case with photographs, in which case, name of the photographer is available for only for select few sub-collections. The books and publications were produced by their respective authors, editors or publications and the valuation reports were produced by valuers. All the physical materials were sourced only for the purpose of digitization and were returned to their current owners after the digitization. PAHA is thus a post-custodial archive. The interviews were conducted by team members of PAHA and MA Architectural History students of CEPT University, Ahmedabad. All interviews have co-ownership of both the interviewers and interviewees.
3. Digitization Pipeline – Preservation through digitization was necessitated by the deteriorating condition of the materials. The digitized materials would be made available publicly through an online repository built using Omeka S. Hence, the digitization serves the dual purpose of conservation and research by making the collection freely available to scholars and students of architectural history.
Sorting and curating the materials to be digitized began in early 2023, when materials were carefully selected and counted at the physical location where they were stored. The selection of the items to be digitized was influenced by time and budgetary limitations. Only items pertaining to building projects from the time period before 1990 and Pune city were selected for digitization as well as items shedding light on the general operations of the firm such as photographs and library books. Correspondence or job files were not digitized due to time and budgetary constraints though some were present in the office archives. A third-party digitization service provider was contracted to complete the digitization task. The actual digitization process was carried out over the 5 months from August 2023 till December 2023. Graphtec CSX350 and HP DesignJet T830 flatbed scanners were utilized to scan the majority of the documents. The measurements of each document were taken before scanning them and were noted down in their respective filenames in mm. Care was taken to preserve the original sequence of the documents in which they were kept in their physical folders or containers. This sequence is captured in the file name of each digitized item (see further). All the architectural drawings were scanned in high resolution 600dpi and stored in TIFF format. Few drawings that were larger than the size of the scanner or that were heavily damaged or torn were instead digitized using an overhead photography method. Sony Alpha 7R III was used to digitize these drawings. Along with them, all the photographs and photo albums were also digitized using the same overhead photography process. They were digitized in 350dpi resolution and stored in RAW format. Tiffen colour separation guides were placed beside the photographs while digitizing, for colour referencing. The RAW files were later cropped, readjusted and converted to be saved as TIFF files. The books and binded reports were scanned using a tabletop book scanner with a resolution of 300dpi.
Naming of digital files
PAHA has designed a unique file naming system to label the digitized materials. The aim of the file naming system is to record the original location and job number of items in the office archives from which they were obtained. It is divided into 5 fields, which are defined by unique codes. Each field is separated by ‘_’ mark. The detailed guide can be accessed here.
Explanation of the 5 fields:
Field 1 – name of the collection. There are a total of 10 main collections. Each collection is denoted by three letters.
Field 2 – This field denotes whether the media can be classified under a particular building project i.e. through its job no. (j), or whether it was part of a library (l), recently recorded oral history (o) or (m) miscellaneous. Only in the case of a job. nos and oral history, a specified letter followed by a code in brackets is denoted. For instance, in the case of architectural drawings of a specific building site, the code will be the original job no. which is present on the drawings. Identification was done based on annotations in archival objects or by referring to the project indexes/job lists of the collection. In case it was not present and the drawing cannot be assigned to or is not part of any known job no., the letter ‘x’ was used in brackets instead of the job no.
Field 3 – This field indicates the type of media (files, drawings, photos, etc.) by a letter, followed by a PAHA code referring to the original location or label / cat. no. of the item in brackets if available. Refer to the file naming convention to see the list of types and their codes. Similar to field 2, it is denoted by a specific letter code and the available number in brackets. The folder and album labels mentioned in the code refer to the physical folder or album from which the drawing or photo is retrieved. PAHA has assigned unique numbers to identify containers in the original collection. This list is provided in the ‘container label legend’ tab in the file naming spreadsheet.
Field 4 – It denotes the sequence of the document in which it was scanned/digitized. This gives not only the original sequence of physical documents within a container but also the total number of documents in each container.
Field 5 – It is an optional field, to be used in case of a single document scanned in multiple parts or a drawing or photo which has content on both front and back sides. This field is denoted by letters a, b, c… as required
Data Provenance: The provenance of the data in PAHA collection is as follows:
Collection name |
Name(s) of owner originals |
Pundlik & Pundlik Collection |
Avinash Pundlik |
Space Designers’ Syndicate Collection |
Zuber Shaikh |
Architects United Collection |
Uma Ghotge, Deepa Ghotge, Sachin Mujumdar, Namita Mujumdar |
V.V. Architects Collection |
Vishwakumar Vishwanath Badawe Satyajit Badawe (son) |
U. M. Apte & Sons Collection |
Ashwini Abhyankar |
Prakash Apte Collection |
Ashwini Abhyankar |
Sant Ghoting Dongre Collection |
Harsha Dongre |
Sharad Shah Collection |
BNCA college of architecture |
V. V. Tamhane Collection |
Ashish Tamhane |
Kishor Merchant Collection |
PAHA (donated by Kishor Merchant) |
For additional contact information on owners, email us on contact.paha@gmail.com
- CLASSIFICATION AND DESCRIPTION OF THE ARCHIVE
Archival Collection Structure:
PAHA follows a 4-level hierarchy for the arrangement of its data or media. Each media item is part of following nested group:It is as follows:
1A.Archival collection: Grouping based on the source (Private collection from which the media-data originates). There are currently a total of 10 Archival Collections in the PAHA dataset. Each archival collection refers to an individual or firm who holds custody of the physical items that have been digitized and/or with whom interviews have been conducted.
2A/Building project collection: Group of all building projects that have been identified within a particular collection
3A Building project group - Group of multiple building projects that are part of a single scheme or institution
3B/ Building project: Group of all media-data that deals with a singular building project
4/ Media item: photographs, drawings, etc.
2B/Miscellaneous Media: Group of all media-data from the collection that can not be associated with any specific building project such as job lists, library books, etc.
3/ Media item: correspondence, publication etc.
2C/Interviews: Group of all interviews that can be related to a particular collection. Data related to interviews such as recordings and transcripts
3/ Interview: Group of all media related to one particular interview such as transcripts, notes, recordings
1B. Oral History Collection: Group of all oral histories items (audio or video), including those that are not associated with an archival collection.
Metadata Structure: Pune Architectural History Archive (PAHA) utilizes a custom metadata template based on Dublin Core metadata scheme to describe the materials in its collections. Where required, fields as defined in the Dublin Core scheme have been renamed to suit the particularity of our data. Detailed information on how metadata fields are mapped, their descriptions and instructions on how to fill out each field in Omeka S is collected and maintained in this spreadsheet. Omeka S is designed to present the data in a three-level hierarchy of ‘Item sets-Items-Media.’ Although it follows this structure, a third-party module ‘Item Sets Tree’, makes it possible to create multiple sub-hierarchies by allowing to assign different Item sets to a particular ‘parent’ Item Set. Utilizing this module, PAHA follows a 4-level hierarchy for the arrangement of its dataset. It is as follows: Archival collection (Parent Item set) --->Building project group/Miscellaneous Media/Interviews (Item sets) --->Building project/Media not associated with any building project/Interviews (Item)--->Drawings/photographs/albums/documents/multimedia/transcripts (Media).

There is no metadata template assigned with the main parent item set, however the subsequent item sets of Building project group and Miscellaneous media share a common metadata template, while the Interviews collection item set doesn’t have a metadata template associated with it. All the subsequent Items such as - Building projects group, Building project, Miscellaneous media and Interview, along with Media such as - Drawings, Photographs, Documents, multimedia have their own unique metadata templates. The metadata values for each item and media are derived from information available within those items or media as well as their physical condition and properties such as dimension, material etc.
Linked Open Data, Controlled Vocabularies: Getty Art & Architectural Thesaurus (AAT) was utilized to describe certain metadata properties such as Subject, Type, Medium, Architectural Styles and Use of Space. Through regular discussions and consultations, a list of controlled vocabulary terms for each of these properties were derived from the Getty AAT. This list is kept dynamic and continuously evolving by adding new terms as required and including customized terms that are more suited for the PAHA dataset but not available in Getty AAT. The list of these controlled vocabulary terms for the respective properties can be accessed here. A third-party module in Omeka S called ‘Value Suggest’, which allows to automatically fetch the required terms from Getty AAT’s database server to fill out the required metadata fields.
Personal & Sensitive Data: Architectural drawings and other documents contain personal information such as names, contact details and addresses of owners, clients, architects and other stakeholders. This data is processed only for the purpose of archiving and educational purposes. Refer to the permissions and privacy policy document for further information.
Data Curators: List of people involved in collection and curation of the dataset –
Name |
Role |
Dr Sarah Melsens |
Project Incharge |
Dr. Maya Dodd |
Project Incharge |
Dr. Pushkar Sohoni |
Project Incharge |
Gaurav Kalyani |
Digital Repository Manager |
Uzma Sayyad |
Research Assistant |
Khyati |
Research Intern, Social Media Manager |
Namita Kalve |
Metadata creator |
Aarya Ghotikar |
Metadata creator |
Janhavi Sharma |
Metadata creator |
Ramaseshan |
Technical Consultant |
Sujaan Mukherjee |
Archival Consultant |
KGS Microsystems |
Digitization |
Santosh Kule |
Digitization |
Vaishnavi |
Digitization |
Suraj |
Digitization |
Prasad Angre |
Digitization |
Abhiraj Salve |
Overhead Photography |
Hrishikesh Borkar |
Overhead Photography |
Nithya Subramaniam |
Website design |
Nikhil Seth |
Website programming & Tech Support |
Rohit Petkar |
Documentary video production |
Maintenance: This repository will be regularly updated and actively maintained. New additions and updates to the already available dataset will continue to be made available.
3. Reuse and long term preservation
Distribution/dissemination strategy: The repository of digitized collections will be hosted on a private domain paha.site, linked with, created and managed through Omeka S. It will allow users to search and browse through the collections. There will be a second, front end website/landing page that is connected to the digital repository on Omeka. It will host customized content such as an interactive map, a timeline, narrative stories and posts, articles, blog posts and podcasts etc. all of which will be aimed towards wider public engagement.
Back-up:
All the digitized materials of the PAHA collection are stored on a 5TB external hard disk drive, with multiple backups stored on 2 more external drives of 5TB & 2TB, along with a Google Drive cloud storage provided by IISER, Pune. The data uploaded on Omeka S server along with the metadata created on Omeka is regularly backed up through CSV export. These CSV files along with the other data on Omeka server is backed up regularly on cloud storage provided by NextCloud file hosting service. This auto-back system is set-up and maintained by an IT consultant hired by PAHA. In case of an emergency, this will allow the restoration of the latest stable backup of Omeka server.
ARK PID: Each Item and Media has been assigned a persistent identifier facilitated through Archival Resource Key (ARK) Alliance. It allows creation and assignment of a consistent, unique URL for each item and media in PAHA collection and also ensures its sustenance, as long as the PAHA server is actively maintained. The unique ARK ID known as Name Assigning Authority Number (NAAN) assigned to PAHA by ARK Alliance is 71826.
Licensing Information: All the materials in the dataset, except for published books, are licensed under Creative Commons CC BY-NC-SA 4.0. The published books that are out of copyright are made available under Non-Commercial Use Only license, while those still under copyright are made available under In Copyright license.
copyright are made available under In Copyright license.
Citation: There are four different versions of citation formats designed specifically for different types of archival item. They are as follows:
- Architectural drawing (media): {building project title}, {subject-architectural drawing}, {creator}, {date (created)}, {Collection name}, Pune Architectural History Archive, {ARK id}
Eg. Shubhashree Apartments, architectural drawings (visual works), Laxman, 1976-09-14, Architects United Collection, Pune Architectural History Archive, paha.site/ark:/71826/media_1905
- Photograph (media) & Misc media (Item): {Title}, {subject-first value}, {creator}, {date (created)}, {Collection name}, Pune Architectural History Archive, {Ark id}
Eg. Exterior photograph of Banali Apartment, positives (photographs), Architects United Collection, Pune Architectural History Archive, paha.site/ark:/71826/media_1294
Certificate – Institution of Valuers, documents (object genre), 1980-09-29, V. V. Architects Collection, Pune Architectural History Archive, paha.site/ark:/71826/item_1449
- Books (item): {Title}, {subject}, {author (creator)}, {publication date}, {collection name}, {Ark id}
Eg. Development Control Rules, legal documents, Pimpri-Chinchwad Municipal Corporation, [1990-12-17], Architects United Collection, Pune Architectural History Archive, paha.site/ark:/71826/item_1102
- Interview (item): {Interviewee}. Interview By {Interviewers} {Date of recording} { collection name (parent 3)} Pune Architectural History Archive {Ark id}
Eg. Badawe Vishwakumar Vishwanath. Interview By Sarah Melsens, 2018-09-18, Oral history collection, Pune Architectural History Archive, paha.site/ark:/71826/item_1490