Data and Image Standards
of the Open Archive:
A How and Why for Small Collections
Director, Drexel Digital
Department of Design
College of Media Arts & Design
Philadelphia, PA, 19104
Standards for metadata harvesting and image capture are being created to facilitate the use of the internet, beyond it's use as a tool for resource discovery, to a tool for "distributed custodianship" of resources (1). Small museums and collections may have trouble finding justification for expending the resources to implement these standards. This paper describes an evolutionary prototype for an archiving project that is developing a process to incorporate the technical protocols and standards being developed and promoted by the Open Archives Initiative (OAI). Dublin Core, and XML developments, as well as recommendations from other collections involved in information sharing, to interpolate information between the database of a small historic costume collection and the database of an OAI repository. As the evolution of the prototype includes the retrospective conversion of collection data from 3"x5" paper file card to a relational database that includes images, all aspects of standardized data structure from naming conventions, data structure, ad image capture are considered. The database is designed particularly for Historic Costume incorporating Fashion Design; within the framework of the greater museum community's accepted data structure; and populated via an online data entry form.
Keywords: Open Archives Initiative, metadata harvesting, Dublin Core Museums Online, Historic Costume, Fashion Design information architecture, evolutionary prototype, XML digital imaging.
Word Count: 4,410
The integration of the craft and history of design with innovations in technology has been part of the foundation of Drexel University since its founding in 1891.The integration of the craft and history of design with innovations in technology as an Institute of Technology by A. J. Drexel. The Drexel Historic Costume Collection had its beginning in the 1890's when members of the Drexel family began assembling a collection of notable garments, accessories, and textiles. The collection represents two hundred years of historic costume and textile design. Among the items are eight gowns by Charles Worth, father of couture. The extensive lace collection has been featured in an outstanding resource book on this textile.(2) Shoes, millinery, parasols, gloves, and other accessories in the collection present an opportunity to study an entire period ensemble. The collection is estimated to contain approximately 7000 items.
The existing documentation for the bulk of the collection consists of 3" x 5" file cards containing limited archival data. As they make the migration of archival data from similar modes of storage to computerized databases in a "retrospective conversion", many other costume collections, such as those of The Kent State University Museum, The Fashion Institute of New York, The Texas Fashion Collection at the University of North Texas, and The Costume Institute of the Metropolitan Museum of Art, are updating and standardizing their archives. (3) The Drexel Historic Costume Collection has a real need for the same activity so its full potential as a teaching and research collection can be realized. The projects generated by this proposal promote research, education, and entrepreneurship within the interdisciplinary framework of creating a digital museum. The Drexel Digital Museum Project: Historic Costume Collection (DHCC) is creating a model for the education and training of digital image and virtual museum managers as well as creating a model of collaboration across colleges and between museums and universities.
Figure one: The Drexel
Historic Costume Collection, courtesy of the
curator, Bella Veksler, from, Lace, the Poetry of Fashion, photography
The Drexel Digital Museum Project: Historic Costume Collection, is a collaboration between the College of Media Arts and Design and the College of Information Science and Technology which uses current technology, traditional design skills and historical perspective to create access to and to preserve and manage the objects which comprise the collections of the Drexel Museum.(4) It represents the first of several planned projects that will be combined to form the Drexel Digital Museum. The goals of the project are to:
- Allow broader public access to Drexel's unique collections
- Provide tools that enable more effective scholarship
- Offer research opportunities within the collection on a global scale
- Train students in digital image management and museum informatics
- Create successful e-commerce initiatives to generate revenue for future projects
- Protect the University's assets
- Acquire funding to achieve sustainability
The project will provide access to the rich collections of the Drexel Museum via an online searchable database, with high quality digital representations, from multiple views. An evolutionary prototype has been created for this museum >online. (5) The quality of the graphic images, rich detail, and multiple views, via the 3D panoramas on the prototype website, are unique among historic costume collection websites.
As the collection had not had a full time curator in some time, moved between two locations and been accessioned by a variety of staff, three different numbering systems were used as object identifiers in record keeping. This posed a real dilemma in creating a unique identifier for the objects in the collection that did not, in some cases, include the universally accepted biblio-numeric (accession date .number of objects in accession. sequence of object in accession).
Caroline R. Arms, in her report on Lessons and Challenges at the Library of Congress (6) recommends establishing naming conventions early in a digitizing project. The naming convention not only establishes a unique, persistent identifier for each object in the collection, but also can provide structure for project control.
Naming conventions for files follow CIMI (Consortium for the Computer Interchange of Museum Information) recommendations and were developed by the Japanese American National Museum (JANM): institutional acronym_object ID_part designator.file extension. (7) An example provided by Snowden Becker, JANM, is: janm_97.77.31A_m.tif for the master file (high-resolution TIFF image), and janm_97.77.31A_a.jpg and janm_97.77.31A_t.jpg for derivative files and thumbnail files. JANM uses the museum standard biblio-numeric as the objects's unique identifier. Since not all of our records will have the biblio-numeric (97.77.31A) as a unique identifier, Drexel Digital Museum (ddm) use a systems generated number for the unique identifier: ddm_SYSGEN#_m.tif; ddm_SYSGEN#_t.jpg; ddm_SYSGEN#_a.jpg. Another ID field is included for those objects that do have the biblio-numeric. We will allow a null value in this field in the database.
We have 7 different file types stored in the image database. Those that are freely accessible and reproducible, via the world wide web, to the public:
- 3-d panoramas (object movies), stored as .mov
- Thumbnails (small files, limiting resolution) stored as .JPEG
- Full graphic of objects (medium files, limiting resolution) stored as .JPEG
- Full graphic of details (medium files, limiting resolution) stored as .JPEG
And those not freely accessible to the public and reproducible only by permission:
- Full graphic of objects (large files, multipurpose resolution, 300 dpi), stored as TIFF
- Full graphic of details (large files, multipurpose resolution, 300 dpi), stored as .TIFF
- Vector files of patterns of selected garments (small files, vector files), stored as .TIFF
The file extension itself designates which files are freely accessible to the public (JPEG, MOV) and which we are using for archiving purposes, because of their lossless quality, or to generate high quality digital output as a potential revenue stream for the project (TIFF). We have added an fg.JPEG(full graphic) and an fg.TIFF(full graphic) format to the image type options in the file extension. Object movies are stored as, example: ddm_SYSGEN#_a.mov. If the images are front and back of a garment, we use _r and _v (for recto and verso):
ddm_SYSGEN#_r_m.tif. If there are multiple detail views, we use a number and
recto and verso designators as necessary: ddm_SYSGEN#_r d2_m.tif.
This system creates unique identifiers for the objects in the collection, indicates the use of the file, provides some description of the image for internal use, and allows for the identification of the source of data supplied to data repositories.
Since we were "lucky" enough to not have inherited a legacy system, we decided to create a hybridized cataloging form by adapting existing classification structures for art images, fashion and textiles, to the needs of our users. As our user groups included historic costume collection scholars, fashion design students and faculty, and fashion designers, we wanted to marry historic costume collection terminology with contemporary fashion design terminology. We borrowed extensively from the Core Categories for Visual Resources (VRA Core), the fields used in the Museum Educational Site License Project and the Objects Classified by Medium Initiative of Longhouse Reserve. Because of limited resources, this project relies heavily on independent study and graduate students to aid in the data entry. This hybridization was to enable ease of data entry from a variety of domain expertise and data entry skills.(8)
Research on classification structures revealed the development of new standards and strategies for sharing information between database interactive websites. In its Digital Strategy, the Library of Congress states: "The Library should selectively adopt the portal model for targeted program areas. By creating links from the Library's Web site, this approach should make available the ever-increasing body of research materials distributed across the internet. The library would be responsible for carefully selecting and arranging for access to licensed commercial resources for its users, but it would not house local copies of materials or assume responsibility for long - term preservation." (9).
There is a real need for faculty and scholars from universities with limited resources to be able to access the images and data from other collections housed in repositories. In their paper on evaluating features of web environments that deliver satisfactory information seeking experiences, Zhang, Small, von Dran and Barcellos (10) assert that user tasks such as accessing and retrieving information rely on Hertzberg's hygiene features of quality of active links and load time. (11) How better to enhance a world wide web user's information retrieval experience than to increase access to information beyond the local database? And to make this access through a local query that retrieves relevant records from multiple collections and a variety of domains.
In a recent conference for the Museum Computer Network (12), Carl Lagoze suggests creating an " infrastructure for cross-repository reference linking as a means to reformulate the scholarly publishing framework"(13) Representatives of various ePrint, library and publishing communities are allying to create the Open Archive Initiative and an infrastructure to facilitate interoperability across multiple domains. Museums on the Web, Consortium for the Computer Interchange of Museum Information and Museum Computer Network are all interested in the "technical umbrella for practical interoperability" (14) that developing technical specifications for metadata harvesting could provide.Metadata is data about the data contained in a record, in plain text files, easilyread by a variety of software for a variety of collections. Data structures are defined in "markup" languages. HyperText Markup Language (HTML), used on the world wide web, is limited to tags specified in HTML standards and is used mainly for defining how data should look on a web screen. Extensible Markup Language (XML) provides rules by which communities from various domains may transport data over the world wide web. This extensibility allows communities to create their own sets of tags. The logical grammar of Document Type Definition (DTD) defines these tag standards to the greater community. Using agreed upon protocols for the tags allows for the harvesting of metadata by a variety of browsers. Resource Description Framework (RFD) can be used to enhance Dublin Core defined markup by supplying a framework for expressing relationships among items, sets of items or entire collections.
In the two - party model described by Lagoze, data providers and service providers use HTTP encoding and XML schema for protocol conformance. Extensibility is achieved by providing multi-item level item and collections level metadata. This allows for searching for an item by descriptive fields like object title, object creator, etc., or by the collection to which the object belongs. The tags for the OAI Metadata Harvesting Protocol are divided into three sections: protocol support; format - specific metadata; and community - specific record data. The sets or collection definitions are defined by the communities of the data providers and are not defined by the Open Archive Initiative (OAI) protocol.The namespace in the format - specific metadata allows different domains to have different meanings for the XML components.
Suppose a Fashion Design student has decided to create a senior collection based on shells. His/Her search of the Drexel Historic Costume Collection would yield the Givenchy that Grace Kelly wore for the big birthday bash the city of Philadelphia held for her, which is embellished with beads of real coral and lace with a shell motif. (She later donated the gown to the collection.) By accessing a repository of records from the broader community of design, he/she could perhaps retrieve not only a "pearly queen's" suit embellished with mother of pearl buttons from the collection of the Victoria and Albert Museum, but the South Sea Islands armor constructed from abalone shell from the collections at Longhouse Reserve and the beautiful, glass creations, inspired by sea life, created by Dale Chihuly from the same collection. Because of the structured access provided by standardized metadata tags attached to those records, he/she would not retrieve the irrelevant web search returns of "Royal Dutch/Shell", "shell game", "Shell Extensions Software" or "Sea Shell Fudge Shop". A most enjoyable aspect of a web search is what is retrieved in an opportunistic manner.
Structured access can refine that opportunism by allowing harvesting of metadata across what may, on initial examination, appear to be unrelated domains. The student may even find inspiration in retrieved images of cross sections of certain shells being examined by engineers for their structural properties. Lei Zeng, Kent State University has undertaken a project similar to part of our project. (15) Her online data entry form maps descriptive entries to allow for obtaining a record in several views: a VRA-Core-based record, a draft USMARC-based-record, or a draft Dublin-Core-based record, and to store that record in a fashion database. A suggested terms links on the form open pick lists of fashion terms.Our online form provides an online thesaurus, in a cascading drop list, that reflects he International Council of Museum's (ICOM) classification hierarchy of body coverings, (16) to which we have added contemporary fashion terms, when the Suggested Terms button is clicked.
Figure two: Online thesaurus, data entry form, DHCC
Additional Fashion Design terms, documented in, at least 3 Fashion Design periodicals, can be added by editing the Category field in the Advanced Functions. Additional hierarchical Thesauri for terms will be added, as the prototype evolves to include other types of collections in the Drexel Museum. An example would be the use of the Longhouse Classification Proposal's terms for Fiber (17) when we document the rather extensive collection of Textiles in the Museum.
However, as we did not conform to Dublin Core guidelines, essential to participating in open archiving, our hybridized form did not create records that could be shared with other databases. The Dublin Core guidelines create metadata, data about the data in a record, in simple text which will allow for this sharing of information.
Figure three: Advanced functions, online data entry form, DHCC
Extensibility of our data structure can be achieved by providing multi-item level item and collections level metadata meeting MOAC standards. This will allow for searching for an item by descriptive fields like object title, object creator, etc., or by the collection to which the object belongs. We are currently redesigning the data entry form using JSP, mapping our Historic Costume /Fashion Design data structure to the MOAC framework of definitions and protocols (18) which will, at a minimum, return records with metadata expressed in Dublin-Core format, without any qualifications.(19) As not all objects have the biblio-numeric accession number, not all will be able to be searched by collections to which they belong within the Drexel Historic Costume Collection. Collaboration with other historic costume collections could produce the set definitions that could define the community - specific record data for historic costume and fashion design in the MOAC protocol.
The portable technologies used in the reference implementation developed by CIMI for MOAC, use tools already used in creating our online, searchable database and interface, or familiar to our developers (20):
- Java 2 Standard Edition 1.3
- Java 2 Enterprise Edition 1.2(JNDI and JDBC)
- MySQL using the mm.mysql JDBC driver
- Java Servlets AP! 2.0
- Java API for XML Processing (JAXP) 1.
A driving force in the design of the Drexel Historic Costume Collection website is the quality of the images presented on the site. Early survey and observation of all external user groups, Designers (Fashion) and Designers (Textile/Fabric), and Students and Scholars (Historians, Archivists and Design Faculty), revealed that all required high quality graphic representations of the objects, from multiple views, with rich details.(21) The prototype delivers images of a quality to rival those on other historic costume collection websites such as The Museum of the City of New York Costumes and Textile Collection (22), Museum of Costume, Bath, England (23), The Costume Institute (Metropolitan Museum Of Art, NYC) (24), The Texas fashion Collection at the University of North Texas (25), and The Fashion Collection at Kent State University. (26) From a "runway" of thumbnail images of the objects in the database, the user may choose to view an object more closely, either by clicking on the thumbnail, or by first selecting which objects will occupy the runway by creating a query to the database from the constraint options of period, designer, category, fabrication and donor.
Figure four: Search screen, Drexel Digital Museum Project
Figure five: Search results screen, Drexel Digital Museum Project
From the search results screen, the user may rotate the garment in 3-d panorama, for multiple views and access archival data about the object.
Figure six: Detail, search results screen, Drexel Digital Museum Project
The MOAC approved Technical Specifications for submissions of finding aids, imaging metadata and images to the Online Archive of California,(27) provide the guidelines for refining our image data by creating digital images that can be re- purposed across print, fixed and network media. The specifications dictate that: all thumbnails should be 150 pixels along the longest edge; all derivative files should be in JPEG (Joint Photographic Experts Group) or GIF (Graphics Interchange Format); master files should be TIFF (Tagged Image File Format) and 3000 pixels along the shortest edge.(28) TIFF is a lossless file format which contains metadata in the "tag" with information about the image. We follow all of these standards with the exception of the 3000 pixel size requirement for the master files. Although we are limited, by the current use of an Olympus E10 camera, to smaller master file size, when funding is secured, we will purchase a Nikon D1x Digital Camera and adhere to this standard as well.
There will be an image file for each object in the collection that includes a color calibration scale and a measurement scale in inches and metric, placed next to the object in the image, 600 dpi, 24-bit, RGB, color corrected to MOAC image standards, and saved as uncompressed TIFF files. These calibrations are necessary to guarantee quality of information.
The guidelines being developed by MOAC and CIMI present a framework for access for institutions of any size. Providing direct access for users to resource rich collections, and preparing resources for direct access, is a challenge for large and small institutions. The DHCC information system design, which has evolved considerably due to the ongoing research by the project developers on museum community standards for accessibility and image capture, has added considerable design and implementation payload to the project. The MOAC approved Technical Specifications and the OAI protocols for metadata harvesting require a much larger commitment of programming hours than originally allotted for in the system implementation.
The ability to integrate the research, writing and building of the website into the project developers' need for research, writing and creative product as part of their quest for academic tenure has been critical to being able to sustain the momentum of the project. The ongoing dialogue with communities like the World Wide Web Consortium and the Museum Compute Network help us make design decisions that are significant and timely.
The model of evolutionary prototyping used by the Drexel Digital Museum Project: Historic Costume Collection, is an iterative design process where the design/implementation team work collaboratively to create a modifiable, portable, underlying structure for the system which supports the very basic functionality required by the project. Then, in repeating cycles of design, implementation and testing, features are added to deliver increased functionality, and the interface refined to enhance that delivery. With this type of design process there is a tangible prototype, with some level of functionality, at each phase of the process. Response to the prototype has been positive. The project director has been asked, to be the keynote speaker for an upcoming conference, "Technology and the Management of Costume Collections," of the Costume Society of America, one of the prime targeted user groups.
The community - specific record data describing community included in the OAI Metadata Harvesting Protocol tags permits customizing metadata for the specific needs of various communities. Within the skill set of our development team are 25 years of experience in the fashion industry, 15 years of curatorial experience in historic costume, strong working knowledge of MySQL, wizard level Java programming skills and 10 years experience in information access in the digital environment. We should be able to implement the OAI standards and contribute to the dialogue to formulate metadata for the particular needs of the historic costume/ fashion design domain.
As a university based collection, our main goal is to "sustain and preserve a universal collection of knowledge and creativity for future generations" (27). Graduate assistants will be trained to aid in image preparation, garment conservation, programming and data entry. The project provides the opportunity for the student to learn how to design, implement and test the system and interface design. The prototype can be used to illustrate the process' evolution. Independent Study Research utilizing the new research tools will be encouraged and sponsored by the curator and project director to observe the efficacy of the system in providing access to the collection. Further curriculum will be developed from the process.
An area that has always been of some concern to us is what information is shared for the transcendent good of the whole, and what is used to create a revenue stream to help sustain the project. Early in the development process, we decided to put the best quality images possible, within a reasonable load time, on the website with no slicing or digital watermarking. There are plans for creating master patterns from significant garments in the collection and providing a service for customized patterns from these masters. We are also considering how textile designs from the collection might be marketed. These functions will provide revenue, without denying access to or use of the images on the website.
The Library of Congress desire to be " responsible for carefully selecting and arranging for access to licensed commercial resources for its users"(29) of its repositories and the early exploration of CIMI into commercial ventures with metadata harvesters(30) raises some interesting issues. Structuring public access through a "pay for view" service denies public access to those who cannot or wish not to pay. Shouldn't the "licensed commercial resource" who provides a process for efficient information retrieval be paid for this service? Shouldn't the creator/owner of the information resource have some control over and remuneration for that information? Who is the owner? We follow, with great interest, the current dialogue and judicial decisions regarding these intellectual property issues.
Figure seven: The Drexel Historic Costume Collection, courtsey of the curator, Bella Veksler, from, Lace, the Poetry of Fashion, photography by Dave Gehosky.
(Figure one) Veksler, Bella. Lace, The Poetry of Fashion. Schiffler Publishing Company. (1998). photograph by David Gehosky
(Figure seven) Veksler, Bella. Lace, The Poetry of Fashion. Schiffler Publishing Company. (1998). photograph by David Gehosky
(1) Lagoze, Carl. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnatti, OH. (October 22 2001).
(2) Veksler, Bella. Lace, The Poetry of Fashion. Schiffler Publishing Company. (1998).
(3) Zeng, L., "Metadata Elements for Object Description and Representation". Journal of the American Society for Information Science. in press.
(4) Martin, Kathi. A University Perspective. Interactions Annual Interface Design Issue. IEEE/ACM Press, (Feb. 2 2000). pp. 85-92. http://www.acm.org/pubs/articles/journals/interactions/2001-8-2/p85-martin/p85-martin.pd.htm
(6) Arms ,Caroline R.. Historical Collections for the National Digital Library. Challenges at the Library of Congress. D-Lib Magazine. (April 1996).
(7) retrieved from an MCN list serve dialogue firstname.lastname@example.org (June 19 2001) http://www.mcn.edu/espectra/digmg1.html
(8) Goodrum, Abby, and Martin, Kathi. Bringing Fashion Out of the Closet: Classification Structure for the Drexel Historic Costume Collection. Bulletin of the American Society for Information Science. Volume 26, No 6. (August/September, 1999). http://www.asis.org/Bulletin/Aug-99/goodrum_martin.html
(9) LC21: Digital Strategy for the Library of Congress. page 5.
(10) Barcellos,Silvia, von Dran, Gisela M., Small, Ruth and Zhang, Ping. Websites That Satisfy Users: A Theoretical Framework for Web User Interface Design and Evaluation. Proceedings of the 32nd Hawaii International Conference on System Science. (1999).
(11) Herzberg, F.. Work and the Nature of Man. Chapter 6. World Publishing , NY. (1966). 71-91.
(12) Lagoze, Carl. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnatti, OH. (October 22 2001).
(13) Lagoze, Carl. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnatti, OH. (October 22 2001).
(14) Lagoze, Carl. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnatti, OH. (October 22 2001).
(16) http://www.asis.org/Bulletin/Aug-99/goodrum_martin.html p 4
(17) Larsen, Jack. Objects Classified by Medium. in press
(20) Stern, Henry,. The CIMI OAI v1.0 Repository, A Reference Implementation of the OAI Metadata Harvesting Protocol. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnati. (October2001). p.3
(21) Goodrum, Abby, and Martin, Kathi. Bringing Fashion Out of the Closet: Classification Structure for the Drexel Historic Costume Collection. Bulletin of the American Society for Information Science. Volume 26, No 6. (August/September, 1999). http://www.asis.org/Bulletin/Aug-99/goodrum_martin.html
(29) Billington, James. Mission and Strategic Priorities of the Library of Congress. Librarian of Congress. (Fall 1995).
(30) Perkins, John. CIMI/MCN 2001 OAI Workshop. Museum Computer Network Conference. Cincinnatti, OH. (October 22 2001