The Open Archival Information System (OAIS) reference model, formally ISO 14721, is the framework that underlies serious digital preservation. National archives, university libraries, government records offices, and large enterprise archives are all built around its concepts. Understanding OAIS turns "I should back up these PDFs" into a structured discipline that survives decades. This guide is an introduction.
What OAIS is
OAIS was developed in the late 1990s by the Consultative Committee for Space Data Systems (CCSDS) to address NASA's need to preserve space-mission data over decades. It became an ISO standard in 2003, updated in 2012.
OAIS provides:
- A reference model, vocabulary and concepts for digital preservation
- A functional model, the activities an archive performs
- An information model, the structure of preserved content
- Standards alignment for tools and practices
It is not a software product. It is a way of thinking about preservation that any specific implementation can be evaluated against.
Why it matters for PDFs
If you preserve PDFs at scale or for the long term, the OAIS concepts:
- Help you design a system that does not silently lose data
- Provide common vocabulary when working with archive professionals
- Inform compliance with formal preservation standards
- Align with tools built by the preservation community
For one-off personal archives, OAIS may be overkill. For institutional archives, it is foundational.
The OAIS functional entities
OAIS describes six functional entities of an archive:
- Ingest, accept materials from producers
- Archival Storage, manage the long-term storage
- Data Management, manage metadata and indexes
- Administration, overall management
- Preservation Planning, monitor and respond to threats
- Access, serve materials to consumers
Each entity does specific work. Together they form a complete archive.
Producers, consumers, management
OAIS describes the actors:
- Producer, submits materials for archive
- Consumer, uses materials from archive
- Management, sets policy
The archive sits between producers and consumers, mediating long-term preservation.
Information packages
OAIS specifies three types of "information package":
- Submission Information Package (SIP), what a producer submits
- Archival Information Package (AIP), what is preserved inside the archive
- Dissemination Information Package (DIP), what is delivered to consumers
The SIP→AIP transformation is the ingest process. The AIP→DIP transformation is the access process.
For PDFs:
- SIP: a PDF plus its associated metadata, submitted by an agency or office
- AIP: the PDF (possibly transformed to PDF/A) plus enriched preservation metadata, stored in the archive
- DIP: the version delivered to a researcher, possibly a viewing copy or a derivative
Representation information
OAIS introduces representation information, the information needed to render and understand the preserved bits:
- Structure, how the bits are laid out (file format spec)
- Semantics, what the content means (domain context)
- Other, environment, dependencies
For PDFs:
- Structure: the PDF specification (ISO 32000)
- Semantics: the meaning of the document in its domain
- Other: fonts, ICC profiles, embedded resources
An archive must preserve enough representation information that the bits remain meaningful far in the future.
Preservation Description Information (PDI)
OAIS specifies five types of PDI:
- Reference Information, identifiers; how this item is uniquely referred to
- Context Information, relationships to other content
- Provenance Information, history of the content (origin, changes)
- Fixity Information, integrity (hashes, checksums)
- Access Rights Information, who can access how
For PDFs in an archive, each PDF has all five layers of PDI accompanying it.
Why this structure helps
Without OAIS-style thinking, archives drift:
- Provenance lost over time
- Fixity not maintained (corruption goes undetected)
- Context broken (the PDF still opens, but its meaning is lost)
- Access controls forgotten
OAIS imposes discipline that prevents these failure modes.
Implementing OAIS for PDFs
A practical OAIS-aligned PDF archive:
Ingest:
- Validate the submitted PDF (PDF/A conformance check)
- Extract or enrich metadata
- Generate a unique identifier
- Compute fixity (SHA-256 hash)
- Wrap into an AIP
Archival Storage:
- Redundant storage (multiple copies, possibly geographically distributed)
- Encryption if confidentiality required
- Periodic integrity verification via hash
Data Management:
- Catalog of all AIPs
- Searchable metadata
- Relationships between AIPs (e.g., versions, parent/child)
Administration:
- Policies for retention, access, disposal
- Audit logs
- Roles and permissions
Preservation Planning:
- Monitor file formats for obsolescence
- Plan migrations
- Watch for environment changes (operating systems, readers)
Access:
- Search interface
- Authentication and authorization
- Generate DIPs for delivery
- Track usage
For a PDF-heavy archive, each of these has specific PDF considerations.
Standards and tools
Several tools implement OAIS-style preservation:
- Archivematica, open-source preservation system
- Preservica, commercial
- Rosetta by Ex Libris
- DSpace for institutional repositories
- Fedora (Flexible Extensible Digital Object Repository)
These provide ingest pipelines, storage, metadata management, and access, implementing OAIS concepts.
Repository certification
Standards exist to certify trusted digital repositories:
- TRAC (Trustworthy Repositories Audit & Certification)
- ISO 16363, Audit and Certification of Trustworthy Digital Repositories
- DSA (Data Seal of Approval), entry-level
- CoreTrustSeal, successor to DSA
A certified TDR provides reasonable assurance that submitted content will be preserved.
Limits of OAIS
OAIS is a reference model, not a turnkey solution. Limits:
- Abstract, does not say how to implement; only what to think about
- Scale-flexible, works for small or large archives but tuning is required
- Doesn't solve organizational issues, staffing, funding, governance
- Doesn't dictate technology, multiple implementations are valid
For someone wanting to "just archive my PDFs", OAIS is conceptual scaffolding rather than a recipe.
OAIS for smaller organizations
You do not need to be a national archive to apply OAIS:
Minimal OAIS-aligned PDF archive:
- PDF/A conversion at ingest
- Hash recorded for fixity
- Metadata: title, author, date, source, hash
- Multiple copies in different locations
- Periodic verification
- Documented access policy
- Retention rules
This is achievable with cloud storage plus a simple metadata database. Even an individual archivist can apply the model.
For PDF-specific archives
PDFs lend themselves to OAIS preservation:
- PDF/A is the right format
- Standard tools validate
- Self-contained by design
- Wide reader support for the foreseeable future
For more PDF-specific guidance, see how to archive PDFs long-term and PDF/A archival format explained.
Migration as a core concept
OAIS explicitly addresses format migration:
- Refresh, copy bits to new media without changing them
- Replication, additional copies, same format
- Migration, transform to a new format
- Emulation, preserve the original environment to render the original format
For PDFs, refresh and replication are routine; migration may eventually be required if PDF is superseded. Currently, that day seems distant.
Provenance and authenticity
A long-term archive must answer "where did this come from?" decades later. OAIS provenance:
- Origin, who created the file
- Changes, what transformations occurred (e.g., PDF→PDF/A conversion)
- Custody, who has held the file
Maintain this chain. A PDF whose provenance is lost loses authenticity value.
Connections to other topics
OAIS thinking informs:
- How to archive PDFs long-term
- Document retention policies
- PDF/A archival format explained
- Document versioning best practices
- Document management systems explained
The model is the conceptual glue across these practical topics.
Common gotchas
Treating "save to cloud" as preservation. Cloud storage protects against device loss but not silent corruption, account loss, or vendor disappearance.
No fixity verification. A file thought to be preserved may be corrupted; without hashes, you don't know.
Single-format dependence. Putting all eggs in PDF/A is reasonable but not eternal. Plan for format migration eventually.
Metadata as afterthought. Add at ingest, not retrospectively.
Access policies undefined. Years later, who can access archived content? Document up front.
Disposal not planned. Some content should age out per retention policy. Build in scheduled review.
Practical recipe: applying OAIS thinking to a PDF archive
For an organization with no current preservation discipline:
- Define scope. What categories of PDFs are preserved?
- Choose tools. Archivematica or commercial; cloud + on-prem.
- Establish workflows for ingest, storage, access.
- Document policies for retention, access, disposal.
- Train staff.
- Audit periodically.
- Plan for migration.
Initial setup takes months; ongoing maintenance is lighter. The result is a preservation discipline that survives decades.
Takeaway
OAIS is the conceptual framework behind serious digital preservation. For PDFs specifically, it informs how to ingest, store, manage, and provide access to archives that need to last. The model is abstract; specific implementations like Archivematica or Preservica make it concrete. For smaller archives, the concepts still apply at smaller scale. For browser-based PDF operations alongside preservation workflows, Docento.app handles common tasks. For related topics, see how to archive PDFs long-term and PDF/A archival format explained.