Bruno Lowagie is the original developer of iText, an innovative PDF library that has grown into a global software company. As an active member of the ISO and PDF communities he has authored several books about iText, as well as worked with the community to continually improve PDF functionality. When he is not working in the PDF world, Bruno spends much of his time with his wife and two sons.
In 2013, Eddy Haerens from Leefdaal paid the eighth invoice sent to him by his contractor. Or so he thought. In reality, the invoice was intercepted by an impostor who changed only one thing on the invoice, the account number of the contractor. Eddy paid 30,000 Euro, but the money never reached its intended destination. When the deception was revealed, it was too late. The bank had followed Eddy's wiring instructions and could not take back the transfer. On top of that, Eddy still had to pay 30,000 Euro to his contractor.
What Problems Does Document Signing Solve?
This example demonstrates the vulnerabilities that are inherent to our traditional approach of documents. When we receive a document, whether in paper form or digitally, how do we know that the content wasn't tampered with? How do we know that the sender of the document is who he or she claims? And, if we receive a signed agreement, how can we make sure that the person who signed the document can't claim "I've never signed that document?" without having to involve a notary?
These three questions correspond with three assurances we want to build into our documents.
- Integrity: We want assurance that the document hasn't been changed.
- Authenticity: We want assurance that the author of the document is who we think they are.
- Non-repudiation: We want assurance that the author can't deny his or her authorship.
All three of these assurances are built into PDF documents using digital signatures. The "how" is fairly technical, but here we can explain the core concepts.
Concept 1: cryptographic hash functions
A digital document consists of a sequence of computer bytes that are organized in such a way that computer programs can render them to the screen, or to a printer. A cryptographic hash function can take those bytes and reduce them to a digest of a predefined length. The original bytes can never be reconstructed based on the digest. But if you apply the hash function to the same sequence of bytes, it will always result in the same digest.
Imagine a situation where documents are sent out to different parties. For instance: a university sends grade reports to all of its students. For privacy reasons, the university can't put all of these reports online on a public server, yet other universities that receive such a report from a student who enrolls for additional studies should be able to check if the grades weren't forged.
To solve this problem, the issuing university could publish the digest of every grade report that is distributed. This collection of digests can't be used to retrieve the grades of the students, but whoever receives a digital report card can make a digest of the bytes of the document and compare it with the digest that was published online. If both digests match, the grade report wasn't tampered with.
Concept 2: public key infrastructure (PKI)
Without going into too many details, Public Key Infrastructure (PKI) involves a pair of asymmetric keys. One key is called the public key and the other is called the private key. These keys can't be derived from each other, but if you encrypt data with one key, you can only decrypt that data with the other key. It is important that the private key remains private (usually it is stored on a physical device from which it can't be copied); the public key can be shared with the world.
These keys can be used for two different purposes.
- Encryption: If I want to share information with one specific person, I can ask for that person's public key and encrypt my data with that key. Nobody can decrypt that data, except the person who owns the corresponding private key.
- Digital signing: If I want to share information with the world, I could use my private key to encrypt it. The whole world can use my public key to decrypt that information. If this operation was successful, the document was encrypted with my private key.
Asymmetric encryption has a huge disadvantage: the number of bytes significantly increases the more secure you want to make your encryption. However, we can fix this if we involve hashing.
Concept 3: digital signing
When you want to share a document that is digitally signed, you have to provide three things.
- The document bytes as-is.
- A public certificate that is issued by a trusted party. This certificate contains a public key.
- An encrypted digest of the document.
Whoever receives these three elements can validate the digitally signed document in three steps.
- Create a digest from the document bytes (provided in 1.): hash1.
- Decrypt the digest (provided in 3.) using the public key (provided in 2.): hash2.
- If hash1 equals hash2, the document is OK.
Do these procedures meet our requirements of integrity, authenticity, and non-repudiation?
How Can We Trust the Information in the Public Certificate?
If the digest stored in the document matches with the digest computed on the fly based on the document bytes, the integrity is assured on condition that the digest was successfully decrypted using the public key. We know that the digest was successfully decrypted if both digests match, which in turn authenticates the owner of the corresponding private key as the author of the signature. This author can't deny that he signed the document unless he can prove that his private key was stolen.
There's only one uncertainty left: how can we trust the information in the public certificate? How can we verify the identity of the owner of the private key? How can we check if the private key wasn't revoked (for instance: because the owner reported it as stolen)?
All of these questions are answered by a Certificate Authority (CA). A CA only issues public and private keys to parties of which the identity has been thoroughly checked. The CA will also maintain a database of all the public certificates that were issued, including information about keys that were revoked.
How Do I Validate a Digital Signature?
It wouldn't be very convenient to ship a public certificate and an encrypted digest along with a document as separate files. Neither would it be convenient for people to validate a digital signature step-by-step. PDF solves this problem, by storing all the validation-related information (VRI) inside the document itself. Validation of the digital signature happens automatically in the PDF viewer.
Figure 1 shows how this is done.
The part marked in blue gives you an idea of what the syntax of a PDF document looks like. The part in pink is reserved for the digital signature. This signature signs all the bytes of the PDF document (the byte range) except the signature itself (which isn't part of the byte range).
Figure 2 shows which information can be stored inside the digital signature.
The minimum requirements consist of a signed message digest and a public certificate that contains the identity of the signer. Best practices also involve the following.
- The complete certificate chain that links the root certificate of the CA to the public certificate.
- Revocation information that contains the validity of the certificate at a certain point in time.
- A timestamp that irrefutably defines the time the signature was created.
Figure 3 shows what a correctly signed PDF document looks like in Adobe Reader.
If you want to create documents that are digitally signed as shown in figure 3, you need software that performs the steps described in this article, as well as certificates issued by a CA that is trusted by Adobe.
The iText software Group are a technology partner of GlobalSign. iText delivers PDF tools that enable people to work smarter with digital documents. With iText you can make documents usable and intelligent, control access and changes made to documents, digitally sign documents, ensure long-term digital archiving, and much more, pushing the limits of digital document interactivity. iText has been a free/open source software since 2000, offering the best documented, best performing and most feature rich open source PDF engine available in Java and C#.
Interested in digital signatures for the cloud? Watch our joint webinar with iText on YouTube: 'Digital Signatures for the Cloud: A B2C Case Study'