Documentation

Getting started

gopdfrab has a small, predictable API: open a document, run a verification profile, read the issues.

Install

Add gopdfrab to your module.

terminal
go get github.com/voidrab/gopdfrab

Open a document

Open parses the file and builds the structures needed for verification. Always close the document when you're done with it.

main.go
doc, err := pdfrab.Open(path)
if err != nil {
	log.Fatal(err)
}
defer doc.Close()

Verify PDF/A

doc.Verify runs the full PDF/A-1b profile and returns a Result with a single Valid flag and a list of issues.

verify.go
v, _ := doc.Verify(pdfrab.PDFA_1B)

if v.Valid {
	fmt.Println("Document is PDF/A-1b compliant")
} else {
	fmt.Println("Issues:")
	for i, issue := range v.Issues {
		fmt.Printf("#%v: %v\n", i+1, issue)
	}
}
verify.go
// Verify opens, verifies, and closes a file in one call
result, err := pdfrab.Verify(path, pdfrab.PDFA_1B)
if err != nil {
	log.Fatal(err)
}
fmt.Println(result.Valid)

VerifyBytes is Verify for an in-memory PDF.

verify.go
// VerifyBytes is Verify for an in-memory PDF
result, err := pdfrab.VerifyBytes(data, pdfrab.PDFA_1B)

VerifyAll opens, verifies, and closes a batch of files concurrently.

verify_all.go
results, err := pdfrab.VerifyAll(paths, pdfrab.PDFA_1B)
if err != nil {
	log.Fatal(err)
}
for _, r := range results {
	if r.Err != nil {
		log.Println(r.Path, r.Err)
		continue
	}
	fmt.Println(r.Path, r.Result.Valid)
}

Convert to PDF/A

Convert produces a PDF/A conformant rewrite. It runs pre-emptive fixups, then a verify/fix loop, and rasterizes pages as a last resort when no in-place fixer can repair them.

convert.go
cr, err := pdfrab.Convert(path, pdfrab.PDFA_1B)
if err != nil {
	log.Fatal(err)
}

if err := os.WriteFile("out.pdf", cr.Output, 0o644); err != nil {
	log.Fatal(err)
}

fmt.Println(cr.Iterations)      // how many verify/fixup passes it took
fmt.Println(cr.Result.Valid)    // true if the output is fully PDF/A conformant

Conversion is also available on an already-open document, on in-memory data, across a batch of files, and the result always exposes any residual issues that survived every remediation pass.

convert_variants.go
// Converting an open document
cr, err := doc.Convert(pdfrab.PDFA_1B)

// Converting in-memory data
cr, err = pdfrab.ConvertBytes(data, pdfrab.PDFA_1B)

// Converting multiple files concurrently
results, err := pdfrab.ConvertAll(paths, pdfrab.PDFA_1B)
if err != nil {
	log.Fatal(err)
}
for _, r := range results {
	if r.Err != nil {
		log.Println(r.Path, r.Err)
		continue
	}
	fmt.Println(r.Path, r.Result.Result.Valid) // r.Result is a ConvertResult
}

// Inspecting residual issues after conversion
residual := cr.Residual()
for _, iss := range residual {
	c := iss.Check()
	fmt.Println(c.Clause(), c.Name())
	fmt.Println(iss.Page(), iss.Messages())
}

Selective check profiles

Narrow verification to the rules you care about. Start from the full PDFA_1B profile and remove checks, or start from an empty profile and add only what you need.

profiles.go
// Start from the full profile and remove checks
p := pdfrab.PDFA_1B.
	RemoveCheck(pdfrab.Checks.Structure.FileHeaderSignature).
	RemoveCheck(pdfrab.Checks.Font.SimpleNotEmbedded)

res, err := doc.Verify(p)

// Or start from an empty profile and add only what you need
p2 := pdfrab.PDFA_1B.Clear().
	AddCheck(
		pdfrab.Checks.Transparency.ImageWithSoftMask,
		pdfrab.Checks.Metadata.PDFAIdentifierMissing,
	)

res2, err := doc.Verify(p2)

Checks are grouped by spec area in the Checks registry:

Registry field Spec area
Checks.Structure6.1.x — file header, trailer, xref, object framing, limits
Checks.Colour6.2.2 OutputIntent, 6.2.3.x device colours, 6.2.9–10
Checks.Image6.2.4–6.2.7 image/form/PostScript XObjects
Checks.Transparency6.2.8 transfer functions, 6.4 soft masks/blend modes/alpha
Checks.Font6.3.x embedding, subsets, metrics, encoding
Checks.Annotation6.5.x annotation types and dictionaries
Checks.Action6.6.x action types and additional actions
Checks.Metadata6.7.x XMP metadata, extension schemas, PDF/A identifier
Checks.Form6.9 interactive forms

AllChecks() enumerates every registered check, and CheckByClause / ChecksForClause look checks up directly by clause.

lookup.go
pdfrab.AllChecks()                       // every registered check, with names, descriptions, clauses
pdfrab.CheckByClause("6.3.4", 1)          // a single check by clause + index
pdfrab.ChecksForClause("6.3.4")           // all checks registered under a clause

Inspecting issues

Each PDFError exposes the Check that flagged it, along with its page and underlying messages. Result has helpers for grouping and summarizing issues.

diagnostics.go
for _, issue := range v.Issues {
	c := issue.Check()
	fmt.Println(c.Clause(), c.Subclause(), c.Name(), c.Description())
	fmt.Println(issue.Page(), issue.Messages())
}

fmt.Println(v.Summary())   // human-readable report, one line per Check
v.Checks()                 // distinct Checks violated, sorted by clause
v.IssuesByCheck()          // map[Check][]PDFError
v.IssuesOnPage(1)          // issues found on page 1 (0 = document-level)

Document helpers

A few shortcuts on an open document for the most common questions: whether it's PDF/A compliant, what conformance it claims, and its raw XMP metadata.

helpers.go
ok, err := doc.IsPDFA()           // shorthand for Verify(PDFA_1B).Valid

part, level, err := doc.ClaimedConformance() // e.g. "1", "B" -- what the file claims, not whether it's valid

xmp, err := doc.XMPMetadata()     // raw XMP packet bytes, decoded to UTF-8

Legacy Isartor profile

The Isartor test suite is the old reference test suite for PDF/A-1b compatibility, predating the veraPDF project. If your application needs PDF/A-1b compatibility judged against Isartor specifically, use the Legacy_1B profile instead of the default PDFA_1B.

legacy.go
// Verify against the legacy Isartor-derived profile instead of the
// veraPDF-aligned default
v, err := doc.Verify(pdfrab.Legacy_1B)