Purpose
This documentation explains the behavior and impact of dynamic fields specifically the Table of Contents (TOC), cross-references, bookmarks, hyperlinks, and page numbers during the Word-to-PDF conversion process using Aspose.Words in our backend services (convert to pdf lambda).
It includes an in-depth analysis of field update logic, performance implications, and actual results from test conversions.
Background
When Word documents are converted to PDF, they may contain dynamic fields such as:
- Table of Contents (or other type of dynamic content table such as Table of Figures)
- Cross-references
- Internal bookmarks
- Page numbers
- External hyperlinks
These fields are often generated or controlled via Wordβs dynamic Field system (e.g., TOC, REF, HYPERLINK, PAGE). By default, their contents are not automatically updated unless the user explicitly triggers the update (e.g., right-click > "Update Field" in Word). This leads to two possible approaches during PDF conversion:
Current Behavior in Production (29.3.0)
In our current implementation, we explicitly call:
doc.UpdateFields();
This refreshes all dynamic fields before saving the document to PDF. We also use:
new PdfSaveOptions { UpdateFields = false };
This ensures:
- TOC texts (headings) and page numbers are regenerated
- Cross-references are fully re-evaluated
- Any broken or outdated references get surfaced as ## Error if the source no longer exists
- Page number fields update based on final pagination
This results in the PDF reflecting the most up-to-date field state, regardless of whether the original .doc/.docx had those fields manually updated.
This behavior applies now not only to Table of Contents, but also to other types of dynamic references such as: Table of Figures, Table of Authorities, Index.
Bibliographies are not automatically updated during conversion.
Problem Statement
While technically correct, this behavior has led to client confusion. Users often upload Word documents without manually updating fields like TOC or cross-references. After conversion:
- The PDF shows updated TOC entries and page numbers
- But the original Word document (as downloaded later) shows outdated or mismatched content
This discrepancy has triggered multiple Zendesk tickets from clients expecting WYSIWYG fidelity (PDF should match what they saw in Word).
Changed in next release (29.4.0)
To align with user expectations and reduce confusion, we remove the explicit call to:
doc.UpdateFields();
And change the save options flag to true:
new PdfSaveOptions { UpdateFields = true };
This approach allows:
- Page numbers to be updated automatically during export for Table of Contents and Table of Figures. For Table of Authorities and Index, page numbers are not automatically updated.
- Only the TOC page numbers get surfaced as Error! Bookmark not defined, if the source no longer exists.
- All other fields (TOC text, Captions inside Table of Figures, cross-reference labels, etc.) to remain as-isβmatching the state in the uploaded .doc/.docx file
Test Results
Several files were used for validation:
Document 1: TOC and other cross-references
π Original Word Document β Dynamic_TOC_Test_Document.docx
π Converted PDF β MG-1-1 Mihai Test doc 1 (v1.0).pdf
Contents of the Test Document
- A manually-inserted TOC
- Section headings across 5 pages
- One cross-reference
- One bookmark
- One external hyperlink
- Header and Footer with dynamic page numbers
Document 2: Table of Figures
π Original Word Document β Document Figures.docx
πConverted PDF β MIG2-127 Document Figures (v1.0).pdf
Contents of the Test Document
- Captions for the images in the document (References β Insert Caption)
- A manually-inserted Table of Figures (not updated)
Document 3: Table of Authorities
π Original Word Document β Document Table of Authorities.docx
πConverted PDF β MIG2-128 Document Table Authorities (v1.0).pdf
Contents of the Test Document
- Sequences of text marked as citation. (References β Insert citation in Word)
- To easily identify them in the doc: go to Home-> Click on . Search by β\ cβ.
- The text marked as citation is the one between {}
- A manually inserted Table of Authorities (not updated)
Document 4: Index
πOriginal Word Document β Document Index.docx
πConverted PDF β MIG2-130 Document Index (v1.0).pdf
Contents of the Test Document
- Sequences of text marked as Entries for Index (References->Mark Entry in Word)
- To easily identify them in the doc: go to Home-> Click on the paragraph icon . Search by β\ cβ.
- The text marked as index entry is the one between { }
- A manually inserted Index (not updated)
Observed Behavior in PDF
Feature | Result | Explanation |
---|---|---|
TOC Text | β Reflects original doc/docx TOC | TOC is preserved 1:1 with Word in the PDF |
TOC Page Numbers | β Correct | Updated usingPdfSaveOptions.UpdateFields = true |
Captions inside Table of Figures | β Reflects original doc/docx Table of Figures | Table of Figures is preserved 1:1 with Word in PDF |
Table of Figures Page Numbers | β Correct | |
Table of Authorities text (non-clickable) | β Reflects values in the original doc/docx | Non-clickable references |
Table of Authorities Page Numbers | β Reflects values in the original doc/docx | Non-clickable references |
Index text | β Reflects values in the original doc/docx | Non-clickable references |
Index Page Numbers | β Reflects values in the original doc/docx | Non-clickable references |
Cross-reference Text | β Updated | Reference to section is resolved correctly |
Cross-reference Link | β Clickable | Internal navigation works |
Bookmark Targeting | β Correct | Clicking anchor navigates as expected |
External Hyperlink | β Clickable in browser | URL retained properly |
Footer Page Numbers | β Accurate | Page field rendered per PDF layout |
Header Page Numbers | β Accurate | Page field rendered per PDF layout |
Benefits of Change
Area | Before (With Update Fields) | After (Without Update Fields) |
---|---|---|
Visual Match with Word | β PDF may differ | β 1:1 match |
Client Understanding | β Confusing | β Aligned |
Field Consistency | β Always fresh | |
Broken Field Warnings | β Possible ## Error | β Preserved as-is |
Performance | β³ Slower on large docs | β‘ Faster |
Known Limitations & Considerations
If the user forgets to update TOC or references manually in Word, the PDF will preserve those stale values.
If a section (ex: Section Five) is removed we update the page numbers by this option
new PdfSaveOptions { UpdateFields = true };
This reflects in the table of content as βError! Bookmark not defined.β This is intentional.
If there is a need to preserve 1:1 parity completely the flag must be set to false, however the full impact of the page numbers not updating is not yet fully known. In such cases Microsoft Word does not even let you update only the page numbers, only the full TOC table.
- Bookmarks and hyperlinks do not require field updates and continue to function correctly.
- Cross-references with broken anchors may remain silently incorrect same as in prod (29.3.0).
- Headers/footers that use Page or NUMPAGES fields are updated correctly.
- Behavior in Collaborative Editing seems to reflect same behavior as in Microsoft Word Office 365 application. If any discrepancies shows it depends on Microsoft to keep the applications in sync with each other.
- Page numbers inside Table of Authorities and Indexes do not update automatically. However, unlike Table of Contents and Table of figures, they are not clickable, therefore it is less likely to generate confusion/errors during userβs navigation inside document.
Comments
Please sign in to leave a comment.