The West Australian (Perth, WA: 1879 - 1954), Fri 21 Oct 1881, Page 2 - OCCASIONAL NOTES. You have corrected this article This article has been corrected by You and other Voluntroves This article has been corrected by Voluntroves.
- A collection of American epitaphs and inscriptions, with occasional notes by Alden, Timothy, 1771-1839. Publication date 1814 Topics Epitaphs Publisher New-York: S. Marks, Printer Collection universitypittsburgh; americana Digitizing sponsor Lyrasis Members and Sloan Foundation Contributor University of Pittsburgh Library System.
- Occasional Notes Productions is in Toronto, Ontario. September 30 at 3:26 PM Air Canada flight to TO was a breeze. Ultra clean plane, 1/3 full, great service.till we got struck by lightening.
- Occasional Notes The Bordelais say that the quantity of the vintage is determined in June and its quality is determined in September. That's not quite true, but now that we've finished bloom and the fruit has set, we have a good sense that that the 2016 vintage will be on the small side.
We're using Apache POI to manipulate the content of some Word documents. There are other ways to do it, but, on the whole, Apache POI works reasonably well for a nominally free solution. We've hit a use case that can be summarised by a simple question: does this Word document contain a (Word-generated) table of contents (TOC)? You would think that that is a reasonably uncontroversial question, perhaps even one commonly asked. Apparently it is not.
Background
The background here is that I know nothing about TOC generation in Word beyond what I've been able to deduce from examining Word's behaviour and trawling the content of word/document.xml
. I gather that Word inserts a processing instruction of some kind, but also renders static content into the file—that is, there's a marker saying 'there is a TOC in this document', but the TOC content itself is also rendered. It seems that instead of dynamically generating the TOC content (say, every time the document is changed), Word instead generates it once, and then it is only updated on a manual re-generation. So the problem we're facing is:
- A document has a TOC.
- We make changes to the body content: say, removing an entire section.
- The TOC is now stale, and instead of automatically refreshing it, Word inserts error messages at print time.
A basically satisfactory workaround in our case is to call enforceUpdateFields()
on the document prior to save, which signals to Word to show a dialog on next load:
Again, this isn't ideal, but it is satisfactory.
Solution
Apache POI doesn't expose anything useful in its high-level API for detecting an existing TOC. After an exhaustive Google search, and quite a bit of digging around in the lower-level class hierarchies, it wasn't obvious that we could solve this at any level using Java alone.
Inspecting word/document.xml
suggested that a processing instruction that looked something like this was present in all documents containing TOCs:
TOC o '1-3' h z u
How about if we get the XML for the document and search for such an element? If we call getDocument()
on the XWPFDocument
, we get a CTDocument1
which implements XmlObject
and provides a selectPath()
method to select nodes via an XPath expression. (If you're curious, it took a couple of hours of trial and error to be able to come up with the facts in the preceding sentence!) Firstly, add XMLBeans and Saxon to your POM:
(Again, that excerpt represents an hour of fun trying to assemble mutually compatible versions of POI, XMLBeans and Saxon, as well as answering the question 'Do we also need xmlbeans-xpath
?' Spoiler: we don't.) Then, with an XWPFDocument
called document
, find any w:instrText
elements, where w
is a namespace which we'll also define, and see if any of them contain a magic string:
So, it's brute force and depends on a magic string, but it seems to work. Better solutions gladly accepted!
We're using Apache POI to manipulate the content of some Word documents. There are other ways to do it, but, on the whole, Apache POI works reasonably well for a nominally free solution. We've hit a use case that can be summarised by a simple question: does this Word document contain a (Word-generated) table of contents (TOC)? You would think that that is a reasonably uncontroversial question, perhaps even one commonly asked. Apparently it is not.
Background
The background here is that I know nothing about TOC generation in Word beyond what I've been able to deduce from examining Word's behaviour and trawling the content of word/document.xml
. I gather that Word inserts a processing instruction of some kind, but also renders static content into the file—that is, there's a marker saying 'there is a TOC in this document', but the TOC content itself is also rendered. It seems that instead of dynamically generating the TOC content (say, every time the document is changed), Word instead generates it once, and then it is only updated on a manual re-generation. So the problem we're facing is:
- A document has a TOC.
- We make changes to the body content: say, removing an entire section.
- The TOC is now stale, and instead of automatically refreshing it, Word inserts error messages at print time.
A basically satisfactory workaround in our case is to call enforceUpdateFields()
on the document prior to save, which signals to Word to show a dialog on next load:
Again, this isn't ideal, but it is satisfactory.
Solution
Apache POI doesn't expose anything useful in its high-level API for detecting an existing TOC. After an exhaustive Google search, and quite a bit of digging around in the lower-level class hierarchies, it wasn't obvious that we could solve this at any level using Java alone.
Inspecting word/document.xml
suggested that a processing instruction that looked something like this was present in all documents containing TOCs:
TOC o '1-3' h z u
So, it's brute force and depends on a magic string, but it seems to work. Better solutions gladly accepted!
We're using Apache POI to manipulate the content of some Word documents. There are other ways to do it, but, on the whole, Apache POI works reasonably well for a nominally free solution. We've hit a use case that can be summarised by a simple question: does this Word document contain a (Word-generated) table of contents (TOC)? You would think that that is a reasonably uncontroversial question, perhaps even one commonly asked. Apparently it is not.
Background
The background here is that I know nothing about TOC generation in Word beyond what I've been able to deduce from examining Word's behaviour and trawling the content of word/document.xml
. I gather that Word inserts a processing instruction of some kind, but also renders static content into the file—that is, there's a marker saying 'there is a TOC in this document', but the TOC content itself is also rendered. It seems that instead of dynamically generating the TOC content (say, every time the document is changed), Word instead generates it once, and then it is only updated on a manual re-generation. So the problem we're facing is:
- A document has a TOC.
- We make changes to the body content: say, removing an entire section.
- The TOC is now stale, and instead of automatically refreshing it, Word inserts error messages at print time.
A basically satisfactory workaround in our case is to call enforceUpdateFields()
on the document prior to save, which signals to Word to show a dialog on next load:
Again, this isn't ideal, but it is satisfactory.
Solution
Apache POI doesn't expose anything useful in its high-level API for detecting an existing TOC. After an exhaustive Google search, and quite a bit of digging around in the lower-level class hierarchies, it wasn't obvious that we could solve this at any level using Java alone.
Inspecting word/document.xml
suggested that a processing instruction that looked something like this was present in all documents containing TOCs:
TOC o '1-3' h z u
How about if we get the XML for the document and search for such an element? If we call getDocument()
on the XWPFDocument
, we get a CTDocument1
which implements XmlObject
and provides a selectPath()
method to select nodes via an XPath expression. (If you're curious, it took a couple of hours of trial and error to be able to come up with the facts in the preceding sentence!) Firstly, add XMLBeans and Saxon to your POM:
Occasional Quotes Stampin Up
(Again, that excerpt represents an hour of fun trying to assemble mutually compatible versions of POI, XMLBeans and Saxon, as well as answering the question 'Do we also need xmlbeans-xpath
?' Spoiler: we don't.) Then, with an XWPFDocument
called document
, find any w:instrText
elements, where w
is a namespace which we'll also define, and see if any of them contain a magic string:
Occasional Notes 意味
So, it's brute force and depends on a magic string, but it seems to work. Better solutions gladly accepted!