Selecting or referring to a page in the Word object model
Q. I'd like to cycle through all the pages in my Word document and <perform this bit of processing> on each page. How do I do this?
A. You can't.
Word is not page layout software. It's a word processor. It sees text as a scroll. Each document is one long scroll of text.
Word barely knows what a page is.
Word paginates a document by constantly talking to the current printer driver. It uses information from the printer driver to know where to chop up its precious scroll if it were required to force it on to individual bits of paper.
If you change the printer driver, so that the new one can fit just a tiny bit more or less text on the page than the previous driver, then all the pagination will change.
Where a page starts and ends is constantly changing as the user adds or deletes content and as the user changes how the document is viewed.
As one demonstration of how fluid is the concept of a 'page', try doing Alt-F9. It toggles between displaying fields and displaying field results. Try it in a document with a substantial table of contents, or several linked spreadsheet tables from Excel, or a couple of large linked images, or some other fields that generate content that takes up a lot of space. The number of pages in the document, and where each starts and stops, can change dramatically.
(Don't grumble about this being old technology. I note that even the latest-generation Amazon Kindle has a hazy concept of a 'page' too. It can display a book page-by-page, but there is no way for the Kindle to "go to page 16" because a "page" depends on the user's view settings.)
'But Word can count the number of pages. It must be able to identify an individual page!'
Yes, Word can count the number of pages in a document. Use something like:
There's no way to get from the
ComputeStatistics property to an individual page.
'But Word has a Pages collection. It must be able to identify an individual page!'
You can do something like the following:
"That works!", you say.
Yes, it appears to work the first time you try it. But it works in trivial circumstances only. It gets flummoxed by a table or a field that crosses a page boundary.
Here are some examples of problems.
If you have a table that starts on page 16 but a row in the table flows over onto page 17, then
will select page 16 and that part of the row that appears on page 17.
If a table is very big, starts on page 20 and ends on page 44, and there are rows that break across pages, then
will select all the way from page 20 to page 44!
If you have, say, a three-page table of contents starting on page 1, then
will select pages 1, 2 and 3.
If your aim was to cycle through each page and perform some kind of processing on each page, then, in this example, your code would have processed each page many times—in this example, up to 24 times.
Cycling through the
Pages collection to process each page is not a usable pattern in professional work.
'But Word has a "\page" built in bookmark. It must be able to identify an individual page!'
Word has a built-in bookmark named
"\page" that's been in the Word object model a lot longer than the
Pages collection. You can use it with code something like this:
Sadly, this code will produce the same results as the Pages collection when applied to a document that has tables with rows that span two or more pages, or has fields whose results span two or more pages.
Pages collection just inherited all the problems of the old
So how to I get my task done?
Option 1: Avoid even thinking about pages
Most people who are thinking "I want to cycle through all the pages in the document and …" are thinking about the printed document that the client gave them, or the document that they see, paginated, on the screen. Or they're thinking like InDesign. They're not thinking in Word's terms. Most of the time, it's more useful to say "I want to cycle through the document and…".
To cycle through the document, you may need to use Sections, Stories, Ranges and/or Paragraphs.
Paragraph The fundamental unit of construction of a Word document is a paragraph. The
Paragraphs collection has lots of useful properties and methods and they generally work well.
Range object is the fundamental way to access any contiguous content in a Word document. A
Range object can refer to a single character, or the whole of the main text of the document, or the text of one footnote, or the third word in the first text box in the even pages footer of section 4.
Range object has lots of useful methods and properties and they generally work well.
Paragraphs and Ranges work together nicely. A
Paragraph object has a
Range property. And a
Range property has a
StoryRanges There's no Story object and no StoryRange object, but there is a StoryRanges collection.
A Word document is made up of many stories: the main text, headers, footers, footnotes, endnotes, comments, text boxes and so on.
Many properties and methods in Word appear to deal with the whole document, but they don't. They only deal with the main text (eg
ActiveDocument.Fields.Count only counts fields in the main text). If you need to cycle through all the document, you need all the other
StoryRanges: headers, footers etc.
To process all your document, you need to cycle through the story ranges. Look up Story in Word's object model help for more. And look up NextStoryRange to see how to cycle through the
Section In Word-speak "Section" is a technical term. Every document has at least one
Section. You need a new section in a document if you want to change any of the things that are properties of a section: margins, orientation, number of columns and so on.
So, to cycle through all of a document, cycle through the story ranges, the sections, and/or the paragraphs. Set a
Range to the bit of the document you want to work with, and perform the processing.
This works 99.9% of the time, unless Geelong beat Collingwood last week, in which case all bets are off and you should be down at the pub.
Option 2: If you really do need to process what Word currently thinks is a printed page
In very rare circumstances, you really do need to deal with printed pages. In my experience it is very very very rare. Think once each year or so of full-time Word development.
If this is the case, use either the built-in bookmark
"\page" or the
Pages collection. (I find the bookmark easier to deal with.) You can then use
.Range.Information(wdActiveEndPageNumber) to keep track of which bits of the document you've processed and avoid processing each page more than once. Be careful about whether fields are displaying field codes or field results. Be careful about whether the document is displaying, or hiding, hidden text. And expect the processing to take a long time because of the extra work your code has to do.
Word doesn't know what a page is by Daiya Mitchell
Other articles about page breaks and pagination on this site.