How to Find & Highlight a Specific Word or Phrase inside a Document using Java

This tutorial describes how java developers can programmatically find and highlight a particular word or a phrase inside a MS Word document using Aspose.Words. It might seem easy at first to just find the string of text in a document and change its formatting, but the main difficulty is that due to formatting, the match string could be spread over several runs of text. Consider the following example. The phrase “Hello World!” consists of three different runs, its beginning is italic, middle is bold, while the last part – regular text. In addition to formatting, any bookmarks in the middle of text will split it into more runs. This article provides a solution designed to handle the described case – if necessary it collects the word (or phrase) from several runs, while skipping non-run nodes. The sample code will open a document and find any instance of the text “your document”. A replace handler is set up to handle the logic to be applied to each resulting match found. In this case the resulting runs are split around the txt and the resulting runs highlighted.
//your code here... package FindAndHighlight; import java.util.regex.Pattern; import java.util.ArrayList; import java.awt.Color; import; import; import com.aspose.words.Document; import com.aspose.words.IReplacingCallback; import com.aspose.words.ReplaceAction; import com.aspose.words.NodeType; import com.aspose.words.ReplacingArgs; import com.aspose.words.Node; import com.aspose.words.Run; class Program { public static void main(String[] args) throws Exception { // Sample infrastructure. URI exeDir = Program.class.getResource("").toURI(); String dataDir = new File(exeDir.resolve("../../Data")) + File.separator; Document doc = new Document(dataDir + "TestFile.doc"); // We want the "your document" phrase to be highlighted. Pattern regex = Pattern.compile("your document", Pattern.CASE_INSENSITIVE); // Generally it is recommend if you are modifying the document in a custom replacement evaluator // then you should use backward replacement by specifying false value to the third parameter of the replace method. doc.getRange().replace(regex, new ReplaceEvaluatorFindAndHighlight(), false); // Save the output document. + "TestFile Out.doc"); } } class ReplaceEvaluatorFindAndHighlight implements IReplacingCallback { /** * This method is called by the Aspose.Words find and replace engine for each match. * This method highlights the match string, even if it spans multiple runs. */ public int replacing(ReplacingArgs e) throws Exception { // This is a Run node that contains either the beginning or the complete match. Node currentNode = e.getMatchNode(); // The first (and may be the only) run can contain text before the match, // in this case it is necessary to split the run. if (e.getMatchOffset() > 0) currentNode = splitRun((Run)currentNode, e.getMatchOffset()); // This array is used to store all nodes of the match for further highlighting. ArrayList runs = new ArrayList(); // Find all runs that contain parts of the match string. int remainingLength = e.getMatch().group().length(); while ( (remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) { runs.add(currentNode); remainingLength = remainingLength - currentNode.getText().length(); // Select the next Run node. // Have to loop because there could be other nodes such as BookmarkStart etc. do { currentNode = currentNode.getNextSibling(); } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN)); } // Split the last run that contains the match if there is any text left. if ((currentNode != null) && (remainingLength > 0)) { splitRun((Run)currentNode, remainingLength); runs.add(currentNode); } // Now highlight all runs in the sequence. for (Run run : (Iterable<Run>) runs) run.getFont().setHighlightColor(Color.YELLOW); // Signal to the replace engine to do nothing because we have already done all what we wanted. return ReplaceAction.SKIP; } /** * Splits text of the specified run into two runs. * Inserts the new run just after the specified run. */ private static Run splitRun(Run run, int position) throws Exception { Run afterRun = (Run)run.deepClone(true); afterRun.setText(run.getText().substring(position)); run.setText(run.getText().substring((0), (0) + (position))); run.getParentNode().insertAfter(afterRun, run); return afterRun; } }


Language: Java | User: Sheraz Khan | Created: Jan 7, 2015 | Tags: Search particular word in document open word document java highlight particular word in a document find instances of a particular word change formatting of text Java Word API Java Word processing