Introduction This component is suitable to convert a well-formed html with/without inline CSS to a PDF file. HTML to PDF Converter is a tool to convert HTML to PDF. HTML to PDF Converter accurately transforms well-formed HTML with or without inline cascading style sheets into PDF files. Features - Converts well-formed HTML into PDF reports - Creates headers and footers automatically for each page of the PDF Document - Creates column headers automatically for the second and each succeeding page of the table when an HTML table carries over to more than one page in the PDF report Limitations HTML document should not have any loose tags and it should not have any external CSS file.The component doesn't support font-size formatting. It doesn't support formatting in the HTML ,like tables and paragraphs.Hence is meant for just simple text and images. Background Common PDF component The Common PDF Component consists of the following Java files. com.bofa.crme.cpa.common.controller.PDFRenderServlet com.bofa.crme.cpa.common.pdfUtil.HTMLNormalizer com.bofa.crme.cpa.common.pdfUtil.HTMLParser com.bofa.crme.cpa.common.pdfUtil.XSLTGenerator com.bofa.crme.cpa.common.pdfUtil.PDFConstants Dependent Jar files : The following jar files are needed for the development. jaxp.jar avalon.jar fop.jar batik.jar xerces.jar Download them - Uploaded to server as requested by Author. Code changes to implement PDF component : Servlet entry in Web.xml : The entries for the controller servlet PDFRenderServlet are added as follows. Code: <servlet> <servlet-name> PDFRenderSevlet </servlet-name> <servlet-class> com.bofa.crme.cpa.common.controller.PDFRenderSevlet </servlet-class> </servlet> <servlet-mapping> <servlet-name>PDFRenderSevlet</servlet-name> <url-pattern>/PDFRenderSevlet</url-pattern> </servlet-mapping> Changes to be done in the jsp files : The javascript file PDF.js is included. Code: <head> <script src="js/pdf.js"></script> … … </head> In all the jsp files the javascript function generatePDF() from pdf.js is called when the generate PDF button is clicked. Code: <td width="10%" align="center"> <a href="javascript:generatePDF()"> <IMG SRC="images/PDF.gif" border="0"> </a> </td> A hidden field strHTML is added in the form block as shown below: Code: <form name="formName" action="” > … <input type="hidden" name="strHTML"> … </form> Note: By this change, no functionality and look & feel get impacted. The pdf.js, javascript file is used to call the common PDF component from the Application. Assumptions The HTML document must be well formed and must not have any loose tags. ALL acceptable HTML tags must be parsed, see W3C website for all valid tags. Also listed below in appendix The CSS files will be included as JSP page includes. Because the PDF components supports inline CSS only. The code Code: J2EE PDFRenderSevlet.java Code: package com.bofa.crme.cpa.common.controller; import javax.servlet.http.HttpServlet; import javax.servlet.ServletException; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse; import java.io.*; import com.bofa.crme.cpa.common.pdfUtil.*; import javax.servlet.ServletOutputStream; //JAXP import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.Source; import javax.xml.transform.Result; import javax.xml.transform.stream.StreamSource; import javax.xml.transform.sax.SAXResult; //Avalon import org.apache.avalon.framework.logger.ConsoleLogger; import org.apache.avalon.framework.logger.Logger; //FOP import org.apache.fop.apps.Driver; import org.apache.fop.messaging.MessageHandler; public class PDFRenderSevlet extends HttpServlet { /** *Description : doGet menthod * @param request as HttpServletRequest * @param response as HttpServletResponse * @throws ServletException * @throws IOException */ /*public void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { System.out.println("Inside doGet"); doPost(req, res); }*/ /** *Description : doPost menthod * @param request as HttpServletRequest * @param response as HttpServletResponse * @throws ServletException * @throws IOException */ public void doPost( HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { StringBuffer pathInfo = request.getRequestURL(); String absPath = pathInfo.toString(); absPath = absPath.substring(0, absPath.lastIndexOf("/")); String strHTML = ""; String pageSetting = "P"; //To set the Lanscape (or) Portrait Mode if (request.getParameter("pageSetting") != null) pageSetting = request.getParameter("pageSetting"); //The absloute Path variable is retrieved from Request. //This part is added to resolve Dynamic NetCharts Image generation //while generating PDF reports. if ((request.getParameter("absPathFromApplication") != null) && (((request.getParameter("absPathFromApplication")).indexOf("//") > -1))) { absPath = request.getParameter("absPathFromApplication"); } strHTML = request.getParameter("strHTML"); generatePDF(strHTML, response, absPath, pageSetting); } /** * Description : This method generates the PDF File * @param xmlSrc as File * @param xslt as File * @param response as HttpServletResponse * @deprecated */ public void writePDF( File xmlSrc, File xslt, HttpServletResponse response) { File pdfFile = new File("reportPDF.pdf"); //String pdfFile = "reportPDF.pdf"; ServletOutputStream sos = null; Driver driver = new Driver(); //Setup logger Logger logger = new ConsoleLogger(ConsoleLogger.LEVEL_ERROR); driver.setLogger(logger); MessageHandler.setScreenLogger(logger); //Setup Renderer (output format) driver.setRenderer(Driver.RENDER_PDF); //Setup the response content type and the header response.setContentType("application/pdf"); response.setHeader("Content-Disposition", "filename=reportPDF.pdf"); try { sos = response.getOutputStream(); } catch (Exception e) { System.out.println( "Error while getting the ServletOuputStream :" + e.toString()); } try { driver.setOutputStream(sos); //Setup XSLT TransformerFactory factory = TransformerFactory.newInstance(); Transformer transformer = factory.newTransformer(new StreamSource(xslt)); //Setup input for XSLT transformation Source src = new StreamSource(xmlSrc); //Resulting SAX events (the generated FO) must be piped through to FOP Result res = new SAXResult(driver.getContentHandler()); //Start XSLT transformation and FOP processing transformer.transform(src, res); } catch (Exception e) { System.out.println( "Error while generating the PDF File ----------->" + e.toString()); } finally { if (null != sos) { try { sos.close(); } catch (Exception e) { System.out.println( "Error while closing the servlet output stream :" + e.toString()); } } } } /** * Description : This method generates the PDF File * @param xmlSrc as String * @param xslt as String * @param response as HttpServletResponse */ public void writePDF( String xmlSrc, String xslt, HttpServletResponse response) { ServletOutputStream sos = null; ByteArrayOutputStream out = null; try { //Setup FOP Driver driver = new Driver(); Logger log = new ConsoleLogger(ConsoleLogger.LEVEL_WARN); driver.setLogger(log); driver.setRenderer(Driver.RENDER_PDF); //Setup a buffer to obtain the content length out = new ByteArrayOutputStream(); driver.setOutputStream(out); //Setup the InputStreams ByteArrayInputStream xmlStream = new ByteArrayInputStream(xmlSrc.getBytes()); ByteArrayInputStream xslStream = new ByteArrayInputStream(xslt.getBytes()); //Setup Transformer Source xsltSrc = new StreamSource(xslStream); Transformer transformer = TransformerFactory.newInstance().newTransformer(xsltSrc); //Make sure the XSL transformation's result is piped through to FOP Result res = new SAXResult(driver.getContentHandler()); //Setup input Source src = new StreamSource(xmlStream); //Start the transformation and rendering process try { transformer.transform(src, res); } catch (Exception e) { System.out.println( "Exception in transform() method in PDFRenderSevlet is ********" + e); e.printStackTrace(); } System.out.println("[INFO] END TIME::" + new java.util.Date()); //close the XML/XSL streams try { if (xmlStream != null) xmlStream.close(); if (xslStream != null) xslStream.close(); } catch (IOException e) { System.out.println( "Exception while closing the XML/XSL Streams"); e.printStackTrace(); } //Prepare response try { sos = response.getOutputStream(); } catch (IOException ioEx) { ioEx.printStackTrace(); } response.setContentType("application/pdf"); response.setHeader( "Content-disposition", "attachment; filename=\"ReportPDF.pdf\""); response.setContentLength(out.size()); //Send content to Browser sos.write(out.toByteArray()); sos.flush(); } catch (Exception e) { System.out.println("Exception in writePDF()\n" + e.getMessage()); } finally { try { if (sos != null) sos.close(); if (out != null) out.close(); } catch (Exception e) { System.out.println("Exception while closing the Streams"); e.printStackTrace(); } } } /** * Description : This method parses the HTML and generates an xsl file * @param strHTML as String * @param reponse as HttpServletResponse * @param absPath as String * @param pageSetting as String */ public void generatePDF( String strHTML, HttpServletResponse response, String absPath, String pageSetting) { String xmlStr = null; String xsltStr = null; try { System.out.println( "[PDFGen-INFO] START TIME::" + new java.util.Date()); HTMLNormalizer htmlParser = new HTMLNormalizer(); XSLTGenerator xsltGen = new XSLTGenerator(); System.out.println( "[PDFGen-INFO] The Page Setting would be :" + pageSetting); xsltGen.setPageSetting(pageSetting); xmlStr = htmlParser.doParseHTML(strHTML); System.out.println("[PDFGen-INFO] The XML file is generated"); xsltStr = xsltGen.doGenerateXSLT(xmlStr, absPath); System.out.println("[PDFGen-INFO] The XSLT file is generated"); } catch (Exception e) { e.printStackTrace(); } // createSrcFiles(xmlStr, xsltStr, response); writePDF(xmlStr, xsltStr, response); } /** * Description : This method creates the XML and XSLTfiles and writes the PDF * to the servlet outputstream. * @param xmlSrc as String * @param xsltSrc as String * @param response as HttpServletResponse * @deprecated */ public synchronized void createSrcFiles( String xmlSrc, String xsltSrc, HttpServletResponse response) { File xmlFile = null; File xsltFile = null; FileOutputStream xmlFos = null; FileOutputStream xsltFos = null; byte[] xmlBytes = null; byte[] xsltBytes = null; try { xmlBytes = xmlSrc.getBytes(); xsltBytes = xsltSrc.getBytes(); xmlFile = new File("xmlFile.xml"); xsltFile = new File("xsltFile.xsl"); xmlFos = new FileOutputStream(xmlFile); xsltFos = new FileOutputStream(xsltFile); xmlFos.write(xmlBytes); xsltFos.write(xsltBytes); } catch (Exception e) { System.out.println("Error writing in fileoutputstream -:"); e.printStackTrace(); } finally { try { if (null != xmlFos) { xmlFos.close(); } if (null != xsltFos) { xsltFos.close(); } } catch (Exception e) { System.out.println("Exception :" + e.toString()); } } //writePDF(xmlFile, xsltFile, response); } } HTMLNormalizer.java Code: package com.bofa.crme.cpa.common.pdfUtil; import java.io.InputStream; import java.util.ArrayList; import java.util.Iterator; import java.util.Set; import java.util.Stack; import java.util.HashMap; import java.util.StringTokenizer; import com.bofa.crme.cpa.common.pdfUtil.PDFConstants; public class HTMLNormalizer { /** * Description : This method returns the normalized tag for the given token. * It adds the closing tag for the single tags like <img/> * @param token as String * @return String */ public static String getNormalizedTag(String token) { String tag = null; if (token.endsWith("/>")) { tag = "<" + token.substring(1, token.length() - 2) + "></" + HTMLParser.getTagName(token) + ">"; } return tag.toLowerCase(); } /** * Description : This method parses the HTMLfile * @param data as String * @return String */ public String doParseHTML(String data) throws Exception { this.setInputString(data); return normalize(); } /** * Description : This method parses the HTMLfile * @param inStream as InputStream * @return String */ public String doParseHTML(InputStream inStream) throws Exception { this.setInputStream(inStream); return normalize(); } // contains the NON-PDF tags private ArrayList nonPdfTags = new ArrayList(); private HTMLParser parser = null; // contains the self closing tags private ArrayList selfClosingList = new ArrayList(); private StringBuffer styleData = new StringBuffer(); //to store the style data private boolean styleFlag = false; //to check the style tag /** * General Constructor * */ public HTMLNormalizer() { parser = new HTMLParser(); //add the self closing tags selfClosingList.add("br"); selfClosingList.add("img"); selfClosingList.add("a"); // selfClosingList.add("area"); //add the non PDF tags nonPdfTags.add("form"); nonPdfTags.add("input"); nonPdfTags.add("hr"); nonPdfTags.add("script"); nonPdfTags.add("style"); } /** * Description : This method extracts the body content for the HTML page * @param strHTML as String * @return String */ private String extractBody(String strHTML) { String content = null; int bodyIndex = strHTML.indexOf(PDFConstants.HEAD_END_TAG); if (bodyIndex > -1) { int start = bodyIndex + PDFConstants.HEAD_END_TAG_LEN; content = strHTML.substring(start, strHTML.length()); } return content; } /** * Description : This method handles the attributes present in a tag and encloses * them inside double quotes * @param tag as String * @return String */ private String handleAttributes(String tag) { String parsedTag = ""; HashMap attrMap = new HashMap(); // this boolean variable is added to check HTML <TD> tags contains NOWRAP attribute, if it is there then nowrap=\"nowrap\" is appended in the tag instead of NOWRAP. boolean booNowrapExists = false; tag = tag.trim(); if (tag.indexOf("nowrap") > 0) { String tempNowrap = ""; booNowrapExists = true; // here NOWRAP is removed from tag... tempNowrap = tag.substring(0, tag.indexOf("nowrap")); tempNowrap += tag.substring((tag.indexOf("nowrap") + 6), tag.length()); tag = tempNowrap; } if (tag.indexOf(">") > -1) tag = tag.substring(0, tag.indexOf(">")); // the below logic is modified to tokenize stlye tag attribute properly. int styleIndex = tag.indexOf("style=\""); if (styleIndex > 0) { //7 is added for 'style=\"' StringBuffer tagBuffer = new StringBuffer(tag.substring(0, styleIndex + 7)); String styleStr = tag.substring(styleIndex + 7); StringTokenizer styleToken = new StringTokenizer(styleStr, " "); while (styleToken.hasMoreTokens()) { tagBuffer.append(styleToken.nextToken()); } // end of while loop tag = tagBuffer.toString(); } // if loop checking style attribute try { // handle the <AREA> tag.......... if (tag.indexOf("<area") > -1) { if (tag.indexOf(">") > -1) tag = tag.substring(0, tag.indexOf(">")); // this is to handle "title" attribute in <AREA tag......... if (tag.indexOf("title=") > 0) { //6 is added for 'title=' char tokenizer = tag.charAt(tag.indexOf("title=") + 6); if (tokenizer == '"') { int one = tag.indexOf("title="); String first = tag.substring(0, one); String second = tag.substring(one); int two = tag.indexOf("\"", one); int three = tag.indexOf("\"", (two + 1)); String altVariable = tag.substring(one, three + 1); String attrName = "title"; String attrValue = altVariable.substring( (altVariable.indexOf("\"") + 1), (altVariable.length() - 1)); tag = tag.substring(0, one) + tag.substring(three + 1); attrMap.put(attrName, attrValue); } } } // handle "alt" attribute (which may have spaces in their values) in <IMG> tag ..... if (tag.indexOf("alt=") > 0) { //4 is added for 'alt=' char tokenizer = tag.charAt(tag.indexOf("alt=") + 4); if (tokenizer == '"') { int one = tag.indexOf("alt="); String first = tag.substring(0, one); String second = tag.substring(one); int two = tag.indexOf("\"", one); int three = tag.indexOf("\"", (two + 1)); String altVariable = tag.substring(one, three + 1); String attrName = "alt"; String attrValue = altVariable.substring( (altVariable.indexOf("\"") + 1), (altVariable.length() - 1)); tag = tag.substring(0, one) + tag.substring(three + 2); attrMap.put(attrName, attrValue); } } // handle "alt" attribute (which may have spaces in their values) in <IMG> tag ends here... // this is to handle "href" attribute in <A > tag......... else if (tag.indexOf("href=") > 0) { //5 is added for 'href=' char tokenizer = tag.charAt(tag.indexOf("href=") + 5); if (tokenizer == '"') { int one = tag.indexOf("href="); String first = tag.substring(0, one); String second = tag.substring(one); int two = tag.indexOf("\"", one); int three = tag.indexOf("\"", (two + 1)); String altVariable = tag.substring(one, three + 1); String attrName = "href"; String attrValue = altVariable.substring( (altVariable.indexOf("\"") + 1), (altVariable.length() - 1)); tag = tag.substring(0, one) + tag.substring(three + 1); attrMap.put(attrName, attrValue); } } } catch (Exception e) { System.out.println( "An exception occured while handling AREA tag or ALT attribute or href attribute in handleAttributes method is :: " + e); } StringTokenizer stk = new StringTokenizer(tag, " "); parsedTag = stk.nextToken(); while (stk.hasMoreTokens()) { String token = stk.nextToken(); int index = token.indexOf("="); String attrName = null; String attrValue = null; if (index > -1) { attrName = token.substring(0, index).trim(); attrValue = token.substring(index + 1).trim(); if (attrValue.startsWith("\"")) { attrValue = attrValue.substring(1); } //attrValue startsWith " if (attrValue.endsWith("\"") || attrValue.endsWith(">")) { attrValue = attrValue.substring(0, attrValue.length() - 1); } //attrValue endsWith " or > } // token contains = else { attrName = token; } attrMap.put(attrName, attrValue); } // while loop if (attrMap.containsKey("class")) { handleClass((String) attrMap.get("class"), attrMap); } else if (attrMap.containsKey("style")) { handleStyle((String) attrMap.get("style"), attrMap); } //remove the class and style keys attrMap.remove("class"); attrMap.remove("style"); //set the attribute values Set keys = (Set) attrMap.keySet(); Iterator keyIterator = keys.iterator(); while (keyIterator.hasNext()) { String element = (String) keyIterator.next(); parsedTag += " " + element + "=\"" + attrMap.get(element) + "\" "; } // iterating over attrMap ends here // If 'NOWRAP' finds in the HTML content's <TD> tag, then this 'NOWRAP' is replaced by nowrap=\"nowrap\" attribute. if (booNowrapExists) { parsedTag += " nowrap = \"nowrap\" >"; } else { parsedTag += ">"; } return parsedTag; } /** * Description :This method extracts the style attribute which would be referencing * the inline stylesheets. The outline style information needs to be handled * @param attrValue as String * @param attrMap as HashMap * @return String */ private void handleClass(String attrValue, HashMap attrMap) { String parsedData = ""; String styleContent = styleData.toString().trim().toLowerCase(); String styleClass = "." + attrValue; //Start :: check for Inline Style Sheet if (styleContent.indexOf(styleClass) > -1) { int firstIndex = styleContent.indexOf(styleClass); int secondIndex = styleContent.indexOf("}", firstIndex); int startIndex = styleContent.indexOf("{", firstIndex); parsedData = styleContent.substring(startIndex + 1, secondIndex).trim(); //remove carriage return characters parsedData = parsedData.replace('\r', ' '); parsedData = parsedData.replace('\t', ' '); parsedData = parsedData.replace('\n', ' '); } //End :: check for Inline Style Sheet //Start :: check for Outline Style Sheet //End :: check for Outline Style Sheet handleStyle(parsedData, attrMap); } /** * Description : This method extracts the style attribute * The style attributes will always be having a space before the value. So we * need to remove the extra spaces * @param attrValue as String * @param attrMap as HashMap * @return String */ private void handleStyle(String attrValue, HashMap attrMap) { String parsedData = null; int cnt = 0; char c0; StringBuffer attrBuffer = new StringBuffer(attrValue); while (cnt < attrBuffer.length()) { c0 = attrBuffer.charAt(cnt); if (c0 == ' ') { attrBuffer.deleteCharAt(cnt); } cnt++; } parsedData = attrBuffer.toString(); StringTokenizer stk = new StringTokenizer(parsedData, ";"); while (stk.hasMoreTokens()) { String token = stk.nextToken().trim(); int index = token.indexOf(":"); String name = null; String value = null; if (index > -1) { name = token.substring(0, index).trim().toLowerCase(); value = token.substring(index + 1).trim(); if (value.startsWith("\"")) { value = value.substring(1); } //attrValue startsWith " if (value.endsWith("\"") || value.endsWith(">")) { value = value.substring(0, value.length() - 1); } //attrValue endsWith " or > } // token contains = attrMap.put(name, value); } // end of stk.hasMoreTokens() while loop } /** * Description : This method parses the given HTML using HTMLParser class and * also normalizes the HTML content * @return String * @throws Exception */ private String normalize() throws Exception { try { Stack stack = new Stack(); StringBuffer buffer = new StringBuffer(); int tokenType = -1; String token = null; //For Dealing Title String strTitle = PDFConstants.XSL_HEADER_VALUE; //Default Header value boolean titleFlag = false; //Used to define the Column numbers when declaring the table int noOfTds = 0; int colsPos = 0; Stack colStack = new Stack(); //For handling inner tables while (parser.next()) { tokenType = parser.getTokenType(); token = parser.getCurrentToken(); if (tokenType == HTMLParser.START_TAG) { String currentTag = HTMLParser.getTagName(token); if (currentTag.equalsIgnoreCase("style") || currentTag.equalsIgnoreCase("script")) { styleFlag = true; } //Extract the title from the Page if (currentTag.equalsIgnoreCase("title")) { titleFlag = true; } // end of Title extraction if (!nonPdfTags.contains(currentTag.toLowerCase())) { String parsedTag = handleAttributes(token.toLowerCase()); //This piece of code adds the 'cols' attribute to the table attributes if (currentTag.equalsIgnoreCase("table") && parsedTag.indexOf("cols") == -1) { StringBuffer textBuffer = new StringBuffer(parsedTag); // START : For handling inner tables if (noOfTds > 0) { colStack.push(new Integer(noOfTds)); colStack.push(new Integer(colsPos)); } // END : For handling inner tables noOfTds = 0; int len = textBuffer.length() - 1; textBuffer.insert(len, " cols=\"0\" "); colsPos = buffer.length() + len + 7; parsedTag = textBuffer.toString(); } // end of table checking if (currentTag.equalsIgnoreCase("td")) { String tempToken = parsedTag.toLowerCase(); if (tempToken.indexOf("colspan=\"") > -1) { int index1 = tempToken.indexOf("colspan=\""); int index2 = tempToken.indexOf("\"", index1 + 10); String col = tempToken.substring(index1 + 9, index2); try { noOfTds += Integer.parseInt(col); } catch (NumberFormatException nfEx) { noOfTds++; } } else { noOfTds++; } // end of if loop checking the colspan tempToken = null; } // for td if condition //make the noOfTds as zero since a new row has began if (currentTag.equalsIgnoreCase("tr") || currentTag.equalsIgnoreCase("thead")) { noOfTds = 0; } buffer.append(parsedTag); stack.push(currentTag); } // end of non pdf tags if check } else if (tokenType == HTMLParser.CONTENT) { if (styleFlag) { styleData.append(token); } else buffer.append(token); //Handle Title if (titleFlag) { strTitle = token; titleFlag = false; } } else if (tokenType == HTMLParser.END_TAG) { String tagName = HTMLParser.getTagName(token); if (tagName.equalsIgnoreCase("script") || tagName.equalsIgnoreCase("style")) { styleFlag = false; } else if (tagName.equalsIgnoreCase("body")) { buffer.append("<title>" + strTitle + "</title>"); } // end of Body If check if (tagName.equalsIgnoreCase((String) stack.peek())) { //For calculating the no of columns in a table if (tagName.equalsIgnoreCase("table")) { buffer.deleteCharAt(colsPos); buffer.insert(colsPos, String.valueOf(noOfTds)); // START : For handling inner tables if (colStack.size() > 0) { colsPos = ((Integer) colStack.pop()).intValue(); noOfTds = ((Integer) colStack.pop()).intValue(); } else noOfTds = 0; // END : For handling inner tables } buffer.append(token.toLowerCase()); stack.pop(); } else { if (stack.contains(tagName)) { String tag = HTMLParser.getEndTag((String) stack.pop()); buffer.append(tag.toLowerCase()); while (stack.size() > 0 && !tagName.equalsIgnoreCase( (String) stack.peek())) { String currentTag = HTMLParser.getEndTag((String) stack.pop()); buffer.append(currentTag.toLowerCase()); } buffer.append(token.toLowerCase()); stack.pop(); } } } else if (tokenType == HTMLParser.SINGLE_TAG) { buffer.append(HTMLNormalizer.getNormalizedTag(token)); } } while (stack.size() > 0) { buffer.append(HTMLParser.getEndTag((String) stack.pop())); } String strHTML = extractBody(buffer.toString()); StringBuffer xmlData = new StringBuffer(); xmlData.append(PDFConstants.XML_HEADER + "\n"); xmlData.append(PDFConstants.ENTITY_REF + "\n"); xmlData.append(strHTML); return xmlData.toString(); } catch (Exception e) { System.out.println( "Exception in normalize is ************ " + e); return ""; } } /** * Description : This method sets the InputStream to parse * @param stream as InputStream */ private void setInputStream(InputStream stream) { parser.parse(stream); } /** * Description : This method sets the InputStream to parse * @param strHTML as String */ private void setInputString(String strHTML) { try { parser.parse(strHTML); } catch (Exception e) { System.out.println( "Exception is occurred while parsing the string value in HTMLNormalizer ************ " + e); } } } HTMLParser.java Code: package com.bofa.crme.cpa.common.pdfUtil; import java.io.InputStream; import java.io.ByteArrayInputStream; import java.io.IOException; public class HTMLParser { public static int CONTENT = 3; public static int END_TAG = 2; public static int SINGLE_TAG = 4; public static int START_TAG = 1; public static int COMMENT = 5; private String currentToken = null; private int currentTokenType; private String html; private InputStream input; private boolean isContentEnd; /** * Description : This method returns the END tag enclosed as </xxx> * @param name as String * @return String */ public static String getEndTag(String name) { return "</" + name + ">".toLowerCase(); } /** * Description : This method returns the START tag enclosed as <xxx> * @param name as String * @return String */ public static String getTagName(String tag) { String name = null; if (tag.startsWith("</")) name = tag.substring(2, tag.length() - 1); else if (tag.startsWith("<")) name = tag.substring( 1, tag.indexOf(" ") > 1 ? tag.indexOf(" ") : tag.length() - 1); return name; } /** * Description : This method returns the current Token * @return String */ public String getCurrentToken() { return currentToken; } /** * Description : This method returns the token type to check for * START/CONTENT/END tag * @return int */ public int getTokenType() { return currentTokenType; } /** * Description : This method parses and gives the next token. * This also sets the current token type * @return boolean */ public boolean next() throws IllegalStateException { if (input == null) throw new IllegalStateException("Stream not set"); StringBuffer token = new StringBuffer(); char c; try { int i = -1; if ((i = input.read()) != -1) { c = (char) i; if (c == '<' || isContentEnd) { token.setLength(0); if (isContentEnd) { token.append('<'); if (c == '/') currentTokenType = END_TAG; else currentTokenType = START_TAG; token.append(c); } else { token.append(c); if (((i = input.read()) != -1) & ((c = (char) i) == '/')) { currentTokenType = END_TAG; } else { currentTokenType = START_TAG; } token.append(c); } while (c != '>' && i != -1) { i = input.read(); c = (char) i; token.append(c); } if (token.charAt(token.length() - 2) == '/') currentTokenType = SINGLE_TAG; currentToken = token.toString(); isContentEnd = false; } else { token.append(c); while (c != '<' && i != -1) { i = input.read(); c = (char) i; if (c == '<') isContentEnd = true; else token.append(c); } currentTokenType = CONTENT; currentToken = token.toString(); } return true; } } catch (IOException e) { e.printStackTrace(); } return false; } /** * Description : This method sets the InputStream which is used to parse * @param inputStream as String * @see parse(String) */ public void parse(InputStream inputStream) { input = inputStream; } /** * Description : This method sets the String to parse by calling * parse(InputStream) method * @param html as String * @see parse(InputStream) */ public void parse(String html) { this.html = html; parse(new ByteArrayInputStream(html.getBytes())); } } PDFConstants.java Code: package com.bofa.crme.cpa.common.pdfUtil; public final class PDFConstants { public final static String HEAD_END_TAG = "</head>"; public final static int HEAD_END_TAG_LEN = 7; public final static String XML_HEADER = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>"; public final static String PDF_PG_HT = "11.5in"; public final static String PDF_PG_WD = "8.5in"; public final static String PDF_LEFT_MARGIN = "0.75in"; public final static String PDF_RIGHT_MARGIN = "0.75in"; public final static String PDF_TOP_MARGIN = "0.50in"; public final static String PDF_BOTTOM_MARGIN = ".50in"; public final static String XSL_HEADER1 = "<xsl:stylesheet version=\"1.1\" xmlns:xsl=\"[URL]http://www.w3.org/1999/XSL/Transform\[/URL]" xmlns:fo=\"[URL]http://www.w3.org/1999/XSL/Format\[/URL]" exclude-result-prefixes=\"fo\">\n"; public final static String XSL_HEADER2 = "<xsl:output method=\"xml\" version=\"1.0\" omit-xml-declaration=\"no\" indent=\"yes\"/>\n"; public final static String XSL_HEADER3 = "<xsl:template match=\"body\">\n"; public final static String XSL_HEADER4 = "<fo:root xmlns:fo=\"[URL]http://www.w3.org/1999/XSL/Format\">\n[/URL]"; public final static String XSL_HEADER5 = "<fo:layout-master-set>\n"; public final static String XSL_HEADER6 = "<fo:simple-page-master master-name=\"all\" margin-top=\"" + PDF_TOP_MARGIN + "\" margin-bottom=\"" + PDF_BOTTOM_MARGIN + "\" margin-left=\"" + PDF_LEFT_MARGIN + "\" margin-right=\"" + PDF_RIGHT_MARGIN + "\""; public final static String XSL_HEADER7 = "<fo:region-body margin-top=\"0.60in\" margin-bottom=\".50in\" />\n"; public final static String XSL_HEADER7_1 = "<fo:region-before extent=\"0.750in\" />\n"; public final static String XSL_HEADER7_2 = "<fo:region-after extent=\"0.25in\" /> \n"; public final static String XSL_HEADER8 = "</fo:simple-page-master>\n"; public final static String XSL_HEADER9 = "</fo:layout-master-set>\n"; public final static String XSL_HEADER10 = "<fo:page-sequence master-reference=\"all\">\n"; public final static String XSL_HEADER11 = "<fo:flow flow-name=\"xsl-region-body\">\n"; public final static String XSL_FOOTER1 = "</fo:flow>\n"; public final static String XSL_FOOTER2 = "</fo:page-sequence>\n"; public final static String XSL_FOOTER3 = "</fo:root>\n"; public final static String XSL_FOOTER4 = "</xsl:template>\n"; public final static String XSL_FOOTER5 = "</xsl:stylesheet>\n"; //Added for Header and Footer information public final static String XSL_MARKER1 = "<fo:static-content flow-name=\"xsl-region-before\">\n"; public final static String XSL_MARKER2 = "<fo:static-content flow-name=\"xsl-region-after\">\n"; public final static String XSL_MARKER3 = "<fo:block text-align=\"center\" font-size=\"6pt\">\n"; public final static String XSL_MARKER4 = "</fo:block>\n"; public final static String XSL_MARKER5 = "</fo:static-content>\n"; //Default Header and Footer values public final static String XSL_HEADER_VALUE = "******* TCSL Confidential Information - Internal Use Only ******"; //Entity References public final static String ENTITY_REF = "<!DOCTYPE body[<!ENTITY tilde \"~\"> <!ENTITY florin \"ƒ\"> <!ENTITY elip \"…\"> <!ENTITY dag \"†\"> <!ENTITY ddag \"‡\"> <!ENTITY cflex \"ˆ\"> <!ENTITY permil \"‰\"> <!ENTITY uscore \"Š\"> <!ENTITY OElig \"Œ\"> <!ENTITY lsquo \"‘\"> <!ENTITY rsquo \"’\"> <!ENTITY ldquo \"“\"> <!ENTITY rdquo \"”\"> <!ENTITY bullet \"•\"> <!ENTITY endash \"–\"> <!ENTITY emdash \"—\"> <!ENTITY trade \"™\"> <!ENTITY oelig \"œ\"> <!ENTITY Yuml \"Ÿ\"> <!ENTITY nbsp \" \"> <!ENTITY iexcl \"¡\"> <!ENTITY cent \"¢\"> <!ENTITY pound \"£\"> <!ENTITY curren \"¤\"> <!ENTITY yen \"¥\"> <!ENTITY brvbar \"¦\"> <!ENTITY sect \"§\"> <!ENTITY uml \"¨\"> <!ENTITY copy \"©\"> <!ENTITY ordf \"ª\"> <!ENTITY laquo \"«\"> <!ENTITY not \"¬\"> <!ENTITY shy \"*\"> <!ENTITY reg \"®\"> <!ENTITY macr \"¯\"> <!ENTITY deg \"°\"> <!ENTITY plusmn \"±\"> <!ENTITY sup2 \"²\"> <!ENTITY sup3 \"³\"> <!ENTITY acute \"´\"> <!ENTITY micro \"µ\"> <!ENTITY para \"¶\"> <!ENTITY middot \"·\"> <!ENTITY cedil \"¸\"> <!ENTITY sup1 \"¹\"> <!ENTITY ordm \"º\"> <!ENTITY raquo \"»\"> <!ENTITY frac14 \"¼\"> <!ENTITY frac12 \"½\"> <!ENTITY frac34 \"¾\"> <!ENTITY iquest \"¿\"> <!ENTITY Agrave \"À\"> <!ENTITY Aacute \"Á\"> <!ENTITY Acirc \"Â\"> <!ENTITY Atilde \"Ã\"> <!ENTITY Auml \"Ä\"> <!ENTITY Aring \"Å\"> <!ENTITY AElig \"Æ\"> <!ENTITY Ccedil \"Ç\"> <!ENTITY Egrave \"È\"> <!ENTITY Eacute \"É\"> <!ENTITY Ecirc \"Ê\"> <!ENTITY Euml \"Ë\"> <!ENTITY Igrave \"Ì\"> <!ENTITY Iacute \"Í\"> <!ENTITY Icirc \"Î\"> <!ENTITY Iuml \"Ï\"> <!ENTITY ETH \"Ð\"> <!ENTITY Ntilde \"Ñ\"> <!ENTITY Ograve \"Ò\"> <!ENTITY Oacute \"Ó\"> <!ENTITY Ocirc \"Ô\"> <!ENTITY Otilde \"Õ\"> <!ENTITY Ouml \"Ö\"> <!ENTITY times \"×\"> <!ENTITY Oslash \"Ø\"> <!ENTITY Ugrave \"Ù\"> <!ENTITY Uacute \"Ú\"> <!ENTITY Ucirc \"Û\"> <!ENTITY Uuml \"Ü\"> <!ENTITY Yacute \"Ý\"> <!ENTITY THORN \"Þ\"> <!ENTITY szlig \"ß\"> <!ENTITY agrave \"à\"> <!ENTITY aacute \"á\"> <!ENTITY acirc \"â\"> <!ENTITY atilde \"ã\"> <!ENTITY auml \"ä\"> <!ENTITY aring \"å\"> <!ENTITY aelig \"æ\"> <!ENTITY ccedil \"ç\"> <!ENTITY egrave \"è\"> <!ENTITY eacute \"é\"> <!ENTITY ecirc \"ê\"> <!ENTITY euml \"ë\"> <!ENTITY igrave \"ì\"> <!ENTITY iacute \"í\"> <!ENTITY icirc \"î\"> <!ENTITY iuml \"ï\"> <!ENTITY eth \"ð\"> <!ENTITY ntilde \"ñ\"> <!ENTITY ograve \"ò\"> <!ENTITY oacute \"ó\"> <!ENTITY ocirc \"ô\"> <!ENTITY otilde \"õ\"> <!ENTITY ouml \"ö\"> <!ENTITY oslash \"ø\"> <!ENTITY ugrave \"ù\"> <!ENTITY uacute \"ú\"> <!ENTITY ucirc \"û\"> <!ENTITY uuml \"ü\"> <!ENTITY yacute \"ý\"> <!ENTITY thorn \"þ\"> <!ENTITY yuml \"ÿ\">]>"; //The two variables are added to handle Portrait //and Lanscape Mode page width's dynamically. public final static int PAGE_WIDTH_PORTRAIT = 500; public final static int PAGE_WIDTH_LANSCAPE = 720; //The following variables are added to handel header and footer. public final static String HEADER_OPEN_TAG = "<header>"; public final static String HEADER_CLOSE_TAG = "</header>"; public final static String FOOTER_OPEN_TAG = "<footer>"; public final static String FOOTER_CLOSE_TAG = "</footer>"; public final static String XSL_MARKER6 = "<fo:flow flow-name=\"xsl-region-body\">"; public final static String XSL_MARKER7 = "</fo:flow>"; } XSLTGenerator.java Code: package com.bofa.crme.cpa.common.pdfUtil; import java.io.ByteArrayInputStream; import java.io.IOException; import java.util.HashMap; import java.util.Iterator; import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.Attr; import org.w3c.dom.Document; import org.w3c.dom.NamedNodeMap; import org.w3c.dom.Node; import org.w3c.dom.NodeList; public class XSLTGenerator { // Attribute Mappings public static HashMap foAttrMap = null; String absPath = null; private String pageSetting = "P"; // Used for Portrait (or) Landscape setting private String tblBdrSize = "0pt"; // cellpadding holder. Since we need to set the padding size for each cell, // we need to store it when we encounter it in a table tag private String tblPadSize = null; //Buffer used to hold the XSLT content StringBuffer xsltBuffer = new StringBuffer(); //Used to set the table cell border property /** * Public Constructor * Creation date: (9/15/03 6:56:00 PM) */ public XSLTGenerator() { foAttrMap = new HashMap(); //TABLE attributes foAttrMap.put("bgcolor", "background-color"); foAttrMap.put("align", "text-align"); foAttrMap.put("background", "background"); foAttrMap.put("border", "border"); foAttrMap.put("bodercolordark", "bodercolordark"); foAttrMap.put("width", "width"); foAttrMap.put("height", "height"); foAttrMap.put("cellspacing", "border-spacing"); foAttrMap.put("cellpadding", "padding"); foAttrMap.put("valign", "vertical-align"); //TD attribute foAttrMap.put("colspan", "number-columns-spanned"); foAttrMap.put("rowspan", "number-rows-spanned"); //Font attributes foAttrMap.put("face", "font-family"); foAttrMap.put("size", "font-size"); } /** * This method checks for % value and converts it into points * The 480 pixels are equivalent to 100% since the page margin would be 4 cms * Creation date: (9/17/03 10:32:42 AM) * @return java.lang.String * @param strValue java.lang.String */ private String abs(String strValue) { String absValue = null; //To handle Portrait and Lanscape Mode dynamically //page width value is assigned based on PageSetting. int pageWidth = PDFConstants.PAGE_WIDTH_PORTRAIT; if (getPageSetting().equalsIgnoreCase("L")) { pageWidth = PDFConstants.PAGE_WIDTH_LANSCAPE; } try { if (strValue.indexOf("%") >= 0) { strValue = strValue.substring(0, strValue.indexOf("%")); double value = Double.parseDouble(strValue); absValue = Math.round(pageWidth * (value / 100)) + "px"; } else absValue = strValue; } catch (Exception e) { e.printStackTrace(); } return absValue; } /** * This method checks for & value and replace it to & which can be * recognised by XML/XSL parsers * Creation date: (9/17/03 10:32:42 AM) * @return java.lang.String * @param strValue java.lang.String */ private static String handleAmp(String strValue) { StringBuffer absValue = new StringBuffer(); int index = strValue.indexOf("&"); if (index > -1) { int len = strValue.length(); for (int i = 0; i < len; i++) { char c1 = strValue.charAt(i); absValue.append(c1); if (c1 == '&' && strValue.indexOf("amp", i) != (i + 1)) { absValue.append("amp;"); } } // end of for loop return absValue.toString(); } else return strValue; } /** * Description : This method generates the Header and Footer information for the * page. Footer, as of now, will always be the page number * @param xmlData as String * @return String */ private String addHeaderFooter(String xmlData) { StringBuffer markerBuffer = new StringBuffer(); String headerInfo = ""; //Add the Header Information markerBuffer.append(PDFConstants.XSL_MARKER1); //markerBuffer.append(PDFConstants.XSL_MARKER3); //Add the Header Information. Repeating content mentioned in //the header tag in all the PDF reports pages int startIndex = xmlData.indexOf(PDFConstants.HEADER_OPEN_TAG); int endIndex = xmlData.indexOf(PDFConstants.HEADER_CLOSE_TAG); if (startIndex > -1 && endIndex > -1) { headerInfo = xmlData.substring(startIndex + 9, endIndex); XSLTGenerator objXSLT = new XSLTGenerator(); objXSLT.setPageSetting(getPageSetting()); headerInfo = objXSLT.doGenerateXSLT(headerInfo, absPath); int headerStartIndex = headerInfo.indexOf(PDFConstants.XSL_MARKER6); headerInfo = headerInfo.substring( headerStartIndex + PDFConstants.XSL_MARKER6.length()); int headerEndIndex = headerInfo.indexOf(PDFConstants.XSL_MARKER7); headerInfo = headerInfo.substring(0, headerEndIndex); } // end of if loop markerBuffer.append(headerInfo); //markerBuffer.append(PDFConstants.XSL_MARKER4); markerBuffer.append(PDFConstants.XSL_MARKER5); //Add the Footer Information int startFooterIndex = xmlData.indexOf(PDFConstants.FOOTER_OPEN_TAG); int endFooterIndex = xmlData.indexOf(PDFConstants.FOOTER_CLOSE_TAG); String footerMsg = ""; if (startFooterIndex > -1 && endFooterIndex > -1) { footerMsg = xmlData.substring(startFooterIndex + 8, endFooterIndex); } // end of if loop markerBuffer.append(PDFConstants.XSL_MARKER2); markerBuffer.append(PDFConstants.XSL_MARKER3); //adding footer message with page number in PDF reports. if ((footerMsg != null) && (!footerMsg.trim().equals(""))) { markerBuffer.append(footerMsg + " "); } markerBuffer.append("Page <fo:page-number />"); markerBuffer.append(PDFConstants.XSL_MARKER4); markerBuffer.append(PDFConstants.XSL_MARKER5); return markerBuffer.toString(); } /** * This method handles the close tag event * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node */ private void closeXSLTag(Node node) { String nodeName = node.getNodeName(); if (nodeName.equalsIgnoreCase("body") || nodeName.equalsIgnoreCase("div") || nodeName.equalsIgnoreCase("center")) { xsltBuffer.append("</fo:block>\n"); } if (nodeName.equalsIgnoreCase("table")) { xsltBuffer.append("</fo:table>"); } if (nodeName.equalsIgnoreCase("thead")) { xsltBuffer.append("</fo:table-header>"); } if (nodeName.equalsIgnoreCase("tr")) { xsltBuffer.append("</fo:table-row>"); } if (nodeName.equalsIgnoreCase("td") || nodeName.equalsIgnoreCase("th")) { xsltBuffer.append("</fo:table-cell>"); } if (nodeName.equalsIgnoreCase("tbody")) { xsltBuffer.append("</fo:table-body>\n"); } if (nodeName.equalsIgnoreCase("h1") || nodeName.equalsIgnoreCase("h2") || nodeName.equalsIgnoreCase("h3") || nodeName.equalsIgnoreCase("h4") || nodeName.equalsIgnoreCase("h5") || nodeName.equalsIgnoreCase("h6")) { xsltBuffer.append("</fo:block>\n"); } if (nodeName.equalsIgnoreCase("em") || nodeName.equalsIgnoreCase("cite") || nodeName.equalsIgnoreCase("var") || nodeName.equalsIgnoreCase("dfn") || nodeName.equalsIgnoreCase("i") || nodeName.equalsIgnoreCase("dfn")) { xsltBuffer.append("</fo:inline>"); } if (nodeName.equalsIgnoreCase("samp") || nodeName.equalsIgnoreCase("kbd") || nodeName.equalsIgnoreCase("code") || nodeName.equalsIgnoreCase("u") || nodeName.equalsIgnoreCase("span") || nodeName.equalsIgnoreCase("font")) { xsltBuffer.append("</fo:inline>"); } // this condition is added to handle BR Tag properly. if (nodeName.equalsIgnoreCase("br")) { xsltBuffer.append("</fo:block>"); } } /** * This method handles the start tag event. * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node */ private void createXSLTag(Node node) { String nodeName = node.getNodeName(); if (nodeName.equalsIgnoreCase("body")) { xsltBuffer.append("<fo:block font-size=\"8pt\">\n"); } if (nodeName.equalsIgnoreCase("table")) { xsltBuffer.append(handleTable(node)); } if (nodeName.equalsIgnoreCase("thead")) { xsltBuffer.append("<fo:table-header>\n"); } if (nodeName.equalsIgnoreCase("tr")) { xsltBuffer.append(handleTagsWithAttr(node)); } if (nodeName.equalsIgnoreCase("td") || nodeName.equalsIgnoreCase("th")) { xsltBuffer.append(handleTableCells(node)); } if (nodeName.equalsIgnoreCase("tbody")) { xsltBuffer.append("<fo:table-body>\n"); } if (nodeName.equalsIgnoreCase("img")) { xsltBuffer.append(handleImg(node, absPath)); } if (nodeName.equalsIgnoreCase("h1") || nodeName.equalsIgnoreCase("h2") || nodeName.equalsIgnoreCase("h3") || nodeName.equalsIgnoreCase("h4") || nodeName.equalsIgnoreCase("h5") || nodeName.equalsIgnoreCase("h6")) { xsltBuffer.append(handleHeader(node)); } if (nodeName.equalsIgnoreCase("u")) { //The <u> tag will be handled to underline the text which should be inline xsltBuffer.append("<fo:inline text-decoration=\"underline\">"); } if (nodeName.equalsIgnoreCase("em") || nodeName.equalsIgnoreCase("cite") || nodeName.equalsIgnoreCase("var") || nodeName.equalsIgnoreCase("dfn") || nodeName.equalsIgnoreCase("i") || nodeName.equalsIgnoreCase("dfn")) { //The <cite>, <em> , <var> , <dfn> , <i> elements are rendered in italics inline. //The <DFN> tag is used for definition xsltBuffer.append("<fo:inline font-style=\"italic\">"); } if (nodeName.equalsIgnoreCase("samp") || nodeName.equalsIgnoreCase("kbd") || nodeName.equalsIgnoreCase("code")) { //The <samp>,<kbd> and <code> tags are rendered in a slightly larger monospaced font. xsltBuffer.append( "<fo:inline font-family=\"monospace\" font-size=\"110%\">"); } if (nodeName.equalsIgnoreCase("br")) { // the below line is modifed to handle BR Tag properly. xsltBuffer.append("<fo:block>"); } if (nodeName.equalsIgnoreCase("div")) { xsltBuffer.append(handleContainerTag(node)); } if (nodeName.equalsIgnoreCase("span")) { xsltBuffer.append(handleContainerTag(node)); } if (nodeName.equalsIgnoreCase("center")) { xsltBuffer.append(handleContainerTag(node)); } if (nodeName.equalsIgnoreCase("font")) { xsltBuffer.append(handleFont(node)); } } /** * Description : This method generates the XSL document * @param xmlStr as String * @return String */ public String doGenerateXSLT(String xmlStr, String absPath) { String xsltStr = ""; this.absPath = absPath; xsltBuffer.append(PDFConstants.XML_HEADER); //append the entity reference xsltBuffer.append("\n" + PDFConstants.ENTITY_REF + "\n"); xsltBuffer.append(PDFConstants.XSL_HEADER1); xsltBuffer.append(PDFConstants.XSL_HEADER2); xsltBuffer.append(PDFConstants.XSL_HEADER3); xsltBuffer.append(PDFConstants.XSL_HEADER4); xsltBuffer.append(PDFConstants.XSL_HEADER5); xsltBuffer.append(PDFConstants.XSL_HEADER6); //Set the pageSettings if (getPageSetting().equalsIgnoreCase("P")) xsltBuffer.append( " page-height=\"" + PDFConstants.PDF_PG_HT + "\" page-width=\"" + PDFConstants.PDF_PG_WD + "\" "); else xsltBuffer.append( " page-width=\"" + PDFConstants.PDF_PG_HT + "\" page-height=\"" + PDFConstants.PDF_PG_WD + "\" "); xsltBuffer.append(">\n"); xsltBuffer.append(PDFConstants.XSL_HEADER7); xsltBuffer.append(PDFConstants.XSL_HEADER7_1); xsltBuffer.append(PDFConstants.XSL_HEADER7_2); xsltBuffer.append(PDFConstants.XSL_HEADER8); xsltBuffer.append(PDFConstants.XSL_HEADER9); xsltBuffer.append(PDFConstants.XSL_HEADER10); //Add Header and Footer xsltBuffer.append(addHeaderFooter(xmlStr)); xsltBuffer.append(PDFConstants.XSL_HEADER11); //create InputSource to parse the XML file byte[] xmlData = xmlStr.getBytes(); ByteArrayInputStream bis = new ByteArrayInputStream(xmlData); org.xml.sax.InputSource source = new org.xml.sax.InputSource(bis); //create the parser and parse XML tree DOMParser parser = new DOMParser(); try { parser.parse(source); } catch (org.xml.sax.SAXException saxEx) { } catch (IOException ioEx) { } //Create the XSL document by using the root tag Document doc = parser.getDocument(); Node node = doc.getDocumentElement(); parseNode(node); xsltBuffer.append(PDFConstants.XSL_FOOTER1); xsltBuffer.append(PDFConstants.XSL_FOOTER2); xsltBuffer.append(PDFConstants.XSL_FOOTER3); xsltBuffer.append(PDFConstants.XSL_FOOTER4); xsltBuffer.append(PDFConstants.XSL_FOOTER5); xsltStr = xsltBuffer.toString(); return xsltStr; } /** * This method extracts the attributes in the given node and stores them in a Hashtable. * The key value will be taken from the foAttrMap. * Creation date: (9/18/03 1:08:46 PM) * @return java.util.HashMap * @param node org.w3c.dom.Node */ private HashMap extractAttributes(Node node) { HashMap attrMap = null; NamedNodeMap attributes = node.getAttributes(); if (null != attributes) { attrMap = new HashMap(); int attrLen = attributes.getLength(); for (int i = 0; i < attrLen; i++) { Attr attr = (Attr) attributes.item(i); String attrName = attr.getNodeName(); String attrValue = attr.getNodeValue(); //if attrValue is a number and doesnot contain any of the below measurements, //then add 'px' if (!attrValue.endsWith("px") || !attrValue.endsWith("pt") || !attrValue.endsWith("%")) { try { Integer.parseInt(attrValue); attrValue += "px"; } catch (NumberFormatException nfEx) { } } //get the fo style value for the specified HTML style attribute String key = (String) foAttrMap.get(attrName.toLowerCase()); if (key == null) key = attrName; attrMap.put(key, attrValue); } // end for attributes iteration } // end of null checking return attrMap; } /** * This method is used to get the column width under the table * Creation date: (9/18/03 10:21:52 PM) * @return String[] * @param parentNode org.w3c.dom.Node */ private String[] getColumnWidth(Node parentNode) { String[] colArr = null; String tag = "tr"; String attrName = "cols"; int colCnt = 0; try { String attrValue = parentNode .getAttributes() .getNamedItem(attrName) .getNodeValue(); int noOfCols = Integer.parseInt(attrValue); Node tbodyNode = searchNode(parentNode, "tbody"); NodeList trList = tbodyNode.getChildNodes(); int trLength = trList.getLength(); colArr = new String[noOfCols]; outer : for (int i = 0; i < trLength; i++) { Node trNode = trList.item(i); if (trNode.getNodeType() == Node.ELEMENT_NODE) { NodeList tdList = trNode.getChildNodes(); int tdCnt = tdList.getLength(); if (tdCnt > 0 && (tdCnt / 2) == noOfCols) { inner : for (int j = 0; j < tdCnt; j++) { Node tdNode = tdList.item(j); if (tdNode.getNodeType() == Node.ELEMENT_NODE) { String widthValue = tdNode .getAttributes() .getNamedItem("width") .getNodeValue(); if (widthValue != null) colArr[colCnt++] = abs(widthValue); } } // end of inner for loop break outer; } else { continue; } // end of tdCnt check } // end of <TR> ELEMENT NODE check } // end of outer for loop } catch (NumberFormatException nfEx) { } catch (Exception e) { } return colArr; } /** * @return */ public String getPageSetting() { return pageSetting; } /** * This method is used to handle the Container tags (<div>,<span> etc) * with their text-align attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleContainerTag(Node node) { StringBuffer dataBuffer = new StringBuffer(); String alignValue = ""; String nodeName = node.getNodeName(); try { HashMap attrMap = extractAttributes(node); if (nodeName.equalsIgnoreCase("div")) { dataBuffer.append("<fo:block"); } else if (nodeName.equalsIgnoreCase("center")) { dataBuffer.append("<fo:block text-align=\"center\""); } else dataBuffer.append("<fo:inline"); dataBuffer.append(" line-height=\"1.5em + 2pt\" "); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); dataBuffer.append(" " + key + "=\"" + value + "\""); } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking dataBuffer.append(">"); } catch (Exception e) { e.printStackTrace(); } return dataBuffer.toString(); } /** * This method is used to handle the Font tags with their alignment attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleFont(Node node) { StringBuffer dataBuffer = new StringBuffer(); String nodeName = node.getNodeName(); int size = 0; try { HashMap attrMap = extractAttributes(node); dataBuffer.append("<fo:inline"); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); if (key.equalsIgnoreCase("font-size")) { try { size = Integer.parseInt(value); } catch (NumberFormatException nfe) { } value = String.valueOf((3 + (size * 3))); dataBuffer.append(" " + key + "=\"" + value + "\""); } else { dataBuffer.append(" " + key + "=\"" + value + "\""); } } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking dataBuffer.append(">\n"); } catch (Exception e) { e.printStackTrace(); } return dataBuffer.toString(); } /** * This method is used to handle the Header tags with their alignment attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleHeader(Node node) { StringBuffer tableBuffer = new StringBuffer(); String alignValue = ""; String nodeName = node.getNodeName(); try { HashMap attrMap = extractAttributes(node); tableBuffer.append( "<fo:block line-height=\"1.5em + 2pt\" font-weight=\"bold\""); if (null != attrMap && null != attrMap.get("text-align")) { alignValue = (String) attrMap.get("text-align"); tableBuffer.append(" text-align=\"" + alignValue + "\""); } if (nodeName.equalsIgnoreCase("h1")) { tableBuffer.append(" font-size=\"18pt\">"); } else if (nodeName.equalsIgnoreCase("h2")) { tableBuffer.append(" font-size=\"14pt\">"); } else if (nodeName.equalsIgnoreCase("h3")) { tableBuffer.append(" font-size=\"10pt\" space-before=\"6pt\">"); } else if (nodeName.equalsIgnoreCase("h4")) { tableBuffer.append(" font-size=\"9pt\" space-before=\"1mm\">"); } else if (nodeName.equalsIgnoreCase("h5")) { tableBuffer.append(" font-size=\"8pt\">"); } else if (nodeName.equalsIgnoreCase("h6")) { tableBuffer.append(" font-size=\"6pt\">"); } // for tag name checking } catch (Exception e) { e.printStackTrace(); } return tableBuffer.toString(); } /** * This method is used to handle the <img> tag with its attributes. * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @param absPath java.lang.String * @return java.lang.String */ private String handleImg(Node node, String absPath) { HashMap attrMap = extractAttributes(node); StringBuffer imgBuffer = new StringBuffer(); imgBuffer.append("<fo:inline>"); imgBuffer.append( "<fo:external-graphic display-align=\"center\" keep-with-next.within-page=\"always\">\n"); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); if (key.equalsIgnoreCase("src")) { // value = absPath + "/" + value; // to handle Alphablox generated images if (value.indexOf("/alphabloxserver") > -1) { String tempPath = absPath.substring(0, absPath.lastIndexOf("/")); value = tempPath + value; // this is to convert the string from "/alphabloxserver" to "/AlphabloxServer" int stPoint = value.indexOf("/alphabloxserver"); String tempValue = value.substring(0, stPoint) + "/AlphabloxServer" + value.substring(stPoint + 16); value = tempValue; } else { value = absPath + "/" + value; } } imgBuffer.append("<xsl:attribute name=\"" + key + "\">"); imgBuffer.append(value); imgBuffer.append("</xsl:attribute>\n"); } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking imgBuffer.append("</fo:external-graphic>"); imgBuffer.append("</fo:inline>"); return imgBuffer.toString(); } /** * This method is used to handle the Table tags with their attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleTable(Node node) { StringBuffer tableBuffer = new StringBuffer(); int noOfCols = 0; //check for the table caption. If present, then add it to the top of the table Node captionNode = searchNode(node, "caption"); if (null != captionNode) { String strCaption = captionNode.getFirstChild().getNodeValue(); tableBuffer.append( "<fo:block font-size=\"8pt\" font-weight=\"bold\" line-height=\"1.5em + 2pt\" text-align=\"center\">\n"); tableBuffer.append(strCaption); tableBuffer.append("</fo:block>\n"); } // captionNode null checking tableBuffer.append("<fo:table table-layout=\"fixed\""); HashMap attrMap = extractAttributes(node); //noOfCols = getNoOfCells(node); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); value = abs(value); if (key.equalsIgnoreCase("border")) { tableBuffer.append( " border-style=\"solid\" border-width=\"" + value + "\""); tblBdrSize = value; } else if (key.equalsIgnoreCase("background")) { tableBuffer.append( " background-image=\"" + absPath + "/" + value + "\""); } else if (key.equalsIgnoreCase("padding")) { tblPadSize = value; } else if (key.equalsIgnoreCase("cols")) { try { noOfCols = Integer.parseInt( value.substring(0, value.indexOf("px"))); } catch (NumberFormatException nfEx) { } } else { tableBuffer.append(" " + key + "=\"" + value + "\""); } } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking tableBuffer.append(">\n"); //get the column width String[] colArr = getColumnWidth(node); // add the column number tags at the end of table definition for (int i = 0; i < noOfCols; i++) { tableBuffer.append( "<fo:table-column column-number=\"" + (i + 1) + "\""); if ((colArr != null) && (colArr[i] != null)) { tableBuffer.append(" column-width=\"" + colArr[i] + "\""); } tableBuffer.append("/>"); } return tableBuffer.toString(); } /** * This method is used to handle the Table columns with their attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleTableCells(Node node) { StringBuffer tableBuffer = new StringBuffer(); String bdrColor = "black"; //border-width is handled dynamically. if the cell doesn't contains this property //adding the default value, or else adding the exact value. boolean borderCheck = true; String bdrProps = " border-style=\"solid\" "; tableBuffer.append("<fo:table-cell "); tableBuffer.append(bdrProps); tableBuffer.append(" margin-right=\"0.5mm\" margin-left=\"0.5mm\" "); //append the padding, if cellpadding is given for the table if (null != tblPadSize) { tableBuffer.append(" padding=\"" + tblPadSize + "\""); } HashMap attrMap = extractAttributes(node); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); //Handle the % values value = abs(value); if ((key.equalsIgnoreCase("number-columns-spanned") || key.equalsIgnoreCase("number-rows-spanned"))) { value = value.substring(0, value.indexOf("px")); } if (key.equalsIgnoreCase("text-align") && value.equalsIgnoreCase("middle")) { //since the center alignment becomes middle, we need to change them. value = "center"; } if (key.equalsIgnoreCase("background")) { tableBuffer.append( " background-image=\"" + absPath + "/" + value + "\""); } else if (key.equalsIgnoreCase("bordercolor")) { bdrColor = value; tableBuffer.append( " border-color=\"" + bdrColor + "\""); // the given width and nowrap attribute is removed from the html content here........ } else if ( (!key.equalsIgnoreCase("padding")) && (!key.equalsIgnoreCase("nowrap")) && (!key.equalsIgnoreCase("width"))) tableBuffer.append(" " + key + "=\"" + value + "\""); if (key.equalsIgnoreCase("border-width")) { borderCheck = false; } //bdrBtmClr = bdrRgtClr = bdrColor; } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking if (borderCheck) { tableBuffer.append(" border-width=\"" + tblBdrSize + "\""); } tableBuffer.append(">\n"); return tableBuffer.toString(); } /** * This method is used to handle the tags with their attributes * Creation date: (9/18/03 1:09:01 PM) * @param node org.w3c.dom.Node * @return java.lang.String */ private String handleTagsWithAttr(Node node) { StringBuffer tableBuffer = new StringBuffer(); String nodeName = node.getNodeName(); if (nodeName.equalsIgnoreCase("tr")) { tableBuffer.append("<fo:table-row"); } else if (nodeName.equalsIgnoreCase("td")) { tableBuffer.append( "<fo:table-cell border-left-color=\"black\" border-left-width=\"0.5pt\" border-left-style=\"solid\" border-top-color=\"black\" border-top-width=\"0.5pt\" border-top-style=\"solid\" border-right-color=\"black\" border-right-width=\"0.5pt\" border-right-style=\"solid\" border-bottom-color=\"black\" border-bottom-width=\"0.5pt\" border-bottom-style=\"solid\""); //append the padding, if cellpadding is given for the table if (null != tblPadSize) { tableBuffer.append(" padding=\"" + tblPadSize + "\""); } } HashMap attrMap = extractAttributes(node); if (null != attrMap) { Iterator attrKey = attrMap.keySet().iterator(); while (attrKey.hasNext()) { String key = (String) attrKey.next(); if (null != key) { String value = (String) attrMap.get(key); if (nodeName.equalsIgnoreCase("td") && (key.equalsIgnoreCase("number-columns-spanned") || key.equalsIgnoreCase("number-rows-spanned"))) { value = value.substring(0, value.indexOf("px")); } if (key.equalsIgnoreCase("text-align") && value.equalsIgnoreCase("middle")) { //since the center alignment becomes middle, we need to change them. value = "center"; } tableBuffer.append(" " + key + "=\"" + value + "\""); } // key null check if loop } // end of Hashmap iteration } // end of attrMap null checking tableBuffer.append(">\n"); return tableBuffer.toString(); } /** * This method is used to parse the given Node element and process the node. * Creation date: (9/18/03 1:05:43 PM) * @param node org.w3c.dom.Node */ private void parseNode(Node node) { try { int nodeType = node.getNodeType(); switch (nodeType) { case Node.ELEMENT_NODE : //the following condition is added to display header and footer block //in all the PDF reports page. if (node.getNodeName().equals("header") || node.getNodeName().equals("footer")) { //System.out.println("The element Node is:::"+node.getNodeName()); return; } createXSLTag(node); NodeList childNodes = node.getChildNodes(); int len = childNodes.getLength(); for (int i = 0; i < len; i++) { parseNode(childNodes.item(i)); } closeXSLTag(node); break; case Node.TEXT_NODE : String nodeValue = handleAmp(node.getNodeValue()); String parentNode = node.getParentNode().getNodeName(); if (parentNode.equalsIgnoreCase("td")) { xsltBuffer.append( // "<fo:block line-height=\"1.5em + 2pt\">\n"); // For all <TD> tags the nowrap attribute is added here, this is to handle the overlapping of content in PDF reports........ "<fo:block line-height=\"1.5em + 2pt\" wrap-option=\"wrap\">\n"); xsltBuffer.append(nodeValue); xsltBuffer.append("</fo:block>"); } else if (parentNode.equalsIgnoreCase("th")) { xsltBuffer.append( "<fo:block font-weight=\"bold\" line-height=\"1.5em + 2pt\" text-align=\"center\">\n"); xsltBuffer.append(nodeValue); xsltBuffer.append("</fo:block>"); } else if ( !parentNode.equalsIgnoreCase("title") && !parentNode.equalsIgnoreCase("script")) { xsltBuffer.append(nodeValue); } break; case Node.CDATA_SECTION_NODE : break; case Node.ENTITY_REFERENCE_NODE : xsltBuffer.append("&" + node.getNodeName() + ";"); break; case Node.ENTITY_NODE : break; case Node.PROCESSING_INSTRUCTION_NODE : break; } // end of switch case } catch (Exception e) { System.out.println("Exception occurred in method"); e.printStackTrace(); } } /** * This method is used to get the tag under the parentNode * Creation date: (9/18/03 10:21:52 PM) * @return org.w3c.dom.Node * @param parentNode org.w3c.dom.Node * @param tag java.lang.String */ public Node searchNode(Node parentNode, String tag) { return searchNode(parentNode, tag, 0); } /** * This method is used to get the tag under the parentNode * Creation date: (9/18/03 10:21:52 PM) * @return org.w3c.dom.Node * @param parentNode org.w3c.dom.Node * @param tag java.lang.String * @param nodePosition int */ public Node searchNode(Node parentNode, String tag, int nodePosition) { Node regNode = null; try { NodeList nodeList = parentNode.getChildNodes(); int length = nodeList.getLength(); for (int i = 0; i < length; i++) { Node childNode = nodeList.item(i); if (childNode != null && childNode.getNodeType() == Node.ELEMENT_NODE && childNode.getNodeName().equals(tag)) { regNode = childNode; break; } // end of if condition searchNode(childNode, tag, nodePosition); } // end of for loop } catch (Exception e) { e.printStackTrace(); } return regNode; } /** * @param string */ public void setPageSetting(String string) { pageSetting = string; } } pdf.js Code: function generatePDF() { //as defalut set page setting as Portrait generatePDFReport("P"); } /** * This function is called when PDF reports is to be generated in Lanscape Mode. */ function generatePDFLanscape() { //set pageSetting as "L" for Lanscape Mode var pageSetting = "L"; generatePDFReport(pageSetting); } /** * This function is called when PDF reports is to be generated in Portrait Mode. */ function generatePDFPortrait() { //set pageSetting as "P" for Portrait Mode var pageSetting = "P"; generatePDFReport(pageSetting); } /** * This function is used to get all the report content and remove Unnecessary content, * and passing the details to callPDF() function. */ function generatePDFReport(pageSetting) { //get the content var strHtml = " "; //this is to filter out the top level comments var contentLength = document.all.length; for(i=0;i<contentLength;i++) { if( document.all(i).canHaveChildren) { strHtml = document.all(i).innerHTML; break; } } //remove the inner table width from PDF strHtml = removeInnerTableWidth(strHtml); // replace with ' ' // g for global occurence and i for case insensitive var rExp = / /gi; var newString = " "; strHtml = strHtml.replace(rExp,newString); //adding Footer content which is dispalyed in each page of the PDF reports. var bodyClose = "/BODY"; var bodyIndex = strHtml.indexOf(bodyClose); if(strHtml.indexOf(bodyClose) != -1) { var first = strHtml.substring(0,bodyIndex-1); //this is to add the end of the body var second = strHtml.substring(bodyIndex-1); var strFooter = "<footer>"+new Date().toString()+" ******* TCSL Confidential Information - Internal Use Only ******</footer>"; strHtml = first+strFooter+second; } //menu item is adding the content outside body. remove them after //<!--END OF FILE --> comment var eof = "!--END OF FILE --"; if(strHtml.indexOf(eof) != -1) { strHtml = strHtml.substring(0,strHtml.indexOf(eof)-1); //this is to add the end of the body strHtml += "</BODY>"; } callPDF(strHtml,pageSetting); } /** * This function is used to send all the inputs to PDFRenderSevlet class, * where PDF report is generated. */ function callPDF(strHtml,pageSetting) { var pageWidth = document.body.scrollWidth; var pdfWidth = "21cm"; //building absolute path which is sent to PDFRenderSevlet as a parameter //this is used when NetCharts Image writing in PDF reports var normalWidth = screen.availWidth; if( pageWidth > normalWidth-20 ) { //calculate the pageWidth var width = parseInt(normalWidth)-20; //for normalWidth, its 21 cm, then calculate for pageWidth, //consider 4 CM for margin pdfWidth = parseInt((33/width * pageWidth )) +"cm"; } //remove the unwanted elements from PDF strHtml = removeUnWanted(strHtml); // to remove the comments we are calling removeComments() method strHtml = removeComments(strHtml); var htmlVar = document.getElementById("strHTML"); htmlVar.value = strHtml; document.forms[0].action = "PDFRenderSevlet?pageSetting="+pageSetting+"&pdfWidth="+pdfWidth; document.forms[0].submit(); } /** * This function eliminates comments in the page content */ function removeComments(content) { var position = content.indexOf("!--"); var temp = ""; if(position == -1) { return content; } else { while(position > 0) { var first = content.substring(0,position-1); var second = content.substring(position,content.indexOf("--\>")+3); var third = content.substring(content.indexOf("--\>")+3); temp = first+third; content = temp; position = temp.indexOf("!--"); } return temp; } } /** * This function eliminates the unwanted contents which are enclosed in * <!--START: DONT INCLUDE PDF --> and <!--END: DONT INCLUDE PDF --> * comments */ function removeUnWanted(content) { var start = "!--START: DONT INCLUDE PDF --"; var end = "!--END: DONT INCLUDE PDF --"; var buttonIndex = content.indexOf(start); var temp = ""; if(buttonIndex == -1) { return content; } else { while(buttonIndex > 0) { var first = content.substring(0,buttonIndex-1); var second = content.substring(buttonIndex,content.indexOf(end)+28); var third = content.substring(content.indexOf(end)+28); temp = first+third; content = temp; buttonIndex = temp.indexOf(start); } return temp; } } /** * This function removes Width property from given content. */ function removeWidth(content) { var position = content.indexOf("width="); var temp = ""; if(position == -1) { return content; } else { while(position > 0) { var first = content.substring(0,position-1); var second = content.substring(position,content.indexOf("%\"")+2); var third = content.substring(content.indexOf("%\"")+2); temp = first+third; content = temp; position = temp.indexOf("width="); } return temp; } } /** * This function is used to remove Width of Inner Tables. */ function removeInnerTableWidth(content) { var start = "!--START: REMOVE INNER TABLE WIDTH --"; var end = "!--END: REMOVE INNER TABLE WIDTH --"; var buttonIndex = content.indexOf(start); if(buttonIndex != -1) { while(buttonIndex > 0) { var first = content.substring(0,buttonIndex-1); var second = content.substring(buttonIndex+38,(content.indexOf(end)-1)); //calling removeWidth function to remove inner table width. second = removeWidth(second); var third = content.substring(content.indexOf(end)+36); content = first+second+third; buttonIndex = content.indexOf(start); } } return content; }
Will it work for HTML pages which have images or other media in them? haven't checked for imgae files.....try to do ur r&d, also plz let me knw if it get works, thanks
look like we need to do lot of code changes to make this work. I have used another component for achieving the same without much of this. But not sure if it will work with java code.
Is there any such component that can be used from .NET. I need similar solution that can be used from .NET, but the products available are way too costly..