phpTidyHt Manual

David Druffner

           ddruff@gemini1consulting.com
        

Revision History
Revision v. 1.0.1.0 10-26-2001 Revised by: dcd
Manual for v.1.0 of phpTidyHt

Manual for phpTidyHt, a PHP function to filter HTML output through HTML Tidy. Download updates to this manual and to phpTidyHt at Gemini 1 Consulting, LLC. An online manual has also been set up which displays users comments, modifications, and tips, as well as any author comments since this manual was published. Please visit the Online Manual. page to see the latest comments and/or to contribute yourself.


Table of Contents
1. Introduction
1.1. Manual Version
1.2. What is phpTidyHt?
1.3. What is HTML Tidy?
1.4. Copyright Information
1.5. Disclaimer
1.6. phpTidyHt License
1.7. New Versions
1.8. Credits
1.9. Feedback
1.10. Translations
2. Download phpTidyHt
3. System Requirements
4. Function Description
5. Using phpTidyHt
5.1. Using Output Buffering - Method 1
5.2. HTML String Build - Method 2
5.3. Debugging Scripts with phpTidyHt
5.4. Integrating phpTidyHt Into Your Scripts
5.5. phpTidyHt Options
6. Demo Page
7. Code History
8. Code Contributions
9. Further Information

1. Introduction


1.1. Manual Version

This document is version 1.0.1.0 The document version number is in S.S.M.M format. S.S. will always match the version of the software and M.M. will indicate the release of the manual relating to that version.


1.2. What is phpTidyHt?

phpTidyHt is a PHP script which allows you to filter all your PHP generated HTML through HTML Tidy before it is sent to the browser. Thus you have the advantage of automatically fixing most HTML errors on the fly, presenting a nicely formatted source to the browser, optionally converting the output to XHTML automatically, and obtaining useful information for debugging HTML source.


1.3. What is HTML Tidy?

If you are not familiar with HTML Tidy, it is a VERY useful command line utility that outputs correctly formed HTML (and optionally indents it nicely) from badly formed HTML (it takes the input and automatically fixes it and sends it to the screen or a file which you designate). Any HTML coding errors which it was able to fix are called "warnings" and any which it was not able to fix are called "errors". The errors and warnings can either be sent to the screen (useful for debugging), to a file, or both. HTML Tidy has been ported to a variety of operating systems, including Linux and most versions of Windows.


1.4. Copyright Information

This document is copyrighted © 2001 David C. Druffner and is distributed under the terms of the license, stated below.

This manual may be reproduced and distributed in whole or in part, in any medium physical or electronic, as long as this copyright notice is retained on all copies. Commercial redistribution is allowed and encouraged; however, this copyright notice must appear prominently in the work and this author would like to be notified at the following email address: prior to any such commercial distributions.

All translations, derivative works, or aggregate works incorporating this manual must be covered under this copyright notice. That is, you may not produce a derivative work from this manual and impose additional restrictions on its distribution. Exceptions to these rules may be granted under certain conditions; please contact the author.

Modifications: Any significant modifications (anything other than the correction of typos) to this document must be identified on the title page in the revision section of this document with the name and contact information of each author appearing next to each revision.

If you have any questions, please contact David Druffner


1.5. Disclaimer

No liability for the contents of this document can be accepted. Use the concepts, examples and other content at your own risk. There may be errors and inaccuracies, that may of course be damaging to your system. Proceed with caution, and although this is highly unlikely, the author(s) do not take any responsibility for that.

All copyrights are held by their by their respective owners, unless specifically noted otherwise. Use of a term in this document should not be regarded as affecting the validity of any trademark or service mark.

Naming of particular products or brands should not be seen as endorsements.


1.6. phpTidyHt License

The phpTidyHt script carries the following license which is GPL compatible and modeled on the modified BSD licsense (for GNU descriptions of various license types see the GNU License Page):

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1.Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2.Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3.The name of the author may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR, ANY DISTRIBUTOR, OR ANY DOWNLOAD HOSTING COMPANY BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


1.7. New Versions

The latest version of this document, which will be made available in a variety of formats, including plain text, HTML (tarred and zipped), Single HTMl, Adobe Acrobate PDF, and the SGML source, can be found at the phpTidyHt Document Page.


1.8. Credits

In this version I have the pleasure of acknowledging:

  • Gemini 1 Consulting, LLC for support during the development of this script and for providing the resources to host the phpTidyHt's documentation page. Gemini 1 Consulting, LLC is the developer of GemTrend™, a Windows based Building Controls Software application, and other innovative software products.

  • Dave Raggett, the developer of HTML Tidy , a great program.

  • SourceForge, the host of phpTidyHt's download and development page.


1.9. Feedback

Feedback is most certainly welcome for this document. Without your submissions and input, this document wouldn't exist. An online manual has been set up to allow you to submit comments for all to read and for consideration for incorporation into the next version of this manual.


1.10. Translations

Translations of this document are welcome. Please use the SGML Source, or failing that, the HTML or text version, all of which are found at the document page. Translations can be emailed to David Druffner.


2. Download phpTidyHt

The latest version of phpTidyHt can be found on the Sourceforge phpTidyHt Project Page and at Gemini 1 Consulting, LLC. It is currently available only in tar.gz format. When untarred it should contain the following files:

  1. phptidyht.php (the phpTidyHt script)

  2. phptidyht_demo.php (the phpTidyHt demo script)

  3. images/ (images for html)

  4. html/ (contains manual html file)

  5. html/phptidyht_manual_big.html (the phpTidyHt manual in html)

  6. readme.txt (license, info)


3. System Requirements

  1. PHP 4 (it should work with some modification on PHP3 but I haven't tested it)

  2. A Unix/Linux Operating System. With some modification, it probably will work on Windows 95/98/NT systems. I will probably make these modifications in a later version. This version was tested on a Mandrake 7.2 OS with Apache.

  3. The web server must allow system calls.If in safe mode, it won't allow this unless the safe mode directory option has been enabled, and then only for system calls to commands that are in the safe mode directory. If your administrator is using safe mode, ask him to put HTML Tidy in the safe mode directory. See the PHP Manual for details.

  4. The ability to read and write to the temporary file defined by $tidy_temp_file. This means you must set the file permissions (using chmod) for the directory which $tidy_temp_file uses to allow the owner of the server process (usually "nobody", "www", "apache" or "httpd" - do a ps -ef | grep httpd to find out). If you are operating on a remote server you may need telnet access to do this, although some ftp clients allow you to set permissions.

  5. HTML Tidy must be installed on the server, in the server's command path, and its permissions must allow the owner of the httpd process (usually nobody,www, apache or httpd) to execute it.




4. Function Description

The phpTidyHt script is comprised of several functions, the primary one of which is phpTidyHt. The proper format of the phpTidyHt function is:

boolean phpTidyHt( string $web_page, constant__FILE__, constant__LINE__);



Where:

Note

Please note that there are two underscores on each side of each constant, for further details on these constants see the Constants section of the PHP manual. phpTidyHt uses these constants to identify the sections of PHP code which is generating the HTML page and any HTML Tidy errors associated with it. The file and line number of the code is added to the identification banner in the error log file.


5. Using phpTidyHt

A typical active page is built from a hodgepodge of functions, conditional statements, database calls, variables, and hard coded HTML. As a result, PHP scripts often produce HTML source that is hard to read when you select "view source" on your browser, hard to validate, and hard to debug.

The solution to this mess is to fix your HTML on the fly with HTML Tidy and phpTidyHt. After integrating the function into your script (see the Integrating section), you can filter your output through phptidyht using one of two methods:

  1. With output buffering (the preferred method) OR

  2. By building your web page, collecting all of your output into a string variable first, then send it to the browser with phptidyht.




5.1. Using Output Buffering - Method 1

Php 4 introduced Output Buffering, which allows you to build and process your output for an entire page before sending it to the browser. It is relatively easy to implement and has many advantages(including the management of header output, easing the use of cookies). Besides being used to manage header output however, output buffering allows us to filter your entire web page through phptidyht before it is sent to the browser without having to change the way you write scripts (as method 2 might require).

The basic concept of output buffering is that when it is started using ob_start, all output is sent to an internal cache until the internal cache is flushed or, in our case, sent to a variable using ob_get_contents , then filtered through phptidyht by sending the variable as an argument to phptidyht.

Use output buffering, unless you are using Php 3, then use method 2, below.

The format of your code when using output buffering should be as follows:

 
 <?php
 
 ob_start; //starts output buffering   
 
 [Code generating HTML output]
 
 $web_page=ob_get_contents();  //copy output buffer to $web_page variable
 
 ob_end_clean();  //end output buffering
 
 /* Filter contents of $web_page through phpTidyHt and echo to browser */
 
 phpTidyHt($web_page, __FILE__, __LINE__);  
 
  
 


Here is an example:

  
  <?php
  
  /* Turns  Output Buffering On */
  ob_start();
  
  ?>
  <html>
  <head>
  <title>
  Test page
  </title>
  
  <?php
  
   echo "
   </head>
   <body>
   This is a test page
   </body>
   </html>";
   
   /* Copies output echoed in above lines from internal cache to variable */
   
   $web_page=ob_get_contents();  
   ob_end_clean();  //end output buffering
   
   //Filter contents of $web_page through phpTidyHt and echo to browser
   
   phpTidyHt($web_page, __FILE__, __LINE__);  
   
   
  

As you can see, with output buffering you don't need to change the way you write scripts, you only need to add a few extra lines of code.


5.2. HTML String Build - Method 2

The second method is implemented by buiding a string containing all HTML for the web page and then sending the HTML to the browser. Here is an example script excerpt of a traditional PHP page build which is echoed to the browser:

Example 1. Traditional Method of Building a Web Page and Echoing it to the Browser


$web_page="<html><title>Guest Book</title><head>";
$web_page.=getCss(); //adds css information generated from the function getCss();
$web_page.=getJavascript(); //adds javascript information generated from function 
$web_page.="</head><body>";
$web_page.="Info for User $username<br>";
$web_page.="<table>";
$web_page.=getDatabaseInfo($user_name, $db_table); //get users info from function 
$web_page.="</table></body></html>";
echo $web_page;

To use Phptidyht, replace the echo in Example 1 with a call to the phpTidyHt function like this:

Example 2. Same as Example 1 but $web_page Filtered Through phpTidyHt


$web_page="<html><title>Guest Book</title><head>";
$web_page.=getCss(); //adds css information generated from the function getCss();
$web_page.=getJavascript(); //adds javascript information generated from function 
$web_page.="</head><body>";
$web_page.="Info for User $username<br>";
$web_page.="<table>";
$web_page.=getDatabaseInfo($user_name, $db_table); //get users info from function 
$web_page.="</table></body></html>";
phpTidyHt($web_page, __FILE__, __LINE__);



5.3. Debugging Scripts with phpTidyHt

Tip

Besides using phpTidyHt for tidying up your HTML output, it is very useful for debugging your HTML source. You can log all HTML errors to the error log and use the log for reference when fixing your HTML code so that it conforms to HTML or XML standards. The log will contain information on all HTML errors and how to fix them and a banner identifying the PHP code section in your scripts which is producing the offending HTML. In addition to (or instead of) logging the errors to the log file, you can choose to have phpTidyHt echo the errors directly to your browser when the page is accessed. Thus when you are test your site the errors will be obvious when encountered.


5.4. Integrating phpTidyHt Into Your Scripts

You can integrate phpTidyHt into your scripts in one of two ways:

  1. Insert phpTidyHt into your main PHP script, OR

  2. Insert phpTidyHt into an external common library file which is included into your main script with an include statement.

The second way is preferable especially if you have multiple scripts from which you will need to call phpTidyHt. In either case you should paste into the top of the script the defined constants and any global variables in the script. As of this writing, they are the following:

define ("HTML_TIDY_ON",true); //html tidy on/off, if off, will echo to screen

define ("SAVE_TIDY_ERRORS",true); //saves all errors to file specified in TIDY_LOG

define ("TIDY_LOG", $DOCUMENT_ROOT."/"."tidy_log.txt"); //defines log file

define ("XML_ON",false); //have output conform to xml standards

define ("ALL_TIDY_ERRORS_TO_BROWSER", false); //sends all errors to browser

define ("FATAL_TIDY_ERRORS_TO_BROWSER",true); //sends fatal errors only to browser

define ("SHOW_POST_VARS", true); //show post variables as part of log banner

$tidy_options.="-i --tidy-mark false"; //adds indent and removes tidy meta tag
$html_tidy_path="tidy"; //sets path (directory name and file name) to the tidy executable
Important

phpTidyHt may not work properly unless the $html_tidy_path is set correctly. Some servers don't allow execution of an executable outside of the bin directories without an explicit designation of this path, so tidy won't execute unless this variable is set correctly.



The functions in the script should be placed in either your main script OR your include file, depending on whether or not you are using a common library include file.


5.5. phpTidyHt Options

The options available are turned on or off by setting the constants and/or by changing the $tidy_options variable (this variable is the option string that is fed to HTML Tidy on the command line). Thus all of the options that are available on the command line with tidy you can set with phpTidyHt. For a full list of available options for which you can set $tidyoptions to, type tidy -h on the command line AND see the HTML Tidy Home Page (neither source lists the full list of options). If the option can be set both by the $tidy_options variable or any of the constants, then $tidy_options controls. For example, if the XML_ON constant is set to false, but the $tidy_options contains --asxml, then the output will be rendered as XML.


6. Demo Page

To demonstrate the usefulness of this script, please visit the live demo . A copy of the phptidyht_demo.php script is also included in the latest version of phpTidyHt.


7. Code History

0.5 was the first and only beta. 1.0 is the first stable version of phpTidyHt (same as 0.5, with updated manual covering output buffering).


8. Code Contributions

Code modifications and contributions are welcome. Suggestions include:

Please contribute at the Sourceforge phpTidyHt Project Page


9. Further Information

For further information on HTML and XML standards and validation scnemes, visit the World Wide Web Consortium. The W3C site contains finalized and draft versions of the various W3C standards, references for HTML,XML, and CSS, an HTML/XML validator, and the home page for HTML Tidy.