TclXML

Steve Ball' NAME
::xml::parser - XML parser support for Tcl SYNOPSIS
package require xml package require parserclass xml2.6 ::xml::sgml::xml::tclparser ::xml::parserclass option ? arg arg ... ? ::xml::parser ? name? ? -option value ... ? parser option arg DESCRIPTION
TclXML provides event-based parsing of XML documents. The application may register callback scripts for certain document features, and when the parser encounters those features while parsing the document the callback is evaluated. The parser may also perform other functions, such as normalisation, validation and/or entity expansion. Generally, these functions are under the control of configuration options. Whether these functions can be performed at all depends on the parser implementation. The TclXML package provides a generic interface for use by a Tcl application, along with a low-level interface for use by a parser implementation. Each implementation provides a class of XML parser, and these register themselves using the ::xml::parserclass create command. One of the registered parser classes will be the default parser class. Loading the package with the generic package require xml command allows the package to automatically determine the default parser class. In order to select a particular parser class as the default, that class' package may be loaded directly, eg. package require expat. In all cases, all available parser classes are registered with the TclXML package, the difference is simply in which one becomes the default. COMMANDS
::xml::parserclass The ::xml::parserclass command is used to manage XML parser classes. Command Options The following command options may be used: create create name ? -createcommand script? ? -createentityparsercommand script? ? -parsecommand script? ? -configurecommand script? ? -getcommand script? ? -deletecommand script? Creates an XML parser class with the given name. destroy destroy name Destroys an XML parser class. info info names Returns information about registered XML parser classes. ::xml::parser The ::xml::parser command creates an XML parser object. The return value of the command is the name of the newly created parser. The parser scans an XML document's syntactical structure, evaluating callback scripts for each feature found. At the very least the parser will normalise the document and check the document for well-formedness. If the document is not well-formed then the -errorcommand option will be evaluated. Some parser classes may perform additional functions, such as validation. Additional features provided by the various parser classes are described in the section Parser Classes Parsing is performed synchronously. The command blocks until the entire document has been parsed. Parsing may be terminated by an application callback, see the section Callback Return Codes. Incremental parsing is also supported by using the -final configuration option. Configuration Options The ::xml::parser command accepts the following configuration options: -attlistdeclcommand -attlistdeclcommand script Specifies the prefix of a Tcl command to be evaluated whenever an attribute list declaration is encountered in the DTD subset of an XML document. The command evaluated is: script name attrname type default value where: name Element type name attrname Attribute name being declared type Attribute type default Attribute default, such as #IMPLIED value Default attribute value. Empty string if none given. -baseurl -baseurl URI Specifies the base URI for resolving relative URIs that may be used in the XML document to refer to external entities. -character color="#ffffff">acterdatacom datacommand -characterdata color="#ffffff">terdatacommand command script Specifies the prefix of a Tcl command to be evaluated whenever character data is encountered in the XML document being parsed. The command evaluated is: script data where: data Charac color="#ffffff">acter ter data in the docu color="#ffffff">ument ment -comment color="#ffffff">mentcom command -comment color="#ffffff">mentcom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated whenever a comment is encountered in the XML docu color="#ffffff">ument ment being parsed. The command evalu color="#ffffff">uated ated is: script data where: data Comment data -defaultcom color="#ffffff">command mand -defaultcom color="#ffffff">command mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when no other callback has been defined for a docu color="#ffffff">ument ment feature which has been encountered. The command evalu color="#ffffff">uated ated is: script data where: data Docu color="#ffffff">ument ment data -default color="#ffffff">tex expand color="#ffffff">pandin inter color="#ffffff">ternalen nalenti color="#ffffff">tities ties -default color="#ffffff">tex expand color="#ffffff">pandin inter color="#ffffff">ternalen nalenti color="#ffffff">tities ties boolean Speci color="#ffffff">ifies fies whether entities declared in the internal DTD subset are expanded with their replacement text. If entities are not expanded then the entity refer color="#ffffff">erences ences will be reported with no expansion. -doctype color="#ffffff">typecom command -doctype color="#ffffff">typecom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when the docu color="#ffffff">ument ment type decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script name public system dtd where: name The name of the docu color="#ffffff">ument ment element public Public identi color="#ffffff">tifier fier for the external DTD subset system System identi color="#ffffff">tifier fier for the external DTD subset. Usually a URI. dtd The internal DTD subset See also -startdoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand and -enddoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand. -element color="#ffffff">mentde declcom color="#ffffff">clcommand mand -element color="#ffffff">mentde declcom color="#ffffff">clcommand mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when an element markup decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script name model where: name The element type name model Content model speci color="#ffffff">ifi fica color="#ffffff">cation tion -elementend color="#ffffff">mentendcom command -elementend color="#ffffff">mentendcom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when an element end tag is encountered. The command evalu color="#ffffff">uated ated is: script name args where: name The element type name that has ended args Additional informa color="#ffffff">mation tion about this element Additional informa color="#ffffff">mation tion about the element takes the form of config color="#ffffff">figu ura color="#ffffff">ration tion options. Possi color="#ffffff">sible ble options are: -empty boolean The empty element syntax was used for this element -namespace uri The element is in the XML namespace associ color="#ffffff">ciated ated with the given URI -elementstart color="#ffffff">mentstartcom command -elementstart color="#ffffff">mentstartcom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when an element start tag is encountered. The command evalu color="#ffffff">uated ated is: script name attlist args where: name The element type name that has started attlist A Tcl list contain color="#ffffff">taining ing the attributes for this element. The list of attributes is format color="#ffffff">matted ted as pairs of attribute names and their values. args Additional informa color="#ffffff">mation tion about this element Additional informa color="#ffffff">mation tion about the element takes the form of config color="#ffffff">figu ura color="#ffffff">ration tion options. Possi color="#ffffff">sible ble options are: -empty boolean The empty element syntax was used for this element -namespace uri The element is in the XML namespace associ color="#ffffff">ciated ated with the given URI -namespacede color="#ffffff">pacedecls cls list The start tag included one or more XML Namespace decla color="#ffffff">lara rations. list is a Tcl list giving the namespaces declared. The list is format color="#ffffff">matted ted as pairs of values, the first value is the namespace URI and the second value is the prefix used for the namespace in this docu color="#ffffff">ument. ment. A default XML namespace decla color="#ffffff">lara ration will have an empty string for the prefix. -endc color="#ffffff">cdata datasec color="#ffffff">section tioncom color="#ffffff">command mand -endc color="#ffffff">cdata datasec color="#ffffff">section tioncom color="#ffffff">command mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when end of a CDATA section is encountered. The command is evalu color="#ffffff">uated ated with no further arguments. -enddoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand -enddoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when end of the docu color="#ffffff">ument ment type decla color="#ffffff">lara ration is encountered. The command is evalu color="#ffffff">uated ated with no further arguments. -entity color="#ffffff">tyde declcom color="#ffffff">clcommand mand -entity color="#ffffff">tyde declcom color="#ffffff">clcommand mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when an entity decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script name args where: name The name of the entity being declared args Additional informa color="#ffffff">mation tion about the entity decla color="#ffffff">lara ration. An internal entity shall have a single argument, the replacement text. An external parsed entity shall have two additional arguments, the public and system indenti color="#ffffff">tifiers fiers of the external resource. An external unparsed entity shall have three additional arguments, the public and system identi color="#ffffff">tifiers fiers followed by the notation name. -entityref color="#ffffff">tyrefer erence color="#ffffff">encecom command -entityref color="#ffffff">tyrefer erence color="#ffffff">encecom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when an entity refer color="#ffffff">erence ence is encountered. The command evalu color="#ffffff">uated ated is: script name where: name The name of the entity being refer color="#ffffff">erenced enced -errrocom color="#ffffff">command mand -errorcom color="#ffffff">command mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a fatal error is detected. The error may be due to the XML docu color="#ffffff">ument ment not being wellformed. In the case of a vali color="#ffffff">idat dating parser class, the error may also be due to the XML docu color="#ffffff">ument ment not obeying validity constraints. By default, a callback script is provided which causes an error return code, but an applica color="#ffffff">cation tion may supply a script which attempts to continue parsing. The command evalu color="#ffffff">uated ated is: script errorcode errormsg where: errorcode A single word description of the error, intended for use by an applica color="#ffffff">cation tion errormsg A humanread color="#ffffff">readable able description of the error -externalen color="#ffffff">nalenti tity color="#ffffff">tycom command -externalen color="#ffffff">nalenti tity color="#ffffff">tycom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated to resolve an external entity refer color="#ffffff">erence. ence. If the parser has been config color="#ffffff">figured ured to vali color="#ffffff">idate date the XML docu color="#ffffff">ument, ment, a default script is supplied that resolves the URI given as the system identi color="#ffffff">tifier fier of the external entity and recursively parses the entity's data. If the parser has been config color="#ffffff">figured ured as a nonval color="#ffffff">vali idat color="#ffffff">dating ing parser, then by default external entities are not resolved. This option can be used to override the default behav color="#ffffff">haviour. iour. The command evalu color="#ffffff">uated ated is: script name baseuri uri id where: name The Tcl command name of the current parser baseuri An absolute URI for the current entity which is to be used to resolve rela color="#ffffff">ative tive URIs uri The system identi color="#ffffff">tifier fier of the external entity, usually a URI id The public identi color="#ffffff">tifier fier of the external entity. If no public identi color="#ffffff">tifier fier was given in the entity decla color="#ffffff">lara ration then id will be an empty string. -final -final boolean Speci color="#ffffff">ifies fies whether the XML docu color="#ffffff">ument ment being parsed is complete. If the docu color="#ffffff">ument ment is to be incremen color="#ffffff">mentally tally parsed then this option will be set to false, and when the last fragment of docu color="#ffffff">ument ment is parsed it is set to true. For example, set parser [::xml::parser -final 0] $parser parse $data1 $parser parse $data2 $parser configure -final 1 $parser parse $finaldata -ignorewhitespace -ignorewhitespace boolean If this option is set to true then spans of charac color="#ffffff">acter ter data in the XML docu color="#ffffff">ument ment which are composed only of whitespace (CR, LF, space, tab) will not be reported to the applica color="#ffffff">cation. tion. In other words, the data passed to every invoca color="#ffffff">cation tion of the -charac color="#ffffff">acter terdat color="#ffffff">data acom color="#ffffff">command mand script will contain at least one nonwhite color="#ffffff">whitespace space charac color="#ffffff">acter. ter. -notation color="#ffffff">tionde declcom color="#ffffff">clcommand mand -notation color="#ffffff">tionde declcom color="#ffffff">clcommand mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a notation decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script name uri where: name The name of the notation uri An external identi color="#ffffff">tifier fier for the notation, usually a URI. -notstan color="#ffffff">standalonecom dalonecommand -notstan color="#ffffff">standalonecom dalonecommand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when the parser determines that the XML docu color="#ffffff">ument ment being parsed is not a standalone docu color="#ffffff">ument. ment. -paramen color="#ffffff">menti tity color="#ffffff">typars parsing -paramen color="#ffffff">menti tity color="#ffffff">typars parsing boolean Controls whether external parame color="#ffffff">eter ter entities are parsed. -parame color="#ffffff">eter teren color="#ffffff">enti tity color="#ffffff">tyde declcom color="#ffffff">clcommand mand -parame color="#ffffff">eter teren color="#ffffff">enti tity color="#ffffff">tyde declcom color="#ffffff">clcommand mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a parame color="#ffffff">eter ter entity decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script name args where: name The name of the parame color="#ffffff">eter ter entity args For an internal parame color="#ffffff">eter ter entity there is only one additional argument, the replacement text. For external parame color="#ffffff">eter ter entities there are two additional arguments, the system and public identi color="#ffffff">tifiers fiers respectively. -parser -parser name The name of the parser class to instanti color="#ffffff">tiate ate for this parser object. This option may only be speci color="#ffffff">ified fied when the parser instance is created. -processin color="#ffffff">cessingin ginstruc color="#ffffff">struction tioncom color="#ffffff">command mand -processin color="#ffffff">cessingin ginstruc color="#ffffff">struction tioncom color="#ffffff">command mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a process color="#ffffff">cessing ing instruction is encountered. The command evalu color="#ffffff">uated ated is: script target data where: target The name of the process color="#ffffff">cessing ing instruction target data Remaining data from the process color="#ffffff">cessing ing instruction -reportempty -reportempty boolean If this option is enabled then when an element is encountered that uses the special empty element syntax, additional arguments are appended to the -elementstart color="#ffffff">mentstartcom command and -elementend color="#ffffff">mentendcom command callbacks. The arguments -empty 1 are appended. For example: script -empty 1 -startcdata color="#ffffff">datasec section color="#ffffff">tioncom command -startcdata color="#ffffff">datasec section color="#ffffff">tioncom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when the start of a CDATA section section is encountered. No arguments are appended to the script. -startdoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand -startdoc color="#ffffff">doctype typede color="#ffffff">declcom clcommand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated at the start of a docu color="#ffffff">ument ment type decla color="#ffffff">lara ration. No arguments are appended to the script. -unknownen color="#ffffff">nencod coding color="#ffffff">ingcom command -unknownen color="#ffffff">nencod coding color="#ffffff">ingcom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a charac color="#ffffff">acter ter is encountered with an unknown encoding. This option has not been implemented. -unparsedenti color="#ffffff">tity tyde color="#ffffff">declcom clcommand -unparsedenti color="#ffffff">tity tyde color="#ffffff">declcom clcommand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a decla color="#ffffff">lara ration is encountered for an unparsed entity. The command evalu color="#ffffff">uated ated is: script system public notation where: system The system identi color="#ffffff">tifier fier of the external entity, usually a URI public The public identi color="#ffffff">tifier fier of the external entity notation The name of the notation for the external entity -vali color="#ffffff">idate date -vali color="#ffffff">idate date boolean Enables vali color="#ffffff">ida dation of the XML docu color="#ffffff">ument ment to be parsed. Any changes to this option are ignored after an XML docu color="#ffffff">ument ment has started to be parsed, but the option may be changed after a reset. -warning color="#ffffff">ingcom command -warning color="#ffffff">ingcom command script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when a warning condi color="#ffffff">dition tion is detected. A warning condi color="#ffffff">dition tion is where the XML docu color="#ffffff">ument ment has not been authored correctly, but is still wellformed and may be valid. For example, the special empty element syntax may be used for an element which has not been declared to have empty content. By default, a callback script is provided which silently ignores the warning. The command evalu color="#ffffff">uated ated is: script warning color="#ffffff">ingcode code warningmsg where: warning color="#ffffff">ingcode code A single word description of the warning, intended for use by an applica color="#ffffff">cation tion wanringmsg A humanread color="#ffffff">readable able description of the warning -xmldeclcom color="#ffffff">clcommand mand -xmldeclcom color="#ffffff">clcommand mand script Speci color="#ffffff">ifies fies the prefix of a Tcl command to be evalu color="#ffffff">uated ated when the XML decla color="#ffffff">lara ration is encountered. The command evalu color="#ffffff">uated ated is: script version encoding standalone where: version The version number of the XML speci color="#ffffff">ifi fica color="#ffffff">cation tion to which this docu color="#ffffff">ument ment purports to conform encoding The charac color="#ffffff">acter ter encoding of the docu color="#ffffff">ument ment standalone A boolean declaring whether the docu color="#ffffff">ument ment is standalone Parser Command The ::xml::parser command creates a new Tcl command with the same name as the parser. This command may be used to invoke vari color="#ffffff">ious ous opera color="#ffffff">ations tions on the parser object. It has the follow color="#ffffff">lowing ing general form: name option arg option and the arg determine the exact behav color="#ffffff">haviour iour of the command. The follow color="#ffffff">lowing ing commands are possi color="#ffffff">sible ble for parser objects: cget cget -option Returns the current value of the config color="#ffffff">figu ura color="#ffffff">ration tion option given by option. Option may have any of the values accepted by the parser object. config color="#ffffff">figure ure config color="#ffffff">figure ure ? -option value ... ? Modify the config color="#ffffff">figu ura color="#ffffff">ration tion options of the parser object. Option may have any of the values accepted by the parser object. entity color="#ffffff">typarser parser entity color="#ffffff">typarser parser ? option value ... ? Creates a new parser object. The new object inherits the same config color="#ffffff">figu ura color="#ffffff">ration tion options as the parent parser object, but is able to parse XML data in a parsed entity. The option -dtdsub color="#ffffff">subset set allows markup decla color="#ffffff">lara rations to be treated as being in the internal or external DTD subset. free free name Frees all resources associ color="#ffffff">ciated ated with the parser object. The object is not usable after this command has been invoked. get get name args Returns informa color="#ffffff">mation tion about the XML docu color="#ffffff">ument ment being parsed. Each parser class provides differ color="#ffffff">ferent ent informa color="#ffffff">mation, tion, see the docu color="#ffffff">umen menta color="#ffffff">tation tion for the parser class. parse parse xml args Parses the XML docu color="#ffffff">ument. ment. The usual desired effect is for vari color="#ffffff">ious ous applica color="#ffffff">cation tion callbacks to be evalu color="#ffffff">uated. ated. Other functions will also be performed by the parser class, at the very least this includes checking the XML docu color="#ffffff">ument ment for wellformed color="#ffffff">formedness. ness. reset reset Initialises the parser object in prepara color="#ffffff">ration tion for parsing a new XML docu color="#ffffff">ument. ment. CALLBACK RETURN
CODES
Every callback script evalu color="#ffffff">uated ated by a parser may return a return code other than TCL_OK. Return codes are interpreted as follows: break Suppresses invoca color="#ffffff">cation tion of all further callback scripts. The parse method returns the TCL_OK return code. continue Suppresses invoca color="#ffffff">cation tion of further callback scripts until the current element has finished. error Suppresses invoca color="#ffffff">cation tion of all further callback scripts. The parse method also returns the TCL_ERROR return code. default Any other return code suppresses invoca color="#ffffff">cation tion of all further callback scripts. The parse method returns the same return code. APPLICA color="#ffffff">CATION TION
EXAMPLES This script outputs the charac color="#ffffff">acter ter data of an XML docu color="#ffffff">ument ment read from stdin. package require xml proc cdata {data args} { puts -nonewline $data } set parser [::xml::parser -characterdatacommand cdata] $parser parse [read stdin] This script counts the number of elements in an XML docu color="#ffffff">ument ment read from stdin. package require xml proc EStart {varName name attlist args} { upvar #0 $varName var incr var } set count 0 set parser [::xml::parser -elementstartcommand [list EStart count]] $parser parse [read stdin] puts "The XML document contains $count elements" PARSER
CLASSES
This section will discuss how a parser class is implemented. Tcl Parser Class The pureTcl parser class requires no compi color="#ffffff">pila lation tionit is a collec color="#ffffff">lection tion of Tcl scripts. This parser implemen color="#ffffff">menta tation is nonval color="#ffffff">vali idat color="#ffffff">dating, ing, ie. it can only check wellformed color="#ffffff">formedness ness in a docu color="#ffffff">ument. ment. However, by enabling the -vali color="#ffffff">idate date option it will read the docu color="#ffffff">ument's ment's DTD and resolve external entities. This parser implemen color="#ffffff">menta tation aims to implement XML v1.0 and supports XML Namespaces. Gener color="#ffffff">erally ally the parser produces XML Infoset informa color="#ffffff">mation tion items. That is, it gives the applica color="#ffffff">cation tion a slightly higherlevel view than the raw XML syntax. For example, it does not report CDATA Sections. Expat Parser Class This section will discuss the Expat parser class. SEE
ALSO
TclDOM, a Tcl interface for the W3C Docu color="#ffffff">ument ment Object Model. KEYWORDS Tcl Built-In Commands Tcl TclXML(n)

manual pages:

3 A B C D E F G H I L M N O P Q R S T U W X _
a b c d e f g h i j k l m n o p q r s t u v w x y z



www.osxterminal.com is a website by Andreas Wacker