inherit Parser._parser.XML : XML
string
autoconvert(string
xml
)
mapping
(string
:string
|array
|mapping
) node_to_struct(.NSTree.NSNode
|.Tree.Node
rootnode
)
XML parsing made easy.
A hierarchical structure of nested mappings and arrays representing the
XML structure starting at rootnode
using a minimal depth.
| The text content of the node. |
| The arguments on this node. |
| The text content of a simple subnode. |
| A list of subnodes. |
| A complex subnode (recurse). |
Parser.XML.node_to_struct(Parser.XML.NSTree.parse_input("<foo>bar</foo>"));
string
text_quote(string
data
)
Quotes the string given in data
by escaping &, < and >.
void
allow_rxml_entities(bool
yes_no
)
void
compat_allow_errors(string
version
)
Set whether the parser should allow certain errors for
compatibility with earlier versions. version
can be:
| Allow more data after the root element. |
| Allow multiple and invalidly placed "<?xml ... ?>" and "<!DOCTYPE ... >" declarations (invalid "<?xml ... ?>" declarations are otherwise treated as normal PI:s). Allow "<![CDATA[ ... ]]>" outside the root element. Allow the root element to be absent. |
version
can also be zero to enable all error checks.
void
define_entity(string
entity
, string
s
, function
(:void
) cb
, mixed
... extras
)
Define an entity or an SMEG.
entity
Entity name, or SMEG name (if preceeded by a "%"
).
s
Expansion of the entity. Entity evaluation will be performed.
define_entity_raw()
void
define_entity_raw(string
entity
, string
raw
)
Define an entity or an SMEG.
entity
Entity name, or SMEG name (if preceeded by a "%"
).
raw
Verbatim expansion of the entity.
define_entity()
string
lookup_entity(string
entity
)
Returns the verbatim expansion of the entity.
array
parse(string
xml
, string
context
, function
(:void
) cb
, mixed
... extra_args
)
array
parse(string
xml
, function
(:void
) cb
, mixed
... extra_args
)
mixed
parse_dtd(string
dtd
, string
context
, function
(:void
) cb
, mixed
... extras
)
mixed
parse_dtd(string
dtd
, function
(:void
) cb
, mixed
... extras
)
Parser.XML.Simple.Context Parser.XML.Simple.Context(
string
s
, string
context
, int
flags
, function
(:void
) cb
, mixed
... extra_args
)
Parser.XML.Simple.Context Parser.XML.Simple.Context(
string
s
, int
flags
, function
(:void
) cb
, mixed
... extra_args
)
s
context
These two arguments are passed along to push_string()
.
flags
Parser flags.
cb
Callback function. This function gets called at various stages during the parsing.
mixed
parse_dtd()
string
parse_entity()
mixed
parse_xml()
void
push_string(string
s
)
void
push_string(string
s
, string
context
)
Add a string to parse at the current position.
s
String to insert at the current parsing position.
context
Optional context used to refer to the inserted string.
This is typically an URL, but may also be an entity
(preceeded by an "&"
) or a SMEG reference
(preceeded by a "%"
).
Not used by the XML parser as such, but is simply
passed into the callbackinfo mapping as
the field "context"
where it can be useful
for eg resolving relative URLs when parsing DTDs,
or for determining where errors occur.
Validating XML parser.
Validates an XML file according to a DTD.
inherit .Simple : Simple
Extends the Simple XML parser.
string
|zero
get_external_entity(string
sysid
, string
|void
pubid
, mapping
|void
info
, mixed
... extra
)
Get an external entity.
Called when a <!DOCTYPE> with a SYSTEM identifier is encountered, or when an entity reference needs expanding.
sysid
The SYSTEM identifier.
pubid
The PUBLIC identifier (if any).
info
The callbackinfo mapping containing the current parser state.
extra
The extra arguments as passed to parse()
or parse_dtd()
.
Returns a string with a DTD fragment on success.
Returns 0
(zero) on failure.
Returning zero will cause the validator to report an error.
In Pike 7.7 and earlier info
had the value 0
(zero).
The default implementation always returns 0
(zero).
Override this function to provide other behaviour.
parse()
, parse_dtd()
int
isname(string
s
)
Check if s
is a valid Name.
int
isnames(string
s
)
Check if s
is a valid list of Names.
int
isnmtoken(string
s
)
Check if s
is a valid Nmtoken.
int
isnmtokens(string
s
)
Check if s
is a valid list of Nmtokens.
array
parse(string
data
, string
|function
(string
, string
, mapping
, array
|string
, mapping
(string
:mixed
), __unknown__
... :mixed
) callback
, mixed
... extra
)
Document this function
array
parse_dtd(string
data
, string
|function
(string
, string
, mapping
, array
|string
, mapping
(string
:mixed
), __unknown__
... :mixed
) callback
, mixed
... extra
)
Document this function
private
mixed
validate(string
kind
, string
name
, mapping
attributes
, array
|string
contents
, mapping
(string
:mixed
) info
, function
(string
, string
|zero
, mapping
|zero
, array
|string
, mapping
(string
:mixed
), __unknown__
... :mixed
) callback
, array
(mixed
) extra
)
The validation callback function.
::parse()
XML Element node.
inherit Node : Node
inherit Text : Text
inherit Node : Node
inherit CharacterData : CharacterData
int
Parser.XML.DOM.DOMException.code
protected
local
void
__create__(int
code
)
Parser.XML.DOM.DOMException Parser.XML.DOM.DOMException(
int
code
)
inherit AbstractDOMParser : AbstractDOMParser
protected
inherit Parser.XML.Validating : xml
inherit Node : Node
inherit Node : Node
inherit Node : Node
inherit Node : node
inherit Node : Node
inherit Node : Node
inherit AbstractDOMParser : AbstractDOMParser
protected
inherit .Simple : xml
inherit Node : Node
inherit Node : Node
inherit CharacterData : CharacterData
A namespace aware version of Parser.XML.Tree. This implementation does as little validation as possible, so e.g. you can call your namespace xmlfoo without complaints.
inherit Parser.XML.Tree : Tree
NSNode
parse_input(string
data
, void
|string
default_ns
)
Takes a XML string data
and produces a namespace node tree.
If default_ns
is given, it will be used as the default namespace.
Throws an error
when an error is encountered during XML
parsing.
string
visualize(Node
n
, void
|string
indent
)
Makes a visualization of a node graph suitable for printing out on a terminal.
> object x = parse_input("<a><b><c/>d</b><b><e/><f>g</f></b></a>"); > write(visualize(x)); Node(ROOT) NSNode(ELEMENT,"a") NSNode(ELEMENT,"b") NSNode(ELEMENT,"c") NSNode(TEXT) NSNode(ELEMENT,"b") NSNode(ELEMENT,"e") NSNode(ELEMENT,"f") NSNode(TEXT) Result 1: 201
Namespace aware node.
inherit Node : Node
void
add_namespace(string
ns
, void
|string
symbol
, void
|bool
chain
)
Adds a new namespace to this node. The preferred symbol to
use to identify the namespace can be provided in the symbol
argument. If chain
is set, no attempts to overwrite an
already defined namespace with the same identifier will be made.
void
change_namespace(string
from
, string
to
)
Change all elements and attributes in the subtree in namespace
from
to namespace to
. In case an attribute is defined in
both namespaces it will be overwritten.
mapping
child_namespaces(mapping
(Node
:mapping
(string
:string
)) intermediate
)
Return the defined namespaces from the tree.
intermediate
If namespaces are clobbered, the node that needs additional xmlns attributes are added to this mapping.
mapping
(string
:string
) diff_namespaces()
Returns the difference between this node and its parent namespaces.
string
get_default_ns()
Returns the default namespace in the current scope.
mapping
(string
:string
) get_defined_nss()
Returns a mapping with all the namespaces defined in the current scope, except the default namespace.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
string
get_ns()
Returns the namespace in which the current element is defined in.
mapping
(string
:mapping
(string
:string
)) get_ns_attributes()
Returns all the attributes in all namespaces that is associated with this node.
The returned mapping is the same as the one in the node, so destructive changes will affect the node.
mapping
(string
:string
) get_ns_attributes(string
namespace
)
Returns the attributes in this node that is declared in the provided namespace.
string
get_ns_short(string
ns
)
Returns the short name for the given namespace in this context. Returns the empty string if the namespace is the default namespace. Returns 0 if the namespace is unknown.
mapping
(string
:string
) get_short_attributes()
Return the attributes for the element with the names given their short name prefixes.
string
get_xml_name()
Returns the element name as it occurs in xml files. E.g. "zonk:name" for the element "name" defined in a namespace denoted with "zonk". It will look up a symbol for the namespace in the symbol tables for the node and its parents. If none is found a new label will be generated by hashing the namespace.
void
remove_child(NSNode
child
)
The remove_child is a not updated to take care of name
space issues. To properly remove all the parents name spaces
from the chid, call remove_node
in the child.
void
rename_namespace(string
from
, string
to
)
Renames the namespace prefix of a namespace. No checks will be made to see if the namespace represented is the same throughout the subtree.
string
render_xml(void
|string
encoding
)
Renders the object tree to a string.
encoding
The character encoding to be used. Defaults the character encoding in the XML header, or UTF-8 if none.
A somewhat DOM-like library that implements lazy generation of the node tree, i.e. it's generated from the data upon lookup. There's also a little bit of XPath evaluation to do queries on the node tree.
Implementation note: This is generally more pragmatic than
Parser.XML.DOM
, meaning it's not so pretty and compliant, but
more efficient.
Implementation status: There's only enough implemented to parse a
node tree from source and access it, i.e. modification functions
aren't implemented. Data hiding stuff like NodeList and
NamedNodeMap is not implemented, partly since it's cumbersome to
meet the "live" requirement. Also, Parser.HTML
is used in XML
mode to parse the input. Thus it's too error tolerant to be XML
compliant, and it currently doesn't handle DTD elements, like
"<!DOCTYPE", or the XML declaration (i.e. "<?xml
version='1.0'?>".
Document
parse(string
source
, void
|int
raw_values
)
Normally entities are decoded, and Node.xml_format
will encode
them again. If raw_values
is nonzero then all text and attribute
values are instead kept in their original form.
The node tree is very likely a cyclic structure, so it might be an
good idea to destruct it when you're finished with it, to avoid
garbage. Destructing the Document
object always destroys all
nodes in it.
inherit NodeWithChildElements : NodeWithChildElements
array
(Element
) get_elements(string
name
)
Note that this one looks among the top level elements, as
opposed to get_elements_by_tag_name
. This means that if the
document is correct, you can only look up the single top level
element here.
Not DOM compliant.
int
get_raw_values()
Not DOM compliant.
Basic node.
string
get_text_content()
If the raw_values flag is set in the owning document, the text is returned with entities and CDATA blocks intact.
parse
mapping
(string
:string
)|Node
|array
(mapping
(string
:string
)|Node
)|string
|zero
simple_path(string
path
, void
|int
xml_format
)
Access a node or a set of nodes through an expression that is a subset of an XPath RelativeLocationPath in abbreviated form.
That means one or more Steps separated by "/" or "//". A Step consists of an AxisSpecifier followed by a NodeTest and then optionally by one or more Predicate's.
"/" before a Step causes it to be matched only against the immediate children of the node(s) selected by the previous Step. "//" before a Step causes it to be matched against any children in the tree below the node(s) selected by the previous Step. The initial selection before the first Step is this element.
The currently allowed AxisSpecifier NodeTest combinations are:
name to select all elements with the given name. The name can be "*" to select all.
@name to select all attributes with the given name. The name can be "*" to select all.
comment() to select all comments.
text() to select all text and CDATA blocks. Note that all entity references are also selected, under the assumption that they would expand to text only.
processing-instruction("name") to select all processing instructions with the given name. The name can be left out to select all. Either ' or " may be used to delimit the name. For compatibility, it can also occur without surrounding quotes.
node() to select all nodes, i.e. the whole content of an element node.
. to select the currently selected element itself.
A Predicate is on the form [PredicateExpr] where PredicateExpr currently can be in any of the following forms:
An integer indexes one item in the selected set, according to the document order. A negative index counts from the end of the set.
A RelativeLocationPath as specified above. It's executed for each element in the selected set and those where it yields an empty result are filtered out while the rest remain in the set.
A RelativeLocationPath as specified above followed by ="value". The path is executed for each element in the selected set and those where the text result of it is equal to the given value remain in the set. Either ' or " may be used to delimit the value.
If xml_format
is nonzero, the return value is an xml
formatted string of all the matched nodes, in document order.
Otherwise the return value is as follows:
Attributes are returned as one or more index/value pairs in a mapping. Other nodes are returned as the node objects. If the expression is on a form that can give at most one answer (i.e. there's a predicate with an integer index) then a single mapping or node is returned, or zero if there was no match. If the expression can give more answers then the return value is an array containing zero or more attribute mappings and/or nodes. The array follows document order.
Not DOM compliant.
string
xml_format()
Returns the formatted XML that corresponds to the node tree.
Not DOM compliant.
Node with child elements.
inherit NodeWithChildren : NodeWithChildren
array
(Element
) get_descendant_elements()
Returns all descendant elements in document order.
Not DOM compliant.
array
(Node
) get_descendant_nodes()
Returns all descendant nodes (except attribute nodes) in document order.
Not DOM compliant.
array
(Element
) get_elements(string
name
)
Lightweight variant of get_elements_by_tag_name
that returns
a simple array instead of a fancy live NodeList.
Not DOM compliant.
XML parser that generates node-trees.
Has some support for XML namespaces http://www.w3.org/TR/REC-xml-names/ RFC 2518 section 23.4.
This module defines two sets of node trees;
the SimpleNode
-based, and the Node
-based.
The main difference between the two, is that
the Node
-based trees have parent pointers,
which tend to generate circular data references
and thus garbage.
There are some more subtle differences between the two. Please read the documentation carefully.
constant
int
Parser.XML.Tree.DTD_ATTLIST
constant
int
Parser.XML.Tree.DTD_ELEMENT
constant
int
Parser.XML.Tree.DTD_ENTITY
constant
int
Parser.XML.Tree.DTD_NOTATION
constant
int
Parser.XML.Tree.STOP_WALK
constant
int
Parser.XML.Tree.XML_ATTR
Attribute nodes are created on demand
constant
int
Parser.XML.Tree.XML_COMMENT
constant
int
Parser.XML.Tree.XML_DOCTYPE
constant
int
Parser.XML.Tree.XML_ELEMENT
constant
int
Parser.XML.Tree.XML_HEADER
constant
Parser.XML.Tree.XML_NODE
constant
int
Parser.XML.Tree.XML_PI
constant
int
Parser.XML.Tree.XML_ROOT
constant
int
Parser.XML.Tree.XML_TEXT
string
attribute_quote(string
data
, void
|string
ignore
)
Quotes the string given in data
by escaping &, <, >, ' and ".
Node
parse_file(string
path
, bool
|void
parse_namespaces
)
Loads the XML file path
, creates a node tree representation and
returns the root node.
RootNode
parse_input(string
data
, void
|bool
no_fallback
, void
|bool
force_lowercase
, void
|mapping
(string
:string
) predefined_entities
, void
|bool
parse_namespaces
, ParseFlags
|void
flags
)
Takes an XML string and produces a node tree.
flags
is not used for PARSE_WANT_ERROR_CONTEXT
,
PARSE_FORCE_LOWERCASE
or PARSE_ENABLE_NAMESPACES
since they
are covered by the separate flag arguments.
string
roxen_attribute_quote(string
data
, void
|string
ignore
)
Quotes strings just like attribute_quote
, but entities in the
form &foo.bar; will not be quoted.
string
roxen_text_quote(string
data
)
Quotes strings just like text_quote
, but entities in the form
&foo.bar; will not be quoted.
SimpleRootNode
simple_parse_file(string
path
, void
|mapping
predefined_entities
, ParseFlags
|void
flags
, string
|void
default_namespace
)
Loads the XML file path
, creates a SimpleNode
tree representation and
returns the root node.
SimpleRootNode
simple_parse_input(string
data
, void
|mapping
predefined_entities
, ParseFlags
|void
flags
, string
|void
default_namespace
)
Takes an XML string and produces a SimpleNode
tree.
string
text_quote(string
data
)
Quotes the string given in data
by escaping &, < and >.
Flags used together with simple_parse_input()
and
simple_parse_file()
.
constant
Parser.XML.Tree.PARSE_CHECK_ALL_ERRORS
constant
Parser.XML.Tree.PARSE_COMPAT_ALLOW_ERRORS_7_2
constant
Parser.XML.Tree.PARSE_COMPAT_ALLOW_ERRORS_7_6
constant
Parser.XML.Tree.PARSE_DISALLOW_RXML_ENTITIES
constant
Parser.XML.Tree.PARSE_ENABLE_NAMESPACES
constant
Parser.XML.Tree.PARSE_FORCE_LOWERCASE
constant
Parser.XML.Tree.PARSE_WANT_ERROR_CONTEXT
@Pike.Annotations.Implements
(AbstractSimpleNode
)
Base class for nodes with parent pointers.
inherit AbstractSimpleNode : AbstractSimpleNode
AbstractNode
add_child(AbstractNode
c
)
Adds the node c
to the list of children of this node. The
node is added before the node old
, which is assumed to be an
existing child of this node. The node is added first if old
is zero.
Returns the new child node, NOT the current node.
The new child node is returned.
AbstractNode
add_child_after(AbstractNode
c
, AbstractNode
old
)
Adds the node c
to the list of children of this node. The
node is added after the node old
, which is assumed to be an
existing child of this node. The node is added first if old
is zero.
The current node.
AbstractNode
add_child_before(AbstractNode
c
, AbstractNode
old
)
Adds the node c
to the list of children of this node. The
node is added before the node old
, which is assumed to be an
existing child of this node. The node is added last if old
is
zero.
The current node.
AbstractNode
clone(void
|int(-1..1)
direction
)
Clones the node, optionally connected to parts of the tree. If direction is -1 the cloned nodes parent will be set, if direction is 1 the clone nodes childen will be set.
void
fix_tree()
Fix all parent pointers recursively in a tree that has been
built with tmp_add_child
.
array
(AbstractNode
) get_ancestors(bool
include_self
)
Returns a list of all ancestors, with the top node last.
The list will start with this node if include_self
is set.
array
(AbstractNode
) get_following()
Returns all the nodes that follows after the current one.
array
(AbstractNode
) get_following_siblings()
Returns all following siblings, i.e. all siblings present after this node in the parents children list.
AbstractNode
get_parent()
Returns the parent node.
array
(AbstractNode
) get_preceding()
Returns all preceding nodes, excluding this nodes ancestors.
array
(AbstractNode
) get_preceding_siblings()
Returns all preceding siblings, i.e. all siblings present before this node in the parents children list.
AbstractNode
get_root()
Follows all parent pointers and returns the root node.
array
(AbstractNode
) get_siblings()
Returns all siblings, including this node.
optional
AbstractNode
low_clone()
Returns an initialized copy of the node.
The returned node has no children, and no parent.
void
remove_child(AbstractNode
c
)
Removes all occurrences of the provided node from the called nodes list of children. The removed nodes parent reference is set to null.
void
remove_node()
Removes this node from its parent. The parent reference is set to null.
AbstractNode
|zero
replace_child(AbstractNode
old
, AbstractNode
|array
(AbstractNode
) new
)
Replaces the first occurrence of the old node child with the new node child or children. All parent references are updated.
The returned value is NOT the current node.
Returns the new child node.
void
replace_children(array
(AbstractNode
) children
)
Replaces the nodes children with the provided ones. All parent references are updated.
AbstractNode
|array
(AbstractNode
) replace_node(AbstractNode
|array
(AbstractNode
) new
)
Replaces this node with the provided one.
Returns the new node.
void
set_parent(AbstractNode
parent
)
Sets the parent node to parent
.
AbstractNode
tmp_add_child(AbstractNode
c
)
AbstractNode
tmp_add_child_before(AbstractNode
c
, AbstractNode
old
)
AbstractNode
tmp_add_child_after(AbstractNode
c
, AbstractNode
old
)
Variants of add_child
, add_child_before
and
add_child_after
that doesn't set the parent pointer in the
newly added children.
This is useful while building a node tree, to get efficient
refcount garbage collection if the build stops abruptly.
fix_tree
has to be called on the root node when the building
is done.
Base class for nodes.
AbstractSimpleNode
|zero
res = Parser.XML.Tree.AbstractSimpleNode()
[ pos
]
The [] operator indexes among the node children, so
node[0]
returns the first node and node[-1]
the last.
The [] operator will select a node from all the nodes children, not just its element children.
AbstractSimpleNode
add_child(AbstractSimpleNode
c
)
Adds the given node to the list of children of this node. The new node is added last in the list.
The return value differs from the one returned
by Node()->add_child()
.
The current node.
AbstractSimpleNode
add_child_after(AbstractSimpleNode
c
, AbstractSimpleNode
old
)
Adds the node c
to the list of children of this node. The
node is added after the node old
, which is assumed to be an
existing child of this node. The node is added first if old
is zero.
The current node.
AbstractSimpleNode
add_child_before(AbstractSimpleNode
c
, AbstractSimpleNode
old
)
Adds the node c
to the list of children of this node. The
node is added before the node old
, which is assumed to be an
existing child of this node. The node is added last if old
is
zero.
The current node.
optional
AbstractSimpleNode
clone()
Returns a clone of the sub-tree rooted in the node.
int
count_children()
Returns the number of children of the node.
array
(AbstractSimpleNode
) get_children()
Returns all the nodes children.
array
(AbstractSimpleNode
) get_descendants(bool
include_self
)
Returns a list of all descendants in document order. Includes
this node if include_self
is set.
AbstractSimpleNode
|zero
get_last_child()
Returns the last child node or zero.
int
iterate_children(function
(AbstractSimpleNode
, mixed
... :int
|void
) callback
, mixed
... args
)
Iterates over the nodes children from left to right, calling the
function callback
for every node. If the callback function
returns STOP_WALK
the iteration is promptly aborted and
STOP_WALK
is returned.
optional
AbstractSimpleNode
low_clone()
Returns an initialized copy of the node.
The returned node has no children.
optional
this_program
node_factory(int
type
, string
name
, mapping
attr
, string
text
)
Optional factory for creating contained nodes.
type
Type of node to create. One of:
| XML text. |
| XML comment. |
| <?xml?>-header |
| XML processing instruction. |
| XML element tag. |
| DTD information. |
| |
| |
| |
|
name
Name of the tag if applicable.
attr
Attributes for the tag if applicable.
text
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Define this function to provide application-specific XML nodes.
Returns one of
| A node object representing the XML tag. |
|
|
|
|
This function is only relevant for XML_ELEMENT
nodes.
This function is not available in Pike 7.6 and earlier.
In Pike 8.0 and earlier this function was only called in root nodes.
void
remove_child(AbstractSimpleNode
c
)
Removes all occurrences of the provided node from the list of children of this node.
AbstractSimpleNode
|zero
replace_child(AbstractSimpleNode
old
, AbstractSimpleNode
|array
(AbstractSimpleNode
) new
)
Replaces the first occurrence of the old node child with the new node child or children.
The return value differs from the one returned
by Node()->replace_child()
.
Returns the current node on success, and 0
(zero)
if the node old
wasn't found.
void
replace_children(array
(AbstractSimpleNode
) children
)
Replaces the nodes children with the provided ones.
int
walk_inorder(function
(AbstractSimpleNode
, mixed
... :int
|void
) callback
, mixed
... args
)
Traverse the node subtree in inorder, left subtree first, then
root node, and finally the remaining subtrees, calling the function
callback
for every node. If the function callback
returns
STOP_WALK
the traverse is promptly aborted and STOP_WALK
is returned.
int
walk_postorder(function
(AbstractSimpleNode
, mixed
... :int
|void
) callback
, mixed
... args
)
Traverse the node subtree in postorder, first subtrees from left
to right, then the root node, calling the function callback
for every node. If the function callback
returns STOP_WALK
the traverse is promptly aborted and STOP_WALK
is returned.
int
walk_preorder(function
(AbstractSimpleNode
, mixed
... :int
|void
) callback
, mixed
... args
)
Traverse the node subtree in preorder, root node first, then
subtrees from left to right, calling the callback function
for every node. If the callback function returns STOP_WALK
the traverse is promptly aborted and STOP_WALK
is returned.
int
walk_preorder_2(function
(AbstractSimpleNode
, mixed
... :int
|void
) cb_1
, function
(AbstractSimpleNode
, mixed
... :int
|void
) cb_2
, mixed
... args
)
Traverse the node subtree in preorder, root node first, then
subtrees from left to right. For each node we call cb_1
before iterating through children, and then cb_2
(which always gets called even if the walk is aborted earlier).
If the callback function returns STOP_WALK
the traverse
decend is aborted and STOP_WALK
is returned once all waiting
cb_2
functions have been called.
void
zap_tree()
Destruct the tree recursively. When the inheriting
AbstractNode
or Node
is used, which have parent pointers,
this function should be called for every tree that no longer is
in use to avoid frequent garbage collector runs.
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.AttributeNode Parser.XML.Tree.AttributeNode(
string
name
, string
value
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.CommentNode Parser.XML.Tree.CommentNode(
string
text
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.DTDAttlistNode Parser.XML.Tree.DTDAttlistNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(Node
)
@Pike.Annotations.Implements
(DTDElementHelper
)
inherit DTDElementHelper : DTDElementHelper
inherit Node : Node
Parser.XML.Tree.DTDElementNode Parser.XML.Tree.DTDElementNode(
string
name
, array
expression
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.DTDEntityNode Parser.XML.Tree.DTDEntityNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.DTDNotationNode Parser.XML.Tree.DTDNotationNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.DoctypeNode Parser.XML.Tree.DoctypeNode(
string
name
, mapping
(string
:string
) attrs
, array
|zero
contents
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.ElementNode Parser.XML.Tree.ElementNode(
string
name
, mapping
(string
:string
) attrs
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.HeaderNode Parser.XML.Tree.HeaderNode(
mapping
(string
:string
) attrs
)
@Pike.Annotations.Implements
(AbstractNode
)
@Pike.Annotations.Implements
(VirtualNode
)
XML node with parent pointers.
inherit AbstractNode : AbstractNode
inherit VirtualNode : VirtualNode
string
get_attr_name()
Returns the name of the attribute node.
array
(Node
) get_attribute_nodes()
Creates and returns an array of new nodes; they will not be added as proper children to the parent node, but the parent link in the nodes are set so that upwards traversal is made possible.
string
get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.PINode Parser.XML.Tree.PINode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(Node
)
The root node of an XML-tree consisting of Node
s.
inherit Node : Node
inherit XMLParser : XMLParser
Parser.XML.Tree.RootNode Parser.XML.Tree.RootNode(
string
|void
data
, mapping
|void
predefined_entities
, ParseFlags
|void
flags
)
void
flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id
.
ElementNode
get_element_by_id(string
id
, int
|void
force
)
Find the element with the specified id.
id
The XML id of the node to search for.
force
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id
if any. Returns UNDEFINED
otherwise.
flush_node_id_cache
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleCommentNode Parser.XML.Tree.SimpleCommentNode(
string
comment
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleDTDAttlistNode Parser.XML.Tree.SimpleDTDAttlistNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(SimpleNode
)
@Pike.Annotations.Implements
(DTDElementHelper
)
inherit DTDElementHelper : DTDElementHelper
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleDTDElementNode Parser.XML.Tree.SimpleDTDElementNode(
string
name
, array
expression
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleDTDEntityNode Parser.XML.Tree.SimpleDTDEntityNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleDTDNotationNode Parser.XML.Tree.SimpleDTDNotationNode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleDoctypeNode Parser.XML.Tree.SimpleDoctypeNode(
string
name
, mapping
(string
:string
) attrs
, array
|zero
contents
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleElementNode Parser.XML.Tree.SimpleElementNode(
string
name
, mapping
(string
:string
) attrs
)
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleHeaderNode Parser.XML.Tree.SimpleHeaderNode(
mapping
(string
:string
) attrs
)
@Pike.Annotations.Implements
(AbstractSimpleNode
)
@Pike.Annotations.Implements
(VirtualNode
)
XML node without parent pointers and attribute nodes.
inherit AbstractSimpleNode : AbstractSimpleNode
inherit VirtualNode : VirtualNode
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimplePINode Parser.XML.Tree.SimplePINode(
string
name
, mapping
(string
:string
) attrs
, string
contents
)
@Pike.Annotations.Implements
(SimpleNode
)
The root node of an XML-tree consisting of SimpleNode
s.
inherit SimpleNode : SimpleNode
inherit XMLParser : XMLParser
Parser.XML.Tree.SimpleRootNode Parser.XML.Tree.SimpleRootNode(
string
|void
data
, mapping
|void
predefined_entities
, ParseFlags
|void
flags
, string
|void
default_namespace
)
void
flush_node_id_cache()
Clears the node id cache built and used by get_element_by_id
.
SimpleElementNode
get_element_by_id(string
id
, int
|void
force
)
Find the element with the specified id.
id
The XML id of the node to search for.
force
Force a regeneration of the id lookup cache. Needed the first time after the node tree has been modified by adding or removing element nodes, or by changing the id attribute of an element node.
Returns the element node with the specified id
if any. Returns UNDEFINED
otherwise.
flush_node_id_cache
@Pike.Annotations.Implements
(SimpleNode
)
inherit SimpleNode : SimpleNode
Parser.XML.Tree.SimpleTextNode Parser.XML.Tree.SimpleTextNode(
string
text
)
@Pike.Annotations.Implements
(Node
)
inherit Node : Node
Parser.XML.Tree.TextNode Parser.XML.Tree.TextNode(
string
text
)
Node in XML tree
(int)Parser.XML.Tree.VirtualNode()
(float)Parser.XML.Tree.VirtualNode()
(string)Parser.XML.Tree.VirtualNode()
(array)Parser.XML.Tree.VirtualNode()
(mapping)Parser.XML.Tree.VirtualNode()
(multiset)Parser.XML.Tree.VirtualNode()
It is possible to cast a node to a string, which will return
render_xml()
for that node.
Parser.XML.Tree.VirtualNode Parser.XML.Tree.VirtualNode(
int
type
, string
|zero
name
, mapping
|zero
attr
, string
|zero
text
)
string
get_any_name()
Return name of tag or name of attribute node.
mapping
(string
:string
) get_attributes()
Returns this nodes attributes, which can be altered destructivly to alter the nodes attributes.
replace_attributes()
int
get_doc_order()
array
(AbstractNode
) get_elements(string
|void
name
, bool
|void
full
)
Returns all element children to this node.
name
If provided, only elements with that name is returned.
full
If specified, name matching will be done against the full name.
Returns an array with matching nodes.
AbstractNode
|zero
get_first_element(string
|void
name
, bool
|void
full
)
Returns the first element child to this node.
name
If provided, the first element child with that name is returned.
full
If specified, name matching will be done against the full name.
Returns the first matching node, and 0 if no such node was found.
string
get_full_name()
Return fully qualified name of the element node.
string
get_namespace()
Return the (resolved) namespace for this node.
int
get_node_type()
Returns the node type. See defined node type constants.
mapping
get_short_attributes()
Returns this nodes name-space adjusted attributes.
set_short_namespaces()
or set_short_attributes()
must
have been called before calling this function.
string
get_tag_name()
Returns the name of the element node, or the nearest element above if an attribute node.
string
|zero
get_text()
Returns text content in node.
void
render_to_file(Stdio.File
f
, void
|bool
preserve_roxen_entities
)
Creates an XML representation for the node sub tree and streams
the output to the file f
. If the flag preserve_roxen_entities
is set, entities on the form &foo.bar; will not be escaped.
string
render_xml(void
|bool
preserve_roxen_entities
, void
|mapping
(string
:string
) namespace_lookup
, void
|string
encoding
, void
|int(2bit)
quote_mode
)
Creates an XML representation of the node sub tree. If the
flag preserve_roxen_entities
is set, entities on the form
&foo.bar; will not be escaped.
namespace_lookup
Mapping from namespace prefix to namespace symbol prefix.
encoding
Force a specific output character encoding. By default the encoding set in the document XML processing instruction will be used, with UTF-8 as a fallback. Setting this value will change the XML processing instruction, if present.
quote_mode
| Defaults to single quote, but use double quote if it avoids escaping. |
| Defaults to double quote, but use single quote if it avoids escaping. |
| Use only single quote. |
| Use only double quote. |
void
replace_attributes(mapping
(string
:string
) attrs
)
Replace the entire set of attributes.
get_attributes()
void
set_doc_order(int
o
)
void
set_short_attributes(mapping
short_attrs
)
Sets this nodes name-space adjusted attributes.
void
set_tag_name(string
name
)
Change the tag name destructively. Can only be used on element and processing-instruction nodes.
string
set_text(string
txt
)
Change the text content destructively.
string
value_of_node()
If the node is an attribute node or a text node, its value is returned. Otherwise the child text nodes are concatenated and returned.
Namespace aware parser.
mapping
(string
:string
) Enter(mapping
(string
:string
) attrs
)
Check attrs
for namespaces.
Returns the namespace expanded version of attrs
.
Mixin for parsing XML.
Uses Parser.XML.Simple
to perform
the actual parsing.
protected
AbstractSimpleNode
node_factory(int
type
, string
name
, mapping
attr
, string
text
)
Factory for creating nodes.
type
Type of node to create. One of:
| XML text. |
| XML comment. |
| <?xml?>-header |
| XML processing instruction. |
| XML element tag. |
| DTD information. |
| |
| |
| |
|
name
Name of the tag if applicable.
attr
Attributes for the tag if applicable.
text
Contained text of the tab if any.
This function is called during parsning to create the various XML nodes.
Overload this function to provide application-specific XML nodes.
Returns a node object representing the XML tag,
or 0
(zero) if the subtree rooted in the
tag should be cut.
This function is not available in Pike 7.6 and earlier.
node_factory_dispatch()
, AbstractSimpleNode()->node_factory()
protected
AbstractSimpleNode
node_factory_dispatch(int
type
, string
name
, mapping
|zero
attr
, string
text
)
Dispatcher of node_factory()
.
This function finds a suitable node_factory()
given the
current parser context to call with the same arguments.
This is a simple parser for SGML structured markups. It's not really HTML, but it's useful for that purpose.
The simple way to use it is to give it some information about available tags and containers, and what callbacks those are to call.
The object is easily reused, by calling the clone()
function.
add_tag
, add_container
, finish
mapping
_inspect()
This is a low-level way of debugging a parser. This gives a mapping of the internal state of the Parser.HTML object.
The format and contents of this mapping may change without further notice.
Parser.HTML
_set_tag_callback(function
(:void
)|string
|array
to_call
)
Parser.HTML
_set_entity_callback(function
(:void
)|string
|array
to_call
)
Parser.HTML
_set_data_callback(function
(:void
)|string
|array
to_call
)
These functions set up the parser object to call the given callbacks upon tags, entities and/or data. The callbacks will only be called if there isn't another tag/container/entity handler for these.
The callback function will be called with the parser object as first argument, and the active string as second. Note that no parsing of the contents has been done. Both endtags and normal tags are called; there is no container parsing.
The return values from the callbacks are handled in the same way
as the return values from callbacks registered with add_tag
and
similar functions.
The data callback will be called as seldom as possible with the longest possible string, as long as it doesn't get called out of order with any other callback. It will never be called with a zero length string.
If a string or array is given instead of a function, it will act as the return value from the function. Arrays or empty strings is probably preferable to avoid recursion.
Returns the object being called.
Parser.HTML
add_tag(string
name
, mixed
to_do
)
Parser.HTML
add_container(string
name
, mixed
to_do
)
Parser.HTML
add_entity(string
entity
, mixed
to_do
)
Parser.HTML
add_quote_tag(string
name
, mixed
to_do
, string
end
)
Parser.HTML
add_tags(mapping
(string
:mixed
) tags
)
Parser.HTML
add_containers(mapping
(string
:mixed
) containers
)
Parser.HTML
add_entities(mapping
(string
:mixed
) entities
)
Registers the actions to take when parsing various things. Tags,
containers, entities are as usual. add_quote_tag()
adds a special
kind of tag that reads any data until the next occurrence of the
end string immediately before a tag end.
to_do
This argument can be any of the following.
| The function will be called as a callback function. It will get the following arguments, depending on the type of callback. mixed tag_callback(Parser.HTML parser,mapping args,mixed ... extra) mixed container_callback(Parser.HTML parser,mapping args,string content,mixed ... extra) mixed entity_callback(Parser.HTML parser,mixed ... extra) mixed quote_tag_callback(Parser.HTML parser,string content,mixed ... extra) |
| This tag/container/entity is then replaced by the string.
The string is normally not reparsed, i.e. it's equivalent to
writing a function that returns the string in an array (but
a lot faster). If |
| The first element is a function as above. It will receive
the rest of the array as extra arguments. If extra arguments
are given by |
| If there is a tag/container/entity with the given name in the parser, it's removed. |
The callback function can return:
| This string will be pushed on the parser stack and be parsed. Be careful not to return anything in this way that could lead to a infinite recursion. |
| The element(s) of the array is the result of the function.
This will not be parsed. This is useful for avoiding
infinite recursion. The array can be of any size, this means
the empty array is the most effective to return if you don't
care about the result. If the parser is operating in
|
| This means "don't do anything", ie the item that generated the callback is left as it is, and the parser continues. |
| Reparse the last item again. This is useful to parse a tag as a container, or vice versa: just add or remove callbacks for the tag and return this to jump to the right callback. |
Returns the object being called.
tags
, containers
, entities
array
(int
) at()
int
at_line()
int
at_char()
int
at_column()
Returns the current position. Characters and columns count from
0
, lines count from 1
.
at()
gives an array with the following layout.
Array | |
| Line. |
| Character. |
| Column. |
int
case_insensitive_tag(void
|int
value
)
All tags and containers are matched case insensitively, and
argument names are converted to lowercase. Tags added with
add_quote_tag()
are not affected, though. Switching to case
insensitive mode and back won't preserve the case of registered
tags and containers.
Parser.HTML
clear_tags()
Parser.HTML
clear_containers()
Parser.HTML
clear_entities()
Parser.HTML
clear_quote_tags()
Removes all registered definitions in the different categories.
Returns the object being called.
add_tag
, add_tags
, add_container
, add_containers
,
add_entity
, add_entities
Parser.HTML
clone(mixed
... args
)
Clones the Parser.HTML
object. A new object of the same class
is created, filled with the parse setup from the old object.
This is the simpliest way of flushing a parse feed/output.
The arguments to clone is sent to the new object, simplifying work
for custom classes that inherits Parser.HTML
.
Returns the new object.
create is called _before_ the setup is copied.
mapping
(string
:mixed
) tags()
mapping
(string
:mixed
) containers()
mapping
(string
:mixed
) entities()
Returns the current callback settings. When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: These run in constant time since they return copy-on-write mappings.
add_tag
, add_tags
, add_container
, add_containers
,
add_entity
, add_entities
string
context()
Returns the current output context as a string.
| In top level data. This is always returned when called from tag or container callbacks. |
| In an unquoted argument. |
| In a splice argument. |
The return value can also be a single character string, in which case the context is a quoted argument. The string contains the starting quote character.
This function is typically only useful in entity callbacks, which can be called both from text and argument values of different sorts.
splice_arg
string
current()
Gives the current range of data, ie the whole tag/entity/etc being parsed in the current callback. Returns zero if there's no current range, i.e. when the function is not called in a callback.
Parser.HTML
feed()
Parser.HTML
feed(string
s
, void
|int
do_parse
)
Feed new data to the Parser.HTML
object. This will start a scan
and may result in callbacks. Note that it's possible that all data
fed isn't processed - to do that, call finish()
.
If the function is called without arguments, no data is fed, but
the parser is run. If the string argument is followed by a
0
, ->feed(s,0);
, the string is fed, but the parser
isn't run.
Returns the object being called.
finish
, read
, feed_insert
Parser.HTML
feed_insert(string
s
)
This pushes a string on the parser stack.
Returns the object being called.
Don't use!
Parser.HTML
finish()
Parser.HTML
finish(string
s
)
Finish a parser pass. A string may be sent here, similar to feed().
Returns the object being called.
array
get_extra()
Gets the extra arguments set by set_extra()
.
Returns the object being called.
int
ignore_comments(void
|int
value
)
int
ignore_tags(void
|int
value
)
Do not look for tags at all. Normally tags are matched even when
there's no callbacks for them at all. When this is set, the tag
delimiters '<'
and '>'
will be treated as any
normal character.
int
ignore_unknown(void
|int
value
)
Treat unknown tags and entities as text data, continuing parsing for tags and entities inside them.
When functions are specified with _set_tag_callback()
or
_set_entity_callback()
, all tags or entities, respectively,
are considered known. However, if one of those functions return
1 and ignore_unknown is set, they are treated as text data
instead of making another call to the same function again.
int
lazy_argument_end(void
|int
value
)
A '>'
in a tag argument closes both the argument and the
tag, even if the argument is quoted.
int
lazy_entity_end(void
|int
value
)
Normally, the parser search indefinitely for the entity end
character (i.e. ';'
). When this flag is set, the
characters '&'
, '<'
, '>'
, '"'
,
'''
, and any whitespace breaks the search for the entity
end, and the entity text is then ignored, i.e. treated as
data.
int
match_tag(void
|int
value
)
Unquoted nested tag starters and enders will be balanced when parsing tags. This is the default.
int
max_parse_depth(void
|int
value
)
Maximum recursion depth during parsing. Recursion occurs when a
tag/container/entity/quote tag callback function returns a string
to be reparsed. The default value is 10
.
int
mixed_mode(void
|int
value
)
Allow callbacks to return arbitrary data in the arrays, which will be concatenated in the output.
int
nestling_entity_end(void
|int
value
)
mapping
parse_tag_args(string
tag
)
Parses the tag arguments from a tag string without the name and
surrounding brackets, i.e. a string on the form "some='tag'
args"
.
Returns a mapping containing the tag arguments.
tag_args
string
parse_tag_name(string
tag
)
Parses the tag name from a tag string without the surrounding
brackets, i.e. a string on the form "tagname some='tag'
args"
.
Returns the tag name or an empty string if none.
int
quote_stapling(int
|void
enable
)
Enable old-style attribute quoting by stapling.
enable
Enable/disable the mode. Defaults to keeping the old setting.
Returns the prior setting.
Any use of this mode is discouraged, and is only provided for compatibility with versions of Pike prior to 8.0.
Note also that this mode will output runtime warnings whenever the mode has had an effect on the parsing.
mapping
(string
:array
(mixed
|string
)) quote_tags()
Returns the current callback settings. The values are arrays ({callback, end_quote}). When matching is done case insensitively, all names will be returned in lowercase.
Implementation note: quote_tags()
allocates a new mapping for
every call and thus, unlike e.g. tags()
runs in linear time.
add_quote_tag
string
|array
(mixed
) read()
string
|array
(mixed
) read(int
max_elems
)
Read parsed data from the parser object.
Returns a string of parsed data if the parser isn't in
mixed_mode
, an array of arbitrary data otherwise.
int
reparse_strings(void
|int
value
)
When a plain string is used as a tag/container/entity/quote tag callback, it's not reparsed if this flag is unset. Setting it causes all such strings to be reparsed.
Parser.HTML
set_extra(mixed
... args
)
Sets the extra arguments passed to all tag, container and entity callbacks.
Returns the object being called.
string
splice_arg(void
|string
name
)
If given a string, it sets the splice argument name to it. It returns the old splice argument name.
If a splice argument name is set, it's parsed in all tags, both those with callbacks and those without. Wherever it occurs, its value (after being parsed for entities in the normal way) is inserted directly into the tag. E.g:
<foo arg1="val 1" splice="arg2='val 2' arg3" arg4>
becomes
<foo arg1="val 1" arg2='val 2' arg3 arg4>
if "splice"
is set as the splice argument name.
array
tag(void
|mixed
default_value
)
Returns the equivalent of the following calls.
Array | |
|
|
|
|
|
|
mapping
(string
:mixed
) tag_args(void
|mixed
default_value
)
Gives the arguments of the current tag, parsed to a convenient
mapping consisting of key:value pairs. If the current thing isn't
a tag, it gives zero. default_value
is used for arguments which
have no value in the tag. If default_value
isn't given, the
value is set to the same string as the key.
string
tag_content()
Gives the content of the current tag, if it's a container or quote tag. Otherwise returns zero.
string
|zero
tag_name()
Gives the name of the current tag, or zero. If used from an entity callback, it gives the string inside the entity.
Parser.HTML
write_out(mixed
... args
)
Send data to the output stream, i.e. it won't be parsed and it won't be sent to the data callback, if any.
Any data is allowed when the parser is running in mixed_mode
.
Only strings are allowed otherwise.
Returns the object being called.
int
ws_before_tag_name(void
|int
value
)
Allow whitespace between the tag start character and the tag name.
int
xml_tag_syntax(void
|int
value
)
Whether or not to use XML syntax to tell empty tags and container tags apart.
| Use HTML syntax only. If there's a |
| Use HTML syntax, but ignore a |
| Use XML syntax, but when a tag that does not end with
|
| Use XML syntax only. If a tag got both container and
non-container callbacks, the non-container callback is called
when the empty element form (i.e. the one ending with
|
string
|zero
decode_numeric_xml_entity(string
chref
)
Decodes the numeric XML entity chref
, e.g. "4" and
returns the character as a string. chref
is the name part of
the entity, i.e. without the leading '&' and trailing ';'. Returns
zero if chref
isn't on a recognized form or if the character
number is too large to be represented in a string.
string
encode_html_entities(string
raw
)
Encode characters to HTML entities, e.g. turning "<"
into
"<"
.
The characters that will be encoded are characters <= 32,
"\"&'<>"
and characters >= 127 and <= 160 and characters
>= 255.
HTML
get_xml_parser()
Returns a Parser.HTML
initialized for parsing XML. It has all
the flags set properly for XML syntax and callbacks to ignore
comments, CDATA blocks and unknown PI tags, but it has no
registered tags and doesn't decode any entities.
HTML
html_entity_parser()
string
parse_html_entities(string
in
)
HTML
html_entity_parser(int
noerror
)
string
parse_html_entities(string
in
, int
noerror
)
Parse any HTML entities in the string to unicode characters. Either return a complete parser (to build on or use) or parse a string. Throw an error if there is an unrecognized entity in the string if noerror is not set.
Currently using XHTML 1.0 tables.
This is a parser for line oriented data that is either comma,
semi-colon or tab separated. It extends the functionality
of the Parser.Tabular
with some specific functionality related
to a header and record oriented parsing of huge datasets.
We document only the differences with the basic Parser.Tabular
.
Parser.Tabular
inherit Parser.Tabular : Tabular
mapping
fetchrecord(void
|array
|mapping
format
)
This function consumes a single record from the input.
To be used in conjunction with parsehead()
.
It returns the mapping describing the record.
parsehead()
, fetch()
int
parsehead(void
|string
delimiters
, void
|string
|object
matchfieldname
)
This function consumes the header-line preceding a typical comma,
semicolon or tab separated value list and autocompiles a format
description from that. After this function has
successfully parsed a header-line, you can proceed with
either fetchrecord()
or fetch()
to get the remaining records.
delimiters
Explicitly specify a string containing all the characters that should be considered field delimiters. If not specified or empty, the function will try to autodetect the single delimiter in use.
matchfieldname
A string containing a regular expression, using Regexp.SimpleRegexp
syntax, or an object providing a Regexp.SimpleRegexp.match()
single string argument compatible method, that must match all the
individual fieldnames before the header will be considered valid.
It returns true if a CSV head has successfully been parsed.
fetchrecord()
, fetch()
, compile()
A RCS file parser that eats a RCS *,v file and presents nice pike data structures of its contents.
inherit Parser._RCS : _RCS
constant
int
Parser.RCS.max_revisions_supported
Feature detection constant for the max_revisions argument
to create()
, parse()
and parse_delta_sections()
.
array
(string
) Parser.RCS.access
The usernames listed in the ACCESS section of the RCS file.
string
|int(0)
Parser.RCS.branch
The default branch (or revision), if present, 0
otherwise.
mapping
(string
:string
) Parser.RCS.branches
Maps branch numbers (indices) to branch names (values).
The indices are short branch revision numbers (ie "1.1.2"
and not "1.1.0.2"
).
string
|int(0)
Parser.RCS.comment
The RCS file comment if present, 0
otherwise.
string
Parser.RCS.description
The RCS file description.
string
Parser.RCS.expand
The keyword expansion options (as named by RCS) if present,
0
otherwise.
string
Parser.RCS.head
Version number of the head version of the file.
mapping
(string
:string
) Parser.RCS.locks
Maps from username to revision for users that have acquired locks on this file.
string
Parser.RCS.rcs_file_name
The filename of the RCS file as sent to create()
.
mapping
(string
:Revision
) Parser.RCS.revisions
Data for all revisions of the file. The indices of the mapping are the revision numbers, whereas the values are the data from the corresponding revision.
bool
Parser.RCS.strict_locks
1
if strict locking is set, 0
otherwise.
mapping
(string
:string
) Parser.RCS.tags
Maps tag names (indices) to tagged revision numbers (values).
This mapping typically contains raw revision numbers for branches
(ie "1.1.0.2"
and not "1.1.2"
).
array
(Revision
) Parser.RCS.trunk
Data for all revisions on the trunk, sorted in the same order as the RCS file stored them - ie descending, most recent first, I'd assume (rcsfile(5), of course, fails to state such irrelevant information).
Parser.RCS Parser.RCS(
string
|void
file_name
, string
|int(0)
|void
file_contents
, void
|int
max_revisions
)
Initializes the RCS object.
file_name
The path to the raw RCS file (includes trailing ",v"). Used
mainly for error reporting (truncated RCS file or similar).
Stored in rcs_file_name
.
file_contents
If a string is provided, that string will be parsed to
initialize the RCS object. If a zero (0
) is sent, no
initialization will be performed at all. If no value is given at
all, but file_name
was provided, that file will be loaded and
parsed for object initialization.
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
string
|zero
expand_keywords_for_revision(string
|Revision
rev
, string
|void
text
, int
|void
expansion_mode
)
Expand keywords and return the resulting text according to the expansion rules set for the file.
rev
The revision to apply the expansion for.
text
If supplied, substitute keywords for that text instead using values that
would apply for the given revision. Otherwise, revision rev
is used.
expansion_mode
Expansion mode
| Perform expansion even if the file was checked in as binary. |
| Perform expansion only if the file was checked in as non-binary with expansion enabled. |
| Perform contraction if the file was checked in as non-binary. |
The Log keyword (which lacks sane quoting rules) is not expanded. Keyword expansion rules set in CVSROOT/cvswrappers are ignored. Only implements the -kkv, -ko and -kb expansion modes.
Does not perform any line-ending conversion.
get_contents_for_revision
string
|zero
get_contents_for_revision(string
|Revision
rev
, void
|bool
dont_cache_data
)
Returns the file contents from the revision rev
, without performing
any keyword expansion. If dont_cache_data
is set we will not keep
intermediate revisions in memory unless they already existed. This will
cut down memory use at the expense of slow access to older revisions.
expand_keywords_for_revision()
this_program
parse(array
raw
, void
|function
(string
:void
) progress_callback
, void
|int
max_revisions
)
Parse the RCS file raw
and initialize all members of this object
fully initialized.
raw
The unprocessed RCS file.
progress_callback
Passed on to parse_deltatext_sections
.
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
The fully initialized object (only returned for API convenience;
the object itself is destructively modified to match the data
extracted from raw
)
parse_admin_section
, parse_delta_sections
,
parse_deltatext_sections
, create
array
parse_admin_section(string
|array
raw
)
Lower-level API function for parsing only the admin section (the
initial chunk of an RCS file, see manpage rcsfile(5)) of an RCS
file. After running parse_admin_section
, the RCS object will be
initialized with the values for head
, branch
, access
,
branches
, tokenize
, tags
, locks
, strict_locks
,
comment
and expand
.
raw
The tokenized RCS file, or the raw RCS-file data.
The rest of the RCS file, admin section removed.
parse_delta_sections
, parse_deltatext_sections
, parse
, create
Does not handle rcsfile(5) newphrase skipping.
array
parse_delta_sections(array
raw
, void
|int
max_revisions
)
Lower-level API function for parsing only the delta sections (the
second chunk of an RCS file, see manpage rcsfile(5)) of an RCS
file. After running parse_delta_sections
, the RCS object will
be initialized with the value of description
and populated
revisions
mapping and trunk
array. Their Revision
members
are however only populated with the members Revision->revision
,
Revision->branch
, Revision->time
, Revision->author
,
Revision->state
, Revision->branches
, Revision->rcs_next
,
Revision->ancestor
and Revision->next
.
raw
The tokenized RCS file, with admin section removed. (See
parse_admin_section
.)
max_revisions
Maximum number of revisions to process. If unset, all revisions will be processed.
The rest of the RCS file, delta sections removed.
parse_admin_section
, tokenize
, parse_deltatext_sections
,
parse
, create
Does not handle rcsfile(5) newphrase skipping.
void
parse_deltatext_sections(array
raw
, void
|function
(string
:void
) progress_callback
, array
|void
callback_args
)
Lower-level API function for parsing only the deltatext sections
(the final and typically largest chunk of an RCS file, see manpage
rcsfile(5)) of an RCS file. After a parse_deltatext_sections
run, the RCS object will be fully populated.
raw
The tokenized RCS file, with admin and delta sections removed.
(See parse_admin_section
, tokenize
and parse_delta_sections
.)
progress_callback
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
args
Optional extra trailing arguments to be sent to progress_callback
parse_admin_section
, parse_delta_sections
, parse
, create
Does not handle rcsfile(5) newphrase skipping.
array
(array
(string
)) tokenize(string
data
)
Tokenize an RCS file into tokens suitable as argument to the various parse functions
data
The RCS file data
An array with arrays of tokens
Iterator for the deltatext sections of the RCS file. Typical usage:
string raw = Stdio.read_file(my_rcs_filename); Parser.RCS rcs = Parser.RCS(my_rcs_filename, 0); raw = rcs->parse_delta_sections(rcs->parse_admin_section(raw)); foreach(rcs->DeltatextIterator(raw); int n; Parser.RCS.Revision rev) do_something(rev);
protected
int
_iterator_index()
the number of deltatext entries processed so far (0..N-1, N being the total number of revisions in the rcs file)
protected
int
_iterator_next()
Advance the iterator one step.
Returns UNDEFINED
when the iterator is finished, and
otherwise the same as _iterator_index()
.
protected
Revision
_iterator_value()
the Revision
at whose deltatext data we are, updated with its info
Parser.RCS.DeltatextIterator Parser.RCS.DeltatextIterator(
array
deltatext_section
, void
|function
(string
, mixed
... :void
) progress_callback
, void
|array
(mixed
) progress_callback_args
)
deltatext_section
the deltatext section of the RCS file in its entirety
progress_callback
This optional callback is invoked with the revision of the deltatext about to be parsed (useful for progress indicators).
progress_callback_args
Optional extra trailing arguments to be sent to progress_callback
the rcsfile(5) manpage outlines the sections of an RCS file
int
Parser.RCS.DeltatextIterator.nprotected
bool
read_next()
Drops the leading whitespace before next revision's deltatext entry and sets this_rev to the revision number we're about to read.
protected
int
parse_deltatext_section(array
raw
, int
o
)
Chops off the first deltatext section from the token array raw
and
returns the rest of the string, or the value 0
(zero) if
we had already visited the final deltatext entry. The deltatext's
data is stored destructively in the appropriate entry of the
revisions
array.
raw
+o
must start with a deltatext entry for this method to work
does not handle rcsfile(5) newphrase skipping
if the rcs file is truncated, this method writes a descriptive error to stderr and then returns 0 - some nicer error handling wouldn't hurt
All data tied to a particular revision of the file.
int
Parser.RCS.Revision.added
The number of lines that were added from the previous revision to make this revision (for the initial revision too).
lines
, removed
string
|zero
Parser.RCS.Revision.ancestor
The revision of the ancestor of this revision, or 0
if this was
the initial revision.
next
string
Parser.RCS.Revision.author
The userid of the user that committed the revision.
string
Parser.RCS.Revision.branch
The branch name on which this revision was committed (calculated according to how cvs manages branches).
array
(string
) Parser.RCS.Revision.branches
When there are branches from this revision, an array with the
first revision number for each of the branches, otherwise 0
.
Follow the next
fields to get to the branch head.
int
Parser.RCS.Revision.lines
The number of lines this revision contained, altogether (not of particular interest for binary files).
added
, removed
string
Parser.RCS.Revision.log
The log message associated with the revision.
string
|zero
Parser.RCS.Revision.next
The revision that succeeds this revision, or 0
if none exists
(ie if this is the HEAD of the trunk or of a branch).
ancestor
string
|zero
Parser.RCS.Revision.rcs_next
The revision stored next in the RCS file, or 0
if none exists.
This field is straight from the RCS file, and has somewhat weird
semantics. Usually you will want to use one of the derived fields
next
or prev
or possibly rcs_prev
.
next
, prev
, rcs_prev
string
|zero
Parser.RCS.Revision.rcs_prev
The revision that this revision is based on,
or 0
if it is the HEAD.
This is the reverse pointer of rcs_next
and branches
, and
is used by get_contents_for_revision()
when applying the deltas
to set text
.
rcs_next
string
Parser.RCS.Revision.rcs_text
The raw delta as stored in the RCS file.
text
, get_contents_for_revision()
int
Parser.RCS.Revision.removed
The number of lines that were removed from the previous revision to make this revision.
lines
, added
string
Parser.RCS.Revision.revision
The revision number (i e
rcs_file->revisions["1.1"]->revision == "1.1"
).
string
Parser.RCS.Revision.state
The state of the revision - typically "Exp"
or "dead"
.
string
|zero
Parser.RCS.Revision.text
The text as committed or 0
if
get_contents_for_revision()
hasn't been called for this revision
yet.
Typically you don't access this field directly, but use
get_contents_for_revision()
to retrieve it.
get_contents_for_revision()
, rcs_text
Calendar.TimeRange
Parser.RCS.Revision.time
The (UTC) date and time when the revision was committed (second precision).
This is a handy simple parser of SGML-like syntax like HTML. It doesn't do anything advanced, but finding the corresponding end-tags.
It's used like this:
array res=Parser.SGML()->feed(string)->finish()->result();
The resulting structure is an array of atoms, where the atom can be a string or a tag. A tag contains a similar array, as data.
A string
"<gat> <gurka> </gurka> <banan> <kiwi> </gat>"
results in({ tag "gat" object with data: ({ tag "gurka" object with data: ({ " " }) tag "banan" object with data: ({ " " tag "kiwi" object with data: ({ " " }) }) }) })
ie, simple "tags" (not containers) are not detected, but containers are ended implicitely by a surrounding container _with_ an end tag.
The 'tag' is an object with the following variables:
string name; - name of tag mapping args; - argument to tag int line,char,column; - position of tag int eline,echar,ecolumn; - end position of tag, src[char..echar-1] got the block. add by Xuesong Guo string file; - filename (see <ref>create</ref>) array(SGMLatom) data; - contained data int open; - is not an empty element and has no end tag. add by Xuesong Guo
string
Parser.SGML.file
Parser.SGML Parser.SGML()
Parser.SGML Parser.SGML(
string
filename
, function
(:void
)|void
name_formater
, function
(:void
)|void
argname_formater
)
This object is created with this filename. It's passed to all created tags, for debug and trace purposes. All tag name will be replace as name_formater(name) All arg_name will be replace as argname_formater(arg_name)
No, it doesn't read the file itself. See feed()
.
object
feed(string
s
)
array
(SGMLatom
|string
) finish()
array
(SGMLatom
|string
) result(string
s
)
Feed new data to the object, or finish the stream.
No result can be used until finish()
is called.
Both finish()
and result()
return the computed data.
feed()
returns the called object.
string
Parser.SGML.SGMLatom.name
mapping
Parser.SGML.SGMLatom.args
int
Parser.SGML.SGMLatom.line
int
Parser.SGML.SGMLatom.char
int
Parser.SGML.SGMLatom.column
int
Parser.SGML.SGMLatom.eline
int
Parser.SGML.SGMLatom.echar
int
Parser.SGML.SGMLatom.ecolumn
string
Parser.SGML.SGMLatom.file
array
(SGMLatom
) Parser.SGML.SGMLatom.data
int
Parser.SGML.SGMLatom.open
This is a parser for line and block oriented data. It provides a flexible yet concise record-description language to parse character/column/delimiter-organised records.
Parser.LR
, http://www.wikipedia.org/wiki/Comma-separated_values,
http://www.wikipedia.org/wiki/EDIFACT
array
|mapping
compile(string
|Stdio.File
|Stdio.FILE
input
)
Compiles the format description language into a compiled structure
that can be fed to setformat
, fetch
, or create
.
The format description is case sensitive.
The format description starts with a single line containing:
[Tabular description begin]
The format description ends with a single line containing:
[Tabular description end]
Any lines before the startline are skipped.
Any lines after the endline are not consumed.
Empty lines are skipped.
Comments start after a #
or ;
.
The depth level of a field is indicated by the number of leading spaces or colons at the beginning of the line.
The fieldname must not contain any whitespace.
An arbitrary number of single character field delimiters can be
specified between brackets, e.g. [,;]
or [,]
would be
for CSV.
When field delimiters are being used: in case of CSV type delimiters
[\t,; ]
the standard CSV quoting rules apply, in case other
delimiters
are used, no quoting is supported and the last field on a line should
not specify a delimiter, but should specify a 0 fieldwidth instead.
A fixed field width can be specified by a plain decimal integer, a value of 0 indicates a field with arbitrary length that extends till the end of the line.
A matching regular expression can be enclosed in ""
, it has
to match
the complete field content and uses Regexp.SimpleRegexp
syntax.
On records the following options are supported:
This record is required.
Fold this record's contents in the enclosing record.
This record is present at most once.
On fields the following options are supported:
After reading and matching this field, drop the field content from the resulting mappingstructure.
setformat()
, create()
, fetch()
Example of the description language:
[Tabular description begin] csv :gtz ::mybankno [,] ::transferdate [,] ::mutatiesoort [,] ::volgnummer [,] ::bankno [,] ::name [,] ::kostenplaats [,] drop ::amount [,] ::afbij [,] ::mutatie [,] ::reference [,] ::valutacode [,] mt940 :messageheader1 mandatory ::exporttime "0000" drop ::CS1 " " drop ::exportday "01" drop ::exportaddress 12 ::exportnumber 5 "[0-9]+" :messageheader3 mandatory fold single ::messagetype "940" drop ::CS1 " " drop ::messagepriority "00" drop :TRN fold ::tag ":20:" drop ::reference "GTZPB|MPBZ|INGEB" :accountid fold ::tag ":25:" drop ::accountno 10 :statementno fold ::tag ":28C:" drop ::settlementno 0 drop :openingbalance mandatory single ::tag ":60F:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :statements ::statementline mandatory fold single :::tag ":61:" drop :::valuedate 6 :::creditdebit 1 :::amount "[0-9]+,[0-9][0-9]" :::CS1 "N" drop :::transactiontype 3 # 3 for Postbank, 4 for ING :::paymentreference 0 ::informationtoaccountowner fold single :::tag ":86:" drop :::accountno "[0-9]*( |)" :::accountname 0 ::description fold :::description 0 "|[^:].*" :closingbalance mandatory single ::tag ":62[FM]:" drop ::creditdebit 1 ::date 6 ::currency "EUR" ::amount 0 "[0-9]+,[0-9][0-9]" :informationtoaccountowner fold single ::tag ":86:" drop ::debit "D" drop ::debitentries 6 ::credit "C" drop ::creditentries 6 ::debit "D" drop ::debitamount "[0-9]+,[0-9][0-9]" ::credit "C" drop ::creditamount "[0-9]+,[0-9][0-9]" drop ::accountname "(\n[^-:][^\n]*)*" drop :messagetrailer mandatory single ::start "-" ::end "XXX" [Tabular description end]
Parser.Tabular Parser.Tabular(
void
|string
|Stdio.File
|Stdio.FILE
input
, void
|array
|mapping
|string
|Stdio.File
|Stdio.FILE
format
, void
|int
verbose
)
This function initialises the parser.
input
The input stream or string.
format
The format to be used (either precompiled or not).
The format description language is documented under compile()
.
verbose
If >1
, it specifies the number of characters to display
of the beginning of each record as a progress indicator. Special
values are:
| Turns on format debugging with visible mismatches. |
| Turns on format debugging with named field contents. |
| Turns on format debugging with field contents. |
| Turns on basic format debugging. |
| Turns off verbosity. Default. |
| Is the same as setting it to |
compile()
, setformat()
, fetch()
object
feed(string
content
)
content
Is injected into the input stream.
This object.
fetch()
mapping
|zero
fetch(void
|array
|mapping
format
)
This function consumes as much input as needed to parse the full tabular structures at once.
format
Describes (precompiled only) formats to be parsed.
If no format is specified,
the format specified on create()
is used, and empty lines are
automatically skipped.
A nested mapping that contains the complete structure as described in the specified format.
If nothing matches the specified format, no input is consumed (except empty lines, if the default format is used), and zero is returned.
compile()
, create()
, setformat()
, skipemptylines()
array
|mapping
setformat(array
|mapping
format
)
format
Replaces the default (precompiled only) format.
The previous default format.
compile()
, fetch()
int
skipemptylines()
This function can be used to manually skip empty lines in
the input. This is unnecessary if no argument is
specified for fetch()
.
It returns true if EOF has been reached.
fetch()
array
(Token
|array
) group(array
(string
|Token
) tokens
, void
|mapping
(string
:string
) groupings
)
Fold sub blocks of an array of tokens into sub arrays, for grouping purposes.
tokens
The token array to fold.
groupings
Supplies the tokens marking the boundaries of blocks to fold. The indices of the mapping mark the start of a block, the corresponding values mark where the block ends. The sub arrays will start and end in these tokens. If no groupings mapping is provided, {}, () and [] are used as block boundaries.
array
hide_whitespaces(array
tokens
)
Folds all whitespace tokens into the previous token's trailing_whitespaces.
string
reconstitute_with_line_numbers(array
(string
|Token
|array
) tokens
)
Like simple_reconstitute
, but adding additional #line n "file"
preprocessor statements in the output whereever a new line or
file starts.
string
simple_reconstitute(array
(string
|Token
|array
) tokens
)
Reconstitutes the token array into a plain string again; essentially
reversing split()
and whichever of the tokenize
, group
and
hide_whitespaces
methods may have been invoked.
array
(string
) split(string
data
, void
|mapping
(string
:string
) state
)
Splits the data
string into an array of tokens. An additional
element with a newline will be added to the resulting array of
tokens. If the optional argument state
is provided the split
function is able to pause and resume splitting inside #"" and
/**/ tokens. The state
argument should be an initially empty
mapping, in which split will store its state between successive
calls.
array
(Token
|array
) strip_line_statements(array
(Token
|array
) tokens
)
Strips off all (preprocessor) line statements from a token array.
array
(Token
) tokenize(array
(string
) s
, void
|string
file
)
Returns an array of Token
objects given an array of string tokens.
Represents a C token, along with a selection of associated data and operations.
string
Parser.C.Token.file
The file in which the token was found.
int
Parser.C.Token.line
The line where the token was found.
string
Parser.C.Token.text
The actual token.
string
Parser.C.Token.trailing_whitespaces
Trailing whitespaces.
string sprintf(string format, ... Parser.C.Token arg ... )
If the object is printed as %s it will only output its text contents.
string
res = Parser.C.Token()
+ s
A string can be added to the Token, which will be added to the text contents.
int
res = Parser.C.Token()
== foo
Tokens are considered equal if the text contents are equal. It is also possible to compare the Token object with a text string directly.
int
|string
res = Parser.C.Token()
[ a
]
Characters and ranges may be indexed from the text contents of the token.
string
res = s
+ Parser.C.Token()
A string can be added to the Token, which will be added to the text contents.
(int)Parser.C.Token()
(float)Parser.C.Token()
(string)Parser.C.Token()
(array)Parser.C.Token()
(mapping)Parser.C.Token()
(multiset)Parser.C.Token()
It is possible to case a Token object to a string. The text content will be returned.
Parser.C.Token Parser.C.Token(
string
text
, void
|int
line
, void
|string
file
, void
|string
trailing_whitespace
)
Error thrown when an unterminated character token is encountered.
inherit Error.Generic : Generic
string
Parser.C.UnterminatedCharacterError.err_char
The character that failed to be tokenized
Error thrown when an unterminated comment token is encountered.
inherit Error.Generic : Generic
string
Parser.C.UnterminatedCommentError.err_comment
The comment that failed to be tokenized
Error thrown when an unterminated string token is encountered.
inherit Error.Generic : Generic
string
Parser.C.UnterminatedStringError.err_str
The string that failed to be tokenized
ECMAScript/JavaScript token parser based on ECMAScript 2017 (ECMA-262), chapter 11: Lexical Grammar.
array
(string
) split(string
data
)
Splits the ECMAScript source data
in tokens.
LALR(1) parser generator.
Severity level
constant
Parser.LR.NOTICE
constant
Parser.LR.WARNING
constant
Parser.LR.ERROR
Class handling reporting of errors and warnings.
optional
int(-1..1)
Parser.LR.ErrorHandler.verbose
Verbosity level
| Just errors. |
| Errors and warnings. |
| Also notices. |
Parser.LR.ErrorHandler Parser.LR.ErrorHandler(
int(-1..1)
|void
verbosity
)
Create a new error handler.
verbosity
Level of verbosity.
verbose
This object implements an LALR(1) parser and compiler.
Normal use of this object would be:
set_error_handler {add_rule, set_priority, set_associativity}* set_symbol_to_string compile {parse}*
function
(SeverityLevel
, string
, string
, mixed
... :void
) Parser.LR.Parser.error_handler
Compile error and warning handler.
mapping
(int
:array
(Rule
)) Parser.LR.Parser.grammar
The grammar itself.
mapping
(string
:Kernel
) Parser.LR.Parser.known_states
LR0 states that are already known to the compiler.
int
Parser.LR.Parser.lr_error
Error code
StateQueue
|zero
Parser.LR.Parser.s_q
Contains all states used. In the queue section are the states that remain to be compiled.
Kernel
|zero
Parser.LR.Parser.start_state
The initial LR0 state.
string sprintf(string format, ... Parser.LR.Parser arg ... )
Pretty-prints the current grammar to a string.
void
add_rule(Rule
r
)
Add a rule to the grammar.
r
Rule to add.
(int)Parser.LR.Parser()
(float)Parser.LR.Parser()
(string)Parser.LR.Parser()
(array)Parser.LR.Parser()
(mapping)Parser.LR.Parser()
(multiset)Parser.LR.Parser()
Implements casting.
type
Type to cast to.
int
compile()
Compiles the grammar into a parser, so that parse() can be called.
string
item_to_string(Item
i
)
Pretty-prints an item to a string.
i
Item to pretty-print.
mixed
parse(object
|function
(void
:string
|array
(string
|mixed
)) scanner
, void
|object
action_object
)
Parse the input according to the compiled grammar. The last value reduced is returned.
The parser must have been compiled (with compile()) prior to calling this function.
Errors should be throw()n.
scanner
The scanner function. It returns the next symbol from the input. It should either return a string (terminal) or an array with a string (terminal) and a mixed (value). EOF is indicated with the empty string.
action_object
Object used to resolve those actions that have been specified as strings.
string
rule_to_string(Rule
r
)
Pretty-prints a rule to a string.
r
Rule to print.
void
set_associativity(string
terminal
, int
assoc
)
Sets the associativity of a terminal.
terminal
Terminal to set the associativity for.
assoc
Associativity; negative - left, positive - right, zero - no associativity.
void
set_error_handler(void
|function
(SeverityLevel
, string
, string
, mixed
... :void
) handler
)
Sets the error report function.
handler
Function to call to report errors and warnings. If zero or not specifier, use the built-in function.
void
set_priority(string
terminal
, int
pri_val
)
Sets the priority of a terminal.
terminal
Terminal to set the priority for.
pri_val
Priority; higher = prefer this terminal.
void
set_symbol_to_string(void
|function
(int
|string
:string
) s_to_s
)
Sets the symbol to string conversion function. The conversion function is used by the various *_to_string functions to make comprehensible output.
s_to_s
Symbol to string conversion function. If zero or not specified, use the built-in function.
string
state_to_string(Kernel
state
)
Pretty-prints a state to a string.
state
State to pretty-print.
An LR(0) item, a partially parsed rule.
int
Parser.LR.Parser.Item.counter
Depth counter (used when compiling).
multiset
(string
) Parser.LR.Parser.Item.direct_lookahead
Look-ahead set for this item.
multiset
(string
) Parser.LR.Parser.Item.error_lookahead
Look-ahead set used for detecting conflicts
int
Parser.LR.Parser.Item.item_id
Used to identify the item. Equal to r->number + offset.
Item
|zero
Parser.LR.Parser.Item.master_item
Item representing this one (used for shifts).
Kernel
|zero
Parser.LR.Parser.Item.next_state
The state we will get if we shift according to this rule
int
Parser.LR.Parser.Item.number
Item identification number (used when compiling).
int
Parser.LR.Parser.Item.offset
How long into the rule the parsing has come.
Rule
|zero
Parser.LR.Parser.Item.r
The rule
multiset
(Item
) Parser.LR.Parser.Item.relation
Relation to other items (used when compiling).
Implements an LR(1) state
mapping
(int
|string
:Kernel
|Rule
) Parser.LR.Parser.Kernel.action
The action table for this state
object(kernel) SHIFT to this state on this symbol. object(rule) REDUCE according to this rule on this symbol.
multiset
Parser.LR.Parser.Kernel.closure_set
The symbols that closure has been called on.
mapping
(int
:Item
) Parser.LR.Parser.Kernel.item_id_to_item
Used to lookup items given rule and offset
array
(Item
) Parser.LR.Parser.Kernel.items
Contains the items in this state.
multiset
(Rule
) Parser.LR.Parser.Kernel.rules
Used to check if a rule already has been added when doing closures.
mapping
(int
:multiset
(Item
)) Parser.LR.Parser.Kernel.symbol_items
Contains the items whose next symbol is this non-terminal.
void
add_item(Item
i
)
Add an item to the state.
void
closure(int
nonterminal
)
Make the closure of this state.
nonterminal
Nonterminal to make the closure on.
Kernel
do_goto(int
|string
symbol
)
Generates the state reached when doing goto on the specified symbol. i.e. it compiles the LR(0) state.
symbol
Symbol to make goto on.
multiset
(int
|string
) goto_set()
Make the goto-set of this state.
This is a queue, which keeps the elements even after they are retrieved.
array
(Kernel
) Parser.LR.Parser.StateQueue.arr
The queue itself.
int(0..)
Parser.LR.Parser.StateQueue.head
Index of the head of the queue.
int(0..)
Parser.LR.Parser.StateQueue.tail
Index of the tail of the queue.
Kernel
|zero
next()
Return the next state from the queue.
Kernel
push(Kernel
state
)
Pushes the state on the queue.
state
State to push.
Specifies the priority and associativity of a rule.
int
Parser.LR.Priority.assoc
Associativity
| Left |
| None |
| Right |
int
Parser.LR.Priority.value
Priority value
Parser.LR.Priority Parser.LR.Priority(
int
p
, int
a
)
Create a new priority object.
p
Priority.
a
Associativity.
This object is used to represent a BNF-rule in the LR parser.
function
(:void
)|string
|zero
Parser.LR.Rule.action
Action to do when reducing this rule. function - call this function. string - call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will be the value of this non-terminal. The default rule is to return the first argument.
int
Parser.LR.Rule.has_tokens
This rule contains tokens
int
Parser.LR.Rule.nonterminal
Non-terminal this rule reduces to.
int
Parser.LR.Rule.num_nonnullables
This rule has this many non-nullable symbols at the moment.
int
Parser.LR.Rule.number
Sequence number of this rule (used for conflict resolving) Also used to identify the rule.
Priority
|zero
Parser.LR.Rule.pri
Priority and associativity of this rule.
array
(string
|int
) Parser.LR.Rule.symbols
The actual rule
Parser.LR.Rule Parser.LR.Rule(
int
nt
, array
(string
|int
) r
, function
(:void
)|string
|void
a
)
Create a BNF rule.
The rule
rule : nonterminal ":" symbols ";" { add_rule };
might be created as
rule(4, ({ 9, ":", 5, ";" }), "add_rule");
where 4 corresponds to the nonterminal "rule", 9 to "nonterminal" and 5 to "symbols", and the function "add_rule" is too be called when this rule is reduced.
nt
Non-terminal to reduce to.
r
Symbol sequence that reduces to nt.
a
Action to do when reducing according to this rule. function - Call this function. string - Call this function by name in the object given to the parser. The function is called with arguments corresponding to the values of the elements of the rule. The return value of the function will become the value of this non-terminal. The default rule is to return the first argument.
This module generates an LR parser from a grammar specified according to the following grammar:
directives : directive ; directives : directives directive ; directive : declaration ; directive : rule ; declaration : "%token" terminals ";" ; rule : nonterminal ":" symbols ";" ; rule : nonterminal ":" symbols action ";" ; symbols : symbol ; symbols : symbols symbol ; terminals : terminal ; terminals : terminals terminal ; symbol : nonterminal ; symbol : "string" ; action : "{" "identifier" "}" ; nonterminal : "identifier" ; terminal : "string";
int
Parser.LR.GrammarParser.lr_error
Error code from the parsing.
Parser
make_parser(string
str
, object
|void
m
)
Compiles the parser-specification given in the first argument. Named actions are taken from the object if available, otherwise left as is.
Returns error-code in both GrammarParser.error and return_value->lr_error.
int
|Parser
make_parser_from_file(string
fname
, object
|void
m
)
Compiles the file specified in the first argument into an LR parser.
make_parser
This is a port of the Javascript Markdown parser 'Marked'
https://github.com/chjj/marked. The only method needed to
be used is parse()
which will transform Markdown text to HTML.
For a description on Markdown, go to the web page of the inventor of Markdown https://daringfireball.net/projects/markdown/.
protected
string
encode_html(string
html
, void
|bool
enc
)
HTML encode <>"'. If enc
is true & will also be encoded
string
parse(string
md
, void
|mapping
options
)
Convert markdown md
to html
options
| Enable Github Flavoured Markdown. (true) |
| Enable GFM tables. Requires "gfm" (true) |
| Enable GFM "breaks". Requires "gfm" (false) |
| Conform to obscure parts of markdown.pl as much as possible. Don't fix any of the original markdown bugs or poor behavior. (false) |
| Sanitize the output. Ignore any HTML that has been input. (false) |
| Mangle (obfuscate) autolinked email addresses (true) |
| Use smarter list behavior than the original markdown. (true) |
| Use "smart" typographic punctuation for things like quotes and dashes. (false) |
| Add prefix to ID attributes of header tags (empty) |
| Generate self closing XHTML tags (false) |
| Add a newline after tags. If false the output will be on one line (well, newlines in text will be kept). (false) |
| Use this renderer to render output. (Renderer) |
| Use this lexer to parse blocks of text. (Lexer) |
| Use this lexer to parse inline text. (InlineLexer) |
| Use this parser instead of the default. (Parser) |
protected
string
replace1(string
subject
, string
from
, string
to
)
Replaces the first occurance of from
in subject
to to
Lexer used for inline text (eg bold text inside a paragraph).
string
output(string
src
)
Parse some inline Markdown and return the corresponding HTML.
Block-level lexer (parses paragraphs, lists, tables, etc).
mapping
Parser.Markdown.Lexer.links
Read only
array
(mapping
) Parser.Markdown.Lexer.tokens
Read only
this_program
lex(string
src
)
Main lexing entry point. Subclass Lexer and override this to add post-processing or other changes.
Top-level parsing handler. It's usually easier to replace the Renderer instead.
string
parse(Lexer
src
)
protected
string
parse_text()
protected
string
tok()
Render a token (or group of tokens) to a string.
string
attrs(mapping
token
, mapping
|void
dflt
)
Collect additional attributes from the token and render them as HTML attributes. Default attributes can be provided.
string
blockquote(string
text
, mapping
token
)
string
html(string
text
, mapping
token
)
string
text(string
t
, mapping
token
)
string
strong(string
t
, mapping
token
)
string
em(string
t
, mapping
token
)
string
del(string
t
, mapping
token
)
string
codespan(string
t
, mapping
token
)
string
br(mapping
token
)
string
code(string
code
, string
lang
, bool
escaped
, mapping
token
)
string
heading(string
text
, int
level
, string
raw
, mapping
token
)
string
hr()
string
image(string
url
, string
title
, string
text
, mapping
token
)
string
link(string
href
, string
|zero
title
, string
text
, mapping
token
)
string
list(string
body
, void
|bool
ordered
, mapping
token
)
string
listitem(string
text
, mapping
token
)
string
paragraph(string
text
, mapping
token
)
string
table(string
header
, string
body
, mapping
token
)
string
tablecell(string
cell
, mapping
flags
, mapping
token
)
string
tablerow(string
row
, mapping
token
)
This module parses and tokenizes Pike source code.
inherit "C.pmod" : "C.pmod"
array
(string
) split(string
data
)
Returns the provided string with Python code as an array with tokens.
Low-level helpers for parsers.
You probably don't want to use the modules contained in
this module directly, but instead use the other Parser
modules. See instead the modules below.
Parser
, Parser.C
, Parser.Pike
, Parser.RCS
,
Parser.HTML
, Parser.XML
Low-level helpers for Parser.C
.
You probably want to use Parser.C
instead of this module.
Parser.C
, _Pike
.
array
(array
(string
)|string
) tokenize(string
code
)
Tokenize a string of C tokens.
Don't use this function directly.
Use Parser.C.tokenize()
instead.
Returns an array with an array with C-level tokens, and the remainder (a partial token), if any.
Low-level helpers for Parser.Pike
.
You probably want to use Parser.Pike
instead of this module.
Parser.Pike
, _C
.
array
(array
(string
)|string
) tokenize(string
code
)
Tokenize a string of Pike tokens.
Returns an array with Pike-level tokens and the remainder (a partial token), if any.
Low-level helpers for Parser.RCS
.
You probably want to use Parser.RCS
instead of this module.
Parser.RCS
array
(array
(string
)) tokenize(string
code
)
Tokenize a string of RCS tokens.
Don't use this function directly.
Use Parser.RCS.tokenize()
instead.
Parser.RCS.tokenize()