Php/regex Help
by Chris "DiGi" Timberlake · in General Discussion · 06/19/2008 (4:20 pm) · 2 replies
Ok; so can anyone point me to some good RegEx tutorials. Or ways to make a parsing system in PHP. I'm trying to parse this in HTML; and have PHP Replace it.
Want To parse:
TemplateName(Variable); ENDPARSING> <-- In HTML
and have PHP read the TemplateAPI part then tell PHP its using a TemplateAPI; then read the next part for info. Possibly in the future i'd want add variable like above.
Want To parse:
and have PHP read the TemplateAPI part then tell PHP its using a TemplateAPI; then read the next part for info. Possibly in the future i'd want add variable like above.
About the author
#2
The cdt: part at the begining of the tag creates a namespace for your tags - marks them as your own..
The tagname part is the name of the tag - depending on this, your php script can do various things, i'll show where and how...
The #tagid tag is optional, makes it easier to tell what a function is doing, and if you're using embedded content (see below) then it's easier for the parser to find the correct closing tag if the same type of tag is embedded within the actual tag..
The key / value pairs store the parameters for your tags.. you can have any number of them...
This script lets you use " or ' to hold the value. You can vary these within one tag, but must use the same type to close as what you used when it was open. Ie.:
You could also use this tag like this:
And the code below would give you the text content as if you had it as the value of a key called "content". Same goes for other, non-key stuff: name of the tag goes into "type", and the optional tagid into "id".
So use the following code for parsing. Make sure you change tagname1|tagname2|tagname3 to the list of possible tag names you want to use - and also the processing part at the end of the code. Before I forget, do pass the template first into $template, don't leave it an empty string.
So when you run this, it gives you an array called $tagparams, which has the following:
$tagparams["type"] - name of the tag found
$tagparams["id"] - optional tagid
$tagparams["isblock"] - boolean (true or false) - shows whether the tag has embedded contents
$tagparams["contents"] - embedded content / html code
$tagparams[parametername] - contains any value assigned to parametername
^^^ use these in the processing sections of the different tags you want the parser to be looking for.
This code is UTF-8 friendly, so you can parse unicode html text with it without having to worry about multibyte characters. You however need the mbstrings module for that. If you don't have this module in your php distro, just replace all occurences of "mb_" to "" in the source.
After the while loop, you should have your $template with all your tags replaced.
I haven't run the code, though most of it should be functional. :) Let me know if this helps, or if you need further help.
Edit: added a part for replacing content in the template
06/24/2008 (4:58 am)
You could as well do it nice, and make it XHTML-like, here's a simple yet well expandable solution that I came up with:<html> <body> <cdt:tagname#tagid key="value" key2="value2" /> </body> </html>
The cdt: part at the begining of the tag creates a namespace for your tags - marks them as your own..
The tagname part is the name of the tag - depending on this, your php script can do various things, i'll show where and how...
The #tagid tag is optional, makes it easier to tell what a function is doing, and if you're using embedded content (see below) then it's easier for the parser to find the correct closing tag if the same type of tag is embedded within the actual tag..
The key / value pairs store the parameters for your tags.. you can have any number of them...
This script lets you use " or ' to hold the value. You can vary these within one tag, but must use the same type to close as what you used when it was open. Ie.:
<cdt:var key='value' par1="TemplateAPI->TemplateName(Variable)" />
You could also use this tag like this:
<cdt:tagname#tagid key="value" key2="value2">
long textual, html content or embedded tags
</cdt:tagname#tagid>And the code below would give you the text content as if you had it as the value of a key called "content". Same goes for other, non-key stuff: name of the tag goes into "type", and the optional tagid into "id".
So use the following code for parsing. Make sure you change tagname1|tagname2|tagname3 to the list of possible tag names you want to use - and also the processing part at the end of the code. Before I forget, do pass the template first into $template, don't leave it an empty string.
$template = ""; // pass html code into this variable
$blktrans = Array(
"[" => "\[", "]" => "\]", "\" => "\\", "{" => "\{", "}" => "\}", "(" => "\(", ")" => "\)", "*" => "\*",
"+" => "\+", "?" => "\?", "." => "\.", "^" => "\^", "$" => "\$", "|" => "\|", "<" => "\<", ">" => "\>",
"\"" => "\\"", "'" => "\'", ":" => "\:", "#" => "\#", ";" => "\;", "=" => "\=", "/" => "\/",
);
$repattern = "/(\<cdt\:(tagname1|tagname2|tagname3))( |#){1}([^<>\"']*)(((\=\"[^\"<>]*\")|(\=\'[^'<>]*\'))([^>\"'=]*))*(\>)/Usi";
while (preg_match($repattern, $template, $matches)) {
$tagparams = Array();
$pars = trim(mb_substr($matches[0], 5, mb_strlen($matches[0])-6), " /");
$tagsfirst = mb_substr($pars, 0, mb_strpos($pars, " "));
$tagid = explode("#", $tagsfirst);
$tagparams["type"] = $tagid[0];
$tagparams["id"] = ($tagid[1]) ? $tagid[1] : "";
if (mb_substr($matches[0], mb_strlen($matches[0])-2, 1)=="/") {
// has no closing tag
$tagparams["isblock"] = false;
$tagparams["contents"] = "";
$fullblock = str_replace("'", "\'", $matches[0]);
} else {
// there is a possible closing tag
$blkheader = $matches[0];
$blkheader = strtr($blkheader, $blktrans); // escapes chars for regex
$blkpattern = "/".$blkheader."(.*)\<\/cdt\:".$tagparams["type"].(($tagparams["id"]!="") ? "\#".$tagparams["id"] : "").">/Usi";
if (preg_match($blkpattern, $template, $finds)) {
// tag closure found
$tagparams["isblock"] = true;
$tagparams["contents"] = $finds[1];
$fullblock = $finds[0];
} else {
// unclosed tag, treated as a tag without contents
$tagparams["isblock"] = false;
$tagparams["contents"] = "";
$fullblock = str_replace("'", "\'", $matches[0]);
}
}
$parampattern = "/[a-z0-9]+=((\"([^\"]*)\")|('([^']*)'))/Usi";
preg_match_all($parampattern, $matches[0], $parammatches, PREG_PATTERN_ORDER);
foreach ($parammatches[0] as $pkey => $pval) {
$pv = explode("=", $pval);
$pvkey = $pv[0];
$pvrest = implode("=",array_slice($pv,1));
$pvrest = str_replace("<", "<", $pvrest);
$pvrest = str_replace(">", ">", $pvrest);
$pvval = mb_substr($pvrest, 1, mb_strlen($pvrest)-2);
$tagparams[$pvkey] = $pvval;
}
// process tags
switch ($tagparams["type"]) {
"tagname1":
// your code
$funcret = ""; // this is a value that should replace the tag
break;
"tagname2":
// your code
$funcret = ""; // this is a value that should replace the tag
break;
"tagname3":
// your code
$funcret = ""; // this is a value that should replace the tag
break;
}
$template = str_replace($fullblock, $funcret, $template);
}So when you run this, it gives you an array called $tagparams, which has the following:
$tagparams["type"] - name of the tag found
$tagparams["id"] - optional tagid
$tagparams["isblock"] - boolean (true or false) - shows whether the tag has embedded contents
$tagparams["contents"] - embedded content / html code
$tagparams[parametername] - contains any value assigned to parametername
^^^ use these in the processing sections of the different tags you want the parser to be looking for.
This code is UTF-8 friendly, so you can parse unicode html text with it without having to worry about multibyte characters. You however need the mbstrings module for that. If you don't have this module in your php distro, just replace all occurences of "mb_" to "" in the source.
After the while loop, you should have your $template with all your tags replaced.
I haven't run the code, though most of it should be functional. :) Let me know if this helps, or if you need further help.
Edit: added a part for replacing content in the template
Torque 3D Owner Jacob Fike
Avalon Labs LLC