PHP: Regular Expressions

Questions about programming languages and debugging
Post Reply
User avatar
Gogeta70
^_^
^_^
Posts: 3275
Joined: 25 Jun 2005, 16:00
19

PHP: Regular Expressions

Post by Gogeta70 »

Well, i know you may be a bit scared of it, with all those random characters, etc. Well, honestly, only rarely will regex get complicated.

I was a little scared myself, i didn't understand it. Most tutorials out there for regex aren't that great. They either go into to much detail and get confusing, or don't have enough info. However, i'm here to help you out, and set you on the right path. I doubt you believe this, but regex is alot different than just using the 'substr' function or any other string functions for that matter, simply because it kind of allows YOU to craft how to find what you're looking for. For instance, only with regex can you find every letter 'a' that comes after a ':'. Anyway, i've made a small code that searches parts of the 50 states of the usa, depending on if you're looking for it in the beginning, middle, or end. The code will explain itself. To understand the regex, read this small tutorial to understand what the symbols do. Don't worry, it'll take a few days for you to get it all, unless you're a natural. I wasn't.

Tutorial - Click Here

The form:

Code: Select all

<form method="post" action="eval.php">
Find: <input type=text name="find"> In:
<select name='what'>
<option>Beginning</option>
<option>Middle</option>
<option>End</option>
</select>
<input type='submit' value='Sort'>
</form>
The PHP:

Code: Select all

<pre>
<?php

$find = $_POST['find'];
$func = $_POST['what'];
$list = " Alabama Alaska Arizona Arkansas California Colorado Connecticut Delaware Florida Georgia Hawaii Idaho Illinois Indiana Iowa Kansas Kentucky Louisiana Maine Maryland Massachusetts Michigan Minnesota Mississippi Missouri Montana Nebraska Nevada NewHampshire NewJersey NewMexico NewYork NorthCarolina NorthDakota Ohio Oklahoma Oregon Pennsylvania RhodeIsland SouthCarolina SouthDakota Tennessee Texas Utah Vermont Virginia Washington WashingtonDC WestVirginia Wisconsin Wyoming ";

if($func == 'Beginning')
	beginning();
if($func == 'Middle')
	middle();
else
	endd();
die();

function beginning()
{
global $find, $list;

$pat = "# ".$find."[a-zA-Z]+#";
preg_match_all($pat, $list, $match);
print_r($match);
}

function middle()
{
global $find, $list;

$pat = "#[a-zA-Z]+".$find."[a-zA-Z]+#";
preg_match_all($pat, $list, $match);
print_r($match);
}

function endd()
{
global $find, $list;

$pat = "#[A-Za-z]+".$find." #";
preg_match_all($pat, $list, $match);
print_r($match);
}
?>
Another useful link is the Ascii Table, which tells you the order of the characters, when using the brackets. '[' and ']' So, for instance, [!-%] would include '!"#$%' in that order. Now, for a demonstration of the code i showed above. http://fatalh.sytes.net/regex/

On a side note, for those with experience, tell me what you think of this 'presentation', please.
¯\_(ツ)_/¯ It works on my machine...

User avatar
FrankB
Ph. D. in Sucko'logics
Ph. D. in Sucko'logics
Posts: 315
Joined: 06 Mar 2006, 17:00
18
Location: Belgistahn
Contact:

Re: PHP: Regular Expressions

Post by FrankB »

gogeta70 wrote: On a side note, for those with experience, tell me what you think of this 'presentation', please.
Well, that was a nice introduction to PHP builtin preg_match() function but not really an introduction to REGEX : try "#" or "*" in your form and see how prech_match_all() gets confused, not the regular expression though : that checks only for matching characters from aZ, case insensitive. Which can be regexed as follows :
$match=~m/\w/i/g; (alphanumeric,case-insensitive and globally, that is: ^begin, (middle),end$)

And even litterally [a-zA-Z] and ["abcd"] give results. (I don't know any state who has "abcd" in it but your script returns a list of 22. :-)
-> *always* check your input for validation, always,always,always.
And how to do that : those very regular expressions

So, not quite much to do with REGEX but still, a nice and cute script for mediate/beginners.
Good, Gogeta, continue like that.
I'd give you an extra point for having put the functions start,middle,end in you script. For that is essential in REGEX.

--
FrankB
Last edited by FrankB on 06 Jul 2006, 05:27, edited 1 time in total.

User avatar
bad_brain
Site Owner
Site Owner
Posts: 11638
Joined: 06 Apr 2005, 16:00
19
Location: In your eye floaters.
Contact:

Post by bad_brain »

:|
*is still a bloody php newb and understands just about 30% of the post*








but anyway, well done gogeta... :)

User avatar
FrankB
Ph. D. in Sucko'logics
Ph. D. in Sucko'logics
Posts: 315
Joined: 06 Mar 2006, 17:00
18
Location: Belgistahn
Contact:

Post by FrankB »

bad_brain wrote::|
*is still a bloody php newb and understands just about 30% of the post*
but anyway, well done gogeta... :)
REGEX is indeed a whole another realm.
Some dedicate entire books to it and it has its own dialects..:-(
Those used in SQL are not the same as those in Perl or JavaScript.
There are always those eachy tiny differences.


--
FrankB

User avatar
Gogeta70
^_^
^_^
Posts: 3275
Joined: 25 Jun 2005, 16:00
19

Post by Gogeta70 »

Thank you guys. And, FrankB, i consider this an introduction to regex, showing some simple functions of it, and how it can be used. However, i suppose you're right, that it also is kind of a tutorial on using 'preg_match_all();'.

Edit: Wait, frankb, you're saying you tryed '[abcd]' in the form? Because '[abcd]' means to look for a, b, c, OR, d. Not all of them.
¯\_(ツ)_/¯ It works on my machine...

User avatar
FrankB
Ph. D. in Sucko'logics
Ph. D. in Sucko'logics
Posts: 315
Joined: 06 Mar 2006, 17:00
18
Location: Belgistahn
Contact:

Post by FrankB »

gogeta70 wrote:
Edit: Wait, frankb, you're saying you tryed '[abcd]' in the form? Because '[abcd]' means to look for a, b, c, OR, d. Not all of them.
You are right, .. and then not totally correct but it gets really complicated in the semantics.
So, maybe not in PHP (i *never* use PHP for REGEX, my preffered dialect is Perl's REGEX).

In our example , finding a match of [abcd] would search for a,ab,ac,ad,b,ba,bc,bd,c,ca,cb,cd,d,da,db,dc,abcd,abc,bcd,cba,dbc (a whole matrix), not like grouped expression (Gogeta70|gogeta70) which returns eithen none or one exact match.
This cutie amongst thousand of others point on the difficulty of character classes ([whatever].
http://www.troubleshooters.com/codecorn ... erlreg.htm

Glad to see you are in the realms of REGEX, Gogeta, it is exactly there were linguistics and programming meet and it is a wonderful but bizarre world.

Many don't have an idea how dangerous and how helpful Regular Expressions are but it is certainly a thrive, a *very*, *very* powerful tool to handle forms, databases and megbytes of ASCII code or text files if not entire filesystems : it is common for UNIX administrators to have typed the wrong expression after "rm", "mv", "link", etc.. causing nightmares and careeer disasters.

On the other hand, using PHP to `parse` and process regular expressions seems a very dangerous and yet a very easy-peasy thing, it all depends on how powerful and privileged you have PHP set in it's *.ini file and its PID.

Keep on REGEXIN' mate :-)

--
FrankB.no0Bb

User avatar
Gogeta70
^_^
^_^
Posts: 3275
Joined: 25 Jun 2005, 16:00
19

Post by Gogeta70 »

Well, PHP is based off of PERL, so i'm surprised that the regex is different at all.

I enjoy regexing mainly because it actually makes me think about what i'm doing and how i'm doing it, or how i'm going to do it. It's one of the most complicated parts of PHP. Regex is dangerous, but very helpful if used properly, almost like fire. It can cook your food, or it can give you 3rd degree burns over large areas of your body.

Regex is also a big deal in forum exploitations, alot of hacked websites are because of an insecure regular expression match, which makes regex so dangerous. I guess that's why i love it so much.

Anyway, if i had to suggest one thing for somone starting regex, i'd suggest that you be VERY specific in which your regex does, then have a few '1337' programmers/hackers look over your code, like frank or I. That's the best way to secure a code.
¯\_(ツ)_/¯ It works on my machine...

Post Reply