Regex
So here's the MEGA regex I came up with:
s* # white spaces
########################## KEYS START ##########################
(?: # We'll use this to make keys optional
(?P<keys> # named group: keys
d+ # match digits
| # or
"(?(?=")..|[^"])*" # match string between "", works even 4 escaped ones "hello " world"
| # or
'(?(?=\\')..|[^'])*' # match string between '', same as above :p
| # or
$w+(?:[(?:[^[]]|(?R))*])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
s* # white spaces
=> # match =>
)? # make keys optional
s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
d+ # match digits
| # or
"(?(?=")..|[^"])*" # match string between "", works even 4 escaped ones "hello " world"
| # or
'(?(?=\\')..|[^'])*' # match string between '', same as above :p
| # or
$w+(?:[(?:[^[]]|(?R))*])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
arrays*((?:[^()]|(?R))*) # match an array()
| # or
[(?:[^[]]|(?R))*] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:functions+)?w+s* # match functions: helloWorld, function name
(?:((?:[^()]|(?R))*)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:s*uses*((?:[^()]|(?R))*)s*)? # match use(&$var), use($foo, $bar) (optionally)
{(?:[^{}]|(?R))*} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
s* # white spaces
I've put some comments, note that you need to use 3 modifiers:
x
: let's me make comments
s
: match newlines with dots
i
: match case insensitive
PHP
$code='array(0 => "a", 123 => 123, $_POST["hello"]['world'] => array("is", "actually", "An array !"), 1234, 'got problem ?',
"a" => $GlobalScopeVar, $test_further => function test($noway){echo "this works too !!!";}, "yellow" => "blue",
"b" => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3)), "c" => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
"bug", "fixed", "mwahahahaa" => "Yeaaaah"
);'; // Sample data
$code = preg_replace('#(^s*arrays*(s*)|(s*)s*;?s*$)#s', '', $code); // Just to get ride of array( at the beginning, and ); at the end
preg_match_all('~
s* # white spaces
########################## KEYS START ##########################
(?: # We'll use this to make keys optional
(?P<keys> # named group: keys
d+ # match digits
| # or
"(?(?=")..|[^"])*" # match string between "", works even 4 escaped ones "hello " world"
| # or
'(?(?=\\')..|[^'])*' # match string between '', same as above :p
| # or
$w+(?:[(?:[^[]]|(?R))*])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
) # close group: keys
########################## KEYS END ##########################
s* # white spaces
=> # match =>
)? # make keys optional
s* # white spaces
########################## VALUES START ##########################
(?P<values> # named group: values
d+ # match digits
| # or
"(?(?=")..|[^"])*" # match string between "", works even 4 escaped ones "hello " world"
| # or
'(?(?=\\')..|[^'])*' # match string between '', same as above :p
| # or
$w+(?:[(?:[^[]]|(?R))*])* # match variables $_POST, $var, $var["foo"], $var["foo"]["bar"], $foo[$bar["fail"]]
| # or
arrays*((?:[^()]|(?R))*) # match an array()
| # or
[(?:[^[]]|(?R))*] # match an array, new PHP array syntax: [1, 3, 5] is the same as array(1,3,5)
| # or
(?:functions+)?w+s* # match functions: helloWorld, function name
(?:((?:[^()]|(?R))*)) # match function parameters (wut), (), (array(1,2,4))
(?:(?:s*uses*((?:[^()]|(?R))*)s*)? # match use(&$var), use($foo, $bar) (optionally)
{(?:[^{}]|(?R))*} # match { whatever}
)?;? # match ; (optionally)
) # close group: values
########################## VALUES END ##########################
s* # white spaces
~xsi', $code, $m); // Matching :p
print_r($m['keys']); // Print keys
print_r($m['values']); // Print values
// Since some keys may be empty in case you didn't specify them in the array, let's fill them up !
foreach($m['keys'] as $index => &$key){
if($key === ''){
$key = 'made_up_index_'.$index;
}
}
$results = array_combine($m['keys'], $m['values']);
print_r($results); // printing results
Output
Array
(
[0] => 0
[1] => 123
[2] => $_POST["hello"]['world']
[3] =>
[4] =>
[5] => "a"
[6] => $test_further
[7] => "yellow"
[8] => "b"
[9] => "c"
[10] =>
[11] =>
[12] => "mwahahahaa"
[13] => "this is"
)
Array
(
[0] => "a"
[1] => 123
[2] => array("is", "actually", "An array !")
[3] => 1234
[4] => 'got problem ?'
[5] => $GlobalScopeVar
[6] => function test($noway){echo "this works too !!!";}
[7] => "blue"
[8] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
[9] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[10] => "bug"
[11] => "fixed"
[12] => "Yeaaaah"
[13] => "a test"
)
Array
(
[0] => "a"
[123] => 123
[$_POST["hello"]['world']] => array("is", "actually", "An array !")
[made_up_index_3] => 1234
[made_up_index_4] => 'got problem ?'
["a"] => $GlobalScopeVar
[$test_further] => function test($noway){echo "this works too !!!";}
["yellow"] => "blue"
["b"] => array("nested"=>array(1,2,3), "nested"=>array(1,2,3),"nested"=>array(1,2,3))
["c"] => function() use (&$VAR) { return isset($VAR) ? "defined" : "undefined" ; }
[made_up_index_10] => "bug"
[made_up_index_11] => "fixed"
["mwahahahaa"] => "Yeaaaah"
["this is"] => "a test"
)
Online regex demo
Online php demo
Known bug (fixed)
$code='array("aaa", "sdsd" => "dsdsd");'; // fail
$code='array('aaa', 'sdsd' => "dsdsd");'; // fail
$code='array("aaa", 'sdsd' => "dsdsd");'; // succeed
// Which means, if a value with no keys is followed
// by key => value and they are using the same quotation
// then it will fail (first value gets merged with the key)
Online bug demo
Credits
Goes to Bart Kiers for his recursive pattern to match nested brackets.
Advice
You maybe should go with a parser since regexes are sensitive. @bwoebi has done a great job in his answer.