- Published on
Reverse a Multibyte String in PHP
- Authors

- Name
- Kevin van Zonneveld
- @kvz
PHP's strrev is not safe to use on utf-8 strings because it reverses a string one byte at a time. So if a character consists of multiple bytes it cannot be preserved as an entity in the reversed result.
There is no Multibyte String alternative
to strrev either.
We did some googling, but strangely enough all solutions we encountered were either invalid or incredibly heavy memory/code wise.
For example:
- using utf8_decode only works if your characters in the string exist in the ISO-8859-1 character set
- using preg_match_all seems weirdly over-engineered
- a simpler preg_match_all works, but on a 2MB string PHP was already using 150MB of memory. This is actually what sparked our search when when @renan_saddam noticed his PHP port of Github's email_reply_parser choked on a 2MB multibyte email.
What We Came Up With
Is dead simple, but I'm putting it online anyway since it's apparently not common good.
<?php
function mb_strrev ($string, $encoding = null) {
if ($encoding === null) {
$encoding = mb_detect_encoding($string);
}
$length = mb_strlen($string, $encoding);
$reversed = '';
while ($length-- > 0) {
$reversed .= mb_substr($string, $length, 1, $encoding);
}
return $reversed;
}
?>
Example:
<?php
echo strrev('Gonçalves') . "\n"; // returns sevla??noG
echo mb_strrev('Gonçalves') . "\n"; // returns sevlaçnoG
?>
In our tests, the above function was factor 5x more efficient in regards to memory consumption than the preg_match_all solution.
Hope this helps
Legacy Comments (5)
These comments were imported from the previous blog system (Disqus).
how does
while ($length-- > 0) {
...
}
compare to
while ($length > 0) {
...
$length--;
}
while ($length-- > 0) // ranges from 9 -> 0, necessary to get the right mb_substr positions
while ($length > 0) // ranges from 10 -> 1, so would require extra code to subtract 1, and it already requires an extra line for $length--
fair enough, thanks
This one works for me.
function string_reverse ($string) {
/// Multibyte String Reverse
return \implode (null, \array_reverse (\preg_split ('//u', $string, null, PREG_SPLIT_NO_EMPTY)));
}
This will break for strings containing accents or other combining marks.
Eg: y̅a becomes a̅y with this function.