PHP Arrays and Strings: Naughty Merge
Leo Sjöberg • November 19, 2015
Note: I'm working in Laravel, so I will be referring to collections. However, the benchmark itself was done in pure PHP.
My need
Just a couple hours ago, I was in the following scenario: I want to allow my users to enter their skills as a comma-separated list, and then save it to database. However, for sake of sorting, it would be nice if people entered the same skills (imagine some entering database
and others databases
, and they then end up two different skills). So I thought, hey, I'll query my database for what other users have already entered, and give those as suggestions as the user types! There's of course one problem with this: my values are comma-separed in a normal TEXT
field in a MySQL database. Of course, the JavaScript I'm passing it to would rather have a nice json array. So I started thinking...
First, I just thought about what needs to be done. Obviously, regardless of how I solve the issue, I must retrieve the data from the database, this will be an array of comma-separated lists, something along the lines of
1[2 'programming,php,mysql,laravel',3 'mongo,database design,nosql,mysql,ruby,java',4 ...5]
In the end, I want to pass it in the following form to my javascript:
1[ 2 'programming', 3 'php', 4 'mysql', 5 'laravel', 6 'mongo', 7 'database design', 8 'nosql', 9 'mysql',10 'ruby',11 'java',12 ...13]
Solutions...
So instantly I thought to myself
Hey, how about I just run
explode
on every string, and then merge the arrays together? That'd be awesome!
So I did. The implementation looked as follows (using Laravel's Collection
class):
1$data = $collection->map(function ($list) {2 return explode(',', $list);3})->collapse()->unique();
This would run an array_map
over the items, and then array_merge
all these arrays together, and last but not least filter out the duplicates. However, in the middle of doing this, I thought to myself:
But you could also concatenate all strings by simply reducing the array, and then exploding one huge array...
I didn't write the implementation then, but it would look like this:
1$data = $collection->reduce(function ($carry, $list) {2 return $carry.$list.',';3}, '');4$data = array_unique(explode(',', $data));
That got me thinking: which solution performs better? So I asked in Larachat. Well, no one knew, but I was suggested to just write a quick performance test, so I did. I skipped the array_unique
in my performance test, since it would be identical in both solutions.
Benchmarking
My goal was to, given a large array of relatively short comma-separated lists, see what would perform better: creating lots of small arrays and merging them, or concatenating the strings and then creating one large array straight away.
After having done this, I also tested with fewer but larger lists, making sure to maintain the same total number of list elements.
Configurations
- 5000 lists of 15 items
- 50 lists of 1500 items
- 5 lists of 15000 items
The code
To benchmark this, I used the following code:
1<?php 2 3// Make sure we don't run out of memory 4ini_set('memory_limit', '512M'); 5 6$array = []; 7$numberOfLists = 5000; 8$itemsPerList = 15; 9 10echo "Seeding array\n";11 12// Create an array of 5000 comma-separated13// lists with 15 varying items per list.14for ($i = 0; $i < $numberOfLists; $i++) {15 $temp = '';16 for ($j = 0; $j < $itemsPerList; $j++) {17 if (! empty ($temp)) {18 $temp .= ',';19 }20 $temp .= 'item'.$j;21 }22 23 $array[] = $temp;24}25 26echo "Array seeded\n";27 28/*29 * Start the benchmark30 */31 32$start1 = microtime(true);33 34// Create an array of arrays35$exploded = array_map(function ($string) {36 return explode(',', $string);37}, $array);38 39// Merge the arrays40$merged = [];41 42foreach ($exploded as $array) {43 // It appears this operation is what really takes time.44 $merged = array_merge($merged, $array);45}46 47$explodeAndMerge = microtime(true) - $start1;48 49$start2 = microtime(true);50 51$reduced = array_reduce($array, function ($carry, $list) {52 return $carry.$list.',';53}, '');54 55$exploded = explode(',', $reduced);56 57$reduceAndExplode = microtime(true) - $start2;58 59echo "Benchmark complete\n".60 "==================\n".61 "Explode and merge: {$explodeAndMerge}\n".62 "Reduce and explode: {$reduceAndExplode}\n";
Results
Using abbreviation EM for explode and merge, and RE for reduce and explode
For each of the different benchmarks, the averages are:
- EM: 22.9s, RE: 0.0026s
- EM: 0.21s, RE: 0.0027s
- EM: 0.04s, RE: 0.12s
Conclusions
From the above results, I have concluded a few things. First off: if you're gonna do what I do, definitely go for reduce and explode. What appears to take most time in the operation is the array_merge
. However, judging by the other results, this becomes a diminishing problem as the number of arrays to merge is decreased.
Between experiments 1 and 2, we had a hundredfold decrease in the number of array merges. This also lead to approximately a hundredfold speedup. This indicates that the speed of array_merge
is pretty much independent from number of array elements (which also makes sense from a low-level perspective).
Another interesting conclusion that can be drawn from the results is the one from the final round of benchmarking: it's more efficient to run explode
and array_merge
five times than to concatenate a string 5 times and explode
once.
Update: A Third Solution
After having posted this, and linked to it in Larachat, Joseph Silber pointed out something very relevant:
Why not just implode and explode?
After trying this, I can happily add that calling
1explode(',', implode($array));
will give the exact same output, but is much faster. For the different benchmarks, we see the times:
- 2.27E-5 (0.000027s)
- 0.00030s
- 0.0026s
As can be seen, this performs exceptionally well when you have multiple small arrays.