PHP Arrays and Strings: Naughty Merge

Leo Sjöberg • November 19, 2015

Note: I'm working in Laravel, so I will be referring to collections. However, the benchmark itself was done in pure PHP.

My need

Just a couple hours ago, I was in the following scenario: I want to allow my users to enter their skills as a comma-separated list, and then save it to database. However, for sake of sorting, it would be nice if people entered the same skills (imagine some entering database and others databases, and they then end up two different skills). So I thought, hey, I'll query my database for what other users have already entered, and give those as suggestions as the user types! There's of course one problem with this: my values are comma-separed in a normal TEXT field in a MySQL database. Of course, the JavaScript I'm passing it to would rather have a nice json array. So I started thinking...

First, I just thought about what needs to be done. Obviously, regardless of how I solve the issue, I must retrieve the data from the database, this will be an array of comma-separated lists, something along the lines of

1[
2 'programming,php,mysql,laravel',
3 'mongo,database design,nosql,mysql,ruby,java',
4 ...
5]

In the end, I want to pass it in the following form to my javascript:

1[
2 'programming',
3 'php',
4 'mysql',
5 'laravel',
6 'mongo',
7 'database design',
8 'nosql',
9 'mysql',
10 'ruby',
11 'java',
12 ...
13]

Solutions...

So instantly I thought to myself

Hey, how about I just run explode on every string, and then merge the arrays together? That'd be awesome!

So I did. The implementation looked as follows (using Laravel's Collection class):

1$data = $collection->map(function ($list) {
2 return explode(',', $list);
3})->collapse()->unique();

This would run an array_map over the items, and then array_merge all these arrays together, and last but not least filter out the duplicates. However, in the middle of doing this, I thought to myself:

But you could also concatenate all strings by simply reducing the array, and then exploding one huge array...

I didn't write the implementation then, but it would look like this:

1$data = $collection->reduce(function ($carry, $list) {
2 return $carry.$list.',';
3}, '');
4$data = array_unique(explode(',', $data));

That got me thinking: which solution performs better? So I asked in Larachat. Well, no one knew, but I was suggested to just write a quick performance test, so I did. I skipped the array_unique in my performance test, since it would be identical in both solutions.

Benchmarking

My goal was to, given a large array of relatively short comma-separated lists, see what would perform better: creating lots of small arrays and merging them, or concatenating the strings and then creating one large array straight away.

After having done this, I also tested with fewer but larger lists, making sure to maintain the same total number of list elements.

Configurations

  1. 5000 lists of 15 items
  2. 50 lists of 1500 items
  3. 5 lists of 15000 items

The code

To benchmark this, I used the following code:

1<?php
2 
3// Make sure we don't run out of memory
4ini_set('memory_limit', '512M');
5 
6$array = [];
7$numberOfLists = 5000;
8$itemsPerList = 15;
9 
10echo "Seeding array\n";
11 
12// Create an array of 5000 comma-separated
13// lists with 15 varying items per list.
14for ($i = 0; $i < $numberOfLists; $i++) {
15 $temp = '';
16 for ($j = 0; $j < $itemsPerList; $j++) {
17 if (! empty ($temp)) {
18 $temp .= ',';
19 }
20 $temp .= 'item'.$j;
21 }
22 
23 $array[] = $temp;
24}
25 
26echo "Array seeded\n";
27 
28/*
29 * Start the benchmark
30 */
31 
32$start1 = microtime(true);
33 
34// Create an array of arrays
35$exploded = array_map(function ($string) {
36 return explode(',', $string);
37}, $array);
38 
39// Merge the arrays
40$merged = [];
41 
42foreach ($exploded as $array) {
43 // It appears this operation is what really takes time.
44 $merged = array_merge($merged, $array);
45}
46 
47$explodeAndMerge = microtime(true) - $start1;
48 
49$start2 = microtime(true);
50 
51$reduced = array_reduce($array, function ($carry, $list) {
52 return $carry.$list.',';
53}, '');
54 
55$exploded = explode(',', $reduced);
56 
57$reduceAndExplode = microtime(true) - $start2;
58 
59echo "Benchmark complete\n".
60 "==================\n".
61 "Explode and merge: {$explodeAndMerge}\n".
62 "Reduce and explode: {$reduceAndExplode}\n";

Results

Using abbreviation EM for explode and merge, and RE for reduce and explode

For each of the different benchmarks, the averages are:

  1. EM: 22.9s, RE: 0.0026s
  2. EM: 0.21s, RE: 0.0027s
  3. EM: 0.04s, RE: 0.12s

Conclusions

From the above results, I have concluded a few things. First off: if you're gonna do what I do, definitely go for reduce and explode. What appears to take most time in the operation is the array_merge. However, judging by the other results, this becomes a diminishing problem as the number of arrays to merge is decreased.

Between experiments 1 and 2, we had a hundredfold decrease in the number of array merges. This also lead to approximately a hundredfold speedup. This indicates that the speed of array_merge is pretty much independent from number of array elements (which also makes sense from a low-level perspective).

Another interesting conclusion that can be drawn from the results is the one from the final round of benchmarking: it's more efficient to run explode and array_merge five times than to concatenate a string 5 times and explode once.

Update: A Third Solution

After having posted this, and linked to it in Larachat, Joseph Silber pointed out something very relevant:

Why not just implode and explode?

After trying this, I can happily add that calling

1explode(',', implode($array));

will give the exact same output, but is much faster. For the different benchmarks, we see the times:

  1. 2.27E-5 (0.000027s)
  2. 0.00030s
  3. 0.0026s

As can be seen, this performs exceptionally well when you have multiple small arrays.