Packages vs. Components: The Dependency Problem.

The PHP landscape has fully transitioned into its Package Age™. Packagist is the main resource for retrieving new PHP packages: most frameworks split out packages from their main distribution to allow re-use outside of the framework. The world is a happy place.

DISCLAIMER: Based on the research of this blogpost, I've enclosed details to @seldaek, some of these issues might be resolved already or will be in the near future.

The package PHProblem.

However, due to PHP's nature, there are some problems. While packages are great for re-use outside of frameworks, dependencies are still an issue. Namespaces resolve conflicts between classnames, but they do not offer a solution to package versioning. Especially in a framework-context, this can become very problematic. A real-world-example for this is Guzzle.

Guzzle is a very popular tool to interact over the HTTP protocol. It provides a solid abstraction for dealing with all things HTTP. One might say, on a grand scale, it's the goto-tool for HTTP. In the last year or so Guzzle's architecture has changed radically. This led to new major versions, as well as a new vendor name all together. Why did this happen? Was there really a need for this? In short; yes.

Because of Guzzle's widespread adoption, releasing a new major version is not something to be taking lightly. Many packages out there depend on a specific version of Guzzle, which might not be the latest version. Take this into account in a larger scale and Guzzle may in fact be the blocking factor for two totally unrelated projects to co-exist within a project. For example; if package A relies on Guzzle 5 and package B relies on Guzzle 6, package A and B can't be installed in the same project. Guzzle has fixed this by publishing a new package. This, of course, is a problem. An unsolved problem in PHP.

While this problem is already pretty clear in a package context, it's implications get magnified when pulling it into a framework context.

But dependencies are good, m'kay.

If packages are created in order to re-use code, we can obviously conclude; re-using package is good! Right?

When creating packages, we want to target a specific task. Even though many things are, or can be, related. For example, if you want to create a backup manager, your main task is to create a backup. However, the generated backup will have to be stored somewhere too. Are you responsible for implementing all the storage adapters? Not really.

A more flexible approach would be to rely on a package that does this for you. The backup-manager package by @mitchellvw and @ShawnMcCool is a good example of this. Its main focus is dealing with backups, the storage handling is offloaded to Flysystem. This not only prevents them from having to deal with all these different implementations, it also improves the package's stability. Flysystem will adapt to external changes, so the backup-manager won't have to. In this case, re-use is wonderful, it solves a problem.

But what's the problem with components?!?!

Before going into the problem I'll define what I refer to as components.

While the terms are semantically the same I choose, for the purpose of illustration, to use different names for them. When I refer to components, I mean framework-related packages. Components are package, but packages are not always components. In the Symfony framework you can clearly identify components. They all have the word Component in the namespace. This can create problems in real-world scenario's.

I'm a fond user of frameworks. In my day to day life I work in Symfony 2, Laravel 4/5, and Silex applications. Not every project has the latest versions. Some projects are Symfony 2.3, others are 2.7. So what happens when I create a package, and you depend on a (framework) component? Let's investigate.

Frameworks bound dependency scoping

The most simple limitation we might encounter is a dependency collision. Since this is a pretty well-defined concept I won't go into it in great detail.

When framework X relies on component A at version 1.0, and package B relies on component A as version 2.0, we'll have an incompatible dependency. Composer is then unable to detect which version to install and exits with an error. As a result, we're unable to use package B when we're using framework X. This is annoying but not difficult to overcome. Perhaps a different implementation of the same concept matches our dependency scope or we help the package to upgrade its dependency. Creating your own solution is also still an option.

There is a silver lining to be found here. Composer was able to protect us from having incompatible versions. It's reliable, predictable, and secure.

Bundled component distributions

Another problem we might encounter is far more complex, leading to very unexpected behaviour. This occurs when framework re-publish parts of the framework as packages.

This common component publishing strategy, is called sub-splits. In this setup the components are developed within the main distribution of the frameworks, after which they're re-published as stand-alone components. Maintainers using this approach often use this method because the distribution process can be automated and thus involve less human work. That's perfectly legit, but what happens when we rely on their components?

Let's say you want to create a new fancy library which needs event handling. You could roll your own, but why not re-use something provided by the community? The obvious thing to do would be to visit packagist and search for events, sorted by most downloaded. This gives us an indication of what the php community uses. Symfony's EventDispatcher Component is one of the most downloaded packages, which is also heavily relied upon, so we'll use that.

The first thing to figure out, is which version to rely on. Instinctively we'll go with the latest stable release. At the time of writing it's 2.7, which also happens to be a LTS release. This seems like a sane choice: It's recent, and will be supported for a long time. However, we've already created a limitation by picking this version.

Imagine working on a legacy project. For those who don't know what that is like: it's a world where boundaries exist, you encounter limitations, you'll just have to Deal with It™. That project might be based on any given framework. What happens when it's a project based on Symfony? Should be a match made in heaven, right? Wrong. The imaginary project we're working on was based on Symfony 2.3. Our amazing package was built using version 2.7 of the event dispatcher. As a proof of concept, let's install the event-dispatcher package at version 2.7 and see what happens. We can easily simulate this in a couple of easy steps.

First we'll use the installer to create a Symfony 2.3 project and change into the newly created directory.

symfony new my_application 2.3  
cd my_application  

Now we'll install the latest version of the Event Dispatcher component.

composer require symfony/event-dispatcher  

It installed just fine. But what really happened? Are we able to use the correct version of the event-dispatcher? To find out what will be loaded we can look an the classmap generated through composer.

composer dump-autoload --optimize  

This command will dump out all the classes it's able to autoload. It's basically a lookup table mapping classnames to their location. Since this file contains a very large amount of classes, we'll grep it.

grep EventDispatcher vendor/composer/autoload_classmap.php  

Screenshot

This is where things start to become interesting. When we examine the output, notice how the root path to the package is different for certain classes. This shows us how the autoloader behaves when loading event related classes. In this particular case the Event class is loaded from the Symfony distribution, while a newly introduced TraceableEventDispatcher class will be loaded from the event-dispatcher package. So what are the actual ramifications?

  1. We can determine that we've successfully installed two versions of the same package.
  2. Classes from the same package will be loaded from different location.
  3. Code relying on 2.7 capabilities might receive the 2.3 implementation.
  4. Composer wasn't able to save us from this.

Conclusion

We've seen that there are already 2 very distinct cases where relying on framework components can limit or potentially corrupt your application. This is a big deal. But how can we solve this? Simply put there are two options.

  1. Don't rely on framework components.
  2. Don't rely on concrete implementations.
  3. Deal with it™.

The second option can go a long way. A great example of this is PSR-3 (Logging). Depending on this interface protects of from underlying dependency constraints, allowing us to move from one implementation to another. This is why PSR's are important.


photo by Linus Bohman