When designing systems, there is no one size fits all solution.
That's because these systems need be used in different contexts and therefore need to showcase different characteristics.
The same goes for the offline characteristic of software capabilities. There is a spectrum of online / offline needs that each require a different design.
Here are my recommendations, when it comes to architectural styles for designing systems that need to run offline:
- For always online capabilities, use the CQRS architectural style.
- For capabilities that run always offline, consider client side event sourcing
- When a capability needs to deal with occasional network drops, augment the CQRS architecture architecture with the cache aside and the store and forward patterns
- When a capabilitiy needs to go online occasionally, augment the client side event sourcing pattern with replication
Changing contexts
These style recommendations are very usefull when you know the exact context that a capability will be used in.
Certain capabilities however will be used in different contexts and need to adapt their behavior accordingly.
One such example in ClubManagement is booking orders in the fundraising capability.
When people book orders there are 3 possible contexts that this capability is used in
- Always online: When people place their orders via the club website, remote validation is required to prevent them from cheating (e.g. by adjusting item prices). Therefor CQRS is the best style to use in this context.
- Occasionally offline/Occasionally online: When the fundraising manager is booking orders on behalf of other people via her administrative app, then validation can happen on the client side. As I can trust this person, as she is working on orders for her own club and has full control of the data anyway, the design could gravitate more to the occasionally online side of the spectrum. Yet near realtime feedback of incoming orders is desired.
- Occasionally online: When the checkin clerck is processing orders on the fundraising event itself, and a last minute order comes in, the capability needs to work offline for up to 8 hours and will only go online again after the fundraising event ended. Therefor the client side validation is a must, and so is replication as conflicts in this 8 hour time window are likely.
So now the question becomes, is it possible to mix and match these styles to accomodate for changing needs?
Connection points
In short: Yes! It is possible to combine the architectural styles when you have to.
The drawing below shows all of them together, with two sections highlighted using dotted rectangles.
These are the points where you can connect the styles together.
Registry & cache aside
On the client an occasionally offline read side, implemented by the cache aside pattern, can be combined with an occasionally online write side, implemented by client side event sourcing which uses the cache aside as a registry.
This combination is great for contexts where you can trust the user, e.g. the management app used by the fundraising manager. By combining these patterns, the fundraising manager would get new bookings from the club website as they come in while she's online, but when her connection would drop she can still continue entering other bookings without any impact.
Note: For the checkin clerck's app, this design is also an option, but you'd have to add a way to manually refresh the data in the registry instead of trying to refresh it on every cache hit/miss. That load operation would be timing out for 8h straight, which is quite a waste of time and resources.
Aggregate Root
Combining an always online context with an occasionally offline one, can be done server side by adding both command handling and event replication capabilities to the aggregate root.
When exposing the same capabilities to different contexts, it is likely to run into conflicts. How to resolve these conflicts between the two sides depends on the nature of the capability. But usually, the side where event replication can be supported will also be the side with more trustworthiness and authority, allowing you to more often than not resolve the conflict by 'rebaseing' the replicated events on top of the event emitted by handling the command.
In the above example this means that the events emitted by the administrative apps would be appended after the events emitted by the public website, and therefor they would overrule the commands result.
Keep it simple
Note: It's not because you can combine all these patterns together that you also should.
Keep it as simple as the context(s) require!