Typesafe complicated combinations

let created = `Date (2025, 4, 25) in

Currently I'm developing the hardware backend for pmmd (an interactive generative LFO system and BPM-modulation based rhythm machine) on the Raspberry Pi Zero - making it into a module for modular synthesizers.

pmmd is written using functional reactive programming (FRP) via OCamls react note and my utility-library fry. These tools are a perfect fit for declaratively programming reactive instruments, but concerning hardware interop I wanted to:

  • have high bandwidth and low-latency communication with the hardware via minimal allocation and no concurrency-scheduling (which can't be avoided when using lwt_react)
  • keep a high resiliency against crashes by avoiding letting C-bindings for handling GPIO crash the main application in unforseen ways. note

So to communicate with the hardware I setup a bunch of ZMQ pub/sub sockets (using ocaml-zmq) between pmmd and a set of OCaml daemons that each handle the different hardware efficiently by coding them in a non-allocating and low-level style note. I list some reasons for why I didn't choose multicore OCaml instead of ZMQ here.

Escalating complexity

This meant that I had a bunch of ZMQ socket-URI's that was intended for different types of ZMQ connections, having several different pub/sub subscriptions for carrying marshalled OCaml values between a bunch of different processes. These processes all need to correctly combine all of this data, types and functions in very specific ways.

To begin with the complexity was small enough that it was OK to just use labeled arguments to carry the socket-URIs, and manually add types to Marshal.{to/from}_string on each end of the sockets.

But this fast became unwieldy when I added more sockets, subscriptions and began supporting proxied pub-sockets.

Making illegal states unrepresentable

To restrict what data and functions can be combined I created a shared library as a single point of reference. The invariants I wanted to express were:

  • a socket-URI is bound to a specific socket
  • ZMQ procedures should be limited to work on compatible socket-variants like proxied-pub, pub and sub
  • socket-I/O is bound to
    • one or two (if socket is proxied) sockets
    • a subscription-channel (there can be several subscriptions on a single socket)
      • some subscriptions are chosen at runtime based on an index
    • a type for the marshalled data

For this I chose to use a mix of phantom types and OCamls module system. The positive experiences with this made me want to write this blogpost - e.g. that:

  • it's an expressive but easy-to-read alternative to GADTs
  • it's a method that allows bottom up programming from less typed to more typed, without needing much change in the use of the data
  • it's a very OCaml'y solution to problems via its module system, abstract types and first-class modules
  • it's a computation-/memory-efficient way to encode invariants in the typesystem, as one doesn't need to carry around a set of functions together with the data

In the following section I'll show examples of how the different typed values are defined and used, to motivate the implementation.

The typed definitions in a shared library

I came to define the sockets like this:

module Sockets = struct
      
  module Events_in = Socket.Make(struct
      
      type id = [ `Events_in ]
      type server_typ = Io_zmq.Server_type.pub
      
      let default_uri = "ipc:///tmp/gpioio-events-in.sock"
                        |> Io_zmq.Uri.make_pub
      
    end)

  module Events_in_proxy = Socket.Make(struct
      
      type id = [ `Events_in_proxy ]
      type server_typ = Io_zmq.Server_type.pub_proxied
      
      let default_uri = "ipc:///tmp/gpioio-events-in-proxy.sock"
                        |> Io_zmq.Uri.make_pub_proxied
      
    end)

  ...

end

Here Events_in and Events_in_proxy represent two different ZMQ socket endpoints that are later bound together in the following IO modules. The Socket.Make module-functor creates a typesafe Socket.Endpoint (via the id phantom-type note) and some corresponding helper-functions for cmdliner support. You'll later see why a Socket.t is a variant of Socket.Endpoint.t.

Io_zmq is a small wrapper library that I extended to only take typed URI's (also via phantom-types) - it's e.g. only possible to use the pub_proxied URI to create a proxied publisher.

In effect there are two layers of phantom-type wrappers of URIs for different purposes - the first binds the string URI to a specific socket endpoint, the second to a specific ZMQ socket type. Note that phantom-type wrappers only exist in the type-system and doesn't lead to extra allocation.

The id type is also used later to limit certain types of IO to a single or several designated socket(s):

module Sockets_IO = struct

  module Action = Socket.IO.Make_proxied(struct

      type marshalled = P.Action.t

      type socket_in = Sockets.Events_in_proxy.id
      type socket_out = Sockets.Events_in.id

      type subscription_index = unit
      let subscription () = "ac"

    end)

  module Primary_encoder = Socket.IO.Make_proxied(struct

      type marshalled = Direction.t

      type socket_in = Sockets.Events_in_proxy.id
      type socket_out = Sockets.Events_in.id

      type subscription_index = unit
      let subscription () = "pe"

    end)

  module Cv_in = Socket.IO.Make(struct

      type marshalled = Cv.t
      type socket = Sockets.Cv_in.id

      type subscription_index = int
      let subscription = Cv.pubsub_prefix_of_index

    end)

  ...

end

Here two different Socket.IO.Make[_proxied] functors are called depending on if several or a single socket is involved. The marshalled type is used for typesafe marshalling of values on the sockets, and the subscription function is used to specify a sub-channel on the pub/sub socket to separate the different data streams. Also note that the Sockets_IO.Cv_in chooses its subscription based on an index.

The typesafe combinations of data and functions

For the cmdliner CLI of one of the processes, one of the typed socket parameters is defined like this:

open Cmdliner

let socket_events_in = 
  let doc =
    "The ZMQ-socket URI to use for events-in communication \
     between pmmd and gpioio. \
     See ZMQ documentation for supported socket-URIs."
  in
  let docv = "URI" in
  let format_conv = Arg.conv' (
    T.Sockets.Events_in.parse,
    T.Sockets.Events_in.pp
  ) in
  Arg.(
    value
    & opt format_conv T.Sockets.Events_in.default_uri
    & info [ "socket-events-in" ] ~docv ~doc)

.. where the type given to the users --socket-events-in argument is:

(
  [ `Events_in ],
  Io_zmq.Server_type.pub Io_zmq.Uri.M.t
) T.Socket.Endpoint.t

Later the Socket.Endpoint.t can be mapped to contain another type of value, but still keep the phantom-type identity (the Socket.id we defined earlier):

...
let socket_rising_edge =
  socket_proxy_events_in
  |> T.Socket.Endpoint.map
    (Io_zmq.make_pub_socket_proxied ~ctx:ctx_rising_edge)
in
...

.. where Io_zmq.make_pub_socket_proxied only accepts URI's that are of type:

Io_zmq.Server_type.pub_proxied Io_zmq.Uri.M.t

On the value-level, Socket.Endpoint.map is just the identity function - and now you saw why a Socket.t is a Socket.Endpoint.t - a socket starts out as a string, and can become other things bound to the same Socket.id - propagating the Socket.id through the code to the point of use. We can e.g. use this resulting wrapped Zmq.Socket.t in a function that abstracts over any kind of Socket.id and Socket.IO.S and guarantees a correct combination:

module Send = struct
      
  let make : type marshalled socket_id subscription_index.
    (module T.Socket.IO.S with
      type marshalled = marshalled and
      type socket_in = socket_id and
      type subscription_index = subscription_index
    )
    -> (socket_id, _) T.Socket.Endpoint.t
    -> marshalled -> subscription_index -> unit =
    fun m_socket_io -> 
      let module IO = (val m_socket_io) in
      fun socket event subscription_index ->
        Zmq.Socket.send_all (IO.get_endpoint_in socket) [
          IO.subscription subscription_index;
          IO.marshal event;
        ]
      
  let action = make (module T.Sockets_IO.Action)
  let primary_encoder = make (module T.Sockets_IO.Primary_encoder)
  let shutdown = make (module T.Sockets_IO.Shutdown)
  let trigger = make (module T.Sockets_IO.Trigger_in)
  ...
      
end

.. where we e.g. use it like this:

...
let callback_rising_edge : unit -> unit = fun () ->
  Send.action socket_rising_edge action ()
in
...

The make function is partially applied with e.g. the T.Sockets_IO.Action module and uses the locally abstract types marshalled, socket_id and subscription_index to express the relation between the T.Sockets_IO module and the given parameters. This means that the make function returns a different type for the resulting partial application, depending on the module it's applied to.

This is all just a subset of the sockets, IO-modules and uses in the pmmd codebase. I think the examples show a good optimum for expressive power of the type and module systems of OCaml, while having a simple interface.

Type errors

As the UI of our fancy new typesafe code is the printed errors from the compiler, I'll show some examples of what happens if we combine the wrong things.

Let's first try to pass the wrong default URI in our CLI definition seen earlier:

File "bin/patch_03_gpioio/cli.ml", line 82, characters 26-53:
82 |         & opt format_conv T.Sockets.Cv_in.default_uri
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: This expression has type
         T.Sockets.Cv_in.t =
           (T.Sockets_IO.Primary_encoder_scaler.socket_out,
            Io_zmq.Server_type.pub Io_zmq.Uri.M.t)
           T.Socket.Endpoint.t
       but an expression was expected of type
         T.Sockets.Events_in.t =
           (T.Sockets_IO.Shutdown.socket_out,
            Io_zmq.Server_type.pub Io_zmq.Uri.M.t)
           T.Socket.Endpoint.t
       Type T.Sockets_IO.Primary_encoder_scaler.socket_out = [ `Cv_in ]
       is not compatible with type
         T.Sockets_IO.Shutdown.socket_out = [ `Events_in ] 
       These two variant types have no intersection
Had 1 error, waiting for filesystem changes...    

.. we see that type-aliasing happens between the different T.Sockets_IO modules which doesn't help the error-messages, but it's still obvious what's wrong. We both have T.Sockets.Cv_in.t vs T.Sockets.Events_in.t and [ `Cv_in ] vs [ `Events_in ] directly communicate the problem.

Now let's try to map the wrong socket-URI in the example seen earlier:

File "bin/patch_03_gpioio/events_in_wiringpi.ml", line 201, characters 8-25:
201 |         socket_trigger_in
              ^^^^^^^^^^^^^^^^^
Error: This expression has type
         (T.Sockets_IO.Shutdown.socket_in,
          Io_zmq.Server_type.pub_proxied Io_zmq.Uri.M.t)
         T.Socket.Endpoint.t
       but an expression was expected of type
         (T.Sockets_IO.Shutdown.socket_in,
          Io_zmq.Server_type.pub Io_zmq.Uri.M.t)
         T.Socket.Endpoint.t
       Type Io_zmq.Server_type.pub_proxied = [ `Pub_proxied ]
       is not compatible with type Io_zmq.Server_type.pub = [ `Pub ] 
       These two variant types have no intersection
Had 1 error, waiting for filesystem changes...

.. here we actually first get a type-error from the typed Zmq wrapper, because the two socket-types differ. So let's also use the wrong Zmq function, i.e. use Io_zmq.make_pub_socket instead:

File "bin/patch_03_gpioio/events_in_wiringpi.ml", line 211, characters 21-39:
211 |         Send.trigger socket_rising_edge ()
                           ^^^^^^^^^^^^^^^^^^
Error: This expression has type
         (T.Sockets_IO.Shutdown.socket_in, [ `Pub ] Zmq.Socket.t)
         T.Socket.Endpoint.t
       but an expression was expected of type
         (T.Sockets_IO.Trigger_in.socket_out, 'a Zmq.Socket.t)
         T.Socket.Endpoint.t
       Type T.Sockets_IO.Shutdown.socket_in = [ `Events_in_proxy ]
       is not compatible with type
         T.Sockets_IO.Trigger_in.socket_out = [ `Trigger_in ] 
       These two variant types have no intersection
Had 1 error, waiting for filesystem changes...

.. now we also get a type-error, but because the type-system is trying to make sense of the types internally to our current function, where we use several sockets; type-inference makes another socket get the wrong type instead. This wouldn't happen if I had hardcoded the socket types in a module interface around this function - but if you are used to type-inference, this is easy to infer. Also we get informed directly of what socket-types we have mixed up via [ `Events_in_proxy ] and [ `Trigger_in ].

If we instead use the wrong Send function in the code we saw earlier, we get:

File "bin/patch_03_gpioio/events_in_wiringpi.ml", line 97, characters 48-52:
97 |         Send.primary_encoder socket_rising_edge action ()
                                                     ^^^^^^
Error: This expression has type [> T.Pinspec.Action.nonmultiplexed ]
       but an expression was expected of type T.Direction.t
       The second variant type does not allow tag(s) `Drop_wave, `Redo_wave
Had 1 error, waiting for filesystem changes...

.. where it now is the type of the marshalled value that helps us do the right thing.

In these examples you can go several steps further and do more things that wrongly fixes the first type-error, where another type-error then pops up because of further inconsistencies being down the line.

The implementation

The first submodule I'll introduce is the Socket.Endpoint which wraps a value with an 'id phantom-type. Note that the phantom type safety comes from the fact the the ENDPOINT module type makes the ('id, 'a) t abstract; the only way to insert or extract an 'a is via the Unsafe module, which is only used internally by the later implementation.

module Socket = struct

  module type ENDPOINT = sig

    type ('id, 'a) t

    module Unsafe : sig
      val v : 'a -> ('id, 'a) t
      val get : ('id, 'a) t -> 'a
    end
    
    val map : ('a -> 'b) -> ('id, 'a) t -> ('id, 'b) t

  end

  module Endpoint : ENDPOINT = struct

    type ('id, 'a) t = 'a

    module Unsafe = struct

      let v = CCFun.id
      let get = CCFun.id

    end

    let map f v = f v 

  end

  ... 

end

The Unsafe.v and Unsafe.get are just the identity-function, and Endpoint.map is function application. As seen in the earlier example, Endpoint.map lets you carry the phantom-type through the codebase, expressing the source of the contained value.

For now the 'id is not bound to any specific type. For even more typesafety against 'malicious' use, one could potentially hide Unsafe behind a module type that covers the whole of Socket.

Let's see how Socket.Endpoint is used inside the Socket.Make module-functor:

module Socket = struct
      
  ...
      
  module type PARAM = sig
    type id
    type server_typ
    val default_uri : server_typ Io_zmq.Uri.t
  end

  module Make(P:PARAM) = struct

    type id = P.id

    type t = (id, P.server_typ Io_zmq.Uri.t) Endpoint.t

    let default_uri : t = Endpoint.Unsafe.v P.default_uri

    let parse : string -> (t, string) result = fun uri ->
      uri
      |> Io_zmq.Uri.Unsafe.v
      |> Endpoint.Unsafe.v 
      |> CCResult.pure

    let to_string : t -> string = fun uri_endpoint ->
      uri_endpoint
      |> Endpoint.Unsafe.get 
      |> Io_zmq.Uri.Unsafe.get

    let pp fmt v = Format.fprintf fmt "%s" (to_string v)

  end

  ...

end

The PARAM module type expresses what we need to supply to create a socket. Its id type is re-exported, and is used as the phantom-type parameter of Endpoint.t when defining type t - i.e. it is here we bind the phantom-type to be a specific type.

default_uri is also re-exported, but is wrapped in the Endpoint.t. After that we define the helpers needed to parse/print this typed value with cmdliner.

Also note that when we Endpoint.map the type t, we no longer have a type t.

The last part of the implementation is the module-functors used to create the Sockets_IO modules:

module Socket = struct

  ...

  module IO = struct

    module Param = struct
      
      module type STANDARD = sig
        type marshalled
        type socket
        type subscription_index
        val subscription : subscription_index -> string
      end

      module type PROXIED = sig
        type marshalled
        type socket_in
        type socket_out
        type subscription_index
        val subscription : subscription_index -> string
      end
      
    end

    module type S = sig

      type marshalled
      type socket_in
      type socket_out
      type subscription_index
        
      val subscription : subscription_index -> string
      val marshal : marshalled -> string
      val unmarshal : string -> marshalled
        
      val get_endpoint_in : (socket_in, 'e) Endpoint.t -> 'e
      val get_endpoint_out : (socket_out, 'e) Endpoint.t -> 'e

    end

    module Make_proxied(P:Param.PROXIED) : S with
      type marshalled = P.marshalled and
      type socket_in = P.socket_in and
      type socket_out = P.socket_out and
      type subscription_index = P.subscription_index
    = struct

      type marshalled = P.marshalled
      type socket_in = P.socket_in
      type socket_out = P.socket_out
      type subscription_index = P.subscription_index

      let subscription = P.subscription

      let marshal v = Marshal.to_string v []
      let unmarshal str = Marshal.from_string str 0

      let get_endpoint_in = Endpoint.Unsafe.get
      let get_endpoint_out = Endpoint.Unsafe.get

    end

    module Make(P:Param.STANDARD) = struct
      
      include Make_proxied(struct
          type marshalled = P.marshalled
          type socket_in = P.socket
          type socket_out = P.socket
          type subscription_index = P.subscription_index
          let subscription = P.subscription
        end)
          
    end

  end

end

.. here we have the possibility of creating an IO module from two different parameter signatures: IO.Param.STANDARD and IO.Param.PROXIED. The resulting module-signatures of the corresponding IO.Make and IO.Make_proxied are both IO.S, but we only supply a single socket-id to IO.Make. As they have the same abstract module-signature, we can abstract over them together, which we did in Send.make.

The typesafe way to extract the wrapped values from the Endpoint.t is via get_endpoint_in and get_endpoint_out - in practice this means that you can't use the wrong socket with an IO module, as it's only the IO modules that know how to extract the sockets that fit them.

Thoughts

As mentioned in the introduction, doing bottom-up programming - adding the typesafety incrementally - felt straight forward when using the chosen method:

  • sockets are initially still just URI-strings behind the scenes - so almost nothing changed at the value-level
  • the Socket.t type is also a Socket.Endpoint.t, which by being mapped can change to become something else - carrying along an identity of where it stems from. E.g. just needed to change |> to become Endpoint.map
  • through development, socket I/O turned out to be different from sockets themselves; but was straightforward to express as an extension to the same shared library and using the same typesystem tools
  • wrapping less typesafe libraries and value-relations is simple to do relative to the gains. I didn't show the Io_zmq.Uri implementation as it's almost equal to Socket.Endpoint.

Examples of other usecases I've had for PTs has e.g. been:

  • typesafe parameterized URI-paths for web-server routing and links
  • read-only refence views

I hope this blogpost gives you the feeling that a lot of typesafety can be gotten without paying too much for it. Simplicity is essential for widespread application of typesystem features, and for ease of maintainability.

Footnotes

  • react: This is not the the web-UI library by "the book of faces selling out to China" - they named their library later. I actually make use of the extension called lwt_react, which extends the FRP semantics with concurrency. This means that you can run Lwt-threads within the react monad, and let the resulting FRP event be delayed by the thread.

  • GPIO binding: I recently made a new WiringPi binding for OCaml available.

  • resiliency: You can argue that my choice of using the ZMQ C-library is a source of the same kinds of failures; but at least it's just a single library, and I already have experience with its failures.

  • ZMQ vs multicore OCaml:

    • historically, performance of ephemerons (which backs react) has been bad in OCaml5 - possibly the status is better today
    • I want to be able to recover from C-library segfaults without crashing the main program - this doesn't seem possible with OCaml5 - otoh. you can simply restart the process if it runs independently.
    • ZMQ allows doing interesting stuff like running your UI on a different machine from where the hardware is, and I already have a lot of experience with ZMQ from my usage of it in niseq
    • the world of OCaml5 havn't stabilized on how to do concurrency and multicore - and I want to continue using lwt, partly because of lwt_react (though some new effects-based concurrency libraries also has lwt-compatibility). I BTW have no problem with monadic concurrency, and havn't yet observed the big gain in direct style concurrency.
  • id type: Note that the id type could have been auto-generated by Socket.Make, but the type-errors gets less readable because of type-aliasing across the different Sockets and Sockets_IO definitions.