September 1, 2024

Using NimbleParsec to solve issues

In this article, I would like to explain how to parse parts of the invalid JSON string using NimbleParsec.

From the documentation:

NimbleParsec is a simple and fast library for text-based parser combinators. Combinators are composed programatically and compiled into multiple clauses with binary matching.

The issue

In an Elixir-Phoenix environment, the client sent invalid JSON as a push message via a channel. The server could not decode the JSON and therefore aborted the WebSocket connection. The client started a new connection and sent the same invalid JSON as a push message via the channel. If one of the two is invalid, then the entire expression is invalid.

It would be easier if the client did not send an invalid JSON. The problem here is that the payload and the Phoenix channel data are sent together as JSON.

The server is therefore no longer able to send the message to the channel process in which a suitable error message could be sent to the client as a reply. The client could handle the error and stop sending the invalid JSON.

The problem with Phoenix is this line in the code:

  [join_ref, ref, topic, event, payload | _] = Phoenix.json_library().decode!(raw_message)

The payload contains an invalid bytes represented by \uded0. It is a valid UTF-8 string but the JSON decoder tries to interprete the bytes which results into an error:

iex [09:59 :: 2] > s = ~S/{"key": "Value: \uded0"}/
"{\"key\": \"Value: \\uded0\"}"
iex [09:59 :: 3] > Jason.decode!(s)
** (Jason.DecodeError) unexpected sequence at position 16: "\\uded0"
     (jason 1.4.4) lib/jason.ex:92: Jason.decode!/2
iex:3: (file)

How to solve this issue?

To return an error to the client instead of disconnecting the WebSocket connection we need to extract the first four parts of the message.

The client sends four strings before the payload. We can extract them without using the JSON decoder by parsing the parts. After that, we just replace the payload part by a specific error map. The server can propage the message to the right channel process where the process can send the error message to the client.

Of course, it would be much easier if the client simply sent valid JSON, but sometimes this is apparently not possible.

The Phoenix Message

The client sends four strings in an array the and last element of the array is the payload:

["1","2","Topic","Event",{...}]

We can use NimbleParsec to write a parser for the first part of the array:

defmodule PrefixDecoder do
  import NimbleParsec

  string_with_quotes =
    ignore(ascii_char([?"]))
    |> repeat_while(
      choice([
        ~S(\") |> string() |> replace(?"),
        utf8_char([])
      ]),
      {:not_quote, []}
    )
    |> ignore(ascii_char([?"]))
    |> reduce({List, :to_string, []})

  defp not_quote(<>, context, _, _), do: {:halt, context}
  defp not_quote(_, context, _, _), do: {:cont, context}

  defparsec(
    :header,
    ignore(string("["))
    |> concat(string_with_quotes)
    |> ignore(string(","))
    |> concat(string_with_quotes)
    |> ignore(string(","))
    |> concat(string_with_quotes)
    |> ignore(string(","))
    |> concat(string_with_quotes)
    |> ignore(string(","))
  )
end

Now we can parse fragments of the invalid JSON string. Let’s do some tests:

iex 1> PrefixDecoder.header(~S/["1","2","Topic","Event",the rest of the string/)
{:ok, ["1", "2", "Topic", "Event"], "the rest of the string", %{}, {1, 0}, 25}
iex 2> PrefixDecoder.header(~S/["1",error,"Topic","Event",the rest of the string/)
{:error, "expected ASCII character equal to \"\\\"\"",
 "error,\"Topic\",\"Event\",the rest of the string", %{}, {1, 0}, 5}

The parser is very strict because we don’t handle whitespaces so far. This is okay, since the client don’t insert whitespaces between the single strings. Let’s make a final test. We create an invalid message like the client sometimes does and then we verify that we extract the first four strings from the invalid message:

iex 3> message = ~S/["1","2","Topic","Event",{"title": "This does not work: \uded0"}]/
"[\"1\",\"2\",\"Topic\",\"Event\",{\"title\": \"This does not work: \\uded0\"}]"
iex 4> Jason.decode!(message)
** (Jason.DecodeError) unexpected sequence at position 56: "\\uded0"
    (jason 1.4.4) lib/jason.ex:92: Jason.decode!/2
    iex:4: (file)
iex 4> PrefixDecoder.header(message)
{:ok, ["1", "2", "Topic", "Event"],
 "{\"title\": \"This does not work: \\uded0\"}]", %{}, {1, 0}, 25}

Now we can put everything together to provide an own implementation of the Phoenix.Socket.Serializer behaviour. We only need to replace the function:

def decode_text(raw_message) do
    [join_ref, ref, topic, event, payload | _] = decode_raw_message(raw_message)

    %Message{
      topic: topic,
      event: event,
      payload: payload,
      ref: ref,
      join_ref: join_ref
    }
  end

  defp decode_raw_message(raw_message) do
    case Jason.decode(raw_message) do
      {:ok, result} ->
        result

      {:error, error} ->
        Logger.warning("Unable to decode JSON message: #{inspect(raw_message)}, because of #{inspect(error)}")
        PrefixDecoder.decode(raw_message, error)
    end
  end

where the decode/2 function looks like this:

def decode(raw_message, error) do
  case header(raw_message) do
    {:ok, [join_ref, ref, topic, event], _rest, _context, _line, _column} ->
      [join_ref, ref, topic, event, %{decode_error: "#{inspect(error)}"}]

    other ->
      Logger.warning("Unable to decode invalid raw message because of #{inspect(other)}")
      raise error
  end
end

By replacing our own implementation in Endpoint module we are able to handle those invalid JSON messages. The message is propagate to the right channel process and the process can reply with an error message which can be handled by the client.

Problem solved, until the client sends a completely invalid message where our parser returns an error :-)