PCL (Parser Configuration Language)

Introduction

PCL is a language designed to extract data from a line of text by describing its structure. The language aims to use a concise and intuitive syntax to help visualize the structure of a line of text. PCL expressions are used to configure the parser action.

Syntax basics

At the moment, a valid PCL expression must be composed of one or more fields such that there are separators between them. That is, it must follow this rule:

delimiter? fixedLength* field(delimiter fixedLength* field)* delimiter?

where a delimiter could be a literal or an operator.

At the moment, the only possible fixed-length field is a string.

Valid example

{myFieldOne:string} {myFieldTwo:int}<while(value=" ")>{myCsv:csv(fields=
 [0,2],separator=",")} 

Invalid example (no delimiters)

{myFieldOne:string} {myFieldTwo:int}

The grammar supports any kind of name that is written with the set of characters A-Z, a-z, 0-9 and the symbol underscore (_). It supports field aliases with any name written with the set of characters A-Z, a-z, 0-9, _, -, # and . (given that the first character is not _).

Syntax fields

In PCL, we can write any sequence of fields. The type of fields can be the following:

Learn more about each field option in the Field options section below.

CSV

CSV is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • indices (optional) to select which columns we want to extract from the CSV.

  • separator (optional) to define the separator of the columns.

  • totalColumns (optional) defines how many columns the CSV has.

Note that the option totalColumns is mandatory when there is a delimiter after the field that is equal to the CSV separator. For example, a CSV with 3 columns and a JSON separated by a comma:

1,2,3, {"hello":"world"}

Examples

{myFieldName:csv(indices=[0,1,3],totalColumns=4,separator=",")}

{myFieldName:csv(indices=[0,1,3],totalColumns=4,separator=",", alias="newCsvName")}

{myFieldName:csv(indices=[0:string(alias="csvFieldName1"), 1:string(alias="csvFieldName2")], alias="newCsvNames")}

Float

Float is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • decimalSeparator (optional) to define the separator to the decimal.

  • thousandSeparator (optional) to define the separator to the thousands.

Example

{myField:float(decimalSeparator=".")}

Note that the options decimalSeparator and thousandSeparator cannot contain the same value.

Group

A Group is a special type that might contain two or more of the following simple types:

  • Float

  • Integer

  • Separator

  • String

Example

{myGroupName:{{myFieldOne:string} {myFieldTwo:int} {myFieldThree:int}}}

Integer

Integer is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • thousandSeparator (optional) to define the separator of thousands.

Example

{myFieldName:int(thousandSeparator=",")}

JSON

JSON is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • fields (optional) to select which items we want to extract from the JSON.

Example

{myFieldName:json(fields=["itemOne","itemTwo"])

{myfield:json(fields=["hello ":string(alias="hello_"), "bye ":string(alias="bye_")])}

Key-value list

Key-value list is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • kvSeparator (optional) to define the separator between keys and values.

  • listSeparator (optional) to define the separator between each key-value item.

  • indices (optional) to select which columns we want to extract from the list by their position in the list.

  • fields (optional) to select which items we want to extract from the list by their key names.

Example

{myFieldName:keyValueList(kvSeparator=":",listSeparator=",")} {myFieldName:keyValueList(fields=["hello ":string(alias="hello_")

Although this field allows the options indices and fields, note that they cannot be used simultaneously.

String

String is a configurable field. The available options are:

  • alias (optional) to rename the field name.

  • length (optional) to define the length of the string.

  • escapableChar (optional) - to escape delimiter characters in the string.

Example

{myField:string(length=2)}

{myField:string(length=2, alias="newFieldName")}

There is a special case with String fields. If the field is using the option length, we may add another field next to it without any separator:

{oneField:string(length=2)}{anotherField:string(length=3)}{lastField:string}

Syntax literals

A literal is a special type of element. This is a string that must exist between two fields. Unlike other fields, the literals are just a string that may contain one or more characters except <, >, {, or }, unless they are escaped with \

This is an example of a literal (whitespace ):

{myFieldOne:string} {myFieldTwo:string}

Syntax operators

There are two types of operators:

Skip

The Skip operator acts like a dynamic separator. It can be used when we want to skip any content until we find a coincidence.

This is a configurable operator that is equivalent to the regular expression (?:from)*(?=to) where from and to are the strings to match.

The available options are:

  • from (mandatory) to define the string to find one or more times.

  • to (mandatory) to define the string to insert one or more times.

Example

<skip(from=" ",to="-")>

A use case could be to skip all characters until a JSON is found. E.g. thisisrubbish{"my": "json"}, we could use this PCL: {f1:string}<skip(from=" ", to="{")> {f2:json}

While

The While operator acts like a dynamic separator. It is useful a separator has an unknown number of repetitions on each log.

This is a configurable operator that is equivalent to the regular expression (?:value)* where value is the string to match. However, if the options min and/or max are defined, then the equivalent regular expression is (?:value)*{- min,max}

The available options are:

  • value (mandatory) to define the string to find one or more times.

  • max (optional) to set the maximum number of repetitions of value (must be greater than 0).

  • min (optional) to set the minimum number of repetitions of value (must be greater than 0).

It is not necessary to define both max and min. However, if both are defined, then it must assert that min is strictly lower than max, that is, min < max.

<while(value=" ",max=2)>

In another example, let - as a separator that appears at least 3 times in all logs:

  • hello - - -world

  • goodbye - - - - -world

  • hello - - - -moon

Then, the PCL could be {f1:string}<while(value=" -", min=3)>{f2:string}

Field options

alias

The value must follow the naming requirements:

  • The allowed set of characters is: A-Z, a-z, 0-9, ., -, _ or #.

  • An alias cannot start with _.

These are valid examples:

  • alias="myNewName"

  • alias="my-new-name"

These are invalid examples:

  • alias="_myNewName"

  • alias="my new name"

default

The value must be of the same type as the parent. For example, if it is the default value of an integer, then the default value must be an integer too.

These are valid examples:

  • {myfield:json(fields=["hello":string(default="{}")])}

  • {myfield:csv(fields=["world":int(default=-1)])}

These are invalid examples:

  • {myfield:json(fields=["hello":string(default=-1)])}

  • {myfield:float(default=1.5)}

decimalSeparator

The value could be:

  • ,

  • . (default value)

These are valid examples:

  • decimalSeparator=","

  • decimalSeparator="."

These are invalid examples:

  • decimalSeparator=""

  • decimalSeparator="-"

  • decimalSeparator="_"

fields

The value must be a list of strings. Note that the list cannot contain other values (e.g. numbers).

Additionally, we may specify the type of each field by writing a colon (:) followed by the type: bool, float, int or string. For example: fields=["oneField":bool, "middleField", "anotherField":int]. If the type is omitted, it should be assumed that the type is string. In the previous example, it assumes that middleField is a string.

Each sub-type may have these options:

  • alias (optional) to rename the field name.

  • default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

  • fields=["oneField","anotherField.with.subField"]

  • fields=["oneField":string(alias="anotherName")]

  • fields=[]

These are invalid examples:

  • fields=[oneField,anotherField]

  • fields=["oneField,anotherField"]

  • fields=[0,1]

indices

The value must be a list with numbers. Note that the list cannot contain other values apart from positive integers (including zero).

Additionally, we may specify the type of each index by writting a colon (:) followed by the type: bool, float, int or string. For example: indices=[0:bool, 1, 3:int]. If the type is omitted, it assumes that the type is string. In the previous example, it assumes that 1 is a string.

Each sub-type may have these options:

  • alias (optional) to rename the field name.

  • default (optional) to set a fixed value if the field does not exist in the log.

These are valid examples:

  • indices=[0,1,3]

  • indices=[1:string(default="not exists")]

  • indices=[]

These are invalid examples:

  • indices=["0","1"]

  • indices=[-3,1]

kvSeparator

The value must be a single character or the escape sequence. By default, it is =.

These are valid examples:

  • kvSeparator=":"

  • kvSeparator="\t"

These are invalid examples:

  • kvSeparator=""

  • kvSeparator=:

  • kvSeparator="hello"

length

The value must be a strictly positive integer.

These are valid examples:

  • length=1

  • length=25

These are invalid examples:

  • length="1"

  • length=0

  • length=-3

listSeparator

The value must be a character from the set: |, ;, ,

These are valid examples:

  • listSeparator=";"

  • listSeparator="|"

These are invalid examples:

  • listSeparator="-"

  • listSeparator=;

separator

The value must be a character from the set: |, ;, ,, \t. By default, it is ,.

These are valid examples:

  • separator=";"

  • separator="\t"

These are invalid examples:

  • separator="-"

  • separator=;

totalColumns

The value must be a strictly positive integer.

These are valid examples:

  • totalColumns=1

  • totalColumns=5

A valid value for this option must equal the number of columns in the CSV.

These are invalid examples:

  • totalColumns="1"

  • totalColumns=0

  • totalColumns=-3

thousandSeparator

The value could be:

  • empty string (default value).

  • ,

  • .

These are valid examples:

  • thousandSeparator=""

  • thousandSeparator="."

These are invalid examples:

  • thousandSeparator="-"

  • thousandSeparator="_"

Use case

message: "foo|bar|"foo|bar"|another field after the CSV"

A valid expression to parse the message is:

{fieldName1:csv(separator="|",totalColumns=3)}|{fieldName2:json()}

or

{csvField:csv(separator="|",indices=[0,1,2],totalColumns=3)}|{stringField:string}

Examples

Example 1

A valid PCL expression could be:

{lastName:string}, {firstName:string}<while(value=" ")>{age:int}: {info:json(fields=["country","occupation"])}

Here there are 4 fields (lastName, firstName, age and info) and 3 delimiters (, and : , and the operator while).

This PCL expression can be used to extract fields from different lines of text that have the same structure. For example, given the text:

Doe, John 37: {"country": "UK", "occupation": "father"}

the PCL expression can be used to extract the following fields:

  • lastName as Doe.

  • firstName as John.

  • age as 37.

  • info as {"country": "UK", "occupation": "father"}.

  • info.country as UK.

  • info.occupation as father.

In another example, given the text:

Smith, Jane 19: {"country": "USA", "occupation": "student"}

the same PCL expression would extract:

  • lastName as Smith.

  • firstName as Jane.

  • age as 19.

  • info as {"country": "USA", "occupation": "student"}.

  • info.country as USA.

  • info.occupation as student.

Example 2

Given this message:

message: "foo|bar|\"foo|bar\"|another field after the CSV" A valid expression to parse it is:

{fieldName1:csv(separator="|",totalColumns=3)}|{fieldName2:json()}

or

{csvField:csv(separator="|",indices=[0,1,2],totalColumns=3)}|{stringField:string}

Last updated