PCL is a language designed to extract data from a line of text by describing its structure. The language aims to use a concise and intuitive syntax to help visualize the structure of a line of text. PCL expressions are used to configure the parser action.
Syntax basics
At the moment, a valid PCL expression must be composed of one or more fields such that there are separators between them. That is, it must follow this rule:
The grammar supports any kind of name that is written with the set of characters A-Z, a-z, 0-9 and the symbol underscore (_). It supports field aliases with any name written with the set of characters A-Z, a-z, 0-9, _, -, # and . (given that the first character is not _).
Syntax fields
In PCL, we can write any sequence of fields. The type of fields can be the following:
Learn more about each field option in the Field options section below.
CSV
CSV is a configurable field. The available options are:
alias (optional) to rename the field name.
indices (optional) to select which columns we want to extract from the CSV.
separator (optional) to define the separator of the columns.
totalColumns (optional) defines how many columns the CSV has.
Note that the option totalColumns is mandatory when there is a delimiter after the field that is equal to the CSV separator. For example, a CSV with 3 columns and a JSON separated by a comma:
A literal is a special type of element. This is a string that must exist between two fields. Unlike other fields, the literals are just a string that may contain one or more characters except <, >, {, or }, unless they are escaped with \
This is an example of a literal (whitespace ):
{myFieldOne:string} {myFieldTwo:string}
Syntax operators
There are two types of operators:
Skip
The Skip operator acts like a dynamic separator. It can be used when we want to skip any content until we find a coincidence.
This is a configurable operator that is equivalent to the regular expression (?:from)*(?=to) where from and to are the strings to match.
The available options are:
from (mandatory) to define the string to find one or more times.
to (mandatory) to define the string to insert one or more times.
Example
<skip(from=" ",to="-")>
A use case could be to skip all characters until a JSON is found. E.g. thisisrubbish{"my": "json"}, we could use this PCL: {f1:string}<skip(from=" ", to="{")> {f2:json}
While
The While operator acts like a dynamic separator. It is useful a separator has an unknown number of repetitions on each log.
This is a configurable operator that is equivalent to the regular expression (?:value)* where value is the string to match. However, if the options min and/or max are defined, then the equivalent regular expression is (?:value)*{- min,max}
The available options are:
value (mandatory) to define the string to find one or more times.
max (optional) to set the maximum number of repetitions of value (must be greater than 0).
min (optional) to set the minimum number of repetitions of value (must be greater than 0).
It is not necessary to define both max and min. However, if both are defined, then it must assert that min is strictly lower than max, that is, min < max.
<while(value=" ",max=2)>
In another example, let - as a separator that appears at least 3 times in all logs:
hello - - -world
goodbye - - - - -world
hello - - - -moon
Then, the PCL could be {f1:string}<while(value=" -", min=3)>{f2:string}
Field options
alias
The value must follow the naming requirements:
The allowed set of characters is: A-Z, a-z, 0-9, ., -, _ or #.
An alias cannot start with _.
These are valid examples:
alias="myNewName"
alias="my-new-name"
These are invalid examples:
alias="_myNewName"
alias="my new name"
default
The value must be of the same type as the parent. For example, if it is the default value of an integer, then the default value must be an integer too.
The value must be a list of strings. Note that the list cannot contain other values (e.g. numbers).
Additionally, we may specify the type of each field by writing a colon (:) followed by the type: bool, float, int or string. For example: fields=["oneField":bool, "middleField", "anotherField":int]. If the type is omitted, it should be assumed that the type is string. In the previous example, it assumes that middleField is a string.
Each sub-type may have these options:
alias (optional) to rename the field name.
default (optional) to set a fixed value if the field does not exist in the log.
These are valid examples:
fields=["oneField","anotherField.with.subField"]
fields=["oneField":string(alias="anotherName")]
fields=[]
These are invalid examples:
fields=[oneField,anotherField]
fields=["oneField,anotherField"]
fields=[0,1]
indices
The value must be a list with numbers. Note that the list cannot contain other values apart from positive integers (including zero).
Additionally, we may specify the type of each index by writting a colon (:) followed by the type: bool, float, int or string. For example: indices=[0:bool, 1, 3:int]. If the type is omitted, it assumes that the type is string. In the previous example, it assumes that 1 is a string.
Each sub-type may have these options:
alias (optional) to rename the field name.
default (optional) to set a fixed value if the field does not exist in the log.
These are valid examples:
indices=[0,1,3]
indices=[1:string(default="not exists")]
indices=[]
These are invalid examples:
indices=["0","1"]
indices=[-3,1]
kvSeparator
The value must be a single character or the escape sequence. By default, it is =.
These are valid examples:
kvSeparator=":"
kvSeparator="\t"
These are invalid examples:
kvSeparator=""
kvSeparator=:
kvSeparator="hello"
length
The value must be a strictly positive integer.
These are valid examples:
length=1
length=25
These are invalid examples:
length="1"
length=0
length=-3
listSeparator
The value must be a character from the set: |, ;, ,
These are valid examples:
listSeparator=";"
listSeparator="|"
These are invalid examples:
listSeparator="-"
listSeparator=;
separator
The value must be a character from the set: |, ;, ,, \t. By default, it is ,.
These are valid examples:
separator=";"
separator="\t"
These are invalid examples:
separator="-"
separator=;
totalColumns
The value must be a strictly positive integer.
These are valid examples:
totalColumns=1
totalColumns=5
A valid value for this option must equal the number of columns in the CSV.
These are invalid examples:
totalColumns="1"
totalColumns=0
totalColumns=-3
thousandSeparator
The value could be:
empty string (default value).
,
.
These are valid examples:
thousandSeparator=""
thousandSeparator="."
These are invalid examples:
thousandSeparator="-"
thousandSeparator="_"
Use case
message: "foo|bar|"foo|bar"|another field after the CSV"