Quantcast
Channel: C/C++ – PRDeving
Viewing all articles
Browse latest Browse all 11

How to parse key-value strings into associative arrays in ANSI C89

$
0
0

We all know that C is one of the best languages for many reasons and especially the ANSI C89 standard (old and solid as fuck) but it also has it’s own drawbacks. There are many situtations in software development where inbuilt data structures can’t suit the programer’s needs and in a language like C, where there are not as many options to choose from (without relying on glibc or similar) as you can have in python or javascript, sometimes the solution can get really hard to figure out.

The data structure covered in this post is called associative array o dictionary (read more), a collection of key – value pairs of data, easy to access and easy to use. Today we will talk about a pretty basic implementation of it. Real-world applications should use more complex and performant approaches such as hash tables.

From loading a configuration file to fetching the enviroment variables; to be able (in pure POSIX compliant ANSI C) to use retrieved key-value pairs accessing it’s value directly with its name might look tricky, but it’s actually easier than you may think.

The Concept

Many programs use config files where data are stored as a list of key-values, HTTP requests are full of key-value information and even C main function is fed (in UNIX systems) with an array of key-value strings as third argument (environment variables). Most of those cases use the same standard, one entry for row (one for each element in the arrays case) with the key and the value separated by a character (usually = or : )

The container

C is a statically typed language, so in order to keep the examples easy to read, we will keep key and value as string (without casting to other types)

typedef struct {
char *key;
char *value;
} keyValue;

typedef *keyValue List;

We will define a structure to hold our pairs, in this case, as we said, key and value are strings.
Since we are parsing an unknown number of elements, we also need a list, a list is an array of key values.

[ // List
{} //Key value
]

Splitting

We have our basic data structure, let’s parse.
In this exapmle we are going to parse the environ extern variable, It contains an array with the system’s enviroment variables, we will use this method to get environment variables instead of the third argument in the main function since this is POSIX compliant, its content looks like this:

SSH_TTY=/dev/pts/1
USER=user
LS_COLORS=rs
MAIL=/var/mail/user

So, in order to split this we will have to run throught the array splitting each entry by the delimitier, in this case ‘=’.

To do this, there’s an ANSI C function, strtok who will do exactly what we want (be careful with this, strtok will modify the original string, in this particular case we don’t care , but in real life, strings passed to strtok should be duplicated beforehand to keep the original source inmutated).
So, we have to initialize a List element and for each entry in the array, resize the list, split the string by the ‘=’ and store both fragments as key and value in the new keyValue structure, easy 🙂

Note: In real life dynamic arrays should be managed in a more efficient way (2*n dynamic arrays will have it’s own post soon 😀 ), the approach here shown is orientative and will work well but its prone to heap segmentation and long term failures.

List split(char *arr[], const char *delim) {
/* initialize the list */
List list = malloc(0);

int c = 0;
/* Loop through elements and resize the list
* environ array is terminated by a null pointer
* so we can loop through like this */
while ((char*)arr[c]) {
list = realloc(list, (c + 1) * sizeof(keyValue));

/* use strtok as shown in the manual (RTFM) and populate the pair*/
list[c].key = strtok(arr[c], delim);
list[c].value = strtok(NULL, delim);
c++;
}

/* return the list once all the elements has been parsed */
return list;
}

Of course, the code above is orientative, it’ll work just fine but errors should be handled and some performance improvements should be made in order to achieve the best result before using it in the wild.

Accessing

After splitting the strings into pairs we will have a list of key-value pairs, but we need to be able to use them, so let’s write a simple find function that returns the value associated with a given identificator:

char *find(const List list, const char *id) {
int c = 0;
while (list[c].key) {
if (strcmp(list[c].key, id) == 0) return list[c].value;
c++;
}
return "\0";
}

With this function we can ask for the key and have its value in return, great, lets code an example:

main(int argc, char *argv[]) {
List list = split(environ, "=");
printf("USER is %s", find(list, "USER"));  /* USER is user */
free(list);
}

With this method you now have easy access to configurations or variables 🙂

Hope it helped.


Viewing all articles
Browse latest Browse all 11

Trending Articles