chevron_left chevron_right
Login Register invert_colors photo_library


Stay updated and chat with others! - Join the Discord!
Thread Rating:
  • 0 Vote(s) - 0 Average


Tutorial [C] Dynamically resizing arrays of strings filter_list
Author
Message
[C] Dynamically resizing arrays of strings #1
Have you ever needed to hold an array of strings, but found it to be a major pain in the ass every time you needed to add a new string, or even worse, thinking that it would be easier to just use a linked list so you can dynamically grow and shrink your list? Have you deliberately chosen to use C++ just so you
can make a std::vector<char *>, or even worse a std::vector<std :: string> instead of bothering with a linked list or an array?

Well have I got news for you. In this tutorial I'm going to teach you how to build a dynamically resizing (both growing and shrinking) array of strings in C. I'm willing to bet that by the end of this tutorial, you'll find that doing it this way is far easier than doing it any other way, and we'll only
need to write 2 functions for it to be 100% usable in every case that you will ever need!

Okay, so many of you are asking "how are you handling something dynamically without using objects or linked lists?" The answer, is to just reallocate every time we need to add or remove an item. It's not the most performance oriented solution, but you will get to keep your ability to treat it like an array
(you can access members by just doing stuff like variable[index]) rather than needing to have a bunch of wonky getter and setter methods that ultimately make your code impossible to maintain or totally buggy.
So let's get started, what's the most basic type of reallocation-based dynamic management can we do? Well, it looks something like this:
Code:
char **realloc_basic(char **array, size_t array_size, size_t new_size)
{
       char **new_array;
       new_array = malloc(sizeof(char *) * new_size);
       memcpy(new_array, array, sizeof(char *) * (array_size <= new_size ? array_size : new_size));
       free(array);
       return new_array;
}
in that function, we followed a pretty basic algorithm:
  1. Allocate a new array with the requested size
  2. Copy the contents of the old array into the new one (as much as will fit)
  3. Free the old array (this doesn't free the data)
  4. Return the new array

To use this, we just do something like this
Code:
char **array = malloc(sizeof(char *) * 3); // array for 3 strings
array[0] = "String 1";
array[1] = "String 2";
array[2] = "String 3";
// note, these can be dynamically allocated strings
array = realloc_basic(array, 3, 4); // resize it to 4 strings
array[3] = malloc("15");
strcpy(array[3], "Hello, World");

See how easily we could resize the strings? This works, but it's a little clunky. First of all, it doesn't handle very many cases, and you still have to allocate space for the string itself (like we did at the last one). What if we wanted our function to allocate the space for these strings for us? Well that's
actually pretty simple to do, we just need to pass in the size of the strings we want to add. The second thing that's a little iffy is that we have to pass in our array, and we have to set our array to the return value. What if we forget to do that? Like for instance this
Code:
char **array = malloc(sizeof(char *) * 3); // array for 3 strings
array[0] = "String 1";
array[1] = "String 2";
array[2] = "String 3";
// note, these can be dynamically allocated strings
realloc_basic(array, 3, 4); // resize it to 4 strings
array[3] = malloc("15");
strcpy(array[3], "Hello, World");
What would happen then? Well, our code would fault, because we don't actually have an array big enough to hold 4 strings. Secondly, we now have a 4-string array that we've lost and will never get back (a memory leak).

How do we fix these problems? Like I said, it's actually not that hard. When it comes to the second one, all we need to do is pass our array in by reference, so our function will actually take a pointer to an array of strings. Writing this function means that we're going to have to start with
this:
Code:
char **realloc(char * **array,
I bet that's the first time you've seen three asterisks in a function call. That extra one we added means we're passing it in by reference. We can still access the array like we were in the function, but every time we do, we have to put a * in front of it (to dereference the pointer). At the end of our
function (but before the return), we just have to add
Code:
*array = new_array;
If we add this, then the array will change even if we don't set it when we return (in fact, we can even change our return type to void if we want to).

Our other problem now, we want to be able to specify the size of the strings when we resize the list. Now, with C++ or another object-oriented language, I'm sure the first words that come out of your mouth would be function overloading. That's fine if you want to do it that way with those languages, but
C has no such thing. I can think of exactly 3 ways of doing this in C:
  1. Make the function take more arguments, and have all of the extra arguments be the sizes of each string to be added.
  2. Make the function take 1 more argument, which is an array of the sizes of each string.
  3. Make the function take a variable number of arguments, each being the size of the string to be added.
Let's discuss those one by one to determine which is best

1. Make the function take more arguments, and have all of the extra arguments be the sizes of each string to be added.
This would end up looking like this
Code:
char **realloc(char * **array, size_t array_size, size_t new_size, size_t str0, size_t str1, size_t str2, size_t str3, size_t str4)
With this, you would be limited to only expanding the list by 5 strings at a time, and if you only wanted to add 1 string, you'd have to do something like this
Code:
realloc(&array, 4, 5, 30, 0, 0, 0, 0);
Now that just looks messy, and it's a pain in the butt for the programmer to do. Remember, we're trying to make things easier on the programmer..
So, it seems like option 1 is a bust, let's have a look at option 2:

2. Make the function take 1 more argument, which is an array of the sizes of each string.
For this, our function signature would look similar to this
Code:
char **realloc(char * **array, size_t array_size, size_t new_size, size_t *string_sizes)
and you would call it like this
Code:
size_t *string_sizes = malloc(sizeof(size_t) * 3);
string_sizes[0] = 16;
string_sizes[1] = 35;
string_sizes[2] = 8;
realloc(&array, 4, 7, string_sizes);
free(string_sizes);
Or, if you wanted to cheat, you could do it like this
Code:
realloc(&array, 4, 6, { 19, 24 });
That first one is an absolute no. That would make the programmer's life hell, and he might even quit programming all together. The second one isn't that bad, so it's a viable option, but having to type those braces is still annoying. If there were a way to get rid of them, would you rather it that way? I
would. Let's take a look at what that would look like.

3. Make the function take a variable number of arguments, each being the size of the string to be added.
Code:
char **realloc(char * **array, size_t array_size, size_t new_size, ...)
WHOA! Hold on! What's those 3 dots mean!? Well, that's an amazing feature that C has that takes advantage of the ABI that's used for x86 architectures. The way it works is because since the designers of x86 were all idiots and only gave you 4 general purpose registers to work with, you pretty much HAVE to pass
your parameters on the stack. This means that you can actually have as many or as few parameters as you want, but you have to do some math to be able to retrieve them. I wrote a tutorial all about Variable Arguments in C, if you'd like
to know more about them go give that a read.
You would ultimately call this function like this
Code:
realloc(&array, 4, 5, 64);
// or
realloc(&array, 4, 9, 23, 15, 91, 8, 19);
can you see what's going on here? In OO terms, this function has been overloaded nearly an infinite number of times without you having to actually write code for all of them.

So, I'll admit, I wrote all of this stuff before I started on the tutorial, so this tutorial won't be about how every little thing works, or how you can write code, but it will be about the concepts. Consequently, I'm not going to hold your hand through everything, I have tutorials written about every concept
I used here. You can look at my Master Tutorial List if you want to read up on these things. Now, let's have a look at my first iteration:
Code:
/* multi_resize - resize a multidimensional array of strings
* src - a pointer to an array of strings
* len - the current size of the array
* new - size to make the array
* ... - arguments specifying the length of strings to be added
*
* Resizes array src and changes it to the new array, the new
* array is also returned.
*/
#include <stdarg.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
char **multi_resize(char * **src, size_t len, size_t new, ...)
{
       char **dst, **arr; // the new pointer and a copy pointer
       size_t i, insert_len, copy_len;
       va_list args;

       arr = *src; // so we don't need the * all the time
       dst = malloc(sizeof(char *) * new); // allocate mem for the new buffer
       copy_len = new > len ? new : len; // failsafe for decreasing the size

       memcpy(dst, arr, copy_len); // copy the existing ptrs
       for (i = new; i < len; ++i) // loop to free up the soon to be deleted members
               free(arr[i]);

       va_start(args, new); // get a handle for our additions (if any exist)
       for (i = len; i < new; ++i)
       {
               insert_len = va_arg(args, size_t); // get the length of new string
               dst[i] = malloc(insert_len); // allocate it
       }
       va_end(args);

       free(arr); // clean up the old list
       *src = dst; // only time we need the *, set the parameter as an out param
       return dst; // return it
}

So, this one functions identically to the way I described the most basic one to be, except that now you don't need to reset your variable (from the return), and you can specify the size of each string you want to create.

But, even this function isn't perfect. How could we improve it more? Well, I can see 2 cases that clearly aren't covered in this one
  1. Allocating the list still has to be done manually
  2. Freeing the list still has to be done manually
So, how do we implement this into our function? Well, it's pretty simple, we know we need to allocate a new array if EITHER the array the user gave us is NULL, or if the size of the array they gave us is zero. It's important we distinguish between those two things, because that means there are two ways to
create a new list:
Code:
char **list;
list = resizev2(NULL, 0, 2, 0, 0); // create a list for 2 strings, but don't allocate space for them
and
Code:
char **list;
resizev2(&list, 0, 2, 0, 0);
In the first example, our *array = new_array would cause a segment violation, so we need to not do that if the user didn't give us a variable.
For the second issue (freeing up the array), it's really easy to detect that too. If the user tells us they want their array resized to hold 0 strings, then we know they want us to free the array for them. In this scenario, we should set *array to NULL, and also return NULL. The code looks like this:
Code:
/* multi_resize - resize a multidimensional array of strings
* src - a pointer to an array of strings
* len - the current size of the array
* new - size to make the array
* ... - arguments specifying the length of strings to be added
*
* Resizes array src and changes it to the new array, the new
* array is also returned.
*
* Cases:
* 1. If src is NULL, a new list will be created and returned
* 2. If len is 0 and src is NOT NULL, a new list will be
*    created and returned, and src will be set to this list
* 3. If new is 0, the list will be freed and NULL returned.
*    src will also be set to NULL
*/
#include <stdarg.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
char **multi_resize_v2(char * **src, size_t len, size_t new, ...)
{
       char **dst, **arr; // the new pointer and a copy pointer
       size_t i, insert_len, copy_len, free_start, free_end;
       va_list args;

       free_start = 0; free_end = len; // by default, free the whole list
       dst = NULL; // by default, the new list doesn't exist
       arr = *src; // so we don't need the * all the time

       if (new > 0)
       {
               dst = malloc(sizeof(char *) * new); // allocate mem for the new buffer

               if (len > 0) // only copy if the list exists
               {
                       copy_len = new > len ? new : len; // failsafe for decreasing the size
                       memcpy(dst, arr, sizeof(char *) * copy_len); // copy the existing ptrs
                       free_start = new; free_end = len;
               }
       }

       for (i = free_end; i < free_end; ++i) // loop to free up the soon to be deleted members
               free(arr[i]);

       va_start(args, new); // get a handle for our additions (if any exist)
       for (i = len; i < new; ++i)
       {
               insert_len = va_arg(args, size_t); // get the length of new string
               dst[i] = malloc(insert_len); // allocate it
       }
       va_end(args);

       if (src != NULL)
       {
               if (arr != NULL)
                       free(arr); // clean up the old list, if it exists
               *src = dst;
       }
       return dst; // return it
}
Also notice how I documented these functions. It's important when you're writing a library function that you clearly document all of this. The standard library functions even have their own Linux man pages!

So now we're done, right? There's nothing more we can improve on ... right? Wrong. What if the reason the programmer is using linked lists or (stupidly) objective oriented programming languages is because he doesn't want to keep track of how big his array is, having to pass it in every time he wanted to resize
it? Well have I got a solution to you. We can eliminate the array_size parameter (len in my code) a couple of ways:
  1. Keep a global list of structures that tracks the pointer to the list, and its size (but keep it behind the scenes)
  2. Wrap the array in a struct and have a member variable
  3. Find a way to make the list self aware
Well, these are some ideas, I'm not even going to bother with #1, global variables are almost NEVER required, and they should be avoided at all cost. So what about #2 and #3?
#2 is a very common idea among novice programmers and OO nuts, but it's not really a very good one. The programmer would always be exposed to not only this structure, but also this count, and it would probably drive him mad having to do something like this
Code:
puts(array->array[3]);
just so the function would be able to eliminate the counter. This is a pretty bad idea. So how about #3? How do we make the list self aware?
We take advantage of the fact that the string is a pointer type. That is, there are values that the programmer will never be able to use, like NULL, and so we can use those to NULL terminate our list, and thus eliminate the need for them to tell us the size of their list when they want to resize it. We don't
have to write any other functions to do this, but we're going to be nice to the user and provide them with a function that tells them how big their list is, just in case they need it. The code looks like this:
Code:
/* multi_resize - resize a multidimensional array of strings
* src - a pointer to a NULL-terminated array of strings
* new - size to make the array
* ... - arguments specifying the length of strings to be added
*
* Resizes array src and changes it to the new array, the new
* array is also returned. These arrays MUST be NULL-terminated!
*
* Cases:
* 1. If src is NULL, a new list will be created and returned
* 2. If len is 0 and src is NOT NULL, a new list will be
*    created and returned, and src will be set to this list
* 3. If new is 0, the list will be freed and NULL returned.
*    src will also be set to NULL
*/
#include <stdarg.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
size_t multi_count(char **arr)
{
       size_t len = 0;
       if (arr == NULL) return 0;
       while (*arr++) ++len;
       return len;
}
char **multi_resize_v3(char * **src, size_t new, ...)
{
       char **dst, **arr; // the new pointer and a copy pointer
       size_t i, len, insert_len, copy_len, free_start, free_end;
       va_list args;

       free_start = 0; free_end = len; // by default, free the whole list
       dst = NULL; // by default, the new list doesn't exist
       arr = src == NULL ? NULL : *src == NULL ? NULL : *src; // so we don't need the * all the time
       len = multi_count(arr);

       if (new > 0)
       {
               dst = malloc(sizeof(char *) * new + 1); // allocate mem for the new buffer, with term
               if (len > 0) // only copy if the list exists
               {
                       copy_len = new > len ? new : len; // failsafe for decreasing the size
                       memcpy(dst, arr, sizeof(char *) * copy_len); // copy the existing ptrs
                       free_start = new; free_end = len;
               }
               dst[new] = NULL; // set the new NULL terminator
       }

       for (i = free_end; i < free_end; ++i) // loop to free up the soon to be deleted members
               free(arr[i]);

       va_start(args, new); // get a handle for our additions (if any exist)
       for (i = len; i < new; ++i)
       {
               insert_len = va_arg(args, size_t); // get the length of new string
               dst[i] = malloc(insert_len); // allocate it
       }
       va_end(args); // close our handle, we're done allocating members

       if (src != NULL)
       {
               if (arr != NULL) free(arr); // clean up the old list, if it exists
               *src = dst;
       }
       return dst; // return it
}

See, it's not actually that different. Instead of taking it as a parameter, we just add it as a variable and call that nice function we added for the programmer to figure out how big the array is. When we make changes to the sizes of the array, we just allocate one extra element and set it to NULL so that we
know how to keep track of the list. So, surely we are done now. Well, in a way we are. This function serves its purpose perfectly, but I can still think of exactly one more thing we can do to make the programmers day better. Here's the case: I want to resize my array so I can add more strings, but I don't want
to have to set the strings myself afterwards. Can you do that for me? Well, yes we can. Our function would look the same, and we can even keep our count function. The only difference the programmer will see is in how it's called. Rather than doing this:
[code]
multi_resize_v3(&array, 4, 17, 21); // assuming the old array size was 2
strcpy(array[2], "Hello, World!");
strcpy(array[3], "A longer string...");
the programmer would do this:
Code:
multi_resize_v4(&array, 5, "Hello, World!", "A longer string...", "See, isn't this better?");
In this case, the programmer just tells the function what string he will want in the array and it does the work for him! The code looks like this:
Code:
/* multi_resize - resize a multidimensional array of strings
* src - a pointer to a NULL-terminated array of strings
* new - size to make the array
* ... - arguments specifying the strings to be inserted
*
* Resizes array src and changes it to the new array, the new
* array is also returned. These arrays MUST be NULL-terminated!
*
* Cases:
* 1. If src is NULL, a new list will be created and returned
* 2. If len is 0 and src is NOT NULL, a new list will be
*    created and returned, and src will be set to this list
* 3. If new is 0, the list will be freed and NULL returned.
*    src will also be set to NULL
*/
#include <stdarg.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
size_t multi_count(char **arr)
{
       size_t len = 0;
       if (arr == NULL || *arr == NULL) return 0;
       for (len = 0; arr[len] != NULL; ++len)
               ;
       return len;
}
char **multi_resize_v4(char * **src, size_t new, ...)
{
       char **dst, **arr, *insert_str; // the new pointer and a copy pointer
       size_t i, len, insert_len, copy_len, free_start, free_end;
       va_list args;

       free_start = 0; free_end = len; // by default, free the whole list
       dst = NULL; // by default, the new list doesn't exist
       arr = src == NULL ? NULL : *src == NULL ? NULL : *src; // so we don't need the * all the time
       len = multi_count(arr);

       if (new > 0)
       {
               dst = malloc(sizeof(char *) * new + 1); // allocate mem for the new buffer, with term
               if (len > 0) // only copy if the list exists
               {
                       copy_len = new > len ? len : new; // failsafe for decreasing the size
                       memcpy(dst, arr, sizeof(char *) * copy_len); // copy the existing ptrs
                       free_start = new; free_end = len;
               }
               dst[new] = NULL; // set the new NULL terminator
       }

       for (i = free_end; i < free_end; ++i) // loop to free up the soon to be deleted members
               free(arr[i]);

       va_start(args, new); // get a handle for our additions (if any exist)
       for (i = len; i < new; ++i)
       {
               insert_str = va_arg(args, char *);
               insert_len = strlen(insert_str); // get length to copy
               dst[i] = malloc(insert_len + 1); // allocate it
               memcpy(dst[i], insert_str, insert_len + 1); // copy string and terminator
       }
       va_end(args); // close our handle, we're done allocating members

       if (src != NULL)
       {
               if (arr != NULL) free(arr); // clean up the old list, if it exists
               *src = dst;
       }
       return dst; // return it
}

Again, they look pretty much the same, the main difference in this one is an extra variable to hold the string itself. Rather than getting the length of the string from the arg list, we get the string and figure out the length with strlen. Now, certainly we have the best function, no? Well, not exactly. We
actually have the best 2 functions (sorta). We should provide both v3 and v4 to the programmer, just in case they don't actually know the string itself ahead of time. I would rename v3 to multi_resize and v4 to multi_resize_string, since the odds are the programmer will be using this for user input, and it
might make it easier to do it the other way. At this point, I stopped writing code, it had completed the task I needed it for, but I can still think of 1 thing YOU should to do improve it for v3.5 and v4.5:
What if the user only wants to allocate SOME of the strings ahead of time? The answer: v3 should check if the size is 0, and if it is it should skip it (right now it would malloc(0), which would probably be fine, but it's undefined behavior for C), and v4 should check if the string is NULL, and in that case
skip allocating and setting. Aside from that, these are pretty much perfect. Here's a test program I wrote (for v4):
Code:
/* driver - test suite for multi_resize functions */

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

/******************************************************/
/*                       array     len     new   args */
char **multi_resize   (char * **, size_t, size_t, ...);
char **multi_resize_v2(char * **, size_t, size_t, ...); // also handles creating and free-ing
char **multi_resize_v3(char * **,         size_t, ...); // handles NULL-terminated lists
char **multi_resize_v4(char * **,         size_t, ...); // args are strings to be added
size_t multi_count    (char   **                     );
/******************************************************/

int main(void)
{
        int i;
        char **list;
        list = multi_resize_v4(NULL, 3, "Hello, World!", "Well this is neat", "See how I don't need to do any work here?");
        multi_resize_v4(&list, 4, "And it's easy to add new strings!");
        for (i = 0; i < multi_count(list); ++i)
                puts(list[i]);
        multi_resize_v4(&list, 0); // free the list
        return 0;
}
NOTE: you don't actually need the functions before v4, but I left them there while I tested. Here's what the output looks like:
[Image: n6XE2Oz.png]

I hope you all enjoyed reading this (rather long) tutorial, or at the very least I hope you gained a new function to help you when you have lists of strings! By the way, this theory/method doesn't just apply to strings, you can make this apply to any pointer-type, and with some tweaking, can probably make it
work with non pointer type stuff.

REMEMBER:
  1. To comment
  2. To ask questions
  3. To read all of my other tutorials (link in signature)
(This post was last modified: 03-17-2018, 04:29 AM by phyrrus9.)

Reply

RE: [C] Dynamically resizing arrays of strings #2
I decided to click your master list bro and it has grown substancially. I always enjoy reading them and despite not being active e due to lack of time figured I'd express my appreciation for your hard work and time invested into them.
@Skullmeat @phyrrus9 @Bish0pQ @mr.kurd and @ender are my best friends on SL

Reply

RE: [C] Dynamically resizing arrays of strings #3
(03-17-2018, 09:06 PM)Slacker Wrote: I decided to click your master list bro and it has grown substancially. I always enjoy reading them and despite not being active e due to lack of time figured I'd express my appreciation for your hard work and time invested into them.

Holy shit you're back! Always nice to see you around, and great to hear that you're reading them. Yes, it's grown quite a bit recently, and it's planned to have at least a dozen more entries by the end of the year. Also, if you haven't noticed, the threads are getting longer (I got tired of having 10 part series, so I'm just cramming all that info into 4 parts now)

Reply

RE: [C] Dynamically resizing arrays of strings #4
Nice. I am always around but need to limit my visibillity publically. My previous "employment" may have been jeopordized so am working out of town under an alias lol once any pending allegations are handled via my lawyer I'll be back around.

You have my alternate contact methods, we held a discussion on there related to bitcoin feel free to message anytime!
@Skullmeat @phyrrus9 @Bish0pQ @mr.kurd and @ender are my best friends on SL

Reply






Users browsing this thread: 1 Guest(s)