Pictured is a peer learning day occuring

 

MEMSET in C


Ah, memory management, simultaneously the C programming language’s most powerful and infamous trait. Fortunately, the C standard library features numerous built-in memory functions to help us out. In this article, I’ll break down the memory-filling function, memset.

PREFACE - WHAT IS C?

More than just the third letter of the alphabet, C is a compiled programming language - in other words, C is run by a computer only after being compiled. Compilation is accomplished by software called compilers, which take C source code files and translate them into executable language (binary) that computers can run.

There are many different C compilers, but in this tutorial, I will be using one called GCC, published by GNU. For instructions on how to install GCC, you can visit GNU’s installation guide here (https://gcc.gnu.org/install/).

Throughout the duration of this article, I exemplify usage of GCC on the command line. If you are unfamiliar with what a command line is, read more here (http://linuxcommand.org/index.php).

To read more on what C is and how to use it, read my earlier post on the topic (Learn C)

MEMSET - WHAT

The function memset (think, "memory setter") is a C standard library function that sets, or, more semantically, fills, a block of memory with a value.

ASIDE - What is computer memory?
Computers store digital information in the form of bits and bytes, with one byte representing the equivalent of eight bits, and one bit representing two possible values - true (1) or false (0). Memory is tracked using what is referred to as addressing, the designation of bytes with numerical values; most commonly, addresses are represented in hexadecimal values.

For example, say that we have an array of 8 characters, the equivalent of 8 bytes of memory. When we declare this array, our computer chunks off 8 bytes of available memory on the heap (think, total memory available for programming).

This allocation of memory is not quite random - apparent randomization of memory addresses actually occurs through an interesting process termed Address Space Layout Randomization (I recommend looking it up!) - but for our exemplary purposes, we’ll say it allocates the hexadecimal addresses 0x00 to 0x07.

char array[8];
Address 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
Variable array
Value ? ? ? ? ? ? ? ?

Initially, without having initialized our array, we are completely unaware of the contents of this block of memory. It could be nothing at all, or it could even be old values we’ve used in prior programs - regardless, the memory in this block is undefined.

Maybe you strive to live life on the edge, but in general, and in good practice when working with C, it’s best to have full control over any memory being used in a given program. Instead of leaving this block of 8 bytes undefined, we should set, or fill, the memory with a given, known value, such as the null byte (\0). This way, we are fully aware and in control of the memory we are using.

Address 0x00 0x01 0x02 0x03 0x04 0x05 0x06 0x07
Variable array
Value \0 \0 \0 \0 \0 \0 \0 \0

In other words - we want to be able to memset active blocks of memory. Good thing we’re covering just that function!

PARAMETERS
void *s, int c, size_t n

The function receives three parameters, a pointer to a block of memory, an integer, and a size_t (an unsigned integer type designed to hold any array index).

The first parameter, s, represents a pointer to the block of memory to fill. The type void in this context signifies that the pointer can reference memory of any type.

The second parameter, c, is the character to fill s with. Note that while this parameter is received as an int, it is converted to an unsigned char when used to fill memory, to ensure that the function keeps just the eight bits needed from the received integer.

Finally, the third parameter, n, represents the size of the memory block to fill. While the pointer s references the location of the memory to fill, it says nothing about how much memory to fill. The parameter n, which is passed as a size_t to ensure a valid, positive size, signals just this information.

RETURN VALUE
void *

You will receive nothing back from memset that you do not give it - after filling the block of memory referenced by s, the function turns around and returns a pointer to s, the same [generically-typed] memory address passed when you call the function.

DECLARATION

The function memset is declared as follows:

/**
* memset - Fills the first @n bytes of the memory area
*          pointed to by @s with the constant byte @c.
* @s: A pointer to the memory area to be filled.
* @c: The character to fill the memory area with.
* @n: The number of bytes to be filled.
*
* Return: A pointer to the filled memory area @s.
*/
void *memset(void *s, int c, size_t n)

MEMSET - HOW

To use the function memset, include the C standard library using the header <string.h>.

#include <string.h>

Once the C string library has been included, you can call the function memset directly.

Example (note that the libraries <stdio.h> and <stdlib.h> are additionally included here for the usage of printf and EXIT_SUCCESS, respectively):

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char memory[8];

    /* Fill the memory block with null bytes */
    memset(memory, '\0', 8);
    printf("%s\n", memory);

    /* Fill the memory block with hashes */
    memset(memory, '#', 8);
    printf("%s\n", memory);

    /* Fill the memory block with dollar signs */
    memset(memory, '$', 8);
    printf("%s\n", memory);

    return (EXIT_SUCCESS);
}
$ gcc main.c -o memset
$ ./memset

########
$$$$$$$$                                                                 

MEMSET - WHEN

As I began to discuss earlier in this article - when programming in C, it’s best practice to maintain tight control of memory; when you allocate memory, initialize it. And if you can’t initialize it, or do not immediately know what to initialize with - memset it!

Dedicated readers of this function tutorial series may remember a use case of memset from a separate article, on strcpy. Proper usage of strcpy involves allocating a new block of memory to store a copy of a given string. Since this destination block must be declared first, before it is actually used by strcpy, memset is well-used to initially fill and clear out the block.

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char src[7] = "Source";
    char dest[7];

    /* Initialize dest with null bytes for good practice */
    memset(dest, '\0', sizeof(dest));

    /* Before copying */
    printf("String src before copy: %s\n", src);
    printf("String dest before copy: %s\n", dest);

    /* Copy src into dest */
    strcpy(dest, src);

    /* After copying */
    printf("String src after copy: %s\n", src);
    printf("String dest after copy: %s\n", dest);

    return (EXIT_SUCCESS);
}
$ gcc main.c -o strcpy
$ ./strcpy
String src before copy: Source
String dest before copy:
String src after copy: Source
String dest after copy: Source                    

In general, memset is a great tool to keep in mind when working with C programs involving memory allocation. Remember, if you can’t initialize it - memset it!

ADVANCED 100 - WHY NOT JUST PASS C AS A CHAR?

It’s a fair point - if c is only ever converted to an unsigned char before being used in memset, why doesn’t the function just receive the parameter directly as a char?

It turns out that the int parameter type is used for historical reasons. The function memset predates the implementation of function prototypes in C. Without prototypes, parameters cannot be passed as chars, since character literals default to ints in C (but not C++).

$ cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    printf("Did you know character literals are ints in C? Check it out!\n");
    printf("The size of an int is: %lu bytes\n", sizeof(int));
    printf("The size of 'a' is: %lu bytes\n", sizeof('a'));

    return (EXIT_SUCCESS);
}
$ gcc main.c -o char-literal
$ ./char-literal
Did you know character literals are ints in C? Check it out!
The size of an int is: 4 bytes
The size of 'a' is: 4 bytes

More than for just this historical reason, however, chars are too small to be pushed onto the stack itself. Plus, using ints even has performance benefits!

For a more in-depth discussion of the above topics, I recommend reading this (https://stackoverflow.com/questions/5919735/why-does-memset-take-an-int-instead-of-a-char) Stack Overflow forum, of which I based this section on.

ADVANCED 101 - IMPLEMENTATION

Sure we can use built-ins, but why not set some memory manually? I present, my implementation of the function memset.

/**
* memset - Fills the first n bytes of the memory area
*      	pointed to by @s with the constant byte @c.
* @s: A pointer to the memory area to be filled.
* @c: The character to fill the memory area with.
* @n: The number of bytes to be filled.
*
* Return: A pointer to the filled memory area @s.
*/
void *_memset(void *s, int c, size_t n)
{
    unsigned int index;
    unsigned char *memory = s, value = c;

    for (index = 0; index < n; index++)
        memory[index] = value;

    return (memory);
}

source: https://github.com/bdbaraban/holbertonschool-low_level_programming/blob/master/0x06-pointers_arrays_strings/0-memset.c

The function memset is truly nothing more than a looped assignment operation. To fill the memory block referenced by s, I use an index variable to loop over its memory byte-by-byte, copying in the value c at each index. I only do this after converting the received s and c parameters to unsigned char, to align their types.

Of course, this is just one, personal implementation of the function memset. There are multiple ways to do so; in fact, I encourage, no, challenge, you to find another way to write this function!

NOTE

Examples in this article were compiled and run on a Linux Ubuntu 18.04 LTS machine with GNU GCC version 7.3.0.
Written by:

Brennan Baraban, Cohort 7 (SF Campus)