Pictured is a peer learning day occuring

 

STRCMP in C


There are so many string functions in the C standard library! Let’s begin breaking them down one-by-one, today, starting with the string comparison function, strcmp.

PREFACE - WHAT IS C?

More than just the third letter of the alphabet, C is a compiled programming language - in other words, C is run by a computer only after being compiled. Compilation is accomplished by software called compilers, which take C source code files and translate them into executable language (binary) that computers can run.

There are many different C compilers, but in this tutorial, I will be using one called GCC, published by GNU. For instructions on how to install GCC, you can visit GNU’s installation guide here (https://gcc.gnu.org/install/).

Throughout the duration of this article, I exemplify usage of GCC on the command line. If you are unfamiliar with what a command line is, read more here (http://linuxcommand.org/index.php).

To read more on what C is and how to use it, read my earlier post on the topic (Learn C)

STRCMP - WHAT

The function strcmp (think, "string compare") is a C standard library function that compares two strings.

ASIDE - STRING REFRESHER
When working with strings in C, remember - strings are no more than arrays of ASCII-encoded characters ending with a terminating null byte (\0). A pointer to a string is merely a pointer to the first character in this array.
For a more in-depth examination on pointers, including a look at strings, I encourage you to visit one of my earlier posts (Pointers in C).

PARAMETERS
const char *str1, const char *str2

The function receives two parameters, pointers to two strings. Note that the strings are received as constants - the function strcmp will never alter received strings.

RETURN VALUE
int

String comparison is qualified by an integer return value that corresponds as follows:

  • If str1 < str2, the function returns a value less than 0.
  • If str1 == str2, the function returns 0.
  • If str1 > str2, the function returns a value greater than 0.

I know. Pretty vague, right? Let’s dive deeper into how this works.

The function strcmp works incrementally; in other words, it compares pointers to two strings one-by-one, iterating over characters from left to right. This incremental comparison occurs using array indexing; first, the function compares the character located at index 0 of str1 with the character located at index 0 of str2, then the characters located at index 1, and so on and so forth. The function continues to iterate over the strings up until one of the following occurs:

  • The end of one string is reached (indicated by a terminating null byte \0).
  • The current characters being compared in both strings do not match.

Upon encountering one of the above conditions, strcmp halts and returns an integer value based on the current characters in comparison.

If the two characters are identical, the function returns 0, indicating equality. Of course, this will only occur in the case that the function reached the terminating null bytes of both strings, having successfully compared equivalent characters for the entirety of both.

Otherwise, in the case that strcmp halted upon encountering unequivalent characters, the function uses arithmetic to calculate a numerical value indicating the inequality. Specifically, the function returns the ASCII value of the current character in str1 subtracted by the ASCII value of the current character in str2.

ASIDE - ASCII
Not familiar with the ASCII table? No worries. In short, computers have no concept of letters and characters like we humans do, only numbers (or, more specifically, binary numbers).
The American Code Standard for Information Interchange (ASCII) is a standardized way of using numbers to represent characters on computers. The ASCII table represents the encoding of 256 common English letters, numbers and punctuation marks, with each character being assigned a numerical value from 0-255.
You can view the complete ASCII table here (https://www.asciitable.com/).

Through this subtraction, strcmp calculates an inequality indicator. If the first unmatched character in str1 is greater in ASCII value than that at the corresponding index in str2, the function returns a positive value indicating that str1 is greater than str2. The same goes the other way if the first unmatched character in str2 is greater than that at the corresponding index in str1.

To visualize, take a look at the following memory representations of strings.

Address 0x00 0x01 0x02 0x03 0x04 0x05
Variable str1
Value H e l l o \0
Address 0x06 0x07 0x08
Variable str2
Value H i \0

In the above, str1 differs from str2 at index 1 of both strings. A call to strcmp on the two will return the ASCII difference of these two characters. The character e is encoded as ASCII value 101, while i represents 105; hence, the returned difference would be 101 - 105 = -4, indicating that str1 is less than str2.

Conversely, say we were to swap the two strings.

Address 0x06 0x07 0x08
Variable str1
Value H i \0
Address 0x00 0x01 0x02 0x03 0x04 0x05
Variable str2
Value H e l l o \0

Now, the return value of strcmp would be the ASCII values of i - e, so, 105 - 101 = 4, indicating that str1 is greater than str2.

In fact, strcmp doesn’t strictly return this subtraction indicator for unequivalent strings - it even does so for equivalent strings. Recall that the function breaks upon encountering either unmatched characters, or the end of one string. When strcmp breaks upon encountering the terminating null byte of the first string, and the second string is exactly the same length, the function returns the subtraction of the ASCII values of both strings’ null bytes. The null byte is encoded in ASCII as value 0 - hence, 0 - 0 equals, well, 0!

Improving specificity with our newfound knowledge, strcmp's return value can be more properly summarized as follows:

  • If the array of characters pointed to by str1 are not all equivalent to those pointed to by str2, and the first unmatched character in str1 is less than its counterpart in str2, the function returns a negative value indicating the degree of ASCII value difference between the two characters.
  • If the array of characters pointed to by str1 are all equivalent to those pointed to by str2, the function returns 0, the difference of both strings' terminating null bytes.
  • If the array of characters pointed to by str1 are not all equivalent to those pointed to by str2, and the first unmatched character in str1 is greater than its counterpart in str2, the function returns a positive value indicating the degree of ASCII value difference between the two characters.
DECLARATION

The function strcmp is declared as follows:

/**
* strcmp - Compares pointers to two strings.
* @str1: A pointer to the first string to compare.
* @str2: A pointer to the second string to compare.
*
* Return: If str1 < str2, the negative difference of the first unmatched characters.
*         If str1 == str2, 0.
*         If str1 > str2, the positive difference of the first unmatched characters.
*/
int strcmp(const char *str1, const char *str2)

STRCMP - HOW

To use the function strcmp, include the C standard library using the header <string.h>.

#include <string.h>

Once the C string library has been included, you can call the function strcmp on any two strings directly.

Example (note that the libraries <stdio.h> and <stdlib.h> are additionally included here for the usage of printf and EXIT_SUCCESS, respectively):

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str1 = "Holberton";
    char *str2 = "School";
    int difference;

    /* Compare str1 against str2 */
    difference = strcmp(str1, str2);
    printf("%s < %s and the byte difference is %d.\n", str1, str2, difference);

    /* Compare str1 against itself */
    difference = strcmp(str1, str1);
    printf("%s == %s and the byte difference is %d.\n", str1, str1, difference);

    /* Compare str2 against str1 */
    difference = strcmp(str2, str1);
    printf("%s > %s and the byte difference is %d.\n", str2, str1, difference);

    return (EXIT_SUCCESS);
}
$ gcc main.c -o strcmp
$ ./strcmp
Holberton < School and the byte difference is -11.
Holberton == Holberton and the byte difference is 0.
School > Holberton and the byte difference is 11.

STRCMP - WHEN

Strings are abundant in C programs, and you’ll likely often finding yourself needing to compare them - strcmp is a quick and easy way to do so.

String comparison particularly comes into play for programs handling user input. For example, imagine that we are writing a user account system. As part of this system, we have a C program that takes a given username and prints that user’s information.

$ cat main.c
#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    char username[16];

    /* Get username from user */
    printf("Enter your username: ");
    scanf("%s\n", username);

    if (/* if username equals "bdov_" */)
    {
        printf("User: bdov_\n");
        printf("Member since: September 4, 2018\n");
        printf("Favorite color: Blue\n");
        printf("Favorite baseball team: Yankees\n");
    }

    else if (/* if username equals "poppyCPO" */)
    {
        printf("User: poppyCPO\n");
        printf("Member since: January 29, 2017\n");
        printf("Favorite color: Red\n");
        printf("Favorite baseball team: Giants\n");
    }

    else
    {
        printf("User %s does not exist :(\n", username);
    }

    return (EXIT_SUCCESS);
}                        

How are we going to match the given username? Here comes strcmp to the rescue:

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char username[16];

    /* Get username from user */
    printf("Enter your username: ");
    scanf("%s\n", username);

    if (strcmp(username, "bdov_") == 0)
    {
        printf("User: bdov_\n");
        printf("Member since: September 4, 2018\n");
        printf("Favorite color: Blue\n");
        printf("Favorite baseball team: Yankees\n");
    }

    else if (strcmp(username, "poppyCPO") == 0)
    {
        printf("User: poppyCPO\n");
        printf("Member since: January 29, 2017\n");
        printf("Favorite color: Red\n");
        printf("Favorite baseball team: Giants\n");
    }

    else
    {
        printf("User %s does not exist :(\n", username);
    }

    return (EXIT_SUCCESS);
}
$ gcc main.c -o login
$ ./login
Enter your username: bdov_
User: bdov_
Member since: September 4, 2018
Favorite color: Blue
Favorite baseball team: Yankees
$ ./login
Enter your username: poppyCPO
User: poppyCPO
Member since: January 29, 2017
Favorite color: Red
Favorite baseball team: Giants
$ ./login
Enter your username: bbaraban
User bbaraban does not exist :(                        

By checking if strcmp returns 0 for the comparison of the user input and a username, we can seamlessly match any user, while catching any non-existent ones with an error message. Cool!

ADVANCED 100 - BUT WHY NOT ==?

At this point, you may be wondering - strcmp seems like a pretty convoluted way to compare two variables. Couldn’t I just use the good ol’ equality operator, ==? Doesn’t the following work just as well?

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str1 = "Holberton";
    char *str2 = "Holberton";

    /* 1 represents true */
    printf("*str1 == *str2 -> %d\n", *str1 == *str2);

    return (EXIT_SUCCESS);
}
$ gcc main.c -o equality
$ ./equality
*str1 == *str2 -> 1

Well, in the above, you’re correct about the output... but not for the right reasons.

The equality operator == strictly compares two variables. Crucially, remember - strings are arrays of characters. In the above, the variables str1 and str2 are no more than pointers to the first character of the string "Holberton". In this case, the first character of both strings is 'H', so the equality operator returns true.

If your goal is to compare the entirety of two strings, however, a simple usage of the equality operator will not achieve what you want.

$ cat main.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(void)
{
    char *str1 = "Holberton";
    char *str2 = "Holberton School";

    /* 1 represents true */
    printf("*str1 == *str2 -> %d\n", *str1 == *str2);

    return (EXIT_SUCCESS);
}
$ gcc main.c -o equality
$ ./equality
*str1 == *str2 -> 1

As you might be beginning to see, a proper comparison of the entirety of both strings requires looping and using the equality operator to compare the characters at each corresponding index. In fact, this is exactly what strcmp does! And on this topic of implementation...

ADVANCED 101 - IMPLEMENTATION

Now that’s what I call a lead-in. I present, my implementation of the function strcmp.

/**
* _strcmp - Compares pointers to two strings.
* @s1: A pointer to the first string to be compared.
* @s2: A pointer to the second string to be compared.
*
* Return: If str1 < str2, the negative difference of the first unmatched characters.
*     	If str1 == str2, 0.
*     	If str1 > str2, the positive difference of the first unmatched characters.
*/
int _strcmp(char *s1, char *s2)
{
    while (*s1 && *s2 && *s1 == *s2)
    {
        s1++;
        s2++;
    }

    return (*s1 - *s2);
}

source: https://github.com/bdbaraban/holbertonschool-low_level_programming/blob/master/0x05-pointers_arrays_strings/3-strcmp.c

As I alluded to above, to implement strcmp, I iterate over both strings simultaneously, character-by-character, continuing indefinitely as long as I have not reached the end of either string and the character at the current index of both strings is identical.

Upon breaking, I return the difference in value between the current characters of str1 and str2. This will either be a positive value, if the value of str1’s character is greater than its str2 counterpart, a negative value for the reverse scenario, or 0, if I have reached the terminating null bytes of both strings.

Of course, this is just one, personal implementation of the function strcmp. There are multiple ways to do so; in fact, I encourage, no, challenge, you to find another way to write this function!

NOTE

Examples in this article were compiled and run on a Linux Ubuntu 18.04 LTS machine with GNU GCC version 7.3.0.
Written by:

Brennan Baraban, Cohort 7 (SF Campus)