Issue 1019: String functions and trailing null inclusion

Authors: Joseph Myers
Date: 2025-09-24
Submitted against: C23
Status: Open

The definitions of "string" and "wide string" in C23 7.1.1 (Definitions of terms) are clear that the terminating null character or null wide character are included:

A string is a contiguous sequence of characters terminated by and including the first null character.

A wide string is a contiguous sequence of wide characters terminated by and including the first null wide character.

However, the specifications of some functions that take two strings as arguments are written on the assumption that the null character is not included in one or both strings.

The specification for strpbrk (C23 7.26.5.5) says:

The strpbrk generic function returns a pointer to the character, or a null pointer if no character from s2 occurs in s1.

With the terminating null character understood as being part of s2 in accordance with the definition, "if no character from s2 occurs in s1" is not a possible condition.

The specification of strspn (C23 7.26.5.7) says:

The strspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters from the string pointed to by s2.

As commonly understood, however, the length returned does not include the terminating null character of s1 in the case where every character is found in s2.

The specification of strtok (C23 7.26.5.9) says:

The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

and

The strtok function then searches from there for a character that is contained in the current separator string. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer.

The latter paragraph must be assuming that s1 or s2 does not include the trailing null character, because "no such character is found" would not be possible if the null character is included in both strings. The former paragraph is not intended to consider the null character in s1 as being a character not contained in s2, so must be assuming that s1 does not include the trailing null character (since if s1 were considered to include it, the latter paragraph would imply s2 is not considered to include it).

The same issues apply to wcspbrk (C23 7.31.4.6.4), wcsspn (C23 7.31.4.6.6) and wcstok (C23 7.31.4.6.8).

Suggested correction

In C23 7.26.5.5 (The strpbrk generic function), change the Description specification:

The strpbrk generic function locates the first occurrence in the string pointed to by s1 of any character from the string pointed to by s2, excluding the terminating null character of s2.

In C23 7.26.5.7 (The strspn function), change the Description specification:

The strspn function computes the length of the maximum initial segment of the string pointed to by s1 which consists entirely of characters from the string pointed to by s2, excluding the terminating null character of s2.

In C23 7.26.5.9 (The strtok function), change the Description specification:

The first call in the sequence searches the string pointed to by s1 for the first character that is not contained in the current separator string pointed to by s2, excluding the terminating null characters of both strings. If no such character is found, then there are no tokens in the string pointed to by s1 and the strtok function returns a null pointer. If such a character is found, it is the start of the first token.

The strtok function then searches from there for a character that is contained in the current separator string, excluding the terminating null characters of both strings. If no such character is found, the current token extends to the end of the string pointed to by s1, and subsequent searches for a token will return a null pointer.

In C23 7.31.4.6.4 (The wcspbrk generic function), change the Description specification:

The wcspbrk generic function locates the first occurrence in the wide string pointed to by s1 of any wide character from the wide string pointed to by s2, excluding the terminating null wide character of s2.

In C23 7.31.4.6.6 (The wcsspn function), change the Description specification:

The wcsspn function computes the length of the maximum initial segment of the wide string pointed to by s1 which consists entirely of wide characters from the wide string pointed to by s2, excluding the terminating null wide character of s2.

In C23 7.31.4.6.8 (The wcstok function), change the Description specification:

The first call in the sequence searches the wide string pointed to by s1 for the first wide character that is not contained in the current separator wide string pointed to by s2, excluding the terminating null wide characters of both wide strings. If no such wide character is found, then there are no tokens in the wide string pointed to by s1 and the wcstok function returns a null pointer. If such a wide character is found, it is the start of the first token.

The wcstok function then searches from there for a wide character that is contained in the current separator wide string, excluding the terminating null wide characters of both wide strings. If no such wide character is found, the current token extends to the end of the wide string pointed to by s1, and subsequent searches in the same wide string for a token return a null pointer. If such a wide character is found, it is overwritten by a null wide character, which terminates the current token.