'What is the best way to match a string to specified format?
The format that I want to match the string to is "from:<%s>" or "FROM:<%s>". The %s can be any length of characters representing an email address.
I have been using sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output). But it doesn't catch the case where the last ">" is missing. Is there a clean way to check if the input string is correctly formatted?
Solution 1:[1]
You can't directly tell whether trailing literal characters in a format string are matched; there's no direct way for sscanf()) to report their absence. However, there are a couple of tricks that'll do the job:
Option 1:
int n = 0;
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]>%n", email, &n) != 1)
…error…
else if (n == 0)
…missing >…
Option 2:
char c = '\0';
if (sscanf("%*[fromFROM:<]%[@:-,.A-Za-z0-9]%c", email, &c) != 2)
…error — malformed prefix or > missing…
else if (c != '>')
…error — something other than > after email address…
Note that the 'from' scan-set will match ROFF or MorfROM or <FROM:morf as a prefix to the email address. That's probably too generous. Indeed, it would match: from:<foofoomoo of from:<[email protected]>, which is a much more serious problem, especially as you throw the whole of the matched material away. You should probably capture the value and be more specific:
char c = '\0';
char from[5];
if (sscanf("%4[fromFROM]:<%[@:-,.A-Za-z0-9]%[>]", from, email, &c) != 3)
…error…
else if (strcasecmp(from, "FROM") != 0)
…not from…
else if (c != '>')
…missing >…
or you can compare using strcmp() with from and FROM if that's what you want. The options here are legion. Be aware that strcasecmp() is a POSIX-specific function; Microsoft provides the equivalent stricmp().
Solution 2:[2]
Use "%n". It records the offset of the scan of input[], if scanning got that far.
Use it to:
Detect scan success that include the
>.Detect Extra junk.
A check of the return value of sscanf() is not needed.
Also use a width limit.
char output[100];
int n = 0;
// sscanf(input, "%*[fromFROM:<]%[@:-,.A-Za-z0-9]>", output);
sscanf(input, "%*[fromFROM]:<%99[@:-,.A-Za-z0-9]>%n", output);
// ^^ width ^^
if (n == 0 || input[n] != '\0') {
puts("Error, scan incomplete or extra junk
} else [
puts("Success");
}
If trailing white-space, like a '\n', is OK, use " %n".
Solution 3:[3]
Regarding the first part of the string, if you want to accept only FROM:< or from:< , then you can simply use the function strncmp with both possibilities. Note, however, that this means that for example From:< will not be accepted. In your question, you implied that this is how you want your program to behave, but I'm not sure if this really is the case.
Generally, I wouldn't recommend using the function sscanf for such a complex task, because that function is not very flexible. Also, in ISO C, it is not guaranteed that character ranges are supported when using the %[] format specifier (although most common platforms probably do support it). Therefore, I would recommend checking the individual parts of the string "manually":
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdbool.h>
bool is_valid_string( const char *line )
{
const char *p;
//verify that string starts with "from:<" or "FROM:<"
if (
strncmp( line, "from:<", 6 ) != 0
&&
strncmp( line, "FROM:<", 6 ) != 0
)
{
return false;
}
//verify that there are no invalid characters before the `>`
for ( p = line + 6; *p != '>'; p++ )
{
if ( *p == '\0' )
return false;
if ( isalpha( (unsigned char)*p ) )
continue;
if ( isdigit( (unsigned char)*p ) )
continue;
if ( strchr( "@:-,.", *p) != NULL )
continue;
return false;
}
//jump past the '>' character
p++;
//verify that we are now at the end of the string
if ( *p != '\0' )
return false;
return true;
}
int main( void )
{
char line[200];
//read one line of input
if ( fgets( line, sizeof line, stdin ) == NULL )
{
printf( "Input failure!\n" );
exit( EXIT_FAILURE );
}
//remove newline character
line[strcspn(line,"\n")] = '\0';
//call function and print result
if ( is_valid_string ( line ) )
printf( "VALID\n" );
else
printf( "INVALID\n" );
}
This program has the following output:
This is an invalid string.
INVALID
from:<[email protected]
INVALID
from:<[email protected]>
VALID
FROM:<[email protected]
INVALID
FROM:<[email protected]>
VALID
FROM:<john.doe@example!!!!.com>
INVALID
FROM:<[email protected]>invalid
INVALID
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |
