Check if a String is a Valid URL in C

To check if a string is a valid URL in C, you can use simple string functions to verify common URL prefixes or employ POSIX regular expressions to validate the overall structure.


Example 1: Using String Comparison

This example demonstrates how to check if a URL string starts with “http://” or “https://”, which are typical prefixes for web URLs.

main.c

</>
Copy
#include <stdio.h>
#include <string.h>

int main() {
    char url[] = "https://www.example.com";
    
    // Check if the URL starts with "http://" or "https://"
    if (strncmp(url, "http://", 7) == 0 || strncmp(url, "https://", 8) == 0) {
        printf("Valid URL\n");
    } else {
        printf("Invalid URL\n");
    }
    
    return 0;
}

Explanation:

  1. char url[]: Declares and initializes the URL string to be validated.
  2. strncmp(): Compares the beginning of the URL with “http://” (7 characters) and “https://” (8 characters) to ensure the string starts with one of these prefixes.
  3. The if condition checks if either comparison returns 0, indicating a match.
  4. If the condition is true, printf() outputs “Valid URL”; otherwise, it outputs “Invalid URL”.

Output:

Valid URL

Example 2: Using POSIX Regular Expressions

This example uses the POSIX regex library to validate the URL structure. We compile a regular expression pattern that enforces the URL to start with “http://” or “https://”, followed by a valid domain, an optional port, and an optional path.

main.c

</>
Copy
#include <stdio.h>
#include <regex.h>

int main() {
    char url[] = "http://www.example.com/path";
    regex_t regex;
    int ret;

    // Regular expression pattern for basic URL validation:
    // ^(https?://)         -> URL must start with "http://" or "https://"
    // ([a-zA-Z0-9.-]+)      -> Domain name with letters, digits, dots, or hyphens
    // (:[0-9]+)?           -> Optional port number
    // (/[a-zA-Z0-9./?=&%-]*)?$ -> Optional path and query parameters
    char *pattern = "^(https?://)([a-zA-Z0-9.-]+)(:[0-9]+)?(/[a-zA-Z0-9./?=&%-]*)?$";

    // Compile the regular expression with extended syntax
    ret = regcomp(&regex, pattern, REG_EXTENDED);
    if (ret) {
        printf("Could not compile regex\n");
        return 1;
    }

    // Execute the regular expression on the URL string
    ret = regexec(&regex, url, 0, NULL, 0);
    if (!ret) {
        printf("Valid URL\n");
    } else {
        printf("Invalid URL\n");
    }

    // Free the compiled regular expression
    regfree(&regex);
    return 0;
}

Explanation:

  1. char url[]: Stores the URL string to be validated.
  2. regex_t regex: Declares a variable to hold the compiled regular expression.
  3. char *pattern: Contains the regex pattern that specifies the expected URL format. The pattern ensures the URL begins with “http://” or “https://”, followed by a valid domain, an optional port, and an optional path.
  4. regcomp(): Compiles the regex pattern into a regex_t structure. The REG_EXTENDED flag enables extended regular expression syntax.
  5. regexec(): Executes the compiled regex against the URL string. A return value of 0 indicates a match, meaning the URL is valid.
  6. regfree(): Releases the memory allocated for the compiled regular expression.

Output:

Valid URL

Conclusion

In this tutorial, we learned how to check if a string is a valid URL in C by exploring two different approaches:

  1. String Comparison: Uses strncmp() to verify that the URL begins with common prefixes like “http://” or “https://”.
  2. Regular Expressions: Utilizes the POSIX regex library to validate the overall structure of the URL, ensuring it meets specific format criteria.