Developer Blog
Articles about Using Microsoft Developer Tools

Converting Text to a URL "Slug"

Wednesday, May 12, 2010 10:01 AM by jonwood

NOTE: This article has been updated and moved to: Converting Text to a URL-Friendly Slug.

By now, many of you have seen where a URL contains text similar to the the page's title. For example, the URL may look like http://www.domain.com/this-is-my-best-article.aspx instead of http://www.domain.com/bestarticle.aspx. Text converted from a regular string, that can appear within a URL this way is called a "slug."

Not only is a slug a little more human-readable, but it can also help indicate to search engines like Google and Bing what keywords are important to your page.

There's no built-in .NET function to convert a string to a slug. I found a few examples on the web but didn't find any I really liked. So I decided to roll my own in C#. Listing 1 shows my ConvertTextToUrl() method. It takes any string and makes it safe to include as part of a URL.

Initially, I started by looking for an official, comprehensive list of characters that are not valid within a URL. But after thinking about it, I decided the result looked cleaner if I got rid of all punctuation, whether they could appear in a URL or not. So my code rejects characters that could legitimately be included in the URL.

/// <summary>
/// Creates a "slug" from text that can be used as part of a valid URL.
/// 
/// Invalid characters are converted to hyphens. Punctuation that is
/// perfect valid in a URL is also converted to hyphens to keep the
/// result mostly text. Steps are taken to prevent leading, trailing,
/// and consecutive hyphens.
/// </summary>
/// <param name="s">String to convert to a slug</param>
/// <returns></returns>
public static string ConvertTextToSlug(string s)
{
  StringBuilder sb = new StringBuilder();
  bool wasHyphen = true;
  foreach (char c in s)
  {
    if (char.IsLetterOrDigit(c))
    {
      sb.Append(char.ToLower(c));
      wasHyphen = false;
    }
    else if (char.IsWhiteSpace(c) && !wasHyphen)
    {
      sb.Append('-');
      wasHyphen = true;
    }
  }
  // Avoid trailing hyphens
  if (wasHyphen && sb.Length > 0)
    sb.Length--;
  return sb.ToString();
}

Listing 1: ConvertTextToSlug() method. 

Some examples I found on the web used regular expressions. My routine is simpler. It just iterates through each character in the string, appending it to the result if it's either a letter or a character. If I encounter a space, I append a hyphen (-).

The code takes steps to prevent consecutive hyphens, keeping the result looking cleaner. It also takes steps to prevent leading and trailing hyphens.

As you can see, it's a very simple routine. But it seems to produce good results. Of course, if you decide to name your documents this way, it'll be up to you to ensure you correctly handle different titles that resolve to the same slug.

Tags:   ,
Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Ensuring a Path Exists

Monday, May 03, 2010 9:26 AM by jonwood

NOTE: This article has been updated and moved to: Ensuring a Path Exists.

I was recently working on a C++/MFC application that needed to backup a bunch of files. Copying files is easy enough, but I had to create the target folder if it didn't exist. Creating a folder is also easy. However, CreateDirectory() will not create intermediate directories.

That is, if I need to create the directory D:\Documents\Financials, CreateDirectory() will fail if D:\Documents does not already exist. To build this directory, I must first create D:\Documents before I can create D:\Documents\Financials. Clearly, I needed a routine that will ensure a path of any depth exists, and will create any missing folders.

Note that SHCreateDirectory() is documented as doing just this. However, the documentation also states this function "might be altered or unavailable in subsequent versions of Windows." I didn't really care for the sound of that so I decided to write my own routine. Although it required a little thought, the routine I came up with with reasonably simple and quite short.

Listing one shows my EnsurePathExists() routine. It takes a path string, and will determine if it exists. If it does not exist, the routine will create it. It returns a Boolean value, which is false if a portion of the directory could not be created.

// Ensures the given path exists, creating it if needed
bool EnsurePathExists(LPCTSTR lpszPath)
{
  CString sPath;
  // Nothing to do if path already exists
  if (DirectoryExists(lpszPath))
    return true;
  // Ignore trailing backslash
  int nLen = _tcslen(lpszPath);
  if (lpszPath[nLen - 1] == '\\')
    nLen--;
  // Skip past drive specifier
  int nCurrLen = 0;
  if (nLen >= 3 && lpszPath[1] == ':' && lpszPath[2] == '\\')
    nCurrLen = 2;
  // We can't create root so skip past any root specifier
  while (lpszPath[nCurrLen] == '\\')
    nCurrLen++;
  // Test each component of this path, creating directories as needed
  while (nCurrLen < nLen)
  {
    // Parse next path compenent
    LPCTSTR psz = _tcschr(lpszPath + nCurrLen, '\\');
    if (psz != NULL)
      nCurrLen = (int)(psz - lpszPath);
    else
      nCurrLen = nLen;
    // Ensure this path exists
    sPath.SetString(lpszPath, nCurrLen);
    if (!DirectoryExists(sPath))
      if (!::CreateDirectory(sPath, NULL))
        return false;
    // Skip over current backslash
    if (lpszPath[nCurrLen] != '\0')
      nCurrLen++;
  }
  return true;
}
// Returns true if the specified path exists and is a directory
bool DirectoryExists(LPCTSTR lpszPath)
{
  DWORD dw = ::GetFileAttributes(lpszPath);
  return (dw != INVALID_FILE_ATTRIBUTES &&
    (dw & FILE_ATTRIBUTE_DIRECTORY) != 0);
}

Listing 1: EnsurePathExists() function.

The code processes a single component, or layer, of the path at a time, either ensuring it exists or creating it if it does not. It uses my helper routine DirectoryExists() to determine if each component exists. If it does not, that component is created using CreateDirectory().

To save time, EnsurePathExists() starts by testing for the existence of the entire path. If it already exists, the function simply returns.

That's about all there is to it. Most of the details are just in making sure each component is properly parsed and tested. The code skips the root folder because you cannot create a root folder. The code also strips multiple leading backslashes, as might be seen in paths that refer to network locations.