Developer Blog
Articles about Using Microsoft Developer Tools

Black Belt Coder and the Future of this Blog

Friday, December 10, 2010 8:31 PM by jonwood

After a couple of years of working on a particular project whenever I had some extra time, that project is finally going live, and it's the Black Belt Coder website.

Black Belt Coder will be a website of free developer articles and source code downloads. There is still much more work to do but I've done enough to put it on the Internet.

In addition to the planned features that still need to be implemented, I also need to keep writing articles for the site. I am also soliciting article submissions. So if you're a developer with something to say, please visit http://www.blackbeltcoder.com/submit/.

While this blog hasn't seen a whole lot of activity recently, the new site will get a lot of attention over the coming years.

For the most part, much of what I may have put on this blog will now go to the new site. There may still be a place for this blog—I haven't decided yet. I'll need to give that some more thought.

I invite you to stop by as I get the site fully operational. And be sure to let me know what you think.

Categories:   General
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Redirecting to WWW on ASP.NET 4.0 and IIS7

Monday, June 14, 2010 8:23 AM by jonwood

NOTE: This article has been updated and moved to: Redirecting to WWW on ASP.NET 4.0 and IIS7.

These days, most domains work both with and without the "www" prefix. However, this can actually hurt your ranking on search engines like Google.

The problem is that search engines like Google consider domain.com to be a different domain than www.domain.com. So as you try and build your search-engine ranking by getting more and more links to your domain, it waters down those links if some use the "www" prefix and others do not.

It is better to have every link use exactly the same form of your domain. To this end, it is common to redirect requests to domain.com to www.domain.com. If someone leaves off the prefix, the redirect will cause their browser to add it. And any links saved and published will use the form of the domain with the prefix.

I recently needed to implement this redirect for a couple of ASP.NET 4.0/IIS 7 sites but ran into a little difficulty. At first, it was recommended that I edit the .htaccess file, but that simply did not work. Apparently, it only works on older versions.

Next, I was told to download IIS 7 Remote Manager and make the changes there. Well, I assume that would work, but downloading, installing, and learning how to use a very large program just to perform a simple redirect seemed like overkill, to put it mildly.

If your site uses ASP.NET 4.0 and IIS 7, or later, there is a fairly painless way to redirect requests to the "www" version of your site by simply editing the web.config file.

Listing 1 shows the text that needs to be added somewhere within your <configuration> section.

<system.webServer>
  <rewrite>
    <rules>
    <clear />
    <rule name="WWW Rewrite" enabled="true">
      <match url="(.*)" />
        <conditions>
          <add input="{HTTP_HOST}" negate="true"
            pattern="^www\.([.a-zA-Z0-9]+)$" />
        </conditions>
        <action type="Redirect" url="http://www.{HTTP_HOST}/{R:0}"
          appendQueryString="true" redirectType="Permanent" />
      </rule>
    </rules>
  </rewrite> 
<system.webServer>

Listing 1: Web.config setting to redirect domain.com to www.domain.com

Note that you will most likely see squiggly lines under the <rewrite> tag with a message that the tag is invalid. I got this message but, in fact, it worked just fine.

I found some information on the web about why I got the error but just haven't yet been able to spend time with something that appears to work just fine.

I'm a developer and this is a little out of my area of expertise so some other folks may be able to shed more light on the details of this approach. Still, if anyone comes at it from the point of view that I had, this is a great solution to look into to perform an important SEO function without digging into the depths of IIS.

Tags:  
Categories:   ASP.NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Validation Controls Lost Their Red Color

Friday, June 04, 2010 2:27 AM by jonwood

I was recently finishing up a new website (http://www.trailcalendar.com) and I noticed that all the validation and validation summary controls in my project no longer appeared in red.

Normally, these controls will display with "color:Red" as part of their style attribute unless you override the ForeColor property. Even if you have CSS styles in effect that change the text color, the inline attribute will override it and the control will appear red. But, all of a sudden, this was no longer being included in my output.

I could not figure this one out. I created a new test project from scratch and I got the same thing (no red color). Yet, when I looked at my older projects, all validation controls were including the color style attribute. What had changed?

It turns out that what had changed is that I recently upgraded to Visual Studio 2010 and ASP.NET 4.0. It appears ASP.NET 4.0 has changed the way that it renders some of the output. Most of these changes are great as they result in more concise HTML. (Verbose HTML has always been an issue with ASP.NET.)

Some of these changes include menus, which are now output as lists instead of tables, properties like border="0" are no longer included, and error text of validation controls is no longer set to red! Note that I think these are great changes and highly recommend the upgrade. In an effort to make the HTML more concise, VS 2010 also gives you the option of preventing your tag IDs growing from something like "LinkButton1" to "ContentPlaceHolder1_ContentPlaceHolder1_Repeater1_LinkButton1_0".

So, now that I know what is causing this, what is the solution? Well, it turns out there is a compatibility setting for this. The controlRenderingCompatibilityVersion can be set to "3.5" to produce the same rendering I'm familiar with. This setting goes in your web.config file and is demonstrated in Listing 1.

<?xml version="1.0"?>
<configuration>
  <system.web>
    <compilation debug="false" targetFramework="4.0" />
    <pages controlRenderingCompatibilityVersion="3.5" />
  </system.web>
</configuration>

Listing 1: The controlRenderingCompatibilityVersion Setting.

It turns out that the Visual Studio Conversion Wizard sets this setting automatically when upgrading a project. This makes sense because you might have a big project that has many places that no longer display correctly. To be honest, I'm just not sure what happened in my case and don't recall if I ran the Conversion Wizard or not.

I'm still not entirely clear on what the expectation is here. I love the option of producing more compact HTML, but I still want my validation text to appear red and I would rather use the newer ASP.NET 4.0 rendering. On my project, I will probably go in and change the ForeColor property on every validation control in my project to "Red".

Either way, I thought I would post about this in case anyone else was bit by this issue. It certainly took me by surprise.

Tags:  
Categories:   ASP.NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Converting Text to a URL "Slug"

Wednesday, May 12, 2010 10:01 AM by jonwood

NOTE: This article has been updated and moved to: Converting Text to a URL-Friendly Slug.

By now, many of you have seen where a URL contains text similar to the the page's title. For example, the URL may look like http://www.domain.com/this-is-my-best-article.aspx instead of http://www.domain.com/bestarticle.aspx. Text converted from a regular string, that can appear within a URL this way is called a "slug."

Not only is a slug a little more human-readable, but it can also help indicate to search engines like Google and Bing what keywords are important to your page.

There's no built-in .NET function to convert a string to a slug. I found a few examples on the web but didn't find any I really liked. So I decided to roll my own in C#. Listing 1 shows my ConvertTextToUrl() method. It takes any string and makes it safe to include as part of a URL.

Initially, I started by looking for an official, comprehensive list of characters that are not valid within a URL. But after thinking about it, I decided the result looked cleaner if I got rid of all punctuation, whether they could appear in a URL or not. So my code rejects characters that could legitimately be included in the URL.

/// <summary>
/// Creates a "slug" from text that can be used as part of a valid URL.
/// 
/// Invalid characters are converted to hyphens. Punctuation that is
/// perfect valid in a URL is also converted to hyphens to keep the
/// result mostly text. Steps are taken to prevent leading, trailing,
/// and consecutive hyphens.
/// </summary>
/// <param name="s">String to convert to a slug</param>
/// <returns></returns>
public static string ConvertTextToSlug(string s)
{
  StringBuilder sb = new StringBuilder();
  bool wasHyphen = true;
  foreach (char c in s)
  {
    if (char.IsLetterOrDigit(c))
    {
      sb.Append(char.ToLower(c));
      wasHyphen = false;
    }
    else if (char.IsWhiteSpace(c) && !wasHyphen)
    {
      sb.Append('-');
      wasHyphen = true;
    }
  }
  // Avoid trailing hyphens
  if (wasHyphen && sb.Length > 0)
    sb.Length--;
  return sb.ToString();
}

Listing 1: ConvertTextToSlug() method. 

Some examples I found on the web used regular expressions. My routine is simpler. It just iterates through each character in the string, appending it to the result if it's either a letter or a character. If I encounter a space, I append a hyphen (-).

The code takes steps to prevent consecutive hyphens, keeping the result looking cleaner. It also takes steps to prevent leading and trailing hyphens.

As you can see, it's a very simple routine. But it seems to produce good results. Of course, if you decide to name your documents this way, it'll be up to you to ensure you correctly handle different titles that resolve to the same slug.

Tags:   ,
Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Ensuring a Path Exists

Monday, May 03, 2010 9:26 AM by jonwood

NOTE: This article has been updated and moved to: Ensuring a Path Exists.

I was recently working on a C++/MFC application that needed to backup a bunch of files. Copying files is easy enough, but I had to create the target folder if it didn't exist. Creating a folder is also easy. However, CreateDirectory() will not create intermediate directories.

That is, if I need to create the directory D:\Documents\Financials, CreateDirectory() will fail if D:\Documents does not already exist. To build this directory, I must first create D:\Documents before I can create D:\Documents\Financials. Clearly, I needed a routine that will ensure a path of any depth exists, and will create any missing folders.

Note that SHCreateDirectory() is documented as doing just this. However, the documentation also states this function "might be altered or unavailable in subsequent versions of Windows." I didn't really care for the sound of that so I decided to write my own routine. Although it required a little thought, the routine I came up with with reasonably simple and quite short.

Listing one shows my EnsurePathExists() routine. It takes a path string, and will determine if it exists. If it does not exist, the routine will create it. It returns a Boolean value, which is false if a portion of the directory could not be created.

// Ensures the given path exists, creating it if needed
bool EnsurePathExists(LPCTSTR lpszPath)
{
  CString sPath;
  // Nothing to do if path already exists
  if (DirectoryExists(lpszPath))
    return true;
  // Ignore trailing backslash
  int nLen = _tcslen(lpszPath);
  if (lpszPath[nLen - 1] == '\\')
    nLen--;
  // Skip past drive specifier
  int nCurrLen = 0;
  if (nLen >= 3 && lpszPath[1] == ':' && lpszPath[2] == '\\')
    nCurrLen = 2;
  // We can't create root so skip past any root specifier
  while (lpszPath[nCurrLen] == '\\')
    nCurrLen++;
  // Test each component of this path, creating directories as needed
  while (nCurrLen < nLen)
  {
    // Parse next path compenent
    LPCTSTR psz = _tcschr(lpszPath + nCurrLen, '\\');
    if (psz != NULL)
      nCurrLen = (int)(psz - lpszPath);
    else
      nCurrLen = nLen;
    // Ensure this path exists
    sPath.SetString(lpszPath, nCurrLen);
    if (!DirectoryExists(sPath))
      if (!::CreateDirectory(sPath, NULL))
        return false;
    // Skip over current backslash
    if (lpszPath[nCurrLen] != '\0')
      nCurrLen++;
  }
  return true;
}
// Returns true if the specified path exists and is a directory
bool DirectoryExists(LPCTSTR lpszPath)
{
  DWORD dw = ::GetFileAttributes(lpszPath);
  return (dw != INVALID_FILE_ATTRIBUTES &&
    (dw & FILE_ATTRIBUTE_DIRECTORY) != 0);
}

Listing 1: EnsurePathExists() function.

The code processes a single component, or layer, of the path at a time, either ensuring it exists or creating it if it does not. It uses my helper routine DirectoryExists() to determine if each component exists. If it does not, that component is created using CreateDirectory().

To save time, EnsurePathExists() starts by testing for the existence of the entire path. If it already exists, the function simply returns.

That's about all there is to it. Most of the details are just in making sure each component is properly parsed and tested. The code skips the root folder because you cannot create a root folder. The code also strips multiple leading backslashes, as might be seen in paths that refer to network locations.

Abbreviating URLs

Thursday, April 29, 2010 7:35 AM by jonwood

NOTE: This article has been updated and moved to: Abbreviating URLs.

Recently, I had a case where an ASP.NET page displayed the user's URL in a side column. This worked fine except that I found some users had very long URLs, which didn't look right.

It occurred to me that I could simple truncate the visible URL while still keeping the underlying link the same. However, when I truncated the URL by trimming excess characters, I realized it could be done more intelligently.

For example, consider the URL http://www.domain.com/here/is/one/long/url/page.apsx. If I wanted to keep it within 40 characters, I could trim it to http://www.domain.com/here/is/one/long/u. The problem is that this abbreviation could be more informative. For example, is it a directory or a page? And, if it's a page, what kind? And what exactly does the "u" at the end stand for?

Wouldn't it be a little better if I instead abbreviated this URL to http://www.domain.com/.../url/page.apsx? We've lost a few characters due to the three dots that show information is missing. But we can still see the domain, and the page name and type.

The code is Listing 1 abbreviates a URL is this way. The UrlHelper class contains just a single, static method, LimitLength(). This method takes a URL string and a maximum length arguments, and attempts to abbreviate the URL so that it will fit within the specified number of characters as described above.

public class UrlHelper
{
  public static char[] Delimiters = { '/', '\\' };
  /// <summary>
  /// Attempts to intelligently short the length of a URL. No attempt is
  /// made to shorten less than 5 characters.
  /// </summary>
  /// <param name="url">The URL to be tested</param>
  /// <param name="maxLength">The maximum length of the result string</param>
  /// <returns></returns>
  public static string LimitLength(string url, int maxLength)
  {
    if (maxLength < 5)
      maxLength = 5;
    if (url.Length > maxLength)
    {
      // Remove protocol
      int i = url.IndexOfAny(new char[] { ':', '.' });
      if (i >= 0 && url[i] == ':')
        url = url.Remove(0, i + 1);
      // Remove leading delimiters
      i = 0;
      while (url.Length > 0 && (url[i] == Delimiters[0]
        || url[0] == Delimiters[1]))
        i++;
      if (i > 0)
        url = url.Remove(0, i);
      // Remove trailing delimiter
      if (url.Length > maxLength && (url.EndsWith("/") || url.EndsWith("\\")))
        url = url.Remove(url.Length - 1);
      // Remove path segments until url is short enough or no more segments:
      //
      // domain.com/abc/def/ghi/jkl.htm
      // domain.com/.../def/ghi/jkl.htm
      // domain.com/.../ghi/jkl.htm
      // domain.com/.../jkl.htm
      if (url.Length > maxLength)
      {
        i = url.IndexOfAny(Delimiters);
        if (i >= 0)
        {
          string first = url.Substring(0, i + 1);
          string last = url.Substring(i);
          bool trimmed = false;
          do
          {
            i = last.IndexOfAny(Delimiters, 1);
            if (i < 0 || i >= (last.Length - 1))
              break;
            last = last.Substring(i);
            trimmed = true;
          } while ((first.Length + 3 + last.Length) > maxLength);
          if (trimmed)
            url = String.Format("{0}...{1}", first, last);
        }
      }
    }
    return url;
  }
}

Listing 1: UrlHelper class.

If the specified maximum length is less than five, LimitLength() simply changes it to five as there is no point in attempting to shorten a URL to less than the length of the protocol (http://).

That's all there is to it. I hope some of you find this code helpful.

Tags:  
Categories:   C# .NET | ASP.NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Encoding Query Arguments

Thursday, April 29, 2010 7:06 AM by jonwood

NOTE: This article has been updated and moved to: Encrypting Query Arguments.

When passing variables between pages in ASP.NET, you have a few techniques you can choose from. One of the simplest is to use query arguments (e.g. http://www.domain.com/page.aspx?arg1=val1&arg2=val2). In ASP.NET, query arguments are easy to implement and use.

If you spend time browsing sites like Amazon.com, you'll see these query arguments causing the URLs to grow quite long. Long URLs don't generally cause a problem; however, there are some potential problems with query arguments.

For one thing, they are completely visible to the user. If you need to pass sensitive variables, then this could cause problems. For another thing, users can easily modify these values. For example, let's say you have a page that displays the current user's information. If a user ID is passed as a query argument, the user could easily edit that ID, possibly causing information for another user to be displayed. The potential security concerns here are pretty obvious.

Still, query arguments can be so convenient I decided to throw together a class that allows me to use them without the potential issues described above. In order to prevent the arguments from being seen by the user, the arguments are encrypted into a single argument. And in order to prevent the user from tampering with the values, the encrypted value includes a checksum that can detect if the data has been tampered with or corrupted.

Listing 1 shows my EncryptedQueryString class. By inheriting from Dictionary<string, string>, my class is a dictionary class. You can add any number of key/value items to the dictionary and then call ToString() to produce an encrypted string that contains all the values and a simple checksum. The string returned can then be passed to a page as a single query argument.

To restore the values, you can call the constructor that accepts an encrypted string. This constructor extracts the data from the encrypted string and adds it to the dictionary. Note that if this constructor finds an invalid or missing checksum, nothing is added to the dictionary. This prevents the calling code from working with questionable data.

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Security.Cryptography;
using System.Text;
using System.Web;
public class EncryptedQueryString : Dictionary<string, string>
{
  // Change the following keys to ensure uniqueness
  // Must be 8 bytes
  protected byte[] _keyBytes =
    { 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17, 0x18 };
  // Must be at least 8 characters
  protected string _keyString = "ABC12345";
  // Name for checksum value (unlikely to be used in user arguments)
  protected string _checksumKey = "__$$";
  /// <summary>
  /// Creates an empty dictionary
  /// </summary>
  public EncryptedQueryString()
  {
  }
  /// <summary>
  /// Creates a dictionary from the given, encrypted string
  /// </summary>
  /// <param name="encryptedData"></param>
  public EncryptedQueryString(string encryptedData)
  {
    // Descrypt string
    string data = Decrypt(encryptedData);
    // Parse out key/value pairs and add to dictionary
    string checksum = null;
    string[] args = data.Split('&');
    foreach (string arg in args)
    {
      int i = arg.IndexOf('=');
      if (i != -1)
      {
        string key = arg.Substring(0, i);
        string value = arg.Substring(i + 1);
        if (key == _checksumKey)
          checksum = value;
        else
          base.Add(HttpUtility.UrlDecode(key), HttpUtility.UrlDecode(value));
      }
    }
    // Clear contents if valid checksum not found
    if (checksum == null || checksum != ComputeChecksum())
      base.Clear();
  }
  /// <summary>
  /// Returns an encrypted string that contains the current dictionary
  /// </summary>
  /// <returns></returns>
  public override string ToString()
  {
    // Build query string from current contents
    StringBuilder content = new StringBuilder();
    foreach (string key in base.Keys)
    {
      if (content.Length > 0)
        content.Append('&');
      content.AppendFormat("{0}={1}",  HttpUtility.UrlEncode(key),
        HttpUtility.UrlEncode(base[key]));
    }
    // Add checksum
    if (content.Length > 0)
      content.Append('&');
    content.AppendFormat("{0}={1}", _checksumKey, ComputeChecksum());
    return Encrypt(content.ToString());
  }
  /// <summary>
  /// Returns a simple checksum for all keys and values in the collection
  /// </summary>
  /// <returns></returns>
  protected string ComputeChecksum()
  {
    int checksum = 0;
    foreach (KeyValuePair<string, string> pair in this)
    {
      checksum += pair.Key.Sum(c => c - '0');
      checksum += pair.Value.Sum(c => c - '0');
    }
    return checksum.ToString("X");
  }
  /// <summary>
  /// Encrypts the given text
  /// </summary>
  /// <param name="text">Text to be encrypted</param>
  /// <returns></returns>
  protected string Encrypt(string text)
  {
    try
    {
      byte[] keyData = Encoding.UTF8.GetBytes(_keyString.Substring(0, 8));
      DESCryptoServiceProvider des = new DESCryptoServiceProvider();
      byte[] textData = Encoding.UTF8.GetBytes(text);
      MemoryStream ms = new MemoryStream();
      CryptoStream cs = new CryptoStream(ms,
        des.CreateEncryptor(keyData, _keyBytes), CryptoStreamMode.Write);
      cs.Write(textData, 0, textData.Length);
      cs.FlushFinalBlock();
      return GetString(ms.ToArray());
    }
    catch (Exception)
    {
      return String.Empty;
    }
  }
  /// <summary>
  /// Decrypts the given encrypted text
  /// </summary>
  /// <param name="text">Text to be decrypted</param>
  /// <returns></returns>
  protected string Decrypt(string text)
  {
    try
    {
      byte[] keyData = Encoding.UTF8.GetBytes(_keyString.Substring(0, 8));
      DESCryptoServiceProvider des = new DESCryptoServiceProvider();
      byte[] textData = GetBytes(text);
      MemoryStream ms = new MemoryStream();
      CryptoStream cs = new CryptoStream(ms,
        des.CreateDecryptor(keyData, _keyBytes), CryptoStreamMode.Write);
      cs.Write(textData, 0, textData.Length);
      cs.FlushFinalBlock();
      return Encoding.UTF8.GetString(ms.ToArray());
    }
    catch (Exception)
    {
      return String.Empty;
    }
  }
  /// <summary>
  /// Converts a byte array to a string of hex characters
  /// </summary>
  /// <param name="data"></param>
  /// <returns></returns>
  protected string GetString(byte[] data)
  {
    StringBuilder results = new StringBuilder();
    foreach (byte b in data)
      results.Append(b.ToString("X2"));
    return results.ToString();
  }
  /// <summary>
  /// Converts a string of hex characters to a byte array
  /// </summary>
  /// <param name="data"></param>
  /// <returns></returns>
  protected byte[] GetBytes(string data)
  {
    // GetString() encodes the hex-numbers with two digits
    byte[] results = new byte[data.Length / 2];
    for (int i = 0; i < data.Length; i += 2)
      results[i / 2] = Convert.ToByte(data.Substring(i, 2), 16);
    return results;
  }
}

Listing 1: EncryptedQueryString class.

So, for example, a page that sends encrypted arguments to another page could contain code something like what is shown in Listing 2. This code constructs an empty EncryptedQueryString object, adds a couple of values to the dictionary, and then passes the resulting string as a single query argument to page.aspx.

protected void Button1_Click(object sender, EventArgs e)
{
  EncryptedQueryString args = new EncryptedQueryString();
  args["arg1"] = "val1";
  args["arg2"] = "val2";
  Response.Redirect(String.Format("page.aspx?args={0}", args.ToString()));
}

Listing 2: Code that passes encrypted query arguments.

Finally, Listing 3 shows code that could go in page.aspx to extract the encrypted values from the single argument.

protected void Page_Load(object sender, EventArgs e)
{
  EncryptedQueryString args =
    new EncryptedQueryString(Request.QueryString["args"]);
  Label1.Text = string.Format("arg1={0}, arg2={1}", args["arg1"], args["arg2"]);
}

Listing 3: Code to extract encrypted query arguments.

And that's all there is to it. Be sure to add error checking in case the dictionary objects are not there (either because they were not provided, or because an invalid checksum caused the EncryptedQueryString class to clear all items from the dictionary).

Also, be sure to customize the two keys near the top of Listing 1 so that people who read this article won't be able to decrypt your values.

Query arguments aren't always the best choice. As mentioned, you may choose to use Session variables or other techniques, depending on your requirements. But query arguments are straight forward and easy to implement. Using the class I've presented here, they can also be reasonably secure.

Tags:  
Categories:   C# .NET | ASP.NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Domains and the New .CO TLD

Thursday, April 01, 2010 11:38 AM by jonwood

Yesterday, I purchased my shortest domain yet: PADFILE.COM. It cost me $80.00, which seems kind of expensive but I have plans for it and there are domains selling for many, many times more than that.

As .COM domains become more and more scarce, it has me thinking about the industry that has come about as a result of domains and the fact that there are a limited number of them available. It doesn’t seem that long ago that I had no knowledge of domains. As someone who likes to build unique websites and make small business out of them, I now spend quite a bit of time searching, buying, and thinking about Internet domains.

Recently, I received a notice that a new top-level domain (TLD) is now available: domains of type .CO. This isn’t really a new domain; .CO has been used in the past to indicate a domain located in Columbia. But it is scheduled to become available worldwide, and registrars want you to instead understand .CO as an abbreviation for “Company,” “Corporation” and “Commerce” instead of “Columbia”.

I’m not sure this is a good move. The argument given is that we are running out of .COM domains names, which is true. But, if there is a domain XYZ.COM, what is the point of XYZ.CO. Well, many people would say both should be owned by the same company, XYZ. In that case, we don’t have a new domain available, we just have one more domain that the company needs to pay for. If, instead, you think a new company should be able to register XYZ.CO, then we really end up with results that are bound to confuse users.

Worse, one reason to own XYZ.CO is, if XYZ.COM is a popular website, people could easily mistype by leaving off the final letter and go to XYZ.CO instead. So cyber squatters will likely try and use .CO to gain traffic this way in addition to visits that could result from the confusion I described above. This issue is being addressed by allowing organizations with an existing global trademark to have first access to .CO domains exactly matching their trademarks. But, currently, .CO domains are running around $300/year instead of the $10/year charged for most domains.

It will be interesting to see how this plays out. The release of the new .CO domains has been delayed due, in part, to some controversy about the decision. I have no question .CO domains will be popular if it goes as planned—I’m sure I’ll be getting some of my own. But I really question the wisdom of this decision. I strongly suspect it is more about making more money than it is about addressing the fact that we are starting to run out of .COM domains.

Go Daddy has posted more information about this at http://www.godaddy.com/tlds/co-domain.aspx?ci=19152.

Tags:  
Categories:   General
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Quick-and-Dirty, Buy-Now Buttons in ASP.NET

Sunday, February 14, 2010 6:34 AM by jonwood

NOTE: This article has been updated and moved to: Quick-and-Dirty, Buy-Now Buttons in ASP.NET.

I posted previously about an issue when incorporating PayPal Buy-Now buttons on an ASP.NET web form. Basically, after presenting a few hacks, I pointed out that you could simply place the form items directly within your ASP.NET form. (See that post for more info.)

However, for quick and dirty Buy Now buttons, there is a far simpler approach. You can simply use an anchor link and provide parameters as query arguments. Listing 1 demonstrates this technique.

<a href="https://www.paypal.com/cgi-bin/webscr
  ?cmd=_xclick&business=MyEmail
  &item_name=Widget
  &amount=29.00
  &undefined_quantity=1
  &currency_code=USD">
<img src="http://www.paypal.com/en_US/i/btn/x-click-but23.gif"
  border="0" alt="Buy Now Using PayPal" />
</a>

Listing 1: Simple Implementation of PayPal Buy Now Button

Note that the href value of the a tag should all go on a single line. I wrapped the text here only so it would fit within the page. MyEmail should be replaced with the email address associated with your PayPal account.

As you can see, we provided several bits of information. After our account email, we provide an item name, the price (amount), and I included the optional currency code.

The undefined_quantity parameter allows the user to enter the quantity, and PayPal will calculate the total based on the price you specified and the quantity entered by the user. Alternatively, you can instead say quantity=5 to fix the quantity so that the user cannot edit it.

Although that should be all you need for a simple Buy-Now button, Table 1 lists some additional arguments you can include.

Argument Description
business Email address associated with seller’s PayPal account
quantity Quantity of items being sold
undefined_quantity Allows user to edit quantity
item_name Name of item
item_number Optional item number
amount Price of each item (without currency symbol)
undefined_amount Allows user to edit the amount (good for donations)
shipping Price of shipping
currency_code Code for type of currency (Default appears to be USD)
first_name Customer’s first name
last_name Customer’s last name
address1 Customer’s first address line
address2 Customer’s second address line
city Customer’s city
state Customer’s state
zip Customer’s zip code
email Customer’s email address
night_phone_a Customers telephone area code
night_phone_b Customers telephone prefix
night_phone_c Remainder of customer’s telephone number

Table 1: Additional Query Arguments

The arguments listed in Table 1 are not exhaustive. Other arguments are available as well. For the simple task I’m describing, this list should be more than enough.

Of course, you also have the option of programmatically forming this link and then using code to redirect to it. This allows you, for example, to set the quantity based on a value entered by the user on your own site.

Note that there are some potential downsides to this technique. For starters, the link is fully visible for anyone to see. Of course, it won’t include your PayPal password so that type of information is not exposed. But your account email is visible.

Users can also save your web page to their computer, and then edit the link. So, for example, they could change the price, load the edited page, and click the link. So you need to verify the correct amount was paid when processing orders.

Nonetheless, for a simply Buy-Now button, this technique works great and couldn’t be simpler to implement.

Parsing HTML Tags in C#

Sunday, February 07, 2010 10:46 AM by jonwood

NOTE: This article has been updated and moved to: Parsing HTML Tags in C#.

The .NET framework provides a plethora of tools for generating HTML markup, and for both generating and parsing XML markup. However, it provides very little in the way of support for parsing HTML markup.

I had some pretty old code (written in classic Visual Basic) for spidering websites and I had ported it over to C#. Spidering generally involves parsing out all the links on a particular web page and then following those links and doing the same for those pages. Spidering is how companies like Google scour the Internet.

My ported code worked pretty well, but it wasn’t very forgiving. For example, I had a website that allowed users to enter a URL of a page that had a link to our site in return for a free promotion. The code would scan the given URL for a backlink. However, sometimes it would report there was no backlink when there really was.

The error was caused when the user’s web page contained syntax errors. For example, an attribute value that had no closing quote. My code would skip ahead past large amounts of markup, looking for that quote.

So I rewrote the code to be more flexible—as most browsers are. In the case of attribute values missing closing quotes, my code assumes the value has terminated whenever it encounters a line break. I made other changes as well, primarily designed to make the code simpler and more robust.

Listing 1 is the HtmlParser class I came up with. Note that there are many ways you can parse HTML. My code is only interested in tags and their attributes and does not look at text that comes between tags. This is perfect for spidering links in a page.

The ParseNext() method is called to find the next occurrence of a tag and returns an HtmlTag object that describes the tag. The caller indicates the type of tag it wants information about (or “*” if it wants information about all tags).

Parsing HTML markup is fairly simple. As I mentioned, much of my time spent was spent making the code handle markup errors intelligently. There were a few other special cases as well. For example, if the code finds a <script> tag, it automatically scans to the closing </script> tag, if any. This is because some scripting can include HTML markup characters that can confuse the parser so I just jump over them. I take similar action with HTML comments and have special handling for !DOCTYPE tags as well.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace HtmlParser
{
  public class HtmlTag
  {
    /// <summary>
    /// Name of this tag
    /// </summary>
    public string Name { get; set; }
    /// <summary>
    /// Collection of attribute names and values for this tag
    /// </summary>
    public Dictionary<string, string> Attributes { get; set; }
    /// <summary>
    /// True if this tag contained a trailing forward slash
    /// </summary>
    public bool TrailingSlash { get; set; }
  };
  public class HtmlParser
  {
    protected string _html;
    protected int _pos;
    protected bool _scriptBegin;
    public HtmlParser(string html)
    {
      Reset(html);
    }
    /// <summary>
    /// Resets the current position to the start of the current document
    /// </summary>
    public void Reset()
    {
      _pos = 0;
    }
    /// <summary>
    /// Sets the current document and resets the current position to the
    /// start of it
    /// </summary>
    /// <param name="html"></param>
    public void Reset(string html)
    {
      _html = html;
      _pos = 0;
    }
    /// <summary>
    /// Indicates if the current position is at the end of the current
    /// document
    /// </summary>
    public bool EOF
    {
      get { return (_pos >= _html.Length); }
    }
    /// <summary>
    /// Parses the next tag that matches the specified tag name
    /// </summary>
    /// <param name="name">Name of the tags to parse ("*" = parse all
    /// tags)</param>
    /// <param name="tag">Returns information on the next occurrence
    /// of the specified tag or null if none found</param>
    /// <returns>True if a tag was parsed or false if the end of the
    /// document was reached</returns>
    public bool ParseNext(string name, out HtmlTag tag)
    {
      tag = null;
      // Nothing to do if no tag specified
      if (String.IsNullOrEmpty(name))
        return false;
      // Loop until match is found or there are no more tags
      while (MoveToNextTag())
      {
        // Skip opening '<'
        Move();
        // Examine first tag character
        char c = Peek();
        if (c == '!' && Peek(1) == '-' && Peek(2) == '-')
        {
          // Skip over comments
          const string endComment = "-->";
          _pos = _html.IndexOf(endComment, _pos);
          NormalizePosition();
          Move(endComment.Length);
        }
        else if (c == '/')
        {
          // Skip over closing tags
          _pos = _html.IndexOf('>', _pos);
          NormalizePosition();
          Move();
        }
        else
        {
          // Parse tag
          bool result = ParseTag(name, ref tag);
          // Because scripts may contain tag characters,
          // we need special handling to skip over
          // script contents
          if (_scriptBegin)
          {
            const string endScript = "</script";
            _pos = _html.IndexOf(endScript, _pos,
              StringComparison.OrdinalIgnoreCase);
            NormalizePosition();
            Move(endScript.Length);
            SkipWhitespace();
            if (Peek() == '>')
              Move();
          }
          // Return true if requested tag was found
          if (result)
            return true;
        }
      }
      return false;
    }
    /// <summary>
    /// Parses the contents of an HTML tag. The current position should
    /// be at the first character following the tag's opening less-than
    /// character.
    /// 
    /// Note: We parse to the end of the tag even if this tag was not
    /// requested by the caller. This ensures subsequent parsing takes
    /// place after this tag
    /// </summary>
    /// <param name="name">Name of the tag the caller is requesting,
    /// or "*" if caller is requesting all tags</param>
    /// <param name="tag">Returns information on this tag if it's one
    /// the caller is requesting</param>
    /// <returns>True if data is being returned for a tag requested by
    /// the caller or false otherwise</returns>
    protected bool ParseTag(string name, ref HtmlTag tag)
    {
      // Get name of this tag
      string s = ParseTagName();
      // Special handling
      bool doctype = _scriptBegin = false;
      if (String.Compare(s, "!DOCTYPE", true) == 0)
        doctype = true;
      else if (String.Compare(s, "script", true) == 0)
        _scriptBegin = true;
      // Is this a tag requested by caller?
      bool requested = false;
      if (name == "*" || String.Compare(s, name, true) == 0)
      {
        // Yes, create new tag object
        tag = new HtmlTag();
        tag.Name = s;
        tag.Attributes = new Dictionary<string, string>();
        requested = true;
      }
      // Parse attributes
      SkipWhitespace();
      while (Peek() != '>')
      {
        if (Peek() == '/')
        {
          // Handle trailing forward slash
          if (requested)
            tag.TrailingSlash = true;
          Move();
          SkipWhitespace();
          // If this is a script tag, it was closed
          _scriptBegin = false;
        }
        else
        {
          // Parse attribute name
          s = (!doctype) ? ParseAttributeName() : ParseAttributeValue();
          SkipWhitespace();
          // Parse attribute value
          string value = String.Empty;
          if (Peek() == '=')
          {
            Move();
            SkipWhitespace();
            value = ParseAttributeValue();
            SkipWhitespace();
          }
          // Add attribute to collection if requested tag
          if (requested)
          {
            // This tag replaces existing tags with same name
            if (tag.Attributes.Keys.Contains(s))
              tag.Attributes.Remove(s);
            tag.Attributes.Add(s, value);
          }
        }
      }
      // Skip over closing '>'
      Move();
      return requested;
    }
    /// <summary>
    /// Parses a tag name. The current position should be the first
    /// character of the name
    /// </summary>
    /// <returns>Returns the parsed name string</returns>
    protected string ParseTagName()
    {
      int start = _pos;
      while (!EOF && !Char.IsWhiteSpace(Peek()) && Peek() != '>')
        Move();
      return _html.Substring(start, _pos - start);
    }
    /// <summary>
    /// Parses an attribute name. The current position should be the
    /// first character of the name
    /// </summary>
    /// <returns>Returns the parsed name string</returns>
    protected string ParseAttributeName()
    {
      int start = _pos;
      while (!EOF && !Char.IsWhiteSpace(Peek()) && Peek() != '>'
        && Peek() != '=')
        Move();
      return _html.Substring(start, _pos - start);
    }
    /// <summary>
    /// Parses an attribute value. The current position should be the
    /// first non-whitespace character following the equal sign.
    /// 
    /// Note: We terminate the name or value if we encounter a new line.
    /// This seems to be the best way of handling errors such as values
    /// missing closing quotes, etc.
    /// </summary>
    /// <returns>Returns the parsed value string</returns>
    protected string ParseAttributeValue()
    {
      int start, end;
      char c = Peek();
      if (c == '"' || c == '\'')

      {
        // Move past opening quote
        Move();
        // Parse quoted value
        start = _pos;
        _pos = _html.IndexOfAny(new char[] { c, '\r', '\n' }, start);
        NormalizePosition();
        end = _pos;
        // Move past closing quote
        if (Peek() == c)
          Move();
      }
      else
      {
        // Parse unquoted value
        start = _pos;
        while (!EOF && !Char.IsWhiteSpace(c) && c != '>')
        {
          Move();
          c = Peek();
        }
        end = _pos;
      }
      return _html.Substring(start, end - start);
    }
    /// <summary>
    /// Moves to the start of the next tag
    /// </summary>
    /// <returns>True if another tag was found, false otherwise</returns>
    protected bool MoveToNextTag()
    {
      _pos = _html.IndexOf('<', _pos);
      NormalizePosition();
      return !EOF;
    }
    /// <summary>
    /// Returns the character at the current position, or a null
    /// character if we're at the end of the document
    /// </summary>
    /// <returns>The character at the current position</returns>
    public char Peek()
    {
      return Peek(0);
    }
    /// <summary>
    /// Returns the character at the specified number of characters
    /// beyond the current position, or a null character if the
    /// specified position is at the end of the document
    /// </summary>
    /// <param name="ahead">The number of characters beyond the
    /// current position</param>
    /// <returns>The character at the specified position</returns>
    public char Peek(int ahead)
    {
      int pos = (_pos + ahead);
      if (pos < _html.Length)
        return _html[pos];
      return (char)0;
    }
    /// <summary>
    /// Moves the current position ahead one character
    /// </summary>
    protected void Move()
    {
      Move(1);
    }
    /// <summary>
    /// Moves the current position ahead the specified number of characters
    /// </summary>
    /// <param name="ahead">The number of characters to move ahead</param>
    protected void Move(int ahead)
    {
      _pos = Math.Min(_pos + ahead, _html.Length);
    }
    /// <summary>
    /// Moves the current position to the next character that is
    // not whitespace
    /// </summary>
    protected void SkipWhitespace()
    {
      while (!EOF && Char.IsWhiteSpace(Peek()))
        Move();
    }
    /// <summary>
    /// Normalizes the current position. This is primarily for handling
    /// conditions where IndexOf(), etc. return negative values when
    /// the item being sought was not found
    /// </summary>
    protected void NormalizePosition()
    {
      if (_pos < 0)
        _pos = _html.Length;
    }
  }
}

Listing 1: The HtmlParse class.

Using the class is very easy. Listing 2 shows sample code that scans a web page for all the HREF values in A (anchor) tags. It downloads a URL and loads the contents into an instance of the HtmlParser class. It then calls ParseNext() with a request to return information about all A tags.

When ParseNext() returns, tag is set to an instance of the HtmlTag class with information about the tag that was found. This class includes a collection of attribute values, which my code uses to locate the value of the HREF attribute.

When ParseNext() returns false, the end of the document has been reached.

  protected void ScanLinks(string url)
  {
    // Download page
    WebClient client = new WebClient();
    string html = client.DownloadString(url);
    // Scan links on this page
    HtmlTag tag;
    HtmlParser parse = new HtmlParser(html);
    while (parse.ParseNext("a", out tag))
    {
      // See if this anchor links to us
      string value;
      if (tag.Attributes.TryGetValue("href", out value))
      {
        // value contains URL referenced by this link
      }
    }
  }

Listing 2: Code that demonstrates using the HtmlParser class

While I’ll probably find a few tweaks and fixes required to this code, it seems to work well. I found similar code on the web but didn’t like it. My code is fairly simple, does not rely on large library routines, and seems to perform well. I hope you are able to benefit from it.

Tags:   ,
Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed