Developer Blog
Articles about Using Microsoft Developer Tools

Quick-and-Dirty, Buy-Now Buttons in ASP.NET

Sunday, February 14, 2010 6:34 AM by jonwood

NOTE: This article has been updated and moved to: Quick-and-Dirty, Buy-Now Buttons in ASP.NET.

I posted previously about an issue when incorporating PayPal Buy-Now buttons on an ASP.NET web form. Basically, after presenting a few hacks, I pointed out that you could simply place the form items directly within your ASP.NET form. (See that post for more info.)

However, for quick and dirty Buy Now buttons, there is a far simpler approach. You can simply use an anchor link and provide parameters as query arguments. Listing 1 demonstrates this technique.

<a href="https://www.paypal.com/cgi-bin/webscr
  ?cmd=_xclick&business=MyEmail
  &item_name=Widget
  &amount=29.00
  &undefined_quantity=1
  &currency_code=USD">
<img src="http://www.paypal.com/en_US/i/btn/x-click-but23.gif"
  border="0" alt="Buy Now Using PayPal" />
</a>

Listing 1: Simple Implementation of PayPal Buy Now Button

Note that the href value of the a tag should all go on a single line. I wrapped the text here only so it would fit within the page. MyEmail should be replaced with the email address associated with your PayPal account.

As you can see, we provided several bits of information. After our account email, we provide an item name, the price (amount), and I included the optional currency code.

The undefined_quantity parameter allows the user to enter the quantity, and PayPal will calculate the total based on the price you specified and the quantity entered by the user. Alternatively, you can instead say quantity=5 to fix the quantity so that the user cannot edit it.

Although that should be all you need for a simple Buy-Now button, Table 1 lists some additional arguments you can include.

Argument Description
business Email address associated with seller’s PayPal account
quantity Quantity of items being sold
undefined_quantity Allows user to edit quantity
item_name Name of item
item_number Optional item number
amount Price of each item (without currency symbol)
undefined_amount Allows user to edit the amount (good for donations)
shipping Price of shipping
currency_code Code for type of currency (Default appears to be USD)
first_name Customer’s first name
last_name Customer’s last name
address1 Customer’s first address line
address2 Customer’s second address line
city Customer’s city
state Customer’s state
zip Customer’s zip code
email Customer’s email address
night_phone_a Customers telephone area code
night_phone_b Customers telephone prefix
night_phone_c Remainder of customer’s telephone number

Table 1: Additional Query Arguments

The arguments listed in Table 1 are not exhaustive. Other arguments are available as well. For the simple task I’m describing, this list should be more than enough.

Of course, you also have the option of programmatically forming this link and then using code to redirect to it. This allows you, for example, to set the quantity based on a value entered by the user on your own site.

Note that there are some potential downsides to this technique. For starters, the link is fully visible for anyone to see. Of course, it won’t include your PayPal password so that type of information is not exposed. But your account email is visible.

Users can also save your web page to their computer, and then edit the link. So, for example, they could change the price, load the edited page, and click the link. So you need to verify the correct amount was paid when processing orders.

Nonetheless, for a simply Buy-Now button, this technique works great and couldn’t be simpler to implement.

Parsing HTML Tags in C#

Sunday, February 07, 2010 10:46 AM by jonwood

NOTE: This article has been updated and moved to: Parsing HTML Tags in C#.

The .NET framework provides a plethora of tools for generating HTML markup, and for both generating and parsing XML markup. However, it provides very little in the way of support for parsing HTML markup.

I had some pretty old code (written in classic Visual Basic) for spidering websites and I had ported it over to C#. Spidering generally involves parsing out all the links on a particular web page and then following those links and doing the same for those pages. Spidering is how companies like Google scour the Internet.

My ported code worked pretty well, but it wasn’t very forgiving. For example, I had a website that allowed users to enter a URL of a page that had a link to our site in return for a free promotion. The code would scan the given URL for a backlink. However, sometimes it would report there was no backlink when there really was.

The error was caused when the user’s web page contained syntax errors. For example, an attribute value that had no closing quote. My code would skip ahead past large amounts of markup, looking for that quote.

So I rewrote the code to be more flexible—as most browsers are. In the case of attribute values missing closing quotes, my code assumes the value has terminated whenever it encounters a line break. I made other changes as well, primarily designed to make the code simpler and more robust.

Listing 1 is the HtmlParser class I came up with. Note that there are many ways you can parse HTML. My code is only interested in tags and their attributes and does not look at text that comes between tags. This is perfect for spidering links in a page.

The ParseNext() method is called to find the next occurrence of a tag and returns an HtmlTag object that describes the tag. The caller indicates the type of tag it wants information about (or “*” if it wants information about all tags).

Parsing HTML markup is fairly simple. As I mentioned, much of my time spent was spent making the code handle markup errors intelligently. There were a few other special cases as well. For example, if the code finds a <script> tag, it automatically scans to the closing </script> tag, if any. This is because some scripting can include HTML markup characters that can confuse the parser so I just jump over them. I take similar action with HTML comments and have special handling for !DOCTYPE tags as well.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace HtmlParser
{
  public class HtmlTag
  {
    /// <summary>
    /// Name of this tag
    /// </summary>
    public string Name { get; set; }
    /// <summary>
    /// Collection of attribute names and values for this tag
    /// </summary>
    public Dictionary<string, string> Attributes { get; set; }
    /// <summary>
    /// True if this tag contained a trailing forward slash
    /// </summary>
    public bool TrailingSlash { get; set; }
  };
  public class HtmlParser
  {
    protected string _html;
    protected int _pos;
    protected bool _scriptBegin;
    public HtmlParser(string html)
    {
      Reset(html);
    }
    /// <summary>
    /// Resets the current position to the start of the current document
    /// </summary>
    public void Reset()
    {
      _pos = 0;
    }
    /// <summary>
    /// Sets the current document and resets the current position to the
    /// start of it
    /// </summary>
    /// <param name="html"></param>
    public void Reset(string html)
    {
      _html = html;
      _pos = 0;
    }
    /// <summary>
    /// Indicates if the current position is at the end of the current
    /// document
    /// </summary>
    public bool EOF
    {
      get { return (_pos >= _html.Length); }
    }
    /// <summary>
    /// Parses the next tag that matches the specified tag name
    /// </summary>
    /// <param name="name">Name of the tags to parse ("*" = parse all
    /// tags)</param>
    /// <param name="tag">Returns information on the next occurrence
    /// of the specified tag or null if none found</param>
    /// <returns>True if a tag was parsed or false if the end of the
    /// document was reached</returns>
    public bool ParseNext(string name, out HtmlTag tag)
    {
      tag = null;
      // Nothing to do if no tag specified
      if (String.IsNullOrEmpty(name))
        return false;
      // Loop until match is found or there are no more tags
      while (MoveToNextTag())
      {
        // Skip opening '<'
        Move();
        // Examine first tag character
        char c = Peek();
        if (c == '!' && Peek(1) == '-' && Peek(2) == '-')
        {
          // Skip over comments
          const string endComment = "-->";
          _pos = _html.IndexOf(endComment, _pos);
          NormalizePosition();
          Move(endComment.Length);
        }
        else if (c == '/')
        {
          // Skip over closing tags
          _pos = _html.IndexOf('>', _pos);
          NormalizePosition();
          Move();
        }
        else
        {
          // Parse tag
          bool result = ParseTag(name, ref tag);
          // Because scripts may contain tag characters,
          // we need special handling to skip over
          // script contents
          if (_scriptBegin)
          {
            const string endScript = "</script";
            _pos = _html.IndexOf(endScript, _pos,
              StringComparison.OrdinalIgnoreCase);
            NormalizePosition();
            Move(endScript.Length);
            SkipWhitespace();
            if (Peek() == '>')
              Move();
          }
          // Return true if requested tag was found
          if (result)
            return true;
        }
      }
      return false;
    }
    /// <summary>
    /// Parses the contents of an HTML tag. The current position should
    /// be at the first character following the tag's opening less-than
    /// character.
    /// 
    /// Note: We parse to the end of the tag even if this tag was not
    /// requested by the caller. This ensures subsequent parsing takes
    /// place after this tag
    /// </summary>
    /// <param name="name">Name of the tag the caller is requesting,
    /// or "*" if caller is requesting all tags</param>
    /// <param name="tag">Returns information on this tag if it's one
    /// the caller is requesting</param>
    /// <returns>True if data is being returned for a tag requested by
    /// the caller or false otherwise</returns>
    protected bool ParseTag(string name, ref HtmlTag tag)
    {
      // Get name of this tag
      string s = ParseTagName();
      // Special handling
      bool doctype = _scriptBegin = false;
      if (String.Compare(s, "!DOCTYPE", true) == 0)
        doctype = true;
      else if (String.Compare(s, "script", true) == 0)
        _scriptBegin = true;
      // Is this a tag requested by caller?
      bool requested = false;
      if (name == "*" || String.Compare(s, name, true) == 0)
      {
        // Yes, create new tag object
        tag = new HtmlTag();
        tag.Name = s;
        tag.Attributes = new Dictionary<string, string>();
        requested = true;
      }
      // Parse attributes
      SkipWhitespace();
      while (Peek() != '>')
      {
        if (Peek() == '/')
        {
          // Handle trailing forward slash
          if (requested)
            tag.TrailingSlash = true;
          Move();
          SkipWhitespace();
          // If this is a script tag, it was closed
          _scriptBegin = false;
        }
        else
        {
          // Parse attribute name
          s = (!doctype) ? ParseAttributeName() : ParseAttributeValue();
          SkipWhitespace();
          // Parse attribute value
          string value = String.Empty;
          if (Peek() == '=')
          {
            Move();
            SkipWhitespace();
            value = ParseAttributeValue();
            SkipWhitespace();
          }
          // Add attribute to collection if requested tag
          if (requested)
          {
            // This tag replaces existing tags with same name
            if (tag.Attributes.Keys.Contains(s))
              tag.Attributes.Remove(s);
            tag.Attributes.Add(s, value);
          }
        }
      }
      // Skip over closing '>'
      Move();
      return requested;
    }
    /// <summary>
    /// Parses a tag name. The current position should be the first
    /// character of the name
    /// </summary>
    /// <returns>Returns the parsed name string</returns>
    protected string ParseTagName()
    {
      int start = _pos;
      while (!EOF && !Char.IsWhiteSpace(Peek()) && Peek() != '>')
        Move();
      return _html.Substring(start, _pos - start);
    }
    /// <summary>
    /// Parses an attribute name. The current position should be the
    /// first character of the name
    /// </summary>
    /// <returns>Returns the parsed name string</returns>
    protected string ParseAttributeName()
    {
      int start = _pos;
      while (!EOF && !Char.IsWhiteSpace(Peek()) && Peek() != '>'
        && Peek() != '=')
        Move();
      return _html.Substring(start, _pos - start);
    }
    /// <summary>
    /// Parses an attribute value. The current position should be the
    /// first non-whitespace character following the equal sign.
    /// 
    /// Note: We terminate the name or value if we encounter a new line.
    /// This seems to be the best way of handling errors such as values
    /// missing closing quotes, etc.
    /// </summary>
    /// <returns>Returns the parsed value string</returns>
    protected string ParseAttributeValue()
    {
      int start, end;
      char c = Peek();
      if (c == '"' || c == '\'')

      {
        // Move past opening quote
        Move();
        // Parse quoted value
        start = _pos;
        _pos = _html.IndexOfAny(new char[] { c, '\r', '\n' }, start);
        NormalizePosition();
        end = _pos;
        // Move past closing quote
        if (Peek() == c)
          Move();
      }
      else
      {
        // Parse unquoted value
        start = _pos;
        while (!EOF && !Char.IsWhiteSpace(c) && c != '>')
        {
          Move();
          c = Peek();
        }
        end = _pos;
      }
      return _html.Substring(start, end - start);
    }
    /// <summary>
    /// Moves to the start of the next tag
    /// </summary>
    /// <returns>True if another tag was found, false otherwise</returns>
    protected bool MoveToNextTag()
    {
      _pos = _html.IndexOf('<', _pos);
      NormalizePosition();
      return !EOF;
    }
    /// <summary>
    /// Returns the character at the current position, or a null
    /// character if we're at the end of the document
    /// </summary>
    /// <returns>The character at the current position</returns>
    public char Peek()
    {
      return Peek(0);
    }
    /// <summary>
    /// Returns the character at the specified number of characters
    /// beyond the current position, or a null character if the
    /// specified position is at the end of the document
    /// </summary>
    /// <param name="ahead">The number of characters beyond the
    /// current position</param>
    /// <returns>The character at the specified position</returns>
    public char Peek(int ahead)
    {
      int pos = (_pos + ahead);
      if (pos < _html.Length)
        return _html[pos];
      return (char)0;
    }
    /// <summary>
    /// Moves the current position ahead one character
    /// </summary>
    protected void Move()
    {
      Move(1);
    }
    /// <summary>
    /// Moves the current position ahead the specified number of characters
    /// </summary>
    /// <param name="ahead">The number of characters to move ahead</param>
    protected void Move(int ahead)
    {
      _pos = Math.Min(_pos + ahead, _html.Length);
    }
    /// <summary>
    /// Moves the current position to the next character that is
    // not whitespace
    /// </summary>
    protected void SkipWhitespace()
    {
      while (!EOF && Char.IsWhiteSpace(Peek()))
        Move();
    }
    /// <summary>
    /// Normalizes the current position. This is primarily for handling
    /// conditions where IndexOf(), etc. return negative values when
    /// the item being sought was not found
    /// </summary>
    protected void NormalizePosition()
    {
      if (_pos < 0)
        _pos = _html.Length;
    }
  }
}

Listing 1: The HtmlParse class.

Using the class is very easy. Listing 2 shows sample code that scans a web page for all the HREF values in A (anchor) tags. It downloads a URL and loads the contents into an instance of the HtmlParser class. It then calls ParseNext() with a request to return information about all A tags.

When ParseNext() returns, tag is set to an instance of the HtmlTag class with information about the tag that was found. This class includes a collection of attribute values, which my code uses to locate the value of the HREF attribute.

When ParseNext() returns false, the end of the document has been reached.

  protected void ScanLinks(string url)
  {
    // Download page
    WebClient client = new WebClient();
    string html = client.DownloadString(url);
    // Scan links on this page
    HtmlTag tag;
    HtmlParser parse = new HtmlParser(html);
    while (parse.ParseNext("a", out tag))
    {
      // See if this anchor links to us
      string value;
      if (tag.Attributes.TryGetValue("href", out value))
      {
        // value contains URL referenced by this link
      }
    }
  }

Listing 2: Code that demonstrates using the HtmlParser class

While I’ll probably find a few tweaks and fixes required to this code, it seems to work well. I found similar code on the web but didn’t like it. My code is fairly simple, does not rely on large library routines, and seems to perform well. I hope you are able to benefit from it.

Tags:   ,
Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed