Developer Blog
Articles about Using Microsoft Developer Tools

Reading and Writing CSV Files in MFC

Thursday, January 21, 2010 6:39 PM by jonwood

NOTE: This article has been updated and moved to: Reading and Writing CSV Files in MFC.

I recently had the need to import and export CSV files in an MFC application. A CSV (Comma-Delimited Values) file is a plain-text file where each row contains one or more fields, separated by commas.

CSV files are probably best known for their use by Microsoft Excel. CSV files provide a convenient format for sharing spreadsheet data between applications, particularly when you consider having your application work directly with native Excel files would be a very complex task.

The CSV file format is not complex. Mostly, it just requires some simple parsing. One trick is when a data field contains a comma. Since commas are used to delimit fields, this would cause problems and so such fields are enclosed in double quotes. And since double quotes have special meaning, we have problems if a data field contains a double quote and so such fields are also enclosed in double quotes, and pairs of double quotes are interpreted to mean a single double quote in the data.

My header file is shown in Listing 1 and my source file is shown in listing 2.

#pragma once
#include "afx.h"
class CCSVFile : public CStdioFile
{
public:
  enum Mode { modeRead, modeWrite };
  CCSVFile(LPCTSTR lpszFilename, Mode mode = modeRead);
  ~CCSVFile(void);
  bool ReadData(CStringArray &arr);
  void WriteData(CStringArray &arr);
#ifdef _DEBUG
  Mode m_nMode;
#endif
};

Listing 1: CSVFile.h

The class is very simple: it only contains two methods (besides the constructor and destructor).

Note that it is up to the caller to ensure that only the ReadData() method is called when the constructor specified read mode, and only the WriteData() method is called when the constructor specified write mode. To help enforce this, the code asserts if it is not the case when _DEBUG is defined.

#include "StdAfx.h"
#include "CSVFile.h"
CCSVFile::CCSVFile(LPCTSTR lpszFilename, Mode mode)
  : CStdioFile(lpszFilename, (mode == modeRead) ?
    CFile::modeRead|CFile::shareDenyWrite|CFile::typeText   
    :
    CFile::modeWrite|CFile::shareDenyWrite|CFile::modeCreate|CFile::typeText)
{
#ifdef _DEBUG
  m_nMode = mode;
#endif
}
CCSVFile::~CCSVFile(void)
{
}
bool CCSVFile::ReadData(CStringArray &arr)
{
  // Verify correct mode in debug build
  ASSERT(m_nMode == modeRead);
  // Read next line
  CString sLine;
  if (!ReadString(sLine))
    return false;
  LPCTSTR p = sLine;
  int nValue = 0;
  // Parse values in this line
  while (*p != '\0')
  {
    CString s;  // String to hold this value
    if (*p == '"')
    {
      // Bump past opening quote
      p++;
      // Parse quoted value
      while (*p != '\0')
      {
        // Test for quote character
        if (*p == '"')
        {
          // Found one quote
          p++;
          // If pair of quotes, keep one
          // Else interpret as end of value
          if (*p != '"')
          {
            p++;
            break;
          }
        }
        // Add this character to value
        s.AppendChar(*p++);
      }
    }
    else
    {
      // Parse unquoted value
      while (*p != '\0' && *p != ',')
      {
        s.AppendChar(*p++);
      }
      // Advance to next character (if not already end of string)
      if (*p != '\0')
        p++;
    }
    // Add this string to value array
    if (nValue < arr.GetCount())
      arr[nValue] = s;
    else
      arr.Add(s);
    nValue++;
  }
  // Trim off any unused array values
  if (arr.GetCount() > nValue)
    arr.RemoveAt(nValue, arr.GetCount() - nValue);
  // We return true if ReadString() succeeded--even if no values
  return true;
}
void CCSVFile::WriteData(CStringArray &arr)
{
  static TCHAR chQuote = '"';
  static TCHAR chComma = ',';
  // Verify correct mode in debug build
  ASSERT(m_nMode == modeWrite);
  // Loop through each string in array
  for (int i = 0; i < arr.GetCount(); i++)
  {
    // Separate this value from previous
    if (i > 0)
      WriteString(_T(","));
    // We need special handling if string contains
    // comma or double quote
    bool bComma = (arr[i].Find(chComma) != -1);
    bool bQuote = (arr[i].Find(chQuote) != -1);
    if (bComma || bQuote)
    {
      Write(&chQuote, sizeof(TCHAR));
      if (bQuote)
      {
        for (int j = 0; j < arr[i].GetLength(); i++)
        {
          // Pairs of quotes interpreted as single quote
          if (arr[i][j] == chQuote)
            Write(&chQuote, sizeof(TCHAR));
          TCHAR ch = arr[i][j];
          Write(&ch, sizeof(TCHAR));
        }
      }
      else
      {
        WriteString(arr[i]);
      }
      Write(&chQuote, sizeof(TCHAR));
    }
    else
    {
      WriteString(arr[i]);
    }
  }
  WriteString(_T("\n"));
}

Listing 2: CSVFile.cpp

There are many ways to go about parsing text. I was more comfortable manually stepping through the text, character-by-character and so that’s what the code does.

As implied by the names, ReadData() is used to read to a CSV file while WriteData() is used to write to one. Each line of data is stored in a CStringArray[], which is passed by reference to both functions. The caller must call these methods once for each line. When calling ReadData(), false is returned when the end of the file is reached.

As mentioned, the code is fairly simple but I’ve actually found I’ve used this code on several occasions and even ported it to C#. Perhaps you will find it useful as well.

Tags:  
Categories:   C++/MFC
Actions:   E-mail | del.icio.us | Permalink | Comments (2) | Comment RSSRSS comment feed

Create an RSS Feed in ASP.NET

Monday, January 18, 2010 4:52 PM by jonwood

NOTE: This article has been updated and moved to: Create an RSS Feed in ASP.NET.

Recently, it occurred to me that one of my websites would probably benefit from an RSS feed. However, I really didn’t understand what RSS feeds were. I understood the basic purpose but really had no clue as to how they worked. With words like “syndication” being tossed around when describing RSS feeds, I had imagined it involved some sort code that continually sent data to some mystical location.

Fortunately, understanding RSS feeds is very easy, and creating your own RSS feed in ASP.NET is a breeze. RSS stands for Really Simple Syndication. It provides a standard for you to make information available to anyone who wants to request your feed. One of my sites is a shareware site and I thought a feed would allow users to stay in contact with my site and make it more likely that they would return. Moreover, an RSS feed allows them to do this without signing up or even giving me their email address.

These days, it’s getting easier for users to use feeds because more and more software is starting to support them. For example, when you enter the URL of a feed into Microsoft Internet Explorer, the information is now formatted specifically for feeds. Microsoft Live Mail also has direct support for feeds. There are also a number of websites that can help you to subscribe to and view RSS feeds.

An RSS feed is simply an XML file on your site that conforms to the RSS specification. Of course, since feeds are meant to be constantly updated, you would normally want to generate this file on-the-fly when it is requested. And, of course, ASP.NET makes this very easy to do.

Listing 1 shows my feed file. This is a normal, every day ASPX file and what you see makes up the entire contents of the file. The first thing to notice is the OutputCache declaration on the second line. When you use OutputCache, requests for this file within the given duration will simply return a copy of the previous results. The duration is in seconds, so if two requests for this file occur within two minutes, the code will not run again for the second request. Instead, ASP.NET will simply return the same data that was returned for the first request. Since the page runs potentially lengthy code and makes a potentially substantial hit on the database, this ensures the site doesn’t get bogged down under heavy traffic.

<%@ Page Language="C#" AutoEventWireup="true" %>
<%@ OutputCache Duration="120" VaryByParam="none" %>
<%@ Import Namespace="System.Xml" %>
<%@ Import Namespace="System.Data" %>
<%@ Import Namespace="System.Data.SqlClient" %>
<%@ Import Namespace="SoftCircuits" %>
<script runat="server">
  
  /// <summary>
  /// Create RSS Feed of newest submissions
  /// </summary>
  /// <param name="sender"></param>
  /// <param name="e"></param>
  protected void Page_Load(object sender, EventArgs e)
  {
    // Clear any previous response
    Response.Clear();
    Response.ContentType = "text/xml";
    //
    XmlTextWriter writer = new XmlTextWriter(Response.OutputStream,
      Encoding.UTF8);
    writer.WriteStartDocument();
    // The mandatory rss tag
    writer.WriteStartElement("rss");
    writer.WriteAttributeString("version", "2.0");
    // The channel tag contains RSS feed details
    writer.WriteStartElement("channel");
    writer.WriteElementString("title", "File Parade's Newest Submissions");
    writer.WriteElementString("link", "http://www.fileparade.com");
    writer.WriteElementString("description",
      "The latest freeware and shareware downloads from File Parade.");
    writer.WriteElementString("copyright",
      String.Format("Copyright {0} SC Web Group. All rights reserved.", DateTime.Today.Year));
    // File Parade image    
    writer.WriteStartElement("image");
    writer.WriteElementString("url",
      "http://www.fileparade.com/Images/logo88x31.png");
    writer.WriteElementString("title",
      "File Parade Freeware and Trialware Downloads");
    writer.WriteElementString("link",
      "http://www.fileparade.com");
    writer.WriteEndElement();
    // Objects needed for connecting to the SQL database
    using (SqlDataReader reader = DataHelper.ExecProcDataReader("GetRssFeed"))
    {
      // Loop through each item and add them to the RSS feed
      while (reader.Read())
      {
        writer.WriteStartElement("item");
        writer.WriteElementString("title",
          EncodeString(String.Format("{0} {1} by {2}",
          reader["Title"], reader["Version"],
          reader["Company"])));
        writer.WriteElementString("description",
          EncodeString((string)reader["Description"]));
        writer.WriteElementString("link",
          String.Format("http://www.fileparade.com/Listing.aspx?id={0}",
          reader["ID"]));
        writer.WriteElementString("pubDate",
          ((DateTime)reader["ReleaseDate"]).ToShortDateString());
        writer.WriteEndElement();
      }
    }
    // Close all tags
    writer.WriteEndElement();
    writer.WriteEndElement();
    writer.WriteEndDocument();
    writer.Flush();
    writer.Close();
    // Terminate response
    Response.End();
  }
  protected string EncodeString(string s)
  {
    s = HttpUtility.HtmlEncode(s);
    return s.Replace("\r\n", "<br />\r\n");
  }
</script>

Listing 1: RSS Feed

Next are my declarations to import the needed namespaces. Nothing special here—just the declarations needed for database access. Note that this code won’t run for you as listed. It includes my SoftCircuits namespace, which contains some in-house routines for the database. You’ll need to replace this with your own database code. This makes sense since you’ll be returning your own data.

The core of the code is placed in the Page_Load event handler. As you know, this code is called when the page is first requested. The first step is to clear the response of any previously output content. Remember, we are creating an XML file and we don’t want any other content to be returned. Next, we set some headers so that the user agent can see what type of content we are returning.

From here, we go ahead and create an XmlTextWriter and attach it to our output stream, and we can start creating our output. We start with some mandatory RSS tags—these are need to identify our content as an RSS file. Next, we add some mandatory tags that describe our channel. This provides additional, descriptive information about our content. Next, I add some optional tags, which specify a small image and related data.

After that, we can finally start to output our actual data. My code uses an in-house method called DataHelper.ExecProcReader, which calls a stored procedure to obtain my data. You will need to replace this with your own code to return whatever data you are syndicating. My routine simply returns a SqlDataReader and I loop through each row in the data it returned.

Note that I perform some modifications to my text fields before writing them. In my case, this text is submitted from various authors and I don’t want them to include their own HTML markup. So I call HtmlEncode, which causes markup to appear as it was written instead of allowing it to modify the layout, formatting, or creating links. I then insert my own markup by placing <br /> wherever there is a newline. This ensures newlines will appear for the user. I should point out that WriteElementString() will HTML-encode the string being written. This prevents markup from disturbing the XML markup. Note that data will be HTML-decoded when it is read. So you only need to mess with this if you want to tweak the data you are returning.

We then flush the XML writer for good measure, and terminate our response. Again, we are creating an XML file and this last step prevents any other output from accidently being included in the response.

If you’re like me, you may be a little surprised how easy this really is. To allow someone to check your feed, you simply provide them with the URL to this page. Using software that supports feeds, they can have instant access to your data in a convenient format. And, of course, are more likely to return to your site when they need more information.

Tags:  
Categories:   ASP.NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

Implementing Word Wrap in C#

Sunday, January 10, 2010 1:44 PM by jonwood

NOTE: This article has been updated and moved to: Implementing Word Wrap in Email Messages.

The .NET platform makes it easy to send emails from your code. However, it was bothering me the other day that my emails had no word wrap.

In most cases, modern email readers will word wrap when email lines are too long. But there are still some email readers around that won’t. The industry standard is to wrap email lines, limiting their length to about 65-75 characters. So I decided it was worth implementing word wrap in my code.

As rich as it is, the .NET platform does not appear to have any routines for implementing word wrap. I found some sample code online but, while the code was fairly simple (which is good), I didn’t think it was very efficient.

The .NET platform provides many routines for parsing text and extracting substrings, etc. but these generally involve allocating and moving lots of memory. So my approach was to write simple C# code that would word wrap the code without unnecessarily allocating additional objects.

Of course, I will need a new string in order to save my results. And since I’ll be building that string line-by-line, I used the StringBuilder class for this. The StringBuilder class allows you to more efficiently build a string without allocating new strings each time you make a change. Listing 1 is the code I came up with.

protected const string _newline = "\r\n";
/// <summary>
/// Word wraps the given text to fit within the specified width.
/// </summary>
/// <param name="text">Text to be word wrapped</param>
/// <param name="width">Width, in characters, to which the text
/// should be word wrapped</param>
/// <returns>The modified text</returns>
public static string WordWrap(string text, int width)
{
  int pos, next;
  StringBuilder sb = new StringBuilder();
  // Lucidity check
  if (width < 1)
    return text;
  // Parse each line of text
  for (pos = 0; pos < text.Length; pos = next)
  {
    // Find end of line
    int eol = text.IndexOf(_newline, pos);
    if (eol == -1)
      next = eol = text.Length;
    else
      next = eol + _newline.Length;
    // Copy this line of text, breaking into smaller lines as needed
    if (eol > pos)
    {
      do
      {
        int len = eol - pos;
        if (len > width)
          len = BreakLine(text, pos, width);
        sb.Append(text, pos, len);
        sb.Append(_newline);
        // Trim whitespace following break
        pos += len;
        while (pos < eol && Char.IsWhiteSpace(text[pos]))
          pos++;
      } while (eol > pos);
    }
    else sb.Append(_newline); // Empty line
  }
  return sb.ToString();
}
/// <summary>
/// Locates position to break the given line so as to avoid
/// breaking words.
/// </summary>
/// <param name="text">String that contains line of text</param>
/// <param name="pos">Index where line of text starts</param>
/// <param name="max">Maximum line length</param>
/// <returns>The modified line length</returns>
public static int BreakLine(string text, int pos, int max)
{
  // Find last whitespace in line
  int i = max - 1;
  while (i >= 0 && !Char.IsWhiteSpace(text[pos + i]))
    i--;
  if (i < 0)
    return max; // No whitespace found; break at maximum length
  // Find start of whitespace
  while (i >= 0 && Char.IsWhiteSpace(text[pos + i]))
    i--;
  // Return length of text before whitespace
  return i + 1;
}

Listing 1: Word Wrap Code

The code starts by extracting each line from the original text. It does this by locating the hard-coded line breaks. Note that my code searches for carriage return, line feed pairs (“\r\n”). Some platforms may only use “\n” or other variations for new lines, but the carriage return, line feed pair works in most cases on Windows systems. You can change the _newline constant if you want the code to look for something else.

The code then copies each line to the result string. If a line is too long to fit within the specified width, then it is further broken into smaller lines. Each time through the loop, if the line needs to be broken, the BreakLine method is called to locate the last white space that fits within the maximum line length. This is done to try and break the line between words instead of in the middle of them.

While the string object provides the LastIndexOf() method, which could be used to locate the last space character, I manually coded the loop myself so that I could use Char.IsWhiteSpace() to support all whitespace characters defined on the current system. If no whitespace is found, the line is simply broken at the maximum line length.

As each line is broken, that the code removes any spaces at the break. This avoids trailing spaces on the current line or leading spaces on the next line. Although there is normally only one space between each word, the code tries to correctly handle cases where there might be more.

As each new line is created, a carriage return, line feed pair is also added to separate each line. Note the special case for handling when the line is empty, in which case we just write the carriage return, line feed pair.

There’s nothing complex about this code, but I took a little extra time to make it efficient. Note that the word wrap is based on the number of characters and not the display width. If you were, for example, word wrapping text output to the screen or printer, the code should probably test different line lengths measured on a device context in order to determine the display length.

Tags:  
Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed