Developer Blog
Articles about Using Microsoft Developer Tools

Shallow and Deep Object Copying

Sunday, April 12, 2009 1:23 PM by jonwood

In .NET, class objects are reference types. Assigning one object variable to another object variable does not copy that object, it simply causes both object variables to reference the same object.

Sometimes, a copy is required. For example, maybe two routines need to start with the same data but then change that data independently from each other. Copying the data ensures that changes made by one routine will not impact the data being used by the other routine.

When using .NET, two types of copies are possible: shallow and deep. In the case of a shallow copy, a new object is created and each member from the original object is assigned to the corresponding member of the new object. In the case of value members, this is a copy in the truest sense. However, with objects that contain reference members, this does not produce a true copy.

One example of a reference type is a string. When you assign one string variable to another, both variables will reference the same string data. The characters of the strings are not truly copied. So if a class contains reference members, a shallow copy does not create a true copy of all class members.

For many cases, a shallow copy is sufficient. Note that strings are immutable and cannot be changed. When you create a shallow copy of an object that contains strings, and then modify a string in the new object, that would create a new string and would not have any impact on the original string in the original object. Note that other data types such as arrays, class objects, and arrays of class objects can be quite a bit more complicated than strings.

A deep copy is when a copy is created that contains none of the original data. A true copy of each member is created. A deep copy doesn’t need to do anything special with members that are value types. But for reference data types, the new object must reference copies of that data instead of the original data.

There is nothing unique about how either method of copying an object are performed. Consider listing 1. This code declares a class called MyClass, and then shows a short method called Test that performs both a shallow and a deep copy using that class object.

protected class MyClass
{
   public int i;
   public int j;
   public string message;
}
private void Test()
{
   MyClass mc1;
   MyClass mc2;
   mc1 = new MyClass();
   mc1.i = 5;
   mc1.j = 10;
   mc1.message = "Hello, World!";
   // Shallow copy
   mc2 = new MyClass();
   mc2.i = mc1.i;
   mc2.j = mc1.j;
   mc2.message = mc1.message;
   // Deep copy
   mc2 = new MyClass();
   mc2.i = mc1.i;
   mc2.j = mc1.j;
   mc2.message = String.Copy(mc1.message);
}

Listing 1: Shallow and deep copying of an object.

The shallow copy does nothing special. It simply assigns each member from one object to the other. For value members, the deep copy uses the same code. However, for the one reference member, message, the code must create a copy of the string data. (Note that addition steps would be required to perform a deep copy with objects that include reference members with references to additional objects, such as class members, arrays, etc.)

Now that I’ve hopefully explained the difference between a shallow and a deep copy, let’s take a look at some of the tools the .NET frameworks provide to perform these tasks.

protected class MyClass : ICloneable
{
   public int i;
   public int j;
   public string message;
   public object Clone()
   {
      return MemberwiseClone();
   }
}
private void Test()
{
   MyClass mc1 = new MyClass();
   mc1.i = 5;
   mc1.j = 10;
   mc1.message = "Hello, World!";
   // Shallow copy
   MyClass mc2 = (MyClass)mc1.Clone();
}

Listing 2: Using MemberwiseClone() to perform a shallow copy.

Listing 2 uses MemberwiseClone() to perform a shallow copy. MemberwiseClone() is protected and so cannot be called directly from Test. Instead, I’ve modified MyClass to implement the ICloneable interface and implemented the one ICloneable method, Clone. (Normally, ICloneable is associated with a deep copy but I use it here to implement a shallow copy.) The Test method calls this new method to perform the shallow copy. Since Clone() returns type object, a type cast is required.

To perform a deep copy, Listing 3 also implements the ICloneable interface. This listing just modifies the code in the Clone() method to perform a deep copy.

protected class MyClass : ICloneable
{
   public int i;
   public int j;
   public string message;
   public object Clone()
   {
      MyClass mc = new MyClass();
      mc.i = i;
      mc.j = j;
      if (message != null)
         mc.message = String.Copy(message);
      return mc;
   }
}
private void Test()
{
   MyClass mc1 = new MyClass();
   mc1.i = 5;
   mc1.j = 10;
   mc1.message = "Hello, World!";
   // Deep copy
   MyClass mc2 = (MyClass)mc1.Clone();
}

Listing 3: Using ICloneable to perform a deep copy.

The actual code in the Clone() method should be familiar by now. The main advantage to implementing it this way is that it is implemented as part of the class, where it can easily be modified and called from any where in your application.

Nothing too complex here, although the concept behind a shallow and deep copy can be confusing to some. Hopefully, I’ve shown some light on this topic and demonstrated how you might approach the issue using .NET.

Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

BackgroundWorker.ReportProgress is Asynchronous

Sunday, April 12, 2009 12:12 PM by jonwood

I never noticed this before but the BackgroundWorker.ReportProgress method returns before the control’s ProgressChanged event has completed. It may return before the ProgressChanged event has even started!

For those not familiar with the BackgroundWorker control, this control simplifies creating a worker thread, especially for the purpose of keeping the user interface responsive while the worker thread performs a lengthy process.

One issue it simplifies relates to the fact that the worker thread cannot directly access the form or its controls because those objects were created by the UI thread. Instead, code running in the worker thread can call the control’s ReportProgress method, which raises the control’s ProgressChanged event. You can pass information to ReportProgress that describes the current state of the lengthy process, and the handler for the ProgressChanged can use that data to display it to the user in your form controls.

I had been using this approach for a lengthy operation that could run for days. A lot was going on so I was passing an instance of a custom class that contained various bits of progress information. But, at one point, I saw that the progress information being displayed was not correct. On further inspection, I could see that my worker thread was updating the progress information object before the ProgressChanged event handler had a chance to display that information.

It is very easy to get caught up with multi-threading issues as some things are just not very intuitive. When I called the ReportProgress method, I had just assumed that it would not return until the ProgressChanged event had completed. But I was wrong.

Thinking about it, the way this control works makes sense. If, instead, the worker thread was blocked until the event had returned, some of the benefits of a worker thread would be lost as one thread would be shut down during that time. Also, note that the ProgressChanged method is overloaded. One version simply takes an integer argument. Since integers are passed by value, there would be no reason to suspend the worker thread when using this version of the ProgressChanged method.

The other version takes an object in addition to the integer argument. That’s the version I was using. Since class objects are passed by reference, changes to this data made in the worker thread would be reflected in the same object being used in the ProgressChanged event.

At first thought, I wondered if maybe I should resolve this by blocking the worker thread somehow until the event had run to completion. But, as I’ve already pointed out, this eliminates some of the advantage of having a worker thread in the first place. A much simpler solution is to simply make a copy of my progress class object. This way, the worker thread can modify its copy as needed while the ProgressChanged event is reading its copy, perhaps both at the same time.

Note that I only required a “shallow” copy. In the case of value members, a shallow copy will create a true copy of those members. In the case of reference members, the copy is actually a reference to the same object. The only reference members in my case were strings. Since strings are immutable and cannot be changed, if my code updated one of these members, that would create a new string and not affect the original one referenced in the object passed to ReportProgress.

protected class ProgressInfo
{
   public int current;
   public int total;
   public string message;
   public object Clone()
   {
      return (ImportStatus)MemberwiseClone();
   }
}
private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
   ProgressInfo info = new progressInfo();
   info.total = 1000;
   for (info.current = 0; info.current < info.total; info.current++)
   {
      info.message = String.Format("Processing item {0}",
         info.current + 1);
      backgroundWorker1.ReportProgress(0, info.Clone());
      //
      // Further processing on this item
      //
   }
}

Listing 1: Passing copy of object to BackgroundWorker.ReportProgress

Listing 1 shows some sample code. The ProgressInfo class declares a Clone() method, which calls MemberwiseClone(). MemberwiseClone() performs a shallow copy of the object. Note that this method is protected and, therefore, can only be called from a method of the class (or a derived class). This is why it was necessary to create the additional, public, “wrapper” method in my class, which my worker thread can call.

Using this code, my ProgressChanged event handler can take its time displaying the progress data and will not be affected by my background worker thread making changes to its copy of that data at the same time.

Categories:   C# .NET
Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed

What are Kilobytes, Megabytes, Gigabytes, Etc?

Friday, April 03, 2009 5:16 AM by jonwood

Okay, we are all familiar with kilobytes, megabytes, gigabytes and the like. Most of us know that a kilobyte is approximately a thousand bytes. And that is accurate enough for most purposes. But what if you need to need to deal with these numbers a bit more accurately?

A kilobyte is exactly 1,024 bytes. So why not 1,000 bytes? Computers use binary to generate numbers. So memory addresses are naturally a power of two. 1,000 is not a power-of-two number, but 1,024 is.

Just as a million is a thousand times a thousand, a megabyte is a kilobyte times a kilobyte, or 1,048,576 bytes. Again, close enough to a million for many purposes but not exactly a million.

Here are the values for some power-of-two numbers.

Number Abbreviation Value
1 Kilobyte KB 1,024 Bytes
1 Megabyte MB 1,048,576 Bytes
1 Gigabyte GB 1,073,741,824 Bytes
1 Terabyte TB 1,099,511,627,776 Bytes
1 Petabyte PB 1,125,899,906,842,624 Bytes
1 Exabyte EB 1,152,921,504,606,846,976 Bytes

    I was working on an MFC application recently and needed to define some large numbers that I felt would be more efficient if the numbers were a power of two. I ended up defining the following C++ macros.

    #define KB(n) (((UINT64)0x400)*((UINT64)(n))) 
    
    #define MB(n) (((UINT64)0x100000)*((UINT64)(n))) 
    
    #define GB(n) (((UINT64)0x40000000)*((UINT64)(n))) 
    
    #define TB(n) (((UINT64)0x10000000000)*((UINT64)(n))) 
    
    #define PB(n) (((UINT64)0x4000000000000)*((UINT64)(n))) 
    
    #define EB(n) (((UINT64)0x1000000000000000)*((UINT64)(n)))

    Macros to make it easy to declare large power-of-two numbers.

    These macros make it easy to exactly declare a large number. For example, if I want to define four gigabytes, I can simply say GB(4).

    Nothing complicated here. But if you need to make use of these numbers, it’s nice to have a table of actual values along with some handy helper macros.

    Tags:  
    Categories:   General | C++/MFC
    Actions:   E-mail | del.icio.us | Permalink | Comments (0) | Comment RSSRSS comment feed