Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.26 MB, 1,017 trang )
CHAPTER 2 ■ DATA MANIPULATION
Table 2-1. Character-Encoding Classes
Encoding Scheme
Class
Create Using
ASCII
ASCIIEncoding
GetEncoding(20127) or the ASCII property
Default
Encoding
GetEncoding(0) or the Default property
UTF-7
UTF7Encoding
GetEncoding(65000) or the UTF7 property
UTF-8
UTF8Encoding
GetEncoding(65001) or the UTF8 property
UTF-16 (big-endian)
UnicodeEncoding
GetEncoding(1201) or the BigEndianUnicode property
UTF-16 (little-endian)
UnicodeEncoding
GetEncoding(1200) or the Unicode property
Windows OS
Encoding
GetEncoding(1252)
Once you have an Encoding object of the appropriate type, you convert a UTF-16–encoded Unicode
string to a byte array of encoded characters using the GetBytes method. Conversely, you convert a byte
array of encoded characters to a string using the GetString method.
The Code
The following example demonstrates the use of some encoding classes:
using System;
using System.IO;
using System.Text;
namespace Apress.VisualCSharpRecipes.Chapter02
{
class Recipe02_02
{
public static void Main()
{
// Create a file to hold the output.
using (StreamWriter output = new StreamWriter("output.txt"))
{
// Create and write a string containing the symbol for pi.
string srcString = "Area = \u03A0r^2";
output.WriteLine("Source Text : " + srcString);
// Write the UTF-16 encoded bytes of the source string.
byte[] utf16String = Encoding.Unicode.GetBytes(srcString);
output.WriteLine("UTF-16 Bytes: {0}",
BitConverter.ToString(utf16String));
57
CHAPTER 2 ■ DATA MANIPULATION
// Convert the UTF-16 encoded source string to UTF-8 and ASCII.
byte[] utf8String = Encoding.UTF8.GetBytes(srcString);
byte[] asciiString = Encoding.ASCII.GetBytes(srcString);
// Write the UTF-8 and ASCII encoded byte arrays.
output.WriteLine("UTF-8 Bytes: {0}",
BitConverter.ToString(utf8String));
output.WriteLine("ASCII Bytes: {0}",
BitConverter.ToString(asciiString));
// Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded
// string and write.
output.WriteLine("UTF-8 Text : {0}",
Encoding.UTF8.GetString(utf8String));
output.WriteLine("ASCII Text : {0}",
Encoding.ASCII.GetString(asciiString));
}
// Wait to continue.
Console.WriteLine("\nMain method complete. Press Enter");
Console.ReadLine();
}
}
}
Usage
Running the code will generate a file named output.txt. If you open this file in a text editor that supports
Unicode, you will see the following content:
Source Text : Area = πr^2
UTF-16 Bytes: 41-00-72-00-65-00-61-00-20-00-3D-00-20-00-A0-03-72-00-5E-00-32-00
UTF-8
Bytes: 41-72-65-61-20-3D-20-CE-A0-72-5E-32
ASCII
Bytes: 41-72-65-61-20-3D-20-3F-72-5E-32
UTF-8
Text : Area = πr^2
ASCII
Text : Area = ?r^2
Notice that using UTF-16 encoding, each character occupies 2 bytes, but because most of the
characters are standard characters, the high-order byte is 0. (The use of little-endian byte ordering
means that the low-order byte appears first.) This means that most of the characters are encoded using
the same numeric values across all three encoding schemes. However, the numeric value for the symbol
pi (emphasized in bold in the preceding output) is different in each of the encodings. The value of pi
58
CHAPTER 2 ■ DATA MANIPULATION
requires more than 1 byte to represent. UTF-8 encoding uses 2 bytes, but ASCII has no direct equivalent
and so replaces pi with the code 3F. As you can see in the ASCII text version of the string, 3F is the symbol
for an English question mark (?).
■ Caution If you convert Unicode characters to ASCII or a specific code page–encoding scheme, you risk losing
data. Any Unicode character with a character code that cannot be represented in the scheme will be ignored.
Notes
The Encoding class also provides the static method Convert to simplify the conversion of a byte array
from one encoding scheme to another without the need to manually perform an interim conversion to
UTF-16. For example, the following statement converts the ASCII-encoded bytes contained in the
asciiString byte array directly from ASCII encoding to UTF-8 encoding:
byte[] utf8String = Encoding.Convert(Encoding.ASCII, Encoding.UTF8,asciiString);
2-3. Convert Basic Value Types to Byte Arrays
Problem
You need to convert basic value types to byte arrays.
Solution
The static methods of the System.BitConverter class provide a convenient mechanism for converting
most basic value types to and from byte arrays. An exception is the decimal type. To convert a decimal
type to or from a byte array, you need to use a System.IO.MemoryStream object.
How It Works
The static method GetBytes of the BitConverter class provides overloads that take most of the standard
value types and return the value encoded as an array of bytes. Support is provided for the bool, char,
double, short, int, long, float, ushort, uint, and ulong data types. BitConverter also provides a set of
static methods that support the conversion of byte arrays to each of the standard value types. These are
named ToBoolean, ToUInt32, ToDouble, and so on.
Unfortunately, the BitConverter class does not provide support for converting the decimal type.
Instead, write the decimal type to a MemoryStream instance using a System.IO.BinaryWriter object, and
then call the MemoryStream.ToArray method. To create a decimal type from a byte array, create a
MemoryStream object from the byte array and read the decimal type from the MemoryStream object using a
System.IO.BinaryReader instance.
59
CHAPTER 2 ■ DATA MANIPULATION
The Code
The following example demonstrates the use of BitConverter to convert a bool type and an int type to
and from a byte array. The second argument to each of the ToBoolean and ToInt32 methods is a zerobased offset into the byte array where the BitConverter should start taking the bytes to create the data
value. The code also shows how to convert a decimal type to a byte array using a MemoryStream object and
a BinaryWriter object, as well as how to convert a byte array to a decimal type using a BinaryReader
object to read from the MemoryStream object.
using System;
using System.IO;
namespace Apress.VisualCSharpRecipes.Chapter02
{
class Recipe02_03
{
// Create a byte array from a decimal.
public static byte[] DecimalToByteArray (decimal src)
{
// Create a MemoryStream as a buffer to hold the binary data.
using (MemoryStream stream = new MemoryStream())
{
// Create a BinaryWriter to write binary data to the stream.
using (BinaryWriter writer = new BinaryWriter(stream))
{
// Write the decimal to the BinaryWriter/MemoryStream.
writer.Write(src);
// Return the byte representation of the decimal.
return stream.ToArray();
}
}
}
// Create a decimal from a byte array.
public static decimal ByteArrayToDecimal (byte[] src)
{
// Create a MemoryStream containing the byte array.
using (MemoryStream stream = new MemoryStream(src))
{
// Create a BinaryReader to read the decimal from the stream.
using (BinaryReader reader = new BinaryReader(stream))
{
// Read and return the decimal from the
// BinaryReader/MemoryStream.
return reader.ReadDecimal();
}
}
}
public static void Main()
60