1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

2-2. Encode a String Using Alternate Character Encoding

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.26 MB, 1,017 trang )


CHAPTER 2 ■ DATA MANIPULATION



Table 2-1. Character-Encoding Classes



Encoding Scheme



Class



Create Using



ASCII



ASCIIEncoding



GetEncoding(20127) or the ASCII property



Default



Encoding



GetEncoding(0) or the Default property



UTF-7



UTF7Encoding



GetEncoding(65000) or the UTF7 property



UTF-8



UTF8Encoding



GetEncoding(65001) or the UTF8 property



UTF-16 (big-endian)



UnicodeEncoding



GetEncoding(1201) or the BigEndianUnicode property



UTF-16 (little-endian)



UnicodeEncoding



GetEncoding(1200) or the Unicode property



Windows OS



Encoding



GetEncoding(1252)



Once you have an Encoding object of the appropriate type, you convert a UTF-16–encoded Unicode

string to a byte array of encoded characters using the GetBytes method. Conversely, you convert a byte

array of encoded characters to a string using the GetString method.



The Code

The following example demonstrates the use of some encoding classes:

using System;

using System.IO;

using System.Text;

namespace Apress.VisualCSharpRecipes.Chapter02

{

class Recipe02_02

{

public static void Main()

{

// Create a file to hold the output.

using (StreamWriter output = new StreamWriter("output.txt"))

{

// Create and write a string containing the symbol for pi.

string srcString = "Area = \u03A0r^2";

output.WriteLine("Source Text : " + srcString);

// Write the UTF-16 encoded bytes of the source string.

byte[] utf16String = Encoding.Unicode.GetBytes(srcString);

output.WriteLine("UTF-16 Bytes: {0}",

BitConverter.ToString(utf16String));



57



CHAPTER 2 ■ DATA MANIPULATION



// Convert the UTF-16 encoded source string to UTF-8 and ASCII.

byte[] utf8String = Encoding.UTF8.GetBytes(srcString);

byte[] asciiString = Encoding.ASCII.GetBytes(srcString);

// Write the UTF-8 and ASCII encoded byte arrays.

output.WriteLine("UTF-8 Bytes: {0}",

BitConverter.ToString(utf8String));

output.WriteLine("ASCII Bytes: {0}",

BitConverter.ToString(asciiString));

// Convert UTF-8 and ASCII encoded bytes back to UTF-16 encoded

// string and write.

output.WriteLine("UTF-8 Text : {0}",

Encoding.UTF8.GetString(utf8String));

output.WriteLine("ASCII Text : {0}",

Encoding.ASCII.GetString(asciiString));

}

// Wait to continue.

Console.WriteLine("\nMain method complete. Press Enter");

Console.ReadLine();

}

}

}



Usage

Running the code will generate a file named output.txt. If you open this file in a text editor that supports

Unicode, you will see the following content:

Source Text : Area = πr^2

UTF-16 Bytes: 41-00-72-00-65-00-61-00-20-00-3D-00-20-00-A0-03-72-00-5E-00-32-00

UTF-8



Bytes: 41-72-65-61-20-3D-20-CE-A0-72-5E-32



ASCII



Bytes: 41-72-65-61-20-3D-20-3F-72-5E-32



UTF-8



Text : Area = πr^2



ASCII



Text : Area = ?r^2



Notice that using UTF-16 encoding, each character occupies 2 bytes, but because most of the

characters are standard characters, the high-order byte is 0. (The use of little-endian byte ordering

means that the low-order byte appears first.) This means that most of the characters are encoded using

the same numeric values across all three encoding schemes. However, the numeric value for the symbol

pi (emphasized in bold in the preceding output) is different in each of the encodings. The value of pi



58



CHAPTER 2 ■ DATA MANIPULATION



requires more than 1 byte to represent. UTF-8 encoding uses 2 bytes, but ASCII has no direct equivalent

and so replaces pi with the code 3F. As you can see in the ASCII text version of the string, 3F is the symbol

for an English question mark (?).



■ Caution If you convert Unicode characters to ASCII or a specific code page–encoding scheme, you risk losing

data. Any Unicode character with a character code that cannot be represented in the scheme will be ignored.



Notes

The Encoding class also provides the static method Convert to simplify the conversion of a byte array

from one encoding scheme to another without the need to manually perform an interim conversion to

UTF-16. For example, the following statement converts the ASCII-encoded bytes contained in the

asciiString byte array directly from ASCII encoding to UTF-8 encoding:

byte[] utf8String = Encoding.Convert(Encoding.ASCII, Encoding.UTF8,asciiString);



2-3. Convert Basic Value Types to Byte Arrays

Problem

You need to convert basic value types to byte arrays.



Solution

The static methods of the System.BitConverter class provide a convenient mechanism for converting

most basic value types to and from byte arrays. An exception is the decimal type. To convert a decimal

type to or from a byte array, you need to use a System.IO.MemoryStream object.



How It Works

The static method GetBytes of the BitConverter class provides overloads that take most of the standard

value types and return the value encoded as an array of bytes. Support is provided for the bool, char,

double, short, int, long, float, ushort, uint, and ulong data types. BitConverter also provides a set of

static methods that support the conversion of byte arrays to each of the standard value types. These are

named ToBoolean, ToUInt32, ToDouble, and so on.

Unfortunately, the BitConverter class does not provide support for converting the decimal type.

Instead, write the decimal type to a MemoryStream instance using a System.IO.BinaryWriter object, and

then call the MemoryStream.ToArray method. To create a decimal type from a byte array, create a

MemoryStream object from the byte array and read the decimal type from the MemoryStream object using a

System.IO.BinaryReader instance.



59



CHAPTER 2 ■ DATA MANIPULATION



The Code

The following example demonstrates the use of BitConverter to convert a bool type and an int type to

and from a byte array. The second argument to each of the ToBoolean and ToInt32 methods is a zerobased offset into the byte array where the BitConverter should start taking the bytes to create the data

value. The code also shows how to convert a decimal type to a byte array using a MemoryStream object and

a BinaryWriter object, as well as how to convert a byte array to a decimal type using a BinaryReader

object to read from the MemoryStream object.

using System;

using System.IO;

namespace Apress.VisualCSharpRecipes.Chapter02

{

class Recipe02_03

{

// Create a byte array from a decimal.

public static byte[] DecimalToByteArray (decimal src)

{

// Create a MemoryStream as a buffer to hold the binary data.

using (MemoryStream stream = new MemoryStream())

{

// Create a BinaryWriter to write binary data to the stream.

using (BinaryWriter writer = new BinaryWriter(stream))

{

// Write the decimal to the BinaryWriter/MemoryStream.

writer.Write(src);

// Return the byte representation of the decimal.

return stream.ToArray();

}

}

}

// Create a decimal from a byte array.

public static decimal ByteArrayToDecimal (byte[] src)

{

// Create a MemoryStream containing the byte array.

using (MemoryStream stream = new MemoryStream(src))

{

// Create a BinaryReader to read the decimal from the stream.

using (BinaryReader reader = new BinaryReader(stream))

{

// Read and return the decimal from the

// BinaryReader/MemoryStream.

return reader.ReadDecimal();

}

}

}

public static void Main()



60



Xem Thêm
Tải bản đầy đủ (.pdf) (1,017 trang)

×