Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.94 MB, 298 trang )
8.2 Row and Columnar Layouts
CPU Cycles per Element
(a)1000.0
[L1]
59
[L2]
[L3]
100.0
10.0
1.0
16KB 64KB 256KB 1MB
4MB 16MB 64MB 256MB
Size of Accessed Area in Bytes
Sequential
Random
(b)
Fig. 8.4 Cycles and cache misses for cache accesses with increasing working sets. (a) Sequential
Access. (b) Random Access
are stored as strings directly in memory and that we do not need to store any
additional data. As an example, let us look at the simple world population example:
Id
Name
Country
City
1
2
3
Paul Smith
Lena Jones
Marc Winter
Australia
USA
Germany
Sydney
Washington
Berlin
As discussed above, the database must transform its two-dimensional table into a
one-dimensional series of bytes for the operating system to write them to memory.
The classical and obvious approach is a row- or record-based layout. In this case,
all attributes of a tuple are stored consecutively and sequentially in memory. In
other words, the data is stored tuple-wise. Considering our example table, the data
60
8 Data Layout in Main Memory
(a)
(b)
Fig. 8.5 Illustration of memory accesses for row-based and column-based operations on row and
columnar data layouts.
would be stored as follows: ‘‘1, Paul Smith, Australia, Sydney; 2, Lena
Jones, USA, Washington; 3, Marc Winter, Germany, Berlin’’.
On the contrary, in a columnar layout , the values of one column are stored
together, column by column. The resulting layout in memory for our example
would be: ‘‘1, 2, 3; Paul Smith, Lena Jones, Marc Winter; Australia, USA, Germany; Sydney, Washington, Berlin’’.
The columnar layout is especially effective for set-based reads. In other words, it
is useful for operations that work on many rows but only on a notably smaller subset
of all columns, as the values of one column can be read sequentially, e.g. when
performing aggregate calculations. However, when performing operations on single
tuples or for inserting new rows, a row-based layout is beneficial. The different access
patterns for row-based and column-based operations are illustrated in Fig. 8.5.
8.2 Row and Columnar Layouts
61
Currently, row-oriented architectures are widely used for OLTP workloads
while column stores are widely utilized in OLAP scenarios like data warehousing,
which typically involve a smaller number of highly complex queries over the
complete data set.
8.3 Benefits of a Columnar Layout
As mentioned above, there are use cases where a row-based table layout can be
more efficient. Nevertheless, many advantages speak in favor of the usage of a
columnar layout in an enterprise scenario.
First, when analyzing the workloads enterprise databases are facing, it turns out
that the actual workloads are more read-oriented and dominated by set
processing [KKG+11].
Second, despite the fact that hardware technology develops very rapidly and the
size of available main memory constantly grows, the use of efficient compression
techniques is still important in order to (a) keep as much data in main memory as
possible and to (b) minimize the amount of data that has to be read from memory
to process queries as well as the data transfer between non-volatile storage
mediums and main memory.
Using column-based table layouts enables the use of efficient compression
techniques leveraging the high data locality in columns (see Chap. 7). They mainly
use the similarity of the data stored in a column. Dictionary encoding can be
applied to row-based as well as column-based table layout, whereas other techniques like prefix encoding, run-length encoding, cluster encoding or indirect
encoding directly leverage the benefits of columnar table layouts.
Third, using columnar table layouts enables very fast column scans as they can
sequentially scan the memory, allowing e.g. on the fly calculations of aggregates.
Consequently, storing pre-calculated aggregates in the database can be avoided,
thus minimizing redundancy and complexity of the database.
8.4 Hybrid Table Layouts
As stated above, set processing operations are dominating enterprise workloads.
Nevertheless, each concrete workload is different and might favor a row-based or a
column-based layout. Hybrid table layouts combine the advantages of both worlds,
allowing to store single attributes of a table column oriented while grouping other
attributes into a row-based layout [GKP+11]. The actual optimal combination highly
depends on the actual workload and can be calculated by layouting algorithms.
As an illustrating example, think about attributes, which inherently belong
together in commercial applications, e.g. quantity and measuring unit or payment
conditions in accounting. The idea of the hybrid layout is that if the set of attributes
62
8 Data Layout in Main Memory
are processed together, it makes sense from a performance point of view to physically store them together. Considering the example table provided in Sect. 8.2 and
assuming the fact that the attributes Id and Name are often processed together, we
can outline the following hybrid data layout for the table: ‘‘1, Paul Smith; 2,
Lena Jones; 3, Marc Winter; Australia, USA, Germany; Sydney,
Washington, Berlin’’. This hybrid layout may decrease the number of cache
misses caused by the expected workload, resulting in increased performance.
The usage of hybrid layouts can be beneficial but also introduces new questions
like how to find the optimal layout for a given workload or how to react on a
changing workload.
8.5
Self Test Questions
1. When DRAM can be accessed randomly with the same costs, why are consecutive accesses usually faster than stride accesses?
(a) With consecutive memory locations, the probability that the next requested
location has already been loaded in the cache line is higher than with
randomized/strided access. Furthermore is the memory page for consecutive accesses probably already in the TLB
(b) The bigger the size of the stride, the higher the probability, that two values
are both in one cache line
(c) Loading consecutive locations is not faster, since the CPU performs better
on prefetching random locations, than prefetching consecutive locations
(d) With modern CPU technologies like TLBs, caches and prefetching, all
three access methods expose the same performance.
References
[BCR10]
T.W. Barr, A.L. Cox, S. Rixner, Translation caching: skip, don’t walk (the Page
Table). ACM SIGARCH Comput Arch. News 38(3), 48–59 (2010)
[BT09]
V. Babka, P. Tuma, Investigating cache parameters of x86 family processors.
Comput. Perform. Eval. Benchmarking. 77–96 (2009)
[GKP+11] M. Grund, J. Krueger, H. Plattner, A. Zeier, S. Madden, P. Cudre-Mauroux, HYRISE
- A hybrid main memory storage engine, in VLDB (2011)
[KKG+11] J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P.
Dubey, A. Zeier, Fast updates on read-optimized databases using multi-core CPUs, in
PVLDB (2011)
[SKP12] D. Schwalb, J. Krueger, H. Plattner, Cache conscious column organization in inmemory column stores. Technical Report 60, Hasso-Plattner-Institute, December 2012.
[SS95]
R.H. Saavedra, A.J. Smith, Measuring cache and TLB performance and their effect on
benchmark runtimes. IEEE Trans. Comput. 44(10), 1223–1235 (1995)