Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.94 MB, 298 trang )
2
1 Introduction
Resource Planning (ERP) systems were rather dry with no intersections to modern
technologies as used by Google, Twitter, Facebook, and several others.
The team decided to start a new radical approach for ERP systems. To start
from scratch, the particular enabling technologies and possibilities of upcoming
computer systems had to be identified. With this foundation, they designed a
completely new system based on two major trends in hardware technologies:
• Massively parallel systems with an increasing number of Central Processing
Units (CPUs) and CPU-cores
• Increasing main memory volumes
To leverage the parallelism of modern hardware, substantial changes had to be
made. Current systems were already parallel in respective to their ability to handle
thousands of concurrent users. However, the underlying applications were not
exploiting parallelism.
Exploiting hardware parallelism is difficult. Hennessy et al. [PH12] discuss
what changes have to be made to make an application run in parallel, and explain
why it is often very hard to change sequential applications to use multiple cores
efficiently.
For the first prototypes, the team decided to look more closely into accounting
systems. In 2006, computers were not yet capable of keeping big companies’ data
completely in memory. So, the decision was made to concentrate on rather small
companies in the first place. It was clear that the progress in hardware development
would continue and that the advances will automatically enable the systems to
keep bigger volumes of data in memory.
Another important design decision was the complete removal of materialized
aggregates. In 2006, ERP systems were highly depending on pre-computed
aggregates. With the computing power of upcoming systems, the new design was
not only capable of increasing the granularity of aggregates, but of completely
removing them.
As the new system keeps every bit of the processed information in memory,
disks are only used for archiving, backup, and recovery. The primary persistence is
the Dynamic Random Access Memory (DRAM), which is accomplished by
increased capacities and data compression.
To evaluate the new approach, several bachelor projects and master projects
implemented new applications using in-memory database technology over the next
several years. Ongoing research focuses on the most promising findings of these
projects as well as completely new approaches to enterprise computing with an
enhanced user experience in mind.
1.3 Learning Map
The learning map (see Fig. 1.1) gives a brief overview over the parts of the
learning material and the respective chapters in these parts. In this graph, you can
easily see what the prerequisites for a chapter are and which contents will follow.
1.4 Self Test Questions
3
Fig. 1.1 Learning map
1.4
Self Test Questions
1. Rely on Disks
Does an in-memory database still rely on disks?
(a) Yes, because disk is faster than main memory when doing complex
calculations
(b) No, data is kept in main memory only
(c) Yes, because some operations can only be performed on disk
(d) Yes, for archiving, backup, and recovery
References
[BMK09] P.A. Boncz, S. Manegold, M.L. Kersten, Database Architecture Evolution: Mammals
Flourished long before Dinosaurs became Extinct. PVLDB 2(2), 1648–1653 (2009)
[KNF+12] A. Kemper, T. Neumann, F. Funke, V. Leis, H. Mühe, Hyper: adapting columnar
main-memory data management for transactional and query processing. IEEE Data
Eng. Bull. 35(1), 46–51 (2012)
[PH12]
D.A. Patterson, J.L. Hennessy, in Computer Organization and Design—The Hardware
/ Software Interface, (Revised 4th edn.). The Morgan Kaufmann Series in Computer
Architecture and Design (Academic Press, San Francisco, CA, USA, 2012)
[Pla09]
H. Plattner, in A common database approach for OLTP and OLAP using an inmemory column database, ed. by U. Çetintemel, S. Zdonik, D. Kossmann. SIGMOD
Conference (ACM, Newyork, 2009), pp. 1–2
Part I
The Future of Enterprise Computing
Chapter 2
New Requirements for Enterprise
Computing
When thinking about developing a completely new database management system
for enterprise computing, the question whether there is a need for a new database
management system arises. And the answer is yes! Modern companies have
changed dramatically. Nowadays companies are more data-driven than ever
before. For example, during manufacturing a much higher amount of data is
produced, e.g. by assembly line sensors or manufacturing robots. Furthermore,
companies process data at a much larger scale, e.g. competitor behavior, price
trends, etc. to support management decisions. And data volumes will continue to
grow in the future. There are two major requirements for a modern database
management system:
• Data from various sources have to be combined in a single database management system, and
• This data has to be analyzed in real-time to support interactive decision taking.
The following sections outline use cases for modern enterprises and derive
associated requirements for a completely new enterprise data management system.
2.1 Processing of Event Data
Event data influences enterprises today more and more. Event data is characterized
by the following aspects:
• Each event dataset itself is small (some bytes or kilobytes) compared to the size
of traditional enterprise data, such as all data contained in a single sales order,
and
• The number of generated events for a specific entity is high compared to the
amount of entities, e.g. hundreds or thousand events are generated for a single
product.
In the following, use cases of event data in modern enterprises are outlined.
H. Plattner, A Course in In-Memory Data Management,
DOI: 10.1007/978-3-642-36524-9_2, Ó Springer-Verlag Berlin Heidelberg 2013
7
8
2 New Requirements for Enterprise Computing
2.1.1 Sensor Data
Sensors are used to supervise the function of more and more systems today. One
example is the tracking and tracing of sensitive goods, such as pharmaceuticals,
clothes, or spare parts. Hereby packages are equipped with Radio-Frequency
Identification (RFID) tags or two-dimensional bar codes, the so-called data matrix.
Each product is virtually represented by an Electronic Product Code (EPC), which
describes the manufacturer of a product, the product category, and a unique serial
number. As a result, each product can be uniquely identified by its EPC code. In
contrast, traditional one-dimensional bar codes can only be used for identification
of classes of products due to their limited domain set. Once a product passes
through a reader gate, a reading event is captured. The reading event consists of
the current reading location, timestamp, the current business step, e.g. receiving,
unpacking, repacking or shipping, and further related details. All events are stored
in decentralized event repositories.
Real-Time Tracking of Pharmaceuticals
For example, approx. 15 billion prescription-based pharmaceuticals are produced
in Europe. Tracking any of them results in approx. 8,000 read event notifications
per second. These events build the basis for anti-counterfeiting techniques. For
example, the route of a specific pharmaceutical can be reconstructed by analyzing
all relevant reading events. The in-memory technology enables tracing of 10
billion events in less than 100 ms.
Formula One Racing Cars
Formula one racing cars are also generating excessive sensor data. These sports cars
are equipped with up to 600 individual sensors, each recording tens to hundreds of
events per second. Capturing sensor data for a 2 h race produces giga- or even
terabytes of sensor data depending on their granularity. The challenge is to capture,
process, and analyze the acquired data during the race to optimize the car parameters instantly, e.g. to detect part faults, optimize fuel consumption or top speed.
2.1.2 Analysis of Game Events
Personalized content in online games is a success factor for the gaming industry.
The German company Bigpoint is a provider of browser games with more than 200
million active users.1 Their browser games generate a steady stream of more than
1
Bigpoint GmbH—http://www.bigpoint.net/