Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (12.94 MB, 298 trang )
2.6 Production and Distribution Planning
13
2.6.1 Production Planning
Production planning identifies the current demand for certain products and consequently adjusts the production rate. It analyzes several indicators, such as the
users’ historic buying behavior, upcoming promotions, stock levels at manufacturers and whole-sellers. Production planning algorithms are complex due to
required calculations, which are comparable to those found in BI systems. With an
in-memory database, these calculations are now performed directly on latest
transactional data. Thus, algorithms are more accurate with respect to current stock
levels or production issues, allowing faster reactions to unexpected incidents.
2.6.2 Available to Promise Check
The Available-to-Promise (ATP) check validates the availability of certain goods.
It analyzes whether the amount of sold and manufactured goods are in balance.
With raising numbers of products and sold goods, the complexity of the check
increases. In certain situations it can be advantageous to withdraw already agreed
goods from certain customers and reschedule them to customers with a higher
priority. ATP checks can also take additional data into account, e.g. fees for
delayed or canceled deliveries or costs for express delivery if the manufacturer is
not able to sent out all goods in time.
Due to the long processing time, ATP checks are executed on top of preaggregated totals, e.g. stock level aggregates per day. Using in-memory databases
enables ATP checks to be performed on the latest data without using pre-aggregated totals. Thus, manufacturing and Scheduling rescheduling decisions can be
taken on real-time data. Furthermore, removing aggregates simplifies the overall
system architecture significantly, while adding flexibility.
2.7
Self Test Questions
1. Compression Factor
What is the average compression factor for accounting data in an in-memory
column-oriented database?
(a)
(b)
(c)
(d)
100x
10x
50x
5x
14
2 New Requirements for Enterprise Computing
2. Data explosion
Consider the formula 1 race car tracking example, with each race car having
512 sensors, each sensor records 32 events per second whereby each event is
64 byte in size.
How much data is produced by a F1 team, if a team has two cars in the race and
the race takes 2 h?
For easier calculation, assume 1,000 byte = 1 kB, 1,000 kB = 1 MB,
1,000 MB = 1 GB.
(a)
(b)
(c)
(d)
14 GB
15.1 GB
32 GB
7.7 GB
References
[OTRK05] A. Oulasvirta, S. Tamminen, V. Roto, J. Kuorelahti, Interaction in 4-second bursts:
the fragmented nature of attentional resources in mobile hci, in Proceedings of the
SIGCHI Conference on Human Factors in Computing Systems, CHI ’05 (ACM, New
York, 2005), pp. 919–928
[Oul05]
A. Oulasvirta, The fragmentation of attention in mobile interaction, and what to do
with it. Interactions 12(6), 16–18 (2005)
[RO05]
V. Roto, A. Oulasvirta, Need for non-visual feedback with long response times in
mobile hci, in Special Interest Tracks and Posters of the 14th International
Conference on World Wide Web, WWW ’05 (ACM, New York, 2005), pp. 775–781
Chapter 3
Enterprise Application Characteristics
3.1 Diverse Applications
An enterprise data management system should be able to handle data coming from
several different source types.
• Transactional data is coming from different applications, e.g. Enterprise
Resource Planning (ERP) systems.
• The sources for event processing and stream data are machines and sensors,
typically high volume systems.
• Real-time analytics usually leverage structured data for transactional reporting,
classical analytics, planning, and simulation.
• Finally, text analytics is typically based on unstructured data coming from the
web, social networks, log files, support systems, etc.
3.2 OLTP Versus OLAP
An enterprise data management system should be able to handle transactional and
analytical query types, which differ in several dimensions. Typical queries for
Online Transaction Processing (OLTP) can be the creation of sales orders,
invoices, accounting data, the display of a sales order for a single customer, or the
display of customer master data. Online Analytical Processing (OLAP) consists
of analytical queries. Typical OLAP-style queries are dunning (payment reminder), cross selling (selling additional products or services to a customer), operational reporting, or analyzing history-based trends.
Because it has always been considered that these query types are significantly
different, it was argued to split the data management system into two separate
systems handling OLTP and OLAP queries separately. In the literature, it is
claimed that OLTP workloads are write-intensive, whereas OLAP-workloads are
read-only and that the two workloads rely on ‘‘Opposing Laws of Database
Physics’’ [Fre95].
H. Plattner, A Course in In-Memory Data Management,
DOI: 10.1007/978-3-642-36524-9_3, Ó Springer-Verlag Berlin Heidelberg 2013
15
16
3 Enterprise Application Characteristics
Yet, research in current enterprise systems showed that this statement is not true
[KGZP10, KKG+11]. The main difference between systems that handle these
query types is that OLTP systems handle more queries with a single select or
queries that are highly selective returning only a few tuples, whereas OLAP
systems calculate aggregations for only a few columns of a table, but for a large
number of tuples.
For the synchronization of the analytical system with the transactional system(s), a cost-intensive ETL (Extract-Transform-Load) process is required. The
ETL process takes a lot of time and is relatively complex, because all changes
have to be extracted from the outside source or sources if there are several, data is
transformed to fit analytical needs, and it is loaded into the target database.
3.3 Drawbacks of the Separation of OLAP from OLTP
While the separation of the database into two systems allows for specific workload
optimizations in both systems, it also has a number of drawbacks:
• The OLAP system does not have the latest data, because the latency between the
systems can range from minutes to hours, or even days.Consequently, many
decisions have to rely on stale data instead of using the latest information.
• To achieve acceptable performance, OLAP systems work with predefined,
materialized aggregates which reduce the query flexibility of the user.
• Data redundancy is high. Similar information is stored in both systems, just
differently optimized.
• The schemas of the OLTP and OLAP systems are different, which introduces
complexity for applications using both of them and for the ETL process synchronizing data between the systems.
3.4 The OLTP Versus OLAP Access Pattern Myth
The workload analysis of multiple real customer systems reveals that OLTP and
OLAP systems are not as different as expected. For OLTP systems, the lookup rate
is only 10 % higher than for OLAP systems. The number of inserts is a little higher
on the OLTP side. However, the OLAP systems are also faced with inserts, as they
have to permanently update their data. The next observation is that the number of
updates in OLTP systems is not very high [KKG+11]. In the high-tech companies
it is about 12 %. It means that about 88 % of all tuples saved in the transactional
database are never updated. In other industry sectors, research showed even lower
update rates, e.g., less than 1 % in banking and discrete manufacturing [KKG+11].
3.4 The OLTP Versus OLAP Access Pattern Myth
17
This fact leads to the assumption that updating as such or alternatively deleting
the old tuple and inserting the new one and keeping track of changes in a ‘‘side
note’’ like it is done in current systems is no longer necessary. Instead, changed or
deleted tuples can be inserted with according time stamps or invalidation flags.
The additional benefit of this insert-only approach is that the complete transactional data history and a tuple’s life cycle are saved in the database automatically.
More details about the insert-only approach will be provided in Chap. 26.
The further fact that workloads are not that different after all leads to the vision
of reuniting the two systems and to combine OLTP and OLAP data in one system.
3.5 Combining OLTP and OLAP Data
The main benefit of the combination is that both, transactional and analytical
queries can be executed on the same machine using the same set of data as a
‘‘single source of truth’’. ETL-processing becomes obsolete.
Using modern hardware, pre-computed aggregates and materialized views can
be eliminated as data aggregation can be executed on-demand and views can be
provided virtually. With the expected response time of analytical queries below
one second, it is possible to do the analytical query processing on the transactional
data directly anytime and anywhere. By dropping the pre-computation of aggregates and materialization of views, applications and data structures can be simplified, as management of aggregates and views (building, maintaining, and storing
them) is not necessary any longer.
A mixed workload combines the characteristics of OLAP and OLTP workloads.
The queries in the workload can have full row operations or retrieve only a small
number of columns. Queries can be simple or complex, pre-determined or ad hoc.
This includes analytical queries that now run on latest transactional data and are
able to see the real-time changes.
3.6 Enterprise Data Characteristics
By analyzing enterprise data, special data characteristics were identified. Most
interestingly, many attributes of a table are not used at all while table can be very
wide. 55 % of columns are unused on average per company and tables with up to
hundreds of columns exist. Many columns that are used have a low cardinality of
values, i.e., there are very few distinct values. Further, in many columns NULL or
default values are dominant, so the entropy (information containment) of these
column is very low (near zero).
These characteristics facilitate the efficient use of compression techniques,
resulting in lower memory consumption and better query performance as will be
seen in later chapters.