Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.79 MB, 388 trang )
The app should upload pictures privately or publicly, for all other users to see. Users should see
thumbnails of the pictures they’re allowed to see, together with metadata, such as a title and a
description.
Communication between the client application and the back-end functionality is based on APIs.
You’ll use JavaScript to build the client application so that it can run in a web page on a desktop
or a mobile device. If you’re a mobile developer, it will be easy to implement the same client for
different devices; for example, using AWS Mobile Hub to kick-start a native app, as described
in chapter 7.
Note
This example uses both client-side (running in the browser) and server-side (running in Lambda
functions) code. Because the code running in the browser is JavaScript, the Lambda function
examples are also provided in JavaScript. The implementation of those functions in Python is
left as an exercise for you to do on your own, because it doesn’t change the architecture or the
logic of the application.
For this type of application, you expect uploads of new content to be far less frequent than the
number of times that content is accessed by users. For efficiency, you can create a static index of
the public and private pictures users can see, so that you can use this index to display the
pictures, instead of running queries on a database. For example, the index can be a file
containing all the information you need to display the pictures in the client application. You can
use any structured format for that, such as XML or YAML. I use JSON because it’s natively
supported in JavaScript.
Tip
Caching is an important optimization that can improve the scalability of your applications and
reduce the latency that users experience. Caching is a common architectural pattern in software
and hardware implementations: multiple caches exist inside the CPUs we normally use, caches
are in the databases (for example, for common retrieved data), caches are in the network stacks
(for example, for DNS results), and so on. You should always consider what data you can safely
cache, and for how long, when building your applications.
11.1.1. Simplifying the implementation
When implementing a new application, you should look at any service or feature that you can
use so that you can reduce the development footprint and the time to market of your solution.
This is even more important if you take a “lean” approach, as suggested, for example, in the
book The Lean Startup by Eric Ries (Crown Business, 2011). According to this practice, you
want to quickly release a Minimum Viable Product (MVP) to your users, an early
implementation with enough features to validate (or learn and change) your product and
business model, and then rapidly iterate with new features on top of that.
Let’s use a lean approach here and replace functionalities you need with services that can
provide them, making a few implementation decisions that will simplify the architecture to
build:
•
•
•
For the repository of pictures, thumbnails, and index files, you can use Amazon S3. That
way, you can directly upload or download pictures or index files using the S3 API, and
you don’t need to implement that yourself.
For metadata, such as the title or description or to store the links to the pictures and their
thumbnails, you can use Amazon DynamoDB, a NoSQL database service. Again, you can
use the DynamoDB API to read or update content metadata without implementing those
APIs from scratch.
To react to changes in the files repository (Amazon S3) or the database (Amazon
DynamoDB), you can use Lambda functions triggered by events.
Following these decisions, the architecture in figure 11.1 can be mapped to a technical
implementation to become what you see in figure 11.2. Of the eight Lambda functions that we
needed to implement in our initial assessment, only three are left in this simplified architecture.
The other five functions are replaced by direct usage of S3 and DynamoDB APIs.
Figure 11.2. Mapping the media-sharing app to a technical implementation using Amazon S3, Amazon DynamoDB, and AWS Lambda.
The development is much simpler than the implementation in figure 11.1 because most functions are directly implemented by the
AWS services themselves, such as picture upload and download (using the S3 API) and metadata read and update (using the
DynamoDB API).
One question that should be on your mind is whether you’re sure you can replace all those
functions by natively using Amazon S3 or Amazon DynamoDB. To answer that, you need to
check whether the functions and the security features satisfy the requirements of your
implementation. You’ll see that as we build the app.
To authenticate and authorize the client application to use AWS APIs, you can use Amazon
Cognito (figure 11.3). As you learned in chapter 6, AWS IAM roles allow a fine-grained control of
access to AWS resources; for example, using policy variables to limit client access to the S3
bucket and the DynamoDB table. You’ll use those features again in this implementation.
Figure 11.3. Using Amazon Cognito, you can give secure and finely controlled access to AWS resources, such as the S3 bucket and the
DynamoDB table, allowing the client application to directly use S3 and DynamoDB APIs.
You can now map the building blocks shown in figure 11.3 to the new, ready-to-use
implementation domains that they’re part of, such as Amazon S3, Amazon DynamoDB, or AWS
Lambda (figure 11.4). Let’s see the new simplified architecture in more detail.
Figure 11.4. Using the new mapping, the building blocks of the architecture are mapped to the implementation domains they’re part
of, such as Amazon S3, Amazon DynamoDB, or AWS Lambda.
The front end to the client application is now all based on the S3 API (for manipulating pictures
and files) and DynamoDB API (for the metadata). In particular, you’re using the S3 PUT Object
to upload or updated content and the GET Object to download it. With Amazon DynamoDB,
you’re using GetItem, to retrieve an item by primary key, and UpdateItem to update it.
The client doesn’t need direct access to DynamoDB PutItem to create a new item in the
database, because when a new piece of content is uploaded on the S3 bucket,
the extractAndUpdateMetadata Lambda function in the back end will read the custom metadata
from the S3 object and insert the new item with that information in the DynamoDB table.
The buildThumbnails Lambda function will react to the same event (new or updated file) to
create a small thumbnail of the picture that you can use to visualize the content in the client. The
thumbnail is stored in the same S3 bucket, with a different prefix.
Finally, the updateContentIndex Lambda function is triggered by a change in the metadata table
on DynamoDB to keep static index files on the S3 bucket updated with all changes.
Note
The Lambda functions are triggered in the back end by events and don’t need to be directly
accessible by clients. Security-wise, this is good because you’re exposing well-proven AWS APIs
to the clients and not your own custom implementations.
11.1.2. Consolidating functions
When you start designing the architecture of your application, you create different modules for
different functionalities. But when you move into the implementation phase, you may find that
several of those modules (functions, in the case of AWS Lambda) are tied together by the data
they use or by the way they’re used by the application.
The extractAndUpdateMetadata and buildThumbnails Lambda functions are triggered by the
same event (new or updated content in the S3 bucket) and can be tied together directly. For
example, the first function can asynchronously invoke the second before terminating, as shown
in figure 11.5.
Figure 11.5. When two functions are triggered by the same event, you can decide to group them together. For example, one function
can asynchronously invoke the other before terminating, as in the case
of extractAndUpdateMetadata and buildThumbnails here.
Continuing with the implementation of those two functions, both need the same data in input:
•
•
buildThumbnails needs the picture file to create the thumbnail.
extractAndUpdateMetadata needs the object metadata to put that information in the
database.
But Amazon S3 has two operations that you can use:
•
•
GET object, to read the whole object, file, and metadata.
HEAD object, to retrieve only the metadata without the file.
The two functions would need to read the same S3 object twice, once with the GET and once with
the HEAD, but this approach is not optimal at scale, where you can have thousands or even
millions of objects.
In this case, my suggestion is to create a single function that will use the same input to create the
thumbnail and process the metadata (figure 11.6).
Figure 11.6. Grouping extractAndUpdateMetadata and buildThumbnails Lambda functions in a
single contentUpdated function to optimize storage access and read the S3 object only once
How small should your function be?
Grouping more functions together is an architectural decision that you should evaluate when you
create an event-driven application. You probably have two opposite effects to balance with your
decisions:
•
•
Having more and smaller functions can improve the modularity of your application.
Smaller functions also have a quicker startup time on AWS Lambda for the first
invocation when the container running the functions is deployed under the hood.
Having fewer and bigger functions can simplify code reuse and optimize (as in our case)
the flow of data to avoid reading or writing again to the same database or file.
11.1.3. Evolving an event-driven architecture
One of the advantages of event-driven applications, and reactive architectures in general, is that
you link the logic (in the functions) to the data flow instead of building a centralized workflow.
For example, you may want to add the option for users to delete content from the media-sharing
app.
To add this functionality, you need to have a new delete API for your clients. But Amazon S3
already has an implementation for that! It’s the DELETE Object API. You only need to manage the
deleted file event from the S3 bucket in the contentUpdated Lambda function and keep the
content index updated in case of deletion in the updateContentIndex function (figure 11.7).
Figure 11.7. Clients can use the S3 DELETE Object API to delete content. You need to manage the deletion event in the Lambda
functions to keep the metadata updated in the database and in the content index.
Adding a new feature to an event-driven application is much easier than in a procedural
approach because you can focus on the relations between resources. When you use a similar
approach in your projects, you’ll often find that adding certain features will be easier—as you
experienced with deleting content—because of how data is modeled and how you can react to
changes. Sometimes this won’t be the case, and adding a feature will be complex. In that case,
my suggestion is to look again at the data you have and see if a different approach to how data is
stored (files, relational, or NoSQL databases) can simplify the overall flow and the
implementation of the new feature.
You now have a better idea of how to map functions into software modules and of the services
you can use to make your implementation quick but effective. Next, let’s see how we structure
our data to support the event-driven approach we’re undertaking.
11.2. DEFINING AN OBJECT NAMESPACE FOR AMAZON S3
In the S3 bucket, you have the content (the pictures), the thumbnails, and the static indexes that
your Lambda functions keep updated. You need to have a public index for all public content
(that is the same for all users) and a private index for each user for their own private content.
Amazon S3 isn’t a hierarchical repository, but in defining the keys you want to use for those
objects, you can choose a hierarchical syntax that can allow you to do the following:
•
•
Trigger the contentUpdated Lambda function only when needed
Give access to public and private content to only the right users via Amazon Cognito and
IAM roles
Warning
You should carefully avoid the possibility of having endless loops of events, such as events
triggering functions that can change something on another resource; for example, an S3 bucket
or a DynamoDB table that could trigger the same function again.
My proposed hierarchical syntax for S3 keys is depicted in figure 11.8. In the bucket, you have
two main prefixes, public/ and private/, to maintain a strong separation between public and
private content, which can be mapped into IAM roles.
Figure 11.8. A hierarchical syntax for the S3 keys used in the S3 bucket, to protect access via IAM roles and allow events with
predefined prefixes to trigger the correct Lambda functions
Each of those two prefixes has a space for content/, as uploaded by the clients, thumbnails/that
are created by the contentUpdated Lambda function, and space for the static index/ files.
The main difference between the private/ and public/ spaces is a single public index file and
specific private index file for each user. The {identityId} part of the keys is to be replaced by
the actual ID given by Amazon Cognito to the users upon their first login.
For each path in S3, different users (authenticated or not by Amazon Cognito) and Lambda
functions can read or write, as described in table 11.1.
Table 11.1. Who can read or write in the different S3 paths
S3 Path
Who can read
Who can write
public/index/content.json
All users (authenticated or not)
The updateContentIndex
Lambda function
public/content/{identityId}/*
All users (authenticated or not)
Authenticated users with the
and the contentUpdated Lambda same identityId
function
public/thumbnails/{identityId}/*
All users (authenticated or not)
The contentUpdated
Lambda function
private/index/{identityId}/content.json Authenticated users with the same The contentUpdated
identityId
Lambda function
private/content/{identityId}/*
Authenticated users with the same Authenticated users with the
identityId and the contentUpdated same identityId function
Lambda function
private/thumbnails/{identityId}/*
Authenticated users with the same The contentUpdated
identityId
Lambda function
Table 11.2 lists the S3 prefixes that will trigger a Lambda function and the corresponding
function name.
Table 11.2. Prefixes in the event sources for the Lambda functions
Prefix on S3
Lambda function
public/content/
contentUpdated
private/content/
contentUpdated
11.3. DESIGNING THE DATA MODEL FOR AMAZON DYNAMODB
DynamoDB tables don’t have a fixed schema; when you create a table, you need to define the
primary key, which can be a single Partition Key or a composite key with a Partition Key and a
Sort Key. In this case, you can use a content table with a composite key, with identityId as
Partition Key and the objectKey as Sort Key (table 11.3). Both attributes are strings.
Table 11.3. DynamoDB content table
Attribute
Type
Description
identityId
Partition Key Part of the Primary Key. The identityId of the user, as provided by Amazon
(String)
Cognito. Only authenticated users with the same identityId can read and
write an item in the table.
objectKey
Sort Key
(String)
Part of the Primary Key. The key of the object on Amazon S3.
thumbnailKey Attribute
(String)
The key of the thumbnail on Amazon S3.
isPublic
Attribute
(Boolean)
The content is publicly shared (“true”) or not (“false”).
title
Attribute
(String)
A title for the content.
description
Attribute
(String)
A description for the content.
uploadDate
Attribute
(String)
The full date of the upload, including time, as taken from S3 metadata.
uploadDay
Attribute
(String)
The day (without time) of the upload, taken from S3 metadata, used by a
global secondary index to quickly query for recent uploads.
To query for public content, you need to create a Global Secondary Index (GSI) that’s composed
of a Partition Key (which you can query only by value) and a Sort Key (which you can query by
range and use to sort the results).
For the public content, you may want to keep the most recent uploads in the public index, and
eventually query the database only if users start to look for old content (for example, browsing
by range). In this case, you can use a subset of the uploadDate (for example,
the uploadDay without the time) as Partition Key of the index, and then the full uploadDate as
Sort Key, as in table 11.4. In this way, you can get the most recent N uploads today, and if that
isn’t enough, you can query for yesterday’s uploads, and so on.
Table 11.4. DynamoDB Global Secondary Index (GSI) for public content lookups
Attribute
Type
Description
uploadDay Attribute
(String)
Partition Key for the index. The day (without time) of the upload, taken from
S3 metadata.
uploadDate Attribute
(String)
Sort Key for the index. The full date of the upload, including time, as taken
from S3 metadata.