1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Cơ sở dữ liệu >

Chapter 11. Building a media-sharing application

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.79 MB, 388 trang )


The app should upload pictures privately or publicly, for all other users to see. Users should see

thumbnails of the pictures they’re allowed to see, together with metadata, such as a title and a

description.

Communication between the client application and the back-end functionality is based on APIs.

You’ll use JavaScript to build the client application so that it can run in a web page on a desktop

or a mobile device. If you’re a mobile developer, it will be easy to implement the same client for

different devices; for example, using AWS Mobile Hub to kick-start a native app, as described

in chapter 7.



Note



This example uses both client-side (running in the browser) and server-side (running in Lambda

functions) code. Because the code running in the browser is JavaScript, the Lambda function

examples are also provided in JavaScript. The implementation of those functions in Python is

left as an exercise for you to do on your own, because it doesn’t change the architecture or the

logic of the application.



For this type of application, you expect uploads of new content to be far less frequent than the

number of times that content is accessed by users. For efficiency, you can create a static index of

the public and private pictures users can see, so that you can use this index to display the

pictures, instead of running queries on a database. For example, the index can be a file

containing all the information you need to display the pictures in the client application. You can

use any structured format for that, such as XML or YAML. I use JSON because it’s natively

supported in JavaScript.



Tip



Caching is an important optimization that can improve the scalability of your applications and

reduce the latency that users experience. Caching is a common architectural pattern in software

and hardware implementations: multiple caches exist inside the CPUs we normally use, caches

are in the databases (for example, for common retrieved data), caches are in the network stacks

(for example, for DNS results), and so on. You should always consider what data you can safely

cache, and for how long, when building your applications.



11.1.1. Simplifying the implementation



When implementing a new application, you should look at any service or feature that you can

use so that you can reduce the development footprint and the time to market of your solution.

This is even more important if you take a “lean” approach, as suggested, for example, in the

book The Lean Startup by Eric Ries (Crown Business, 2011). According to this practice, you

want to quickly release a Minimum Viable Product (MVP) to your users, an early



implementation with enough features to validate (or learn and change) your product and

business model, and then rapidly iterate with new features on top of that.

Let’s use a lean approach here and replace functionalities you need with services that can

provide them, making a few implementation decisions that will simplify the architecture to

build:













For the repository of pictures, thumbnails, and index files, you can use Amazon S3. That

way, you can directly upload or download pictures or index files using the S3 API, and

you don’t need to implement that yourself.

For metadata, such as the title or description or to store the links to the pictures and their

thumbnails, you can use Amazon DynamoDB, a NoSQL database service. Again, you can

use the DynamoDB API to read or update content metadata without implementing those

APIs from scratch.

To react to changes in the files repository (Amazon S3) or the database (Amazon

DynamoDB), you can use Lambda functions triggered by events.



Following these decisions, the architecture in figure 11.1 can be mapped to a technical

implementation to become what you see in figure 11.2. Of the eight Lambda functions that we

needed to implement in our initial assessment, only three are left in this simplified architecture.

The other five functions are replaced by direct usage of S3 and DynamoDB APIs.

Figure 11.2. Mapping the media-sharing app to a technical implementation using Amazon S3, Amazon DynamoDB, and AWS Lambda.

The development is much simpler than the implementation in figure 11.1 because most functions are directly implemented by the

AWS services themselves, such as picture upload and download (using the S3 API) and metadata read and update (using the

DynamoDB API).



One question that should be on your mind is whether you’re sure you can replace all those

functions by natively using Amazon S3 or Amazon DynamoDB. To answer that, you need to

check whether the functions and the security features satisfy the requirements of your

implementation. You’ll see that as we build the app.



To authenticate and authorize the client application to use AWS APIs, you can use Amazon

Cognito (figure 11.3). As you learned in chapter 6, AWS IAM roles allow a fine-grained control of

access to AWS resources; for example, using policy variables to limit client access to the S3

bucket and the DynamoDB table. You’ll use those features again in this implementation.

Figure 11.3. Using Amazon Cognito, you can give secure and finely controlled access to AWS resources, such as the S3 bucket and the

DynamoDB table, allowing the client application to directly use S3 and DynamoDB APIs.



You can now map the building blocks shown in figure 11.3 to the new, ready-to-use

implementation domains that they’re part of, such as Amazon S3, Amazon DynamoDB, or AWS

Lambda (figure 11.4). Let’s see the new simplified architecture in more detail.



Figure 11.4. Using the new mapping, the building blocks of the architecture are mapped to the implementation domains they’re part

of, such as Amazon S3, Amazon DynamoDB, or AWS Lambda.



The front end to the client application is now all based on the S3 API (for manipulating pictures

and files) and DynamoDB API (for the metadata). In particular, you’re using the S3 PUT Object

to upload or updated content and the GET Object to download it. With Amazon DynamoDB,

you’re using GetItem, to retrieve an item by primary key, and UpdateItem to update it.

The client doesn’t need direct access to DynamoDB PutItem to create a new item in the

database, because when a new piece of content is uploaded on the S3 bucket,

the extractAndUpdateMetadata Lambda function in the back end will read the custom metadata

from the S3 object and insert the new item with that information in the DynamoDB table.

The buildThumbnails Lambda function will react to the same event (new or updated file) to

create a small thumbnail of the picture that you can use to visualize the content in the client. The

thumbnail is stored in the same S3 bucket, with a different prefix.

Finally, the updateContentIndex Lambda function is triggered by a change in the metadata table

on DynamoDB to keep static index files on the S3 bucket updated with all changes.



Note



The Lambda functions are triggered in the back end by events and don’t need to be directly

accessible by clients. Security-wise, this is good because you’re exposing well-proven AWS APIs

to the clients and not your own custom implementations.



11.1.2. Consolidating functions



When you start designing the architecture of your application, you create different modules for

different functionalities. But when you move into the implementation phase, you may find that

several of those modules (functions, in the case of AWS Lambda) are tied together by the data

they use or by the way they’re used by the application.

The extractAndUpdateMetadata and buildThumbnails Lambda functions are triggered by the

same event (new or updated content in the S3 bucket) and can be tied together directly. For

example, the first function can asynchronously invoke the second before terminating, as shown

in figure 11.5.

Figure 11.5. When two functions are triggered by the same event, you can decide to group them together. For example, one function

can asynchronously invoke the other before terminating, as in the case

of extractAndUpdateMetadata and buildThumbnails here.



Continuing with the implementation of those two functions, both need the same data in input:









buildThumbnails needs the picture file to create the thumbnail.

extractAndUpdateMetadata needs the object metadata to put that information in the



database.

But Amazon S3 has two operations that you can use:







GET object, to read the whole object, file, and metadata.

HEAD object, to retrieve only the metadata without the file.



The two functions would need to read the same S3 object twice, once with the GET and once with

the HEAD, but this approach is not optimal at scale, where you can have thousands or even

millions of objects.

In this case, my suggestion is to create a single function that will use the same input to create the

thumbnail and process the metadata (figure 11.6).

Figure 11.6. Grouping extractAndUpdateMetadata and buildThumbnails Lambda functions in a

single contentUpdated function to optimize storage access and read the S3 object only once



How small should your function be?



Grouping more functions together is an architectural decision that you should evaluate when you

create an event-driven application. You probably have two opposite effects to balance with your

decisions:









Having more and smaller functions can improve the modularity of your application.

Smaller functions also have a quicker startup time on AWS Lambda for the first

invocation when the container running the functions is deployed under the hood.

Having fewer and bigger functions can simplify code reuse and optimize (as in our case)

the flow of data to avoid reading or writing again to the same database or file.



11.1.3. Evolving an event-driven architecture



One of the advantages of event-driven applications, and reactive architectures in general, is that

you link the logic (in the functions) to the data flow instead of building a centralized workflow.

For example, you may want to add the option for users to delete content from the media-sharing

app.

To add this functionality, you need to have a new delete API for your clients. But Amazon S3

already has an implementation for that! It’s the DELETE Object API. You only need to manage the

deleted file event from the S3 bucket in the contentUpdated Lambda function and keep the

content index updated in case of deletion in the updateContentIndex function (figure 11.7).



Figure 11.7. Clients can use the S3 DELETE Object API to delete content. You need to manage the deletion event in the Lambda

functions to keep the metadata updated in the database and in the content index.



Adding a new feature to an event-driven application is much easier than in a procedural

approach because you can focus on the relations between resources. When you use a similar

approach in your projects, you’ll often find that adding certain features will be easier—as you

experienced with deleting content—because of how data is modeled and how you can react to

changes. Sometimes this won’t be the case, and adding a feature will be complex. In that case,

my suggestion is to look again at the data you have and see if a different approach to how data is

stored (files, relational, or NoSQL databases) can simplify the overall flow and the

implementation of the new feature.

You now have a better idea of how to map functions into software modules and of the services

you can use to make your implementation quick but effective. Next, let’s see how we structure

our data to support the event-driven approach we’re undertaking.



11.2. DEFINING AN OBJECT NAMESPACE FOR AMAZON S3

In the S3 bucket, you have the content (the pictures), the thumbnails, and the static indexes that

your Lambda functions keep updated. You need to have a public index for all public content

(that is the same for all users) and a private index for each user for their own private content.

Amazon S3 isn’t a hierarchical repository, but in defining the keys you want to use for those

objects, you can choose a hierarchical syntax that can allow you to do the following:









Trigger the contentUpdated Lambda function only when needed

Give access to public and private content to only the right users via Amazon Cognito and

IAM roles



Warning



You should carefully avoid the possibility of having endless loops of events, such as events

triggering functions that can change something on another resource; for example, an S3 bucket

or a DynamoDB table that could trigger the same function again.



My proposed hierarchical syntax for S3 keys is depicted in figure 11.8. In the bucket, you have

two main prefixes, public/ and private/, to maintain a strong separation between public and

private content, which can be mapped into IAM roles.

Figure 11.8. A hierarchical syntax for the S3 keys used in the S3 bucket, to protect access via IAM roles and allow events with

predefined prefixes to trigger the correct Lambda functions



Each of those two prefixes has a space for content/, as uploaded by the clients, thumbnails/that

are created by the contentUpdated Lambda function, and space for the static index/ files.



The main difference between the private/ and public/ spaces is a single public index file and

specific private index file for each user. The {identityId} part of the keys is to be replaced by

the actual ID given by Amazon Cognito to the users upon their first login.

For each path in S3, different users (authenticated or not by Amazon Cognito) and Lambda

functions can read or write, as described in table 11.1.

Table 11.1. Who can read or write in the different S3 paths

S3 Path



Who can read



Who can write



public/index/content.json



All users (authenticated or not)



The updateContentIndex

Lambda function



public/content/{identityId}/*



All users (authenticated or not)

Authenticated users with the

and the contentUpdated Lambda same identityId

function



public/thumbnails/{identityId}/*



All users (authenticated or not)



The contentUpdated

Lambda function



private/index/{identityId}/content.json Authenticated users with the same The contentUpdated

identityId

Lambda function

private/content/{identityId}/*



Authenticated users with the same Authenticated users with the

identityId and the contentUpdated same identityId function

Lambda function



private/thumbnails/{identityId}/*



Authenticated users with the same The contentUpdated

identityId

Lambda function



Table 11.2 lists the S3 prefixes that will trigger a Lambda function and the corresponding

function name.

Table 11.2. Prefixes in the event sources for the Lambda functions

Prefix on S3



Lambda function



public/content/



contentUpdated



private/content/



contentUpdated



11.3. DESIGNING THE DATA MODEL FOR AMAZON DYNAMODB

DynamoDB tables don’t have a fixed schema; when you create a table, you need to define the

primary key, which can be a single Partition Key or a composite key with a Partition Key and a

Sort Key. In this case, you can use a content table with a composite key, with identityId as

Partition Key and the objectKey as Sort Key (table 11.3). Both attributes are strings.



Table 11.3. DynamoDB content table

Attribute



Type



Description



identityId



Partition Key Part of the Primary Key. The identityId of the user, as provided by Amazon

(String)

Cognito. Only authenticated users with the same identityId can read and

write an item in the table.



objectKey



Sort Key

(String)



Part of the Primary Key. The key of the object on Amazon S3.



thumbnailKey Attribute

(String)



The key of the thumbnail on Amazon S3.



isPublic



Attribute

(Boolean)



The content is publicly shared (“true”) or not (“false”).



title



Attribute

(String)



A title for the content.



description



Attribute

(String)



A description for the content.



uploadDate



Attribute

(String)



The full date of the upload, including time, as taken from S3 metadata.



uploadDay



Attribute

(String)



The day (without time) of the upload, taken from S3 metadata, used by a

global secondary index to quickly query for recent uploads.



To query for public content, you need to create a Global Secondary Index (GSI) that’s composed

of a Partition Key (which you can query only by value) and a Sort Key (which you can query by

range and use to sort the results).

For the public content, you may want to keep the most recent uploads in the public index, and

eventually query the database only if users start to look for old content (for example, browsing

by range). In this case, you can use a subset of the uploadDate (for example,

the uploadDay without the time) as Partition Key of the index, and then the full uploadDate as

Sort Key, as in table 11.4. In this way, you can get the most recent N uploads today, and if that

isn’t enough, you can query for yesterday’s uploads, and so on.

Table 11.4. DynamoDB Global Secondary Index (GSI) for public content lookups

Attribute



Type



Description



uploadDay Attribute

(String)



Partition Key for the index. The day (without time) of the upload, taken from

S3 metadata.



uploadDate Attribute

(String)



Sort Key for the index. The full date of the upload, including time, as taken

from S3 metadata.



Xem Thêm
Tải bản đầy đủ (.pdf) (388 trang)

×