DynamoDB Local Secondary Index

A local secondary index essentially gives DynamoDB tables an additional sort key by which to query data. I recently needed to add a local secondary index to an existing DynamoDB table through CloudFormation so that I could query objects by the modified column value without scanning. Unfortunately, this isn't as easy as simply adding the LocalSecondaryIndexes element to the CloudFormation script.

Adding a local secondary index requires replacing the table, which creates the challenge of preserving the table data during the upgrade.

Here's what I learned:

  • LocalSecondaryIndex CloudFormation YAML Syntax
  • How to troubleshoot the error Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.
  • How to take DynamoDB table backups with the AWS CLI
  • A work-around to prevent the error CloudFormation cannot update a stack when a custom-named resource requires replacing....
  • Basic JSON parsing with jq

Initial Table Schema: Jobs Table

The original schema for the Jobs table consisted of a generated jobId partition key and jobName sort key. The system is used for invoicing, so its important for users to quickly find a Job by its ID or name so they made a good choice fot they key schema.

Here is the CloudFormation code that originally defined the Jobs table

# app.cloudformation.yaml
JobsDynamoDBTable:
	Type: 'AWS::DynamoDB::Table'
	Properties:
		TableName: Jobs
		AttributeDefinitions:
			-
				AttributeName: jobId
				AttributeType: S
			-
				AttributeName: name
				AttributeType: S
		KeySchema:
			-
				AttributeName: jobId
				KeyType: HASH
			-
				AttributeName: name
				KeyType: RANGE
		ProvisionedThroughput:
			ReadCapacityUnits: 1
			WriteCapacityUnits: 1

Recently, the client asked for some reporting to be added to the application that requires querying jobs by their last modified date. No problem, that's what local secondary indexes were built for.

Add the LocalSecondaryIndex to the CloudFormation Script

The LocalSecondaryIndexes element for DynamoDB in CloudFormation is straight-forward. Provide a descriptive key name, alternate key schema, and Projection (columns that should be available in the index).

LocalSecondaryIndexes:
  -
    IndexName: jobId-modified-index
    KeySchema:
      -
        AttributeName: jobId
        KeyType: HASH
      -
        AttributeName: modified
        KeyType: RANGE
    Projection:
      ProjectionType: ALL

An attempt to deploy this change will result in an error:

An error occurred: JobsDynamoDBTable - Property AttributeDefinitions is inconsistent with the KeySchema of the table and the secondary indexes.

Of course! The AttributeDefinitions must be updated to include the modified property. Once updated, the entire JobsDynamoDBTable CloudFormation resource element now looked like this:

# app.cloudformation.yaml
JobsDynamoDBTable:
	Type: 'AWS::DynamoDB::Table'
	Properties:
		TableName: Jobs
		AttributeDefinitions:
			-
				AttributeName: jobId
				AttributeType: S
			-
				AttributeName: name
				AttributeType: S
			# Required for LocalSecondaryIndex
			-
				AttributeName: modified
				AttributeType: N
		KeySchema:
			-
				AttributeName: jobId
				KeyType: HASH
			-
				AttributeName: name
				KeyType: RANGE
		# Make modified available as a sort key
		LocalSecondaryIndexes:
			-
				IndexName: jobId-modified-index
				KeySchema:
					-
						AttributeName: jobId
						KeyType: HASH
					-
						AttributeName: modified
						KeyType: RANGE
				Projection:
					ProjectionType: ALL # Make All columns available in index
		ProvisionedThroughput:
			ReadCapacityUnits: 1
			WriteCapacityUnits: 1

Though the CloudFormation template now seems complete, deploying fails:

An error occurred: JobsDynamoDBTable - CloudFormation cannot update a stack when a custom-named resource requires replacing. Rename Jobs and update the stack again.

Replace the a DynamoDB Resource

CloudFormation refuses to add the index to the DynamoDB table, so it must be replaced with a new table. However, with application data in the table, it can't simply be deleted. The plan, then, is to create a new table called Jobs.v1 with the index, then to move data from Jobs to Jobs.v1 and finally (optionally) delete the original Jobs table.

Retain the Original Table

A rename of the table will result in the original Jobs table being deleted. To prevent this, add DeletionPolicy: Retain to table object in your CloudFormation script and deploy. For safety sake, make a backup as well. Backups can be created in the AWS Console or via the CLI.

Create a DynamoDB Backup with the AWS CLI

Here's an example of how to create a backup using the AWS CLI:

$ aws dynamodb create-backup \
  --region=YOUR_REGION_HERE \
  --table-name Jobs \
  --backup-name Jobs-$(date '+%Y%m%d_%H%M%S')
{
  "BackupDetails": {
    "BackupCreationDateTime": 1548578121.641,
    "BackupName": "Jobs-20190127_033734",
    "BackupStatus": "CREATING",
    "BackupType": "USER",
    "BackupSizeBytes": 123456,
    "BackupArn": "arn:aws:dynamodb:us-east-1:123456789012:table/Jobs/backup/01548578121641-f46bes63"
  }
}

It's important to let the backup finish before proceeding with further changes, but the aws dynamodb backup command returns almost instantly because the backup operation is asynchronous.

Waiting on the backup to complete can be done with a simple script and a little help from the jq utility.

First, parse out the ARN using jq:

$ backup_arn=$(aws dynamodb create-backup \
  --region=YOUR_REGION_HERE \
  --table-name Jobs \
  --backup-name Jobs-$(date '+%Y%m%d_%H%M%S')) \
| jq '.BackupDetails.BackupArn' -r) # -r for "raw" returns value w/ out quotes

Next, loop until the backup is no longer in a CREATING state:

$ backup_status="CREATING"
$ while [ "$backup_status" = "CREATING" ]; do \
  echo "... Waiting for backup to complete, backup_status="$backup_status \
  backup_status=$(aws dynamodb describe-backup \
    --backup-arn $backup_arn \
    --region=YOUR_REGION_HERE \
  | jq '.BackupDescription.BackupDetails.BackupStatus' -r)\
  sleep 2 \
done \
echo "Backup complete"
... Waiting for backup to complete, backup_status=CREATING
... Waiting for backup to complete, backup_status=CREATING
... Waiting for backup to complete, backup_status=CREATING
Backup complete

$

Rename the Table

With the Retain policy and backup in place it's safe to rename the Jobs table and deploy. The first time I did this, I had forgotten to deploy the DeletionPolicy: Retain change in my CloudFormation and deleted the original table, along with all the data. Take backups!

The entire table definition now looks like this:

# app.cloudformation.yaml
JobsDynamoDBTableV1: # Renamed
	Type: 'AWS::DynamoDB::Table'
	# Retain the table when deleting from CF Stack!
	DeletionPolicy: Retain
	Properties:
		TableName: Jobs.v1 # Renamed
		AttributeDefinitions:
			-
				AttributeName: jobId
				AttributeType: S
			-
				AttributeName: name
				AttributeType: S
			# Required for LocalSecondaryIndex
			-
				AttributeName: modified
				AttributeType: N
		KeySchema:
			-
				AttributeName: jobId
				KeyType: HASH
			-
				AttributeName: name
				KeyType: RANGE
		# Make modified available as a sort key
		LocalSecondaryIndexes:
			-
				IndexName: jobId-modified-index
				KeySchema:
					-
						AttributeName: jobId
						KeyType: HASH
					-
						AttributeName: modified
						KeyType: RANGE
				Projection:
					ProjectionType: ALL
		ProvisionedThroughput:
			ReadCapacityUnits: 1
			WriteCapacityUnits: 1

Before deploying, update references to the Jobs table and JobsDynamoDBTable resource to Jobs.v1 and JobsDynamoDBTableV1, respectively throughout your application and CloudFormation script.

Deploy and Migrate

The deployment will consist of two steps: The deployment its self, then the migration of the Jobs DynamoDB table to the Jobs.v1 table.

The deployment of the CloudFormation template is simple:

aws cloudformation deploy --template-file app.cloudformation.yaml --stack-name app

Once completed, the migration from the Jobs table to Jobs.v1 can begin.

Copy All Rows from one DynamoDB Table to Another via CLI

To copy all the rows from one DynamoDB table to another uses two primary commands with the AWS CLI: aws dynamodb scan to retrieve rows from the source table and aws dynamodb batch-write-item to write records to the destination.

Limitations of batch-write-item

Complicating matters is that batch-write-item can perform a maximum of 25 operations per call. Since the source table contains more than 25 records, we'll have to loop through.

Paginating Reads, 25 at a Time

Our migration script will retrieve 25 items at a time, write them to a file, then use batch-write-item to copy them to the destination. To paginate through the data, both the --page-size and --max-items commands must be set to 25:

aws dynamodb scan --profile hvh --region=us-east-1 \
  --table-name Jobs \
  --page-size 25 \
  --max-items 25

This call can be wrapped in a loop to page through the data and the variables needed parsed using jq:

# migrate.sh
set -e
page=1
while : ; do # do .. while equivalent
  echo "  Scanning page $page, up to 25 records in Jobs table"
  scan_result=$(aws dynamodb scan --profile hvh --region=us-east-1 \
    --table-name Jobs \
    --page-size 25 \
    --max-items 25 \
    $starting_token)
  echo "$scan_result"

  # Get pagination token:
  next_token=$(echo "$scan_result" | jq '.NextToken' -r)
  # Set starting_token for next iteration (blank on first iteration)
  starting_token="--starting-token $next_token"

  # Break loop, if no more pages
  [[ "$next_token" != "null" ]] || break
  # Increment page count
  ((page++))
done

Transforming DynamoDB Scan Results to PutRequest Elements

Each item returned from aws dynamodb scan must be transformed into a PutRequest element for compatibility with batch-write-item. Each set of 25 scan results can be written to a temporary file then passed to batch-write-item to perform the insert. The entire, finished script now looks like this:

# migrate.sh
set -e
page=1

echo "Copying all rows from Jobs to Jobs.v1, 25 rows at a time"
while : ; do # do .. while equivalent
    echo "  Scanning page $page, up to 25 records in Jobs table"
    scan_result=$(aws dynamodb scan --profile hvh --region=us-east-1 \
        --table-name Jobs \
        --page-size 25 \
        --max-items 25 \
        $starting_token)

    # Transform the scan result to a PutRequest:
    put_req_json=$(echo $scan_result | jq '[.Items | .[] | { PutRequest: { Item: . } }]')
    # Get number of records:
    rec_count=$(echo $put_req_json | jq 'length')
    # Get pagination token:
    next_token=$(echo "$scan_result" | jq '.NextToken' -r)
    # Set starting_token for next iteration (blank on first iteration)
    starting_token="--starting-token $next_token"

    echo "  Writing $rec_count records to temporary file"
    tmp_file_name="Jobs.v1.$page.json"
    echo '{
        "Jobs.v1": '"${put_req_json}"'
    }' > "Jobs.v1.$page.json"

    echo "  Writing $rec_count to Jobs.v1"
    aws dynamodb batch-write-item \
        --region=YOUR_REGION_HERE \
        --request-items file://"$tmp_file_name" \
    | sed 's/^/  /' # Indent output

    echo "  Deleting temporary file"
    rm "$tmp_file_name"

    # Break loop, if no more pages
    [[ "$next_token" != "null" ]] || break
    # Increment page count
    ((page++))
done

echo "Copied all rows from Jobs to Jobs.v1"

Clean Up: Delete the Original Table (optional)

The original Job table is no longer in use, and may be deleted safely at this point:

aws dynamodb delete-table \
  --region=YOUR_REGION_HERE \
  --table-name Jobs

That's it! The original Jobs table has been migrated to Jobs.v1 with a Local Secondary Index applied and the original table removed.

Broader Topics Related to DynamoDB & CloudFormation: Add a Local Secondary Index to Existing Table

AWS Command Line Interface (CLI)

The AWS CLI is a tool to manage AWS services via command line

AWS CloudFormation

AWS CloudFormation is a service to model AWS infrastructure as code

AWS DynamoDB

AWS DynamoDB is a 'NoSQL' key-value and document database

Database as Code

A technique to version-control and automate the deployment of database schemas

Infrastructure as Code (IaC)

A technique to version-control and automate the deployment of virtualized infrastructure

DynamoDB & CloudFormation: Add a Local Secondary Index to Existing Table Knowledge Graph