Transferring Information from Amazon S3 to Aurora PostgreSQL

An alternative approach is to divide the s3 file into sections and parse it to a file using the method outlined here. Then, connect to RDS Postgres instances through a named pipe, which is suggested by AWS in their documentation.


Solution:

In spite of being an old post, my intention is that it will be useful for those who are searching for information. The majority of the content is sourced from the official AWS documentation, but I have included additional notes to provide further assistance.

Here is the drill:

Establish a connection with the Postgres instance on RDS.

Execute the given command on the psql interface to generate the essential add-on for importing data from s3.

CREATE EXTENSION aws_s3 CASCADE;

Create the database:

CREATE DATABASE students;

Assuming the following three columns, make a table.

CREATE TABLE t1 (col1 varchar(80), col2 varchar(80), col3 varchar(80));

It is important to ensure that the number of columns in your S3 bucket’s csv file remain consistent, unless you purposely intend to exclude certain columns, which is not covered in this post.

To create an IAM role for a PostgreSQL DB cluster, the console can be used.

  1. Access the Amazon RDS console by logging into the AWS Management Console at https://console.aws.amazon.com/rds/.
  2. Select the name of the PostgreSQL DB cluster to view its specifics.
  3. Under the Manage IAM roles section located in the Connectivity & Security tab, select the desired role to include in the instance’s IAM roles by clicking on Add IAM roles to this instance.
  4. Under Feature, choose s3Import.
  5. <p>
    Choose Add role.
    </p>

    The first statement utilizes the function aws_commons.create_s3_uri to create a URI for a specific file in an S3 bucket, which includes the name of the bucket, the path to the file, and the region where the bucket is located. It then assigns the URI to a variable named s3_uri using the gset command. The second statement imports the contents of the file located at the URI specified in the s3_uri variable to a table named t1 using the aws_s3.table_import_from_s3 function. The file is in CSV format, which is specified in the function.

Use the aws_region (such as us-east-1) to specify the file.csv’s full/path/to/ location, rather than appending the bucket name. Do not include an Availability Zone in this specification.

You should get something like:

1000 rows imported into relation "t1" from file ...

Test with a sample query:

SELECT * FROM t1 LIMIT 5;

HTH.

Frequently Asked Questions