8 minutes
Postgres: Benchmarking concurrently safe upsert queries
In my previous post I covered how to build concurrently safe upsert queries .
This post serves as a continuation where different approaches will be battle tested to expose the performance of each one:
- Advisory lock
- On conflict update
- On conflict do nothing
- CTE queries
Without further ado, let’s start:
The application
In order to emulate the scenario described on the previous post, an HTTP API is required. This one will connect to a database to perform the calls. The application is written in Go and can be found here .
The application is plain simple, as it contains few files:
- cmd/main.go: Responsible for bootstrapping the postgres connection and http server.
- server/server.go: Exposes different endpoints, performs body parsing and error checking.
- domain/domain.go: Exposes the
UpsertedRow
struct that will be returned by postgres. - postgres/postgres.go: Contains functions for each approach listed above, all with the same signature so they can be consumed easily on the server file. Yes, on an a real world app, do not inject database handlers directly on the transport layer.
Pre-requisites
Since we are going to use postgres, an instance needs to be set up, and for convenience this example uses docker:
docker run --name db -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres:latest
A file named seed.psql
is also present on the root of the repository:
CREATE DATABASE customer_svc;
CREATE TABLE customers(id SERIAL, customer_id UUID NOT NULL PRIMARY KEY);
Running the application
The application can be run in a container as well:
docker build -t pg-upsert-app:local .
docker run -p 8080:8080 --link db --name app -i pg-upsert-app:local
Benchmarking
For the benchmark, the tool vegeta is used. This tool allows doing HTTP load testing which will be used against the API.
The test suite is just a bash script that will feature a similar logic for each approach to be tested; example:
UPSERT_LOCK=$(echo '{
method: "POST",
url: "http://app:8080/upsert-lock",
body:"{\"id\": \"<ruuid>\"}" | @base64,
header: {"Content-Type": ["application/json"]}
}' | sed "s/<ruuid>/$(uuidgen | tr '[:upper:]' '[:lower:]')/g")
echo "EXECUTING FOR LOCKS"
jq -ncM "$UPSERT_LOCK" \
| vegeta attack -format=json -duration=40s -connections=20 -rate=100 | vegeta encode \
| vegeta report -type="hist[0,2ms,4ms,6ms,8ms,10ms,15ms]"
This is translated to the following:
- Echo a
POST
request towardshttp://app:8080/upsert-lock
with the payload{"id": "some-uuid-generated-at-runtime"}
. This uuid will be repeated for each call performed in the given scenario. - Parse this echoed string as a json, and use it as an input for vegeta tool.
- Execute requests for 40 seconds with 20 concurrent connections at a rate of 100 request per second; in other words 4000 request distributed across 20 connections for 40 seconds.
- The results will be aggregated into different buckets determined by
hist[0,2ms,4ms,6ms,8ms,10ms,15ms]
.
The execution of this test is also dockerized so there is no need to download additional tools:
docker build -t benchmark:local -f benchmark/Dockerfile benchmark/.
docker run --link app -t benchmark:local
In order to avoid results influenced by the warm up phase of the app, the test suite was executed twice and only the results from the last execution are shown below.
Advisory lock
As the previous post indicates this approach is the less scalable. The introduction of a custom application lock in the database is not a good design model and the results are showing that:
EXECUTING FOR LOCKS
Bucket # % Histogram
[0s, 2ms] 974 24.35% ##################
[2ms, 4ms] 2771 69.27% ###################################################
[4ms, 6ms] 244 6.10% ####
[6ms, 8ms] 10 0.25%
[8ms, 10ms] 0 0.00%
[10ms, 15ms] 1 0.03%
[15ms, +Inf] 0 0.00%
Not only that but the code is also more complex than other solutions:
func (repo *pgCustomerRepo) UpsertCustomerLock(ctx context.Context, id uuid.UUID) (res domain.UpsertedRow, err error) {
tx, err := repo.dbHandler.BeginTx(ctx, nil)
if err != nil {
return
}
_, err = tx.ExecContext(ctx, "SELECT pg_advisory_xact_lock($1)", hash(id.String()))
if err != nil {
return
}
r := tx.QueryRowContext(ctx, "SELECT ctid, xmax, id FROM customers WHERE customer_id = $1", id)
err = r.Scan(&res.CTID, &res.XMAX, &res.ID)
if err != nil && err == sql.ErrNoRows {
q := "INSERT INTO customers(customer_id) VALUES($1) RETURNING ctid, xmax, id"
row := tx.QueryRowContext(ctx, q, id)
err = row.Scan(&res.CTID, &res.XMAX, &res.ID)
}
if err == nil {
err = tx.Commit()
} else {
err = tx.Rollback()
}
return
}
func hash(s string) int64 {
h := fnv.New64a()
h.Write([]byte(s))
return int64(h.Sum64())
}
- Create the transaction.
- Get a deterministic
int64
for the given uuid, and perform the lock (subsequent calls will have to wait here until the transaction ends) - Get the customer.
- If customer does not exist, create it and return it back.
- Commit the transaction.
Upsert on conflict
This one is most likely the most common solution for these kind of cases. The results are better than the previous one but it is still not a good design.
Remember, updating on conflict will cause the following side effects:
- Old row is removed, a new is added.
- It modifies the row transaction id, an “old” that got updated row will have a new transaction timestamp.
- Firing triggers affecting the table.
For a better refresher, click here
EXECUTING FOR UPSERT CONFLICT
Bucket # % Histogram
[0s, 2ms] 1777 44.42% #################################
[2ms, 4ms] 1938 48.45% ####################################
[4ms, 6ms] 158 3.95% ##
[6ms, 8ms] 72 1.80% #
[8ms, 10ms] 33 0.83%
[10ms, 15ms] 18 0.45%
[15ms, +Inf] 4 0.10%
The code here is simpler, so this one feels easier:
func (repo *pgCustomerRepo) UpsertCustomerConflict(ctx context.Context, id uuid.UUID) (res domain.UpsertedRow, err error) {
query :=
"INSERT INTO customers(customer_id) VALUES($1) " +
"ON CONFLICT (customer_id) DO UPDATE SET customer_id = excluded.customer_id " +
"RETURNING ctid, xmax, id"
row := repo.dbHandler.QueryRowContext(ctx, query, id)
err = row.Scan(&res.CTID, &res.XMAX, &res.ID)
return
}
- Insert the customer.
- If customer exists, update (delete + create) the row overriding the id with the same id.
- Get back the data.
Upsert do nothing
The following approach has its pros and cons. The pros are that there are no side effects being caused in the database, and the application is not doing any kind of locking. This approach can be referred to write first, read later.
There are two ways to achieve this technique:
- Same batch, write then read, one roundtrip.
- Two batches, first one: write, second one: read, two round trips.
If two batches are going to be used, the performance will end up being somewhat like the locking scenario, since there is a need of going several times to the database.
The example displayed below shows how to achieve it in only one batch; but then again this is part of the cons. By nature Postgres does not allow sending two parameterized query statements on the same batch; so the parameters should be resolved before sending the query which can led to SQL injection.
The example below shows how to bypass Postgres limit avoiding parameters; but this is not a good practice, as it can introduce security vulnerabilities in case we are dealing with parameters like strings.
func (repo *pgCustomerRepo) UpsertCustomerDoNothing(ctx context.Context, id uuid.UUID) (res domain.UpsertedRow, err error) {
query :=
"INSERT INTO customers(customer_id) VALUES($1) ON CONFLICT DO NOTHING;" +
"SELECT ctid, xmax, id FROM customers WHERE customer_id = $1"
query = strings.ReplaceAll(query, "$1", fmt.Sprintf("'%s'", id.String()))
row := repo.dbHandler.QueryRowContext(ctx, query)
err = row.Scan(&res.CTID, &res.XMAX, &res.ID)
return
}
- Insert the customer, if the customer exists, do nothing.
- Fetch the customer.
The performance if this is accomplished in one batch is good, but don’t do it as security is more important than premature optimization:
EXECUTING FOR UPSERT DO NOTHING
Bucket # % Histogram
[0s, 2ms] 3426 85.65% ################################################################
[2ms, 4ms] 572 14.30% ##########
[4ms, 6ms] 1 0.03%
[6ms, 8ms] 0 0.00%
[8ms, 10ms] 1 0.03%
[10ms, 15ms] 0 0.00%
[15ms, +Inf] 0 0.00%
CTE queries
Last, this approach is not the most pleasant to read but it has some pros:
- No side effects on the database.
- No custom application locking.
- Parameterized queries.
- Works with database engines that do not support
ON CONFLICT DO ...
EXECUTING FOR UPSERT CTE
Bucket # % Histogram
[0s, 2ms] 2950 73.75% #######################################################
[2ms, 4ms] 1034 25.85% ###################
[4ms, 6ms] 14 0.35%
[6ms, 8ms] 2 0.05%
[8ms, 10ms] 0 0.00%
[10ms, 15ms] 0 0.00%
[15ms, +Inf] 0 0.00%
The code is displayed below:
func (repo *pgCustomerRepo) UpsertCustomerCte(ctx context.Context, id uuid.UUID) (res domain.UpsertedRow, err error) {
query :=
"WITH " +
"search AS (SELECT ctid, xmax, id FROM customers WHERE customer_id = $1 LIMIT 1)," +
"add AS (INSERT INTO customers (customer_id) SELECT $1 WHERE NOT EXISTS(SELECT id from search) RETURNING ctid, xmax, id)" +
"SELECT ctid, xmax, id from add UNION ALL SELECT ctid, xmax, id from search"
row := repo.dbHandler.QueryRowContext(ctx, query, id)
err = row.Scan(&res.CTID, &res.XMAX, &res.ID)
return
}
- Prepare the
search
statement. - Prepare the
add
statement that will only add a customer ifsearch
does not return one. - Execute both statements with a
union
, only one of the statements will return the customer.
Recap
In this post different approaches were battle tested; there is no holy grail or golden rule when it comes to database modelling.
The results showed here do not indicate that your project has to use CTE queries. Always the healthier approach is to ask yourself and the team:
- Do we have this problem?
- Why do we have this problem?
- Is there anything we can do to get rid of this problem?
- Brainstorm session, avoid premature optimization, avoid over engineering.
- Take a decision.
Keeping this in mind, will probably avoid falling into design traps or solutions that were not needed in the first place.
As a last sentence, if you consider this is useful, do not hesitate on sharing this on your favorite social media. See you next time!
go golang postgres database sql benchmark upsert queries cte concurrency
1574 Words
2020-05-13 17:40 +0200