Insert, on duplicate update in PostgreSQL?

Questions : Insert, on duplicate update in PostgreSQL?

Several months ago I learned from an answer on Stack Overflow how to perform multiple updates at once in MySQL using the following syntax:

INSERT INTO table (id, field, field2) VALUES (1, A, X), (2, B, Y), (3, C, Z) ON DUPLICATE KEY UPDATE field=VALUES(Col1), field2=VALUES(Col2); 

I’ve now switched over to PostgreSQL and apparently this is not correct. It’s referring to all the correct tables so I assume it’s a matter of different keywords being used but I’m not sure where in the PostgreSQL documentation this is covered.

To clarify, I want to insert several things and if they already exist to update them.

Total Answers: 16 Answers 16


Popular Answers:

  1. PostgreSQL since version 9.5 has UPSERT syntax, with ON CONFLICT clause. with the following syntax (similar to MySQL)

    INSERT INTO the_table (id, column_1, column_2) VALUES (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z') ON CONFLICT (id) DO UPDATE SET column_1 = excluded.column_1, column_2 = excluded.column_2; 

    Searching postgresql’s email group archives for “upsert” leads to finding an example of doing what you possibly want to do, in the manual:

    Example 38-2. Exceptions with UPDATE/INSERT

    This example uses exception handling to perform either UPDATE or INSERT, as appropriate:

    CREATE TABLE db (a INT PRIMARY KEY, b TEXT); CREATE FUNCTION merge_db(key INT, data TEXT) RETURNS VOID AS $$ BEGIN LOOP -- first try to update the key -- note that "a" must be unique UPDATE db SET b = data WHERE a = key; IF found THEN RETURN; END IF; -- not there, so try to insert the key -- if someone else inserts the same key concurrently, -- we could get a unique-key failure BEGIN INSERT INTO db(a,b) VALUES (key, data); RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing, and loop to try the UPDATE again END; END LOOP; END; $$ LANGUAGE plpgsql; SELECT merge_db(1, 'david'); SELECT merge_db(1, 'dennis'); 

    There’s possibly an example of how to do this in bulk, using CTEs in 9.1 and above, in the hackers mailing list:

    WITH foos AS (SELECT (UNNEST(%foo[])).*) updated as (UPDATE foo SET foo.a = foos.a ... RETURNING foo.id) INSERT INTO foo SELECT foos.* FROM foos LEFT JOIN updated USING(id) WHERE updated.id IS NULL; 

    See a_horse_with_no_name’s answer for a clearer example.

  2. With PostgreSQL 9.1 this can be achieved using a writeable CTE (common table expression):

    WITH new_values (id, field1, field2) as ( values (1, 'A', 'X'), (2, 'B', 'Y'), (3, 'C', 'Z') ), upsert as ( update mytable m set field1 = nv.field1, field2 = nv.field2 FROM new_values nv WHERE m.id = nv.id RETURNING m.* ) INSERT INTO mytable (id, field1, field2) SELECT id, field1, field2 FROM new_values WHERE NOT EXISTS (SELECT 1 FROM upsert up WHERE up.id = new_values.id) 

    See these blog entries:


    Note that this solution does not prevent a unique key violation but it is not vulnerable to lost updates.
    See the follow up by Craig Ringer on dba.stackexchange.com

  3. I was looking for the same thing when I came here, but the lack of a generic “upsert” function botherd me a bit so I thought you could just pass the update and insert sql as arguments on that function form the manual

    that would look like this:

    CREATE FUNCTION upsert (sql_update TEXT, sql_insert TEXT) RETURNS VOID LANGUAGE plpgsql AS $$ BEGIN LOOP -- first try to update EXECUTE sql_update; -- check if the row is found IF FOUND THEN RETURN; END IF; -- not found so insert the row BEGIN EXECUTE sql_insert; RETURN; EXCEPTION WHEN unique_violation THEN -- do nothing and loop END; END LOOP; END; $$; 

    and perhaps to do what you initially wanted to do, batch “upsert”, you could use Tcl to split the sql_update and loop the individual updates, the preformance hit will be very small see http://archives.postgresql.org/pgsql-performance/2006-04/msg00557.php

    the highest cost is executing the query from your code, on the database side the execution cost is much smaller

  4. There is no simple command to do it.

    The most correct approach is to use function, like the one from docs.

    Another solution (although not that safe) is to do update with returning, check which rows were updates, and insert the rest of them

    Something along the lines of:

    update table set column = x.column from (values (1,'aa'),(2,'bb'),(3,'cc')) as x (id, column) where table.id = x.id returning id; 

    assuming id:2 was returned:

    insert into table (id, column) values (1, 'aa'), (3, 'cc'); 

    Of course it will bail out sooner or later (in concurrent environment), as there is clear race condition in here, but usually it will work.

    Here’s a longer and more comprehensive article on the topic.

  5. Personally, I’ve set up a “rule” attached to the insert statement. Say you had a “dns” table that recorded dns hits per customer on a per-time basis:

    CREATE TABLE dns ( "time" timestamp without time zone NOT NULL, customer_id integer NOT NULL, hits integer ); 

    You wanted to be able to re-insert rows with updated values, or create them if they didn’t exist already. Keyed on the customer_id and the time. Something like this:

    CREATE RULE replace_dns AS ON INSERT TO dns WHERE (EXISTS (SELECT 1 FROM dns WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id)))) DO INSTEAD UPDATE dns SET hits = new.hits WHERE ((dns."time" = new."time") AND (dns.customer_id = new.customer_id)); 

    Update: This has the potential to fail if simultaneous inserts are happening, as it will generate unique_violation exceptions. However, the non-terminated transaction will continue and succeed, and you just need to repeat the terminated transaction.

    However, if there are tons of inserts happening all the time, you will want to put a table lock around the insert statements: SHARE ROW EXCLUSIVE locking will prevent any operations that could insert, delete or update rows in your target table. However, updates that do not update the unique key are safe, so if you no operation will do this, use advisory locks instead.

    Also, the COPY command does not use RULES, so if you’re inserting with COPY, you’ll need to use triggers instead.

  6. I use this function merge

    CREATE OR REPLACE FUNCTION merge_tabla(key INT, data TEXT) RETURNS void AS $BODY$ BEGIN IF EXISTS(SELECT a FROM tabla WHERE a = key) THEN UPDATE tabla SET b = data WHERE a = key; RETURN; ELSE INSERT INTO tabla(a,b) VALUES (key, data); RETURN; END IF; END; $BODY$ LANGUAGE plpgsql 
  7. I custom “upsert” function above, if you want to INSERT AND REPLACE :

    `

     CREATE OR REPLACE FUNCTION upsert(sql_insert text, sql_update text) RETURNS void AS $BODY$ BEGIN -- first try to insert and after to update. Note : insert has pk and update not... EXECUTE sql_insert; RETURN; EXCEPTION WHEN unique_violation THEN EXECUTE sql_update; IF FOUND THEN RETURN; END IF; END; $BODY$ LANGUAGE plpgsql VOLATILE COST 100; ALTER FUNCTION upsert(text, text) OWNER TO postgres;` 

    And after to execute, do something like this :

    SELECT upsert($$INSERT INTO ...$$,$$UPDATE... $$) 

    Is important to put double dollar-comma to avoid compiler errors

    • check the speed…
  8. Similar to most-liked answer, but works slightly faster:

    WITH upsert AS (UPDATE spider_count SET tally=1 WHERE date='today' RETURNING *) INSERT INTO spider_count (spider, tally) SELECT 'Googlebot', 1 WHERE NOT EXISTS (SELECT * FROM upsert) 

    (source: http://www.the-art-of-web.com/sql/upsert/)

  9. According the PostgreSQL documentation of the INSERT statement, handling the ON DUPLICATE KEY case is not supported. That part of the syntax is a proprietary MySQL extension.

  10. I have the same issue for managing account settings as name value pairs. The design criteria is that different clients could have different settings sets.

    My solution, similar to JWP is to bulk erase and replace, generating the merge record within your application.

    This is pretty bulletproof, platform independent and since there are never more than about 20 settings per client, this is only 3 fairly low load db calls – probably the fastest method.

    The alternative of updating individual rows – checking for exceptions then inserting – or some combination of is hideous code, slow and often breaks because (as mentioned above) non standard SQL exception handling changing from db to db – or even release to release.

     #This is pseudo-code - within the application: BEGIN TRANSACTION - get transaction lock SELECT all current name value pairs where id = $id into a hash record create a merge record from the current and update record (set intersection where shared keys in new win, and empty values in new are deleted). DELETE all name value pairs where id = $id COPY/INSERT merged records END TRANSACTION 
  11. CREATE OR REPLACE FUNCTION save_user(_id integer, _name character varying) RETURNS boolean AS $BODY$ BEGIN UPDATE users SET name = _name WHERE id = _id; IF FOUND THEN RETURN true; END IF; BEGIN INSERT INTO users (id, name) VALUES (_id, _name); EXCEPTION WHEN OTHERS THEN UPDATE users SET name = _name WHERE id = _id; END; RETURN TRUE; END; $BODY$ LANGUAGE plpgsql VOLATILE STRICT 
  12. For merging small sets, using the above function is fine. However, if you are merging large amounts of data, I’d suggest looking into http://mbk.projects.postgresql.org

    The current best practice that I’m aware of is:

    1. COPY new/updated data into temp table (sure, or you can do INSERT if the cost is ok)
    2. Acquire Lock [optional] (advisory is preferable to table locks, IMO)
    3. Merge. (the fun part)
  13. Edit: This does not work as expected. Unlike the accepted answer, this produces unique key violations when two processes repeatedly call upsert_foo concurrently.

    Eureka! I figured out a way to do it in one query: use UPDATE ... RETURNING to test if any rows were affected:

    CREATE TABLE foo (k INT PRIMARY KEY, v TEXT); CREATE FUNCTION update_foo(k INT, v TEXT) RETURNS SETOF INT AS $$ UPDATE foo SET v = $2 WHERE k = $1 RETURNING $1 $$ LANGUAGE sql; CREATE FUNCTION upsert_foo(k INT, v TEXT) RETURNS VOID AS $$ INSERT INTO foo SELECT $1, $2 WHERE NOT EXISTS (SELECT update_foo($1, $2)) $$ LANGUAGE sql; 

    The UPDATE has to be done in a separate procedure because, unfortunately, this is a syntax error:

    ... WHERE NOT EXISTS (UPDATE ...) 

    Now it works as desired:

    SELECT upsert_foo(1, 'hi'); SELECT upsert_foo(1, 'bye'); SELECT upsert_foo(3, 'hi'); SELECT upsert_foo(3, 'bye'); 
  14. UPDATE will return the number of modified rows. If you use JDBC (Java), you can then check this value against 0 and, if no rows have been affected, fire INSERT instead. If you use some other programming language, maybe the number of the modified rows still can be obtained, check documentation.

    This may not be as elegant but you have much simpler SQL that is more trivial to use from the calling code. Differently, if you write the ten line script in PL/PSQL, you probably should have a unit test of one or another kind just for it alone.

  15. sometime I need it in objects for xmlhttp calls, so I do like this.

    timestamp : parseInt(new Date().getTime()/1000, 10) 
  16. var d = new Date(); console.log(d.valueOf()); 
  17. Get TimeStamp In JavaScript

    In JavaScript, a timestamp is the number of milliseconds that have passed since January 1, 1970.

    If you don’t intend to support < IE8, you can use

    new Date().getTime(); + new Date(); and Date.now(); 

    to directly get the timestamp without having to create a new Date object.

    To return the required timestamp

    new Date("11/01/2018").getTime() 
  18. //if you need 10 digits alert('timestamp '+ts()); function ts() { return parseInt(Date.now()/1000); }

  19. var my_timestamp = ~~(Date.now()/1000);

  20. function getTimeStamp() { var now = new Date(); return ((now.getMonth() + 1) + '/' + (now.getDate()) + '/' + now.getFullYear() + " " + now.getHours() + ':' + ((now.getMinutes() < 10) ? ("0" + now.getMinutes()) : (now.getMinutes())) + ':' + ((now.getSeconds() < 10) ? ("0" + now.getSeconds()) : (now.getSeconds()))); } 
  21. there are many ways to do it.

     Date.now() new Date().getTime() new Date().valueOf() 

    To get the timestamp in seconds, convert it using:

    Math.floor(Date.now() / 1000) 
  22. Here is another solution to generate a timestamp in JavaScript – including a padding method for single numbers – using day, month, year, hour, minute and seconds in its result (working example at jsfiddle):

    var pad = function(int) { return int < 10 ? 0 + int : int; }; var timestamp = new Date(); timestamp.day = [ pad(timestamp.getDate()), pad(timestamp.getMonth() + 1), // getMonth() returns 0 to 11. timestamp.getFullYear() ]; timestamp.time = [ pad(timestamp.getHours()), pad(timestamp.getMinutes()), pad(timestamp.getSeconds()) ]; timestamp.now = parseInt(timestamp.day.join("") + timestamp.time.join("")); alert(timestamp.now); 
  23. To get time, month, day, year separately this will work

    var currentTime = new Date(); var month = currentTime.getMonth() + 1; var day = currentTime.getDate(); var year = currentTime.getFullYear(); 
  24. time = Math.round(((new Date()).getTime()-Date.UTC(1970,0,1))/1000); 

Tasg: postgresql, upsert