Extensibility is PostgreSQL's DNA, lies in its original design.
This design philosophy grants PostgreSQL a lot of unique capabilities, one of them is its extension system. With Postgres extension, 3rd parties can extend the Postgres capabilities without touching any Postgres core.
Today most production Postgres deployments run some extensions. Below we present those most commonly used ones.
Extension | Capability |
---|---|
pg_stat_statements | Collect execution stats |
PostGIS | Process geospatial data |
postgres_fdw | Query external PostgreSQL data |
uuid-ossp | Generate UUID |
pg_cron | Schedule job inside database |
timescaledb | Process time-series data |
pgvector | Process vectorized data |
pg_stat_statements
pg_stat_statements provides a means for tracking planning and execution statistics of all SQL statements executed by a server. When pg_stat_statements is active, it tracks statistics across all databases of the server.
The statistics gathered by the module are made available via a view named pg_stat_statements
.
Note that the pg_stat_statements extension only tracks queries that have been executed since it was enabled. If you want to track all queries, you should enable the extension at server start-up by adding the following line to your postgresql.conf file:
To find the top 10 queries by total execution time:
PostGIS
PostGIS extends the PostgreSQL by adding support storing, indexing and querying geographic data. PostGIS is the most complex Postgres extension and a testimony of the Postgres powerful extension system.
To find the nearest city to a given point:
-
Say we have the following table of cities with their locations represented by points. Note, the
location
column has a GEOMETRY type which is provided by the PostGIS extension. -
To find the nearest place to a given point, you can use the ST_Distance function to calculate the distance between the point and each place in the table, and then sort the results by distance. For example, the following command finds the nearest city to the point (-74.005941, 40.712784), which is the location of New York City:
This query calculates the distance between each city in the cities table and the point (-74.005941, 40.712784), and sorts the results by distance using the <->
operator. The LIMIT 1 clause returns only the nearest city.
Note that the ST_Distance function returns the distance between two points in meters by default. You can convert the result to a different unit of measurement by using the appropriate PostGIS function, such as ST_Distance_Sphere for distance in kilometers.
postgres_fdw
postgres_fdw can be used to access data stored in external PostgreSQL servers. postgres_fdw
is the successor of the old dblink
extension. postgres_fdw
provides more transparent and standards-compliant syntax for accessing remote tables, and can give better performance in many cases.
With postgres_fdw
, you can query other Postgres database.
-
Create a new database that you want to query. For example, let's create a database named my_other_database:
-
Connect to the database where you want to create the foreign table (in this example, we'll use the default postgres database).
-
Create a user mapping for the user that will access the remote database. For example, if you want to use the same user that you are currently connected as, you can run the following command:
-
Create a foreign server definition using the postgres_fdw extension.
This command creates a server definition named
my_other_database_server
using the postgres_fdw foreign data wrapper and the dbname option set tomy_other_database
. -
Create a foreign table definition in the local database that maps to a table in the remote my_other_database database.
This command creates a foreign table named my_other_table in the local database that maps to a table named my_table in the public schema of the my_other_database database.
-
Use the foreign table in queries just like you would a regular table.
You can also join the foreign table with local tables in your queries, just like you would with regular tables.
Note that when using postgres_fdw to query a remote database on the same PostgreSQL instance, you may need to adjust the postgresql.conf file and restart the PostgreSQL server to enable access to the pg_hba.conf file for the other database.
uuid-ossp
uuid-ossp provides functions to generate universally unique identifiers (UUIDs) using one of several standard algorithms. Postgres already has built-in function gen_random_uuid()
to generate a version 4 (random) UUID. If you want to generate other UUID version, you need to use uuid-ossp
.
To generate a version 5 UUID:
This command generates a UUID version 5 based on the namespace identifier for URLs (uuid_ns_url()) and the name string 'example.com'. The output will look something like this: f1f5d9f0-2a4c-5f24-9536-3f1f69e68a7e
.
You can also create your own namespace identifier using the uuid-ossp function uuid_ns_create().
This command creates a namespace identifier using the name 'example' and returns it as a UUID.
You can then use this namespace identifier with uuid_generate_v5() to generate UUIDs based on that namespace and a name string.
Note that UUID version 5 is recommended for use in applications where security is a concern, as it is generated using a SHA-1 hash of the namespace identifier and name string, which is less susceptible to collisions than other UUID versions.
pg_cron
pg_cron is a simple cron-based job scheduler that runs inside the database as an extension. It uses the same syntax as regular cron, but it allows you to schedule PostgreSQL commands directly from the database.
The schedule uses the standard cron syntax.
-
Create a new cron job by running the following command
-
Verify that the cron job has been created
-
View the status of running and recently completed job
timescaledb
timescaledb provides optimized storage and querying of time-series data.
-
Create a hypertable
A hypertable is a special type of table in TimescaleDB that is designed for storing and querying time-series data. You can create a hypertable using the CREATE_HYPERTABLE function.
-
Insert some data into the sensor_data table
-
Query the data
TimescaleDB provides a number of optimized functions for working with time-series data, such as time_bucket for aggregating data into time intervals. For example, to calculate the average value for each hour of data, you can run the following query:
pgvector
pgvector is an extension for PostgreSQL that provides support for vector processing. It allows you to perform vectorized operations on groups of data, which can provide significant performance improvements for certain types of queries.
To get the nearest neighbors to a vector:
-
Create a new table with a vector column
-
Insert vectors
-
Query the nearest neighbors to a vector
Neon recently announced a simlar extension pg_embedding
. It claims to be 20x faster than pgvector.
Summary
Postgres extension is a key differentiator from its main alternative MySQL. If the business requires geospatial processing, then Postgres is the only choice thanks to the PostGIS extension. For a complete comparison, please read Postgres vs. MySQL.