MySQL vs PostgreSQL
MySQL | PostgreSQL | |
---|---|---|
VS
|
||
http://www.mysql.com/ | http://postgresql.org |
MySQL vs PostgreSQL is a decision many must make when approaching open-source relational databases management systems. Both are time-proven solutions that compete strongly with proprietary database software. MySQL has long been assumed to be the faster but less full-featured of the two database systems, while PostgreSQL was assumed to be a more densely featured database system often described as an open-source version of Oracle. MySQL has been popular among various software projects because of its speed and ease of use, while PostgreSQL has had a close following from developers who come from an Oracle or SQL Server background.
These assumptions, however, are mostly outdated and incorrect. MySQL has come a long way in adding advanced functionality while PostgreSQL dramatically improved its speed within the last few major releases. Many, however, are unaware of the convergence and still hold on to stereotypes based on MySQL 4.1 and PostgreSQL 7.4. The current versions are MySQL 5.5 and PostgreSQL 9.1.
Contents[hide]
|
[edit] Architecture
PostgreSQL is a unified database server with a single storage engine. MySQL has two layers, an upper SQL layer and a set of storage engines. When comparing the two it's typically necessary to specify which storage engines are being used with MySQL because that greatly affects suitability, performance and (even basic) feature availability. The most commonly used storage engines in MySQL are InnoDB for almost full ACID support and high performance on large workloads with lots of concurrency and MyISAM for lower concurrency workloads or higher concurrency read-mostly workloads that don't need ACID properties. Applications can combine multiple storage engines as required to exploit the advantages of each.
[edit] Performance
Database systems can be optimized according to the environment they run in. Thus, it is very difficult to give an accurate comparison in performance without paying attention to configuration and environment. PostgreSQL and MySQL both employ various technologies to improve performance.
[edit] Beginnings
MySQL began development with a focus on speed while PostgreSQL began development with a focus on features and standards. Thus, MySQL was often regarded as the faster of the two. The default configuration for both is tuned to run on small systems, and it's common for people performing benchmark tests to either not change the defaults, or properly tune only the one they are most familiar with. Either action will usually give misleading results. Furthermore, both DBMS's will do better in benchmarks related to their original strengths (i.e. MySQL fast in simple stuff, PostgreSQL more reliable and faster in complex stuff).
[edit] Raw Speed
[edit] PostgreSQL
PostgreSQL provides significant performance features
- efficient executor for both static SQL or parameterised SQL
- advanced cost-based optimizer, with many plan choices and adaptive statistics collection
- indexing: partial, functional, multiple-index-combining, 4 different kinds of index
- TOAST data compression
- improved cache management in versions 8.1 and 8.2
- huge scalability on write intensive workloads from 8.2+
- asynchronous commit ("MyIsam for Postgres")
- Asynchronous Replication built-in from 9.0+
- Synchronous Replication built-in from 9.1
The 8.x releases have added more than 75 new discrete performance features. These have been added as a result of a multi-year project to improve performance by steadily identifying and removing key bottlenecks in scalability, as well as adding low-level tuning and architectural features.
[1] shows feature set added for the 8.3 release, for example.
PostgreSQL can compress and decompress its data on the fly with a fast compression scheme to fit more data in an allotted disk space. The advantage of compressed data, besides saving disk space, is that reading data takes less IO, resulting in faster data reads.
PostgreSQL supports one storage engine, with tight integration between that storage engine and the rest of the database. Options like asynchronous commit can be set on a per-transaction, per-user or whole system basis, allowing different transaction types to co-exist efficiently without the need to select storage engine types once for each table ahead of time.
By default, PostgreSQL comes tuned to run on a shared server, so has low performance settings. When running on a dedicated server performance can be improved by changes to a few key parameters.
[edit] MySQL:core
MySQL 5.1 natively supports 9 storage engines [2]:
- MyISAM
- InnoDB
- NDB Cluster
- MERGE
- MEMORY (HEAP)
- FEDERATED
- ARCHIVE
- CSV
- BLACKHOLE
However, the federated and blackhole engines are not actually "storage" engines (for example, "blackhole" does not store anything).
There are several externally-developed storage engines, some of the most popular are:
- Aria (crash-safe MyISAM)
- XtraDB (feature enhanced InnoDB fork)
- NitroEDB
- BrightHouse
MySQL has several custom and community storage engines under development:
- PrimeBase XT
- FederatedODBC
- FederatedX
- IBM DB2
- memcached
In some distributions, the default storage engine is MyISAM, which is not transaction safe. Setting the default engine to a transactional engine such as InnoDB is, however, trivial. Beginning with MySQL 5.5.1 InnoDB is the default storage engine.
MySQL has a query cache that does simple string matching before the parser to see whether a query has been processed recently and rapidly returns the result to the client application if it has, without the need to do any of the traditional database work. This is of considerable value to many read-mostly workloads. Cached queries are removed whenever any table involved in the query is changed so its usefulness declines as the rate of data changes increases.
The query cache runs on a single thread and must consider each select, so it may eventually become a performance bottleneck at some point beyond 8 cores, but that's not usually the case. It can be turned off easily to check this and to see whether its small overhead is worthwhile for the particular workload.
MySQL also supports network protocol-level compression which is an option that can be turned on by the client if the server allows it. This compresses everything to and from the server.
[edit] MySQL:MyISAM
MyISAM is the traditional MySQL storage engine and is often better for read-mostly workloads. MySQL's MyISAM engine performs faster than PostgreSQL on simple queries and when concurrency is low or follows certain patterns (e.g. count(*) is very fast). MyISAM's speed comes at the cost of not supporting transactions, foreign keys, and not offering guaranteed data durability.
In old versions, MyISAM required exclusive access for SELECT or data-changing operations. Recent versions default to allowing updates at the end concurrently if there are no deleted records and offer the option to ignore deleted record free space and always operate concurrently, or to disable concurrent inserts.
MyISAM supports leading prefix compression for keys and supports a read only compressed table form that compresses data as well.
System tables always use the MyISAM storage engine. This has been used to criticise it for the potential of losing system table information, but some claim that this doesn't happen "in practice" as it is unlikely that such generally infrequently updated tables will be in the process of being written to at the time of a crash.
[edit] MySQL:InnoDB
InnoDB is an ACID compliant, transactional storage engine using MVCC technology. It's the normal choice for most modern applications using MySQL.
The InnoDB storage engine stores the data with the primary key, so primary key lookups are fast. Good choice of primary key for physical optimisation can be very useful; in cases where it's undesirable or where the desired primary key produces poor physical performance a simple integer can be used. An internal integer primary key is the default if no primary key or unique column is present.
The InnoDB engine automatically generates hash index entries when processing SELECTs. This feature can be turned off if necessary; some workloads perform better without it.
The InnoDB engine has an insert buffer that caches updates to secondary index entries and applies them in the background. This can significantly speed up inserts, reducing the number of physical writes required by combining many updates. If a secondary index page has outstanding updates when it is needed for a query the updates will be merged first. As of version 5.5 the insert buffer is also used as a buffer for other types of writes, improving the performance of UPDATE queries as well.
With the InnoDB installed via Plugin MySQL 5.1 supports on-the-fly compression of InnoDB tables.
Beginning with this release of the InnoDB Plugin, you can use the attributes ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE in the CREATE TABLE and ALTER TABLE commands to request InnoDB to compress each page to 1K, 2K, 4K, 8K, or 16K bytes.
— InnoBASE OY
, 3.2. Specifying Compression
Despite major changes in ownership in recent years, InnoDB performance has received continuing development attention.
With InnoDB, the secondary key lookup is slow, because the leaves of its secondary indices are the primary keys. So it has to do 2 lookups when using a secondary index: one for the secondary index itself, and one for the primary key.
[edit] MySQL:NDB Cluster
NDB is a high performance, highly available, (mostly) in-memory storage engine. Non-indexed attributes may be stored on disk. Data and logs are automatically flushed to disk periodically to mitigate data loss in the event of a cluster failure. NDB is used heavily in Telecommunications applications where uptime and real time performance is critical. NDB transparently organizes records into fragments which are distributed evenly to all nodes in the cluster. NDB uses synchronous replication internally to ensure records are committed to at all nodes within a logical node group before returning a COMMIT. This two phase commit gives NDB the ability to support sub-second failover. Further to this NDB also supports automatic node recovery. NDB supports the online introduction of additional nodes to the cluster without impacting availability of the application.
Significant Limitation: The distributed nature of the tables makes NDB perform poorly for complex JOIN operation as compared to traditional storage engines. This has been resolved in version ndb-7.2.1 and later with Adaptive Query Localization (distributed pushed-down join, formerly SPJ)
PostgreSQL: PostgreSQL has no comparable solution.
[edit] MySQL:Archive
MySQL supports on-the-fly compression since version 5.0 with the ARCHIVE storage engine. Archive is a write-once, read-many storage engine, designed for historical data. It compresses data up to 90%. It does not support indexes. In version 5.1 Archive engine can be used with partitioning.
[edit] MySQL:Aria
Aria is a fully ACID compliant, storage engine being incrementally developed by Monty Widenius and a team of other former MySQL developers. The incremental development model means that the full ACID feature set is gradually evolving.
[edit] Multi-Processor
Historically MySQL has focused more on scale-out than scaleup and PostgreSQL is often considered to scale better with large numbers of cores or uncommonly high concurrency levels.
With the increased availability of multi-core systems MySQL has been working to improve this, through both new development (5.5 and 5.6) and the incremental development of 5.0 via bug fixes that remove bottlenecks. The continuous incremental improvements without major or minor version number changes makes it vital to compare with recent builds of the server, not just looking at the main version number. A partial list of versions with significant performance improvements is 5.0.30, 5.0.54 and 5.0.58 and any benchmark result older than those versions should be considered to be particular to older installations, especially in the areas of concurrency and scalability.
Along with MySQL's internal work on concurrency
many patches from Google and Percona hold promise and some have been incorporated into the standard MySQL server in version 5.4.
[edit] Asynchronous I/O
PostgreSQL supports a full fledged asynchronous API for use by client applications. It is reported to increase performance by up to 40% in some cases. MySQL lacks Async support, though some drivers have been created to attempt to overcome this deficiency (perl ruby).
[edit] COUNT(*)
Transactional databases which implement MVCC such as PostgreSQL and InnoDB perform COUNT(*) in a way that is very slow compared to non-transactional databases like MyISAM. The MyISAM engine in MySQL uses an index scan for COUNT(*) and also caches the result of the count, thus it is much faster. PostgreSQL and InnoDB require a table scan to locate all visible rows. These MVCC capable engines implement COUNT(*) this way because MVCC stores transaction visibility in the row data as opposed to the index. With MVCC capable databases, caching the COUNT(*) would result in incorrect data being returned. PostgreSQL 9.2 will have index-only scan support which uses the visibility map feature to determine whether a row is visible to the current transaction rather than visiting the page. This means dramatically faster COUNT(*) results. PostgreSQL 9.2 index-only scans PostgreSQL Slow Count() Workaround InnoDB COUNT(*)
[edit] Benchmarks
Speed comparisons tend to depend on specific workloads and be used long after the versions involved are out of date, for example, tweakers.net and jamonation tests that precede many improvements that were implemented in MySQL 5.0 and the subsequent MySQL and Oracle releases.
It's still true that in some benchmarks PostgreSQL may scale better with large numbers of cores at high concurrency levels. Equally, there are workloads where the same is true for MySQL, so the usual advice to test your own application continues to apply. PostgreSQL excels at workloads with a significant write workload[citation needed].
A new benchmark for MySQL 5.0.51 and MySQL 5.1.30 with InnoDB 1.0.3 (MySQL 5.4 contains the patches from InnoDB 1.0.3) compared with PostgreSQL 8.3.7 suggests that MySQL and PostgreSQL are almost equal in terms of scalability, by at least one standard of measure.
While benchmarks can be an indicator of performance, MySQL performance in many subquery / join statements is behind PostgreSQL due to the inferior Query Optimizer (a much wider benchmark would show PostgreSQL as a clear winner in this case).[citation needed]. Recent enhancements to the MySQL optimizer provide large gains in the performance of subquery / join statements.
[edit] ACID Compliance
ACID stands for Atomicity, Consistency, Isolation and Durability. This model is used to judge data integrity across database management systems. Most database systems achieve ACID compliance by using transactions AND enforcing triggers/ foreign keys (it's forgetting about this part that prevents consistency, as in InnoDB).
PostgreSQL is fully ACID compliant.
InnoDB engine is fully ACID compliant, but using InnoDB in MySQL fails ACIDity, because MySQL doesn't propagate the triggers of foreign keys for this engine. InnoDB is a storage engine, it does not execute the queries. Until MySQL is ACID compliant, outside of the storage engines, any storage engine running on it will not conform.
Some other MySQL engines are said to be ACID compliant, like Falcon / Cluster.
PostgreSQL is acknowledged as having a more rigorous approach to robustness and data integrity.
[edit] Features
PostgreSQL and MySQL both have an impressive array of features that increase data integrity, functionality, and performance. The features included in a database may help improve performance, ease of use, functionality, or stability.
[edit] Ease of use
A "gotcha" is a feature or function which works as advertised - but not as expected.
— http://sql-info.de/mysql/gotchas.html
PostgreSQL supporters claim that MySQL has more "gotchas"[3] than PostgreSQL[4], due to its deviation from the SQL standard, and its various functional limitations which may not seem intuitively obvious to a new user. While retaining backwards compatibility, MySQL has introduced various SQL modes that increase standards compliance and produce results that may be more expected by those unfamiliar with MySQL.
In certain aspects, such as case sensitivity in CHAR fields, MySQL's
behaviour [...] is the diametrical opposite of the default behaviour of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7 and PostgreSQL 7.4.3 [...])
— http://sql-info.de/mysql/gotchas.html#1_4
[edit] Insert Ignore / Replace
MySQL supports the statements, 'INSERT IGNORE', which inserts if a row doesn't exist, and 'REPLACE', which replaces the current row.
Currently PostgreSQL supports neither of these statements and suggests using stored procedures to get around the lack of these statements. However, there are major shortcomings:
it can only insert a single value at a time. This is a major performance limitation, and also suffers concurrency issues. INSERT IGNORE and REPLACE handle multi-valued inserted much more gracefully.
— Robin Johnson
, PostgreSQL Vs. MySQL for INSERT IGNORE + REPLACE - stored procedures, savepoints and beyond
A similar MySQL feature INSERT ... ON DUPLICATE UPDATE
is also missing from PostgreSQL and requires use of stored procedures or RULE
s which can work only on one row at a time.
PostgreSQL will feature the MERGE clause in a future version [5], likely in 9.1, which follows the SQL:2008 standard. This can do the same as MySQL's non-standard 'INSERT IGNORE','REPLACE' and 'INSERT ... ON DUPLICATE UPDATE' statements but with more granularity.
[edit] Constraints
Both PostgreSQL and MySQL support Not-Null, Unique, and Primary Key constraints. Foreign Key constraints are supported by PostgreSQL and by MySQL's InnoDB storage engine, but not other engines. However MySQL silently ignores the CHECK constraint which PostgreSQL has supported for a long time.
InnoDB tables support checking of foreign key constraints... For other storage engines, MySQL Server parses and silently ignores the FOREIGN KEY and REFERENCES syntax in CREATE TABLE statements.
—MySQL AB
,MySQL 5.1 Reference Manual:: 12.1.9. CREATE TABLE Syntax
For MySQL engines not supporting foreign key constraints or to implement constraints between tables in differing engines, triggers can be used to check the constraint, although this is not a guarantee as it's prone to race conditions.
Also, the support of ForeignKeys in MySQL is not yet complete, as cascade actions are not considered as SQL statements and thus do not fire triggers etc. (they only cascade further)
[edit] Default Values
PostgreSQL allows for any function marked as IMMUTABLE or STABLE to be used as the default value for a column. Currently, NOW() is the only function that can be used as a default value in a MySQL table -- and can only be applied to one column per table, on TIMESTAMP columns only.
[edit] Stored Procedures
MySQL supports stored procedures, per se; PostgreSQL supports stored functions, which are in practice very similar.
The first query language for PostgreSQL, PL/pgSQL, is similar to Oracle's PL/SQL. PostgreSQL supports SQL:2003 PSM stored procedures as well as many other general purpose programming languages such as Perl (PL/Perl), Python (PL/Python), TCL (PL/Tcl), Java (PL/Java) and C (PL/C).
MySQL follows the SQL:2003 syntax for stored routines, which is also used by IBM's DB2.
—MySQL AB
,MySQL 5.1 Reference Manual:: 18 Stored Procedures and Functions
Via the plugin interface MySQL supports external language stored procedures in Java, Perl, XML-RPC with more language plugins in the works.
[edit] Triggers
Both PostgreSQL and MySQL support triggers. A PostgreSQL trigger can execute ANY user-defined function from any of its procedural languages, not just PL/pgsql.
MySQL triggers are activated by SQL statements only. They are not activated by changes in tables made by APIs that do not transmit SQL statements to the MySQL Server; in particular, they are not activated by updates made using the NDB API.
—MySQL AB
,MySQL 5.1 Reference Manual:: 19 Triggers
MySQL triggers are also not activated by cascading updates and deletes even when caused by a SQL statement (this is against the standard), since those are a feature of the InnoDB engine rather than the database as a whole. In these cases, MySQL silently ignores the triggers without issuing a warning (i.e. in other dbms's you must be extra careful to not pop triggers accidentally, in MySQL you must be extra careful that many of your statements will not activate the triggers, such as cascades, etc.).
PostgreSQL also supports "rules," which allow operating on the query syntax tree, and can do some operations more simply that are traditionally done by triggers. However, the rule system is on the way out in favour of (more powerful) triggers.
Syntax for definition of triggers in PostgreSQL isn't as straightforward as in MySQL. PostgreSQL requires separate definition of a function with specific data type returned (this is the general behaviour of PostgreSQL = more strict vs MySQL = less strict. The syntax in itself is as straightforward but the implementation is stricter and much more powerful (any user function), thus a bit harder to learn). On the upside, PostgreSQL supports multiple actions per trigger using OR (e.g. BEFORE INSERT OR UPDATE).
MySQL does NOT support multiple triggers of the same type (i.e. maximum one ON UPDATE BEFORE and one ON UPDATE AFTER trigger) on the same table, whereas PostgreSQL does (If multiple triggers of the same kind are defined for the same event, they will be fired in alphabetical order by name. > from the manual).
PostgreSQL, as of 9.1, supports TRIGGERS on views, MySQL does not.
[edit] Replication and High Availability
Replication is a database management system's ability to duplicate its stored data for the purposes of backup safety and is one way to prevent database downtime. PostgreSQL and MySQL both support replication:
[edit] PostgreSQL
PostgreSQL has built-in asynchronous replication as of 9.0, consisting of streaming replication [6] and hot standby [7]. PostgreSQL streams the write-ahead log data which replays activity on the database guaranteeing identical results. MySQL's older form of statement based replication could introduce slave inconsistency when executing non-deterministic statements. MySQL now warns when using potentially unsafe statements, switching to a ROW image based log format.
There are several packages that also provide replication in PostgreSQL:
- PGCluster
- Slony-I
- DBBalancer
- pgpool
- PostgreSQL table comparator
- SkyTools
- Sequoia
- Bucardo
- Mammoth Replicator
- Cybercluster
- GridSQL (shared-nothing)
- rubyrep (asynchronous, master-master)
- bondreplicate (asynchronous, master-master)
It is a common misconception that these "third-party packages" are somehow less well integrated. Slony, for example, was designed and built by Jan Wieck, a PostgreSQL core team member, and has a number of other members of the PostgreSQL community involved in its ongoing design and maintenance. However, Slony is considerably slower and uses more resources than built-in replication, as it uses SQL and triggers rather than binary log shipping to replicate the data across servers. That may make it less suitable for larger cluster deployments with high performance demands.
[edit] Slony-I Replication Weakness
Slony-I, the most widely used PostgreSQL replication tool, is inherently inferior to MySQL's built in replication for a number of reasons. First, it uses SQL and triggers to replicate the data across servers. This is considerably slower than MySQL's binary log shipping and makes the communication costs much higher. Second, Slony-I's communication costs grow quadratically in relation to the number of servers in the replication pool (Order(n^2)). This makes it inherently unusable for larger clusters. If we conservatively figure that Slony-I's SQL/trigger method takes twice as much communication as MySQL's binary log shipping, we can easily see how poorly this would work for larger clusters in the real world.
With two servers:
MySQL: 2 = 2
PostgreSQL: 2*2^2 = 8
With 4 servers:
MySQL: 4 = 4
PostgreSQL: 2*4^2 = 32
With 12 servers:
MySQL: 12 = 12
PostgreSQL: 2*12^2 = 288.
While Slony-I is adequate for high availability with two servers, its communication costs are simply prohibitive for scaling out.
Note from Jan Wieck: I don't quite understand how the author of the above came up with those numbers. It seems to refer to the SYNC events and their confirmations. Slony does create these internal housekeeping messages on all nodes and they are transported to all other nodes. But it doesn't transport all user data changes back and forth in that fashion. These are small (maybe 30-100 byte messages) that occur once every couple of seconds. Claiming that with 2 nodes it has 4 times MySQL's communication volume or with 12 nodes 24 times that is outright wrong. Also, Slony does allow cascading of nodes, where one replica will act as a multiplier, serving more replicas in a star like cluster configuration. There is no need to pull directly from the origin.
Slony-I is also difficult to administer.
PGCluster is not useful for situations where high-performance and a decent amount of writes are to be expected. This is because it is a synchronous replication system which waits until a write has happened on all machines in the cluster rather. However, in situations that have very few writes and require data to be absolutely consistent across each database, PGCluster can be a good tool to use.
[edit] PostgreSQL Synchronous Replication
PostgreSQL 9.1 comes with support for synchronous replication. It thus allows for the popular star-topology configurations with one write and multiple read-only slaves with slaves capable of stepping up in case the primary goes down.
[edit] MySQL
MySQL ships with support for asynchronous replication. In this form of replication a log of events is transmitted to the slaves. These slaves must apply the statements or rows in this log to each of the slave servers independently. This has the limitation that each slave may have a slightly different view of the data depending upon the length of their lag in transferring and applying this log. Prior to 5.1 non-deterministic statements in this log can cause records to be inserted or updated differently on each slave [8]
Starting with version 5.1, MySQL supports two forms of replication; statement based replication (SBR) and row based replication (RBR). SBR, used prior to 5.1, collects SQL queries which affect changes to the database in a binary log which the slave servers subscribe to for their changes. RBR instead records the incremental row changes themselves in the binary log that are then applied to the slave. RBR is used automatically when non-deterministic queries are executed on the master. The storage engines NDB and InnoDB, in certain cases, only support replication using this new row based binlog format.
[9]
Starting with version 5.5 MySQL supports semi-synchronous [10] replication in which the master confirms receipt of the log by at least one slave before returning from a COMMIT. This allows for greater data integrity and simplified slave promotion in the event of a catastrophic failure of the master.
Within the NDB storage engine MySQL supports scalable synchronous replication between ndbd nodes using a two phase commit protocol which does not rely upon a binary log. The two phase commit protocol still exhibits excellent performance due to the in-memory nature of NDB. Once an update has been committed to memory on two nodes it is considered durable. [11] Replication between two NDB clusters or between NDB and tables in another engine are possible via the asynchronous replication that comes standard with MySQL.
In addition to the built-in asynchronous and semi-synchronous replication methods provided by MySQL Server, additional 3rd party solutions exists to provide synchronous replication.
- Tungsten Replicator
- Galera Replication
[edit] DataTypes
PostgreSQL does not have an unsigned integer data type (which is not a standard one and can be replaced by a SQL DOMAIN in PG), but it has a much richer data type support in several aspects: standards compliance, the logically fundamental data type BOOLEAN, IP address and networks, user-defined data types mechanism, built-in and contributed data types.
PostgreSQL allows columns of a table to be defined as variable-length multidimensional arrays. Arrays of any built-in or user-defined base type, enum type, or composite type can be created. Arrays of domains are not yet supported. [12]
MySQL does not have network IP address data types that PostgreSQL has but does provide INET_ATON() and INET_NTOA() functions to convert IPv4 addresses to and from integers, which are easily stored.
Also, PostgreSQL supports a range of IP-related functions like checking whether a range is part of another range, etc.
[edit] Subqueries
Both MySQL and PostgreSQL support subqueries. Support for them is more recent in MySQL and performance is still being improved for some types of subquery that may have already been optimised in PostgreSQL. It should also be noted that MySQL does not allow for a subquery in a view. This feature was removed after 4.1 and as of 5.5 was still not available. Workarounds include nested views, joins and hacking the source code.
However, it must be extremely clear that there remains a lot of basic subqueries with major optimization problems in current MySQL versions (5.5.x), which HAVE to be worked around..
Also, be careful as the way "NOT IN" works is not the same across DBMS's (i.e. you need inner/outer value null checks in mysql, they're integrated in postgresql, etc.).
[edit] Joins
Both MySQL and PostgreSQL support joins. As for subqueries, the support is more recent in MySQL too, and the performance in many cases is still behind that of PostgreSQL, also requires workarounds etc.
There are even cases where a MySQL (innodb) join will crashinstead of just being slow (maybe someone has that experience with a recent version of PostgreSQL, if yes please comment).
[edit] Advanced Indexing
Advanced indexing methods allow database systems to optimize queries to achieve greater performance.
Index Type | MySQL | PostgreSQL |
---|---|---|
Hash indexes | InnoDB, NDB and MEMORY engines support Hash indexes | PostgreSQL supports Hash indexes, though as of 8.1 they are never faster than b-tree indexes [13] |
Multiple Indexes | MySQL supports multiple indexes per table and can use one for each alias of a table; since 5.0 it will also use index merge to use multiple indexes for a single alias. | PostgreSQL supports multiple indexes per query. |
Full-Text Indexes |
MySQL comes with full-text search for InnoDB and MyISAM storage engines. Prior to version 5.6 only the MyISAM storage engine supported this feature. [14] A 3rd party add-on to MySQL, Sphinx Fulltext Search Engine allows it to support full-text searches on storage engines which do not natively support it. |
PostgreSQL 8.2 has full text search in the tsearch2 module. PostgreSQL 8.3 integrates tsearch2 into the core: "TSearch2, our cutting-edge full text search tool, has been fully integrated into the core code, and also has a cleaner API." [15] |
Partial Indexes | MySQL does not support partial indexes. |
PostgreSQL supports partial indexes:
|
Prefix Indexes | MySQL supports prefix indexes. Prefix indexes cover the first N characters of a string column, making the index much smaller than one that covers the entire width of the column, yet still provide good performance characteristics. | With PostgreSQL, prefix indexes are a particular case of Expression Indexes (see below). |
Multi-column Indexes | MySQL is limited to 16 columns per index. [16] And not all storage engines provide multi-column indexes. | PostgreSQL is limited to 32 columns per index. [17] |
Bitmap Indexes | MySQL has no bitmap indexes but achieves similar functionality using its "index_merge" feature. | PostgreSQL supports the ability to combine multiple indexes at query time using bitmap indexes. |
Expression Indexes | Expression Indexes can be emulated in MySQL by adding a precomputed column and using a trigger to maintain it. | PostgreSQL allows you to create indexes based on expressions (which may include calls to immutable functions). This is very handy in case there is a table with relatively stable data (not a lot of inserts / updates) and will often be running a query which involves an expensive calculation - the expression itself can be indexed thus eliminating the need of computing it at query runtime. |
Non-blocking CREATE INDEX | Dependent on the storage engine. Some engines (such as NDB Cluster and InnoDB Plugin) support online add/drop index (no locks taken). If the engine doesn't support online add/drop index, a write exclusive lock is required and the table copied. | PostgreSQL supports the ability to create indexes without locking the table for writes. |
Covering Indexes | MySQL supports covering indexes, which allow data to be selected by scanning the index alone without touching the table data. This is advantageous with large tables that have many millions of rows. | Covering indexes were added to PostgreSQL 9.1 |
[edit] Partitioning
MySQL Supports several forms of horizontal partitioning.
- RANGE
- LIST
- HASH
- KEY
- Composite partitioning using a combination of RANGE or LIST with HASH or KEY subpartitions
- MySQL supports a total of 1024 partitions + subpartitions per table.
PostgreSQL only supports RANGE and LIST partitioning[18].
- HASH partitioning is supported via immutable functions.
- Composite partitioning is also supported.
- PostgreSQL partitions are tables that inherit from a master table.
- I have (thus far) been unable to find a specific technical or practical upper limit to the number of partitions supported in PostgreSQL. Anecdotally, the practical limit is less than the technical limit.
[edit] Other features
In PostgreSQL, there is no built-in mechanism for limiting database size, mostly due to the risk it implies. This is another reason, after popularity, why the most of the web hosting companies are using MySQL[citation needed].
Also, PgAgent a scheduling agent for PostgreSQL allows for scheduled processes.
[edit] Licensing
PostgreSQL comes with an MIT-style license, which fits the Free Software Definition and Open Source Definition, and conforms to both the Debian Free Software Guidelines and the Copyfree Standard.
MySQL's source code is available under terms of the GNU General Public License, which also fits the Free Software and Open Source definitions and conforms to the Debian Free Software Guidelines (but not to the Copyfree Standard). It is also available under a proprietary license agreement, which is typically intended for use by those who wish to release software incorporating MySQL code without having to release the source code for the entire application. In practical terms, this means that MySQL can be distributed with or without source code, as can PostgreSQL, but to distribute without source code in the case of MySQL requires paying Oracle for a MySQL Commercial License.
Even the MySQL client library is GPL (not LGPL), which means that to use (and therefore link to) the MySQL client library the program must either itself be GPL, must use one of a broad range of FOSS licenses including BSD and LGPL, or must have a commercial license from Oracle.
[edit] Development
MySQL is owned and sponsored by a single for-profit firm, Oracle. MySQL AB holds copyrights to most of the codebase. MySQL's corporate management has drawn criticism for mismanagement of its development.
By contrast, PostgreSQL is not controlled by any single company, but relies on a global community of developers and companies to develop it. It does, however, enjoy both software development help and resource contributions from businesses who make use of PostgreSQL database technologies, such as EnterpriseDB. Corporate sponsors are considered contributors roughly like any other, however, within PostgreSQL's community-driven development model.
MySQL is an open-source PRODUCT.
Postgres is an open-source PROJECT.
— Greg Sabino Mullane
,Postgres is not for sale (reprint of original blog post)
One criticism of the MySQL development model has been the historical reluctance of its corporate development team to accept patches from external sources. This has prompted some to say MySQL is not a "true" open source project. Nontrivial improvements from Google and Percona have been accepted into the main codebase recently, though how significant a change in external development policy this represents is yet to be seen.
Furthermore, PostgreSQL's development team is much more accessible than that of MySQL, and they will go as far as to provide you with a patch if there really is a problem with the engine.
On the other hand, the MySQL team will routinely degrade bugs (to "not a bug") without providing any solution, and accept the fact that there's a problem and not fix it for 5 years, etc[citation needed].
[edit] Culture
[edit] Community
MySQL's community is supported in part by the company's Community Relations Team. MySQL AB has sponsored an annual User's Conference and Expo since 2003.
PostgreSQL is a fully community supported open source project, with no singular corporate sponsorship. Instead, companies whose business models depend on PostgreSQL are accepted as members of the community, and code from corporate contributors is accepted under the same terms as from any other external contributor.
Both also have large numbers of enthusiastic supporters who are willing to assist on a voluntary basis.
[edit] Support Services
As a business product, MySQL's corporate sponsor provides its own official support for the server, and there are independent support providers available as well. A MySQL blog consolidator aggregates information about independent support providers, and many are invited to MySQL's Users Conference. One even holds its own sub-conference in association with the main event.
PostgreSQL is a project with many sponsors and developers, and is not controlled by any one company. A realistic choice of support is available from a range of Professional Support Companies.
[edit] Name
When the ANSI SQL standard was written, its author explained that the official standards-compliant pronunciation of SQL is "ess queue ell". The names of both MySQL and PostgreSQL reflect the pronunciation specified by the SQL standard's author.
MySQL is officially pronounced "my ess queue ell", though those unfamiliar with this often call it "my sequel" instead -- especially if their previous DBMS experience centered around Microsoft SQL Server (pronounced either "sequel server" or "ess queue ell server").
Because MySQL is a corporate software product, MySQL AB has complete control over the name of the project. As a result of this, and the desire for a consistent brand identity, the MySQL name is likely to remain static.
PostgreSQL is pronounced "post gress queue ell", formed by combining Postgres (the name of the original database management system from which PostgreSQL is descended) with SQL. PostgreSQL is a true portmanteau, in that it not only combines the spellings and pronunciations of two words, but also their meanings: it is the Postgres DBMS updated to use SQL. Some people refer it as "pgsql". The mispronunciation "post gree sequel", and related abbreviation "post gree", may be largely due to MS SQL Server DBA influence as well, though it is a very rare error amongst PostgreSQL users.
There has been talk of going back to the original Postgres name, though how much traction the idea has had is debatable. The PostgreSQL wiki provides an overview of the debate, including pros and cons for such a name change and alternatives.
[edit] Popularity
MySQL is widely popular among various open-source web development packages. The MyISAM engine is often the only database engine offered by webhosting providers. Many web developers use MySQL. Thus, MySQL became widely popular in web development, and MySQL calls itself "The world's most popular open source database," a grounded claim.
Part of the reason for this popularity is a common perception that MySQL is "easier" to use than other databases -- particularly PostgreSQL. That perception arose years ago, and has fed itself by word of mouth, such that whether it is still true or not is likely to have little or nothing to do with MySQL's current reputation for being comparatively easy to use. In fact, in recent years PostgreSQL has made significant changes [citation needed] that have now caused the perception that they have "closed the gap", and may even have improved its ease of use beyond that of MySQL, though the validity of such claims is as open to question as those of MySQL.
PostgreSQL was not available on Windows. First windows native version was 8.0. This was an advantage for MySQL.
PostgreSQL has a good reputation with Oracle developers, which is most certainly related to its much stricter and reliable approach, like that of Oracle. The difficulties and controversy around MariaDB, Drizzle and corporate takeovers have left many people re-considering their database strategy.[citation needed]
[edit] Links
- PostgreSQL vs. MySQL - Pros & Cons by the CEO of EnterpriseDB Postgres Event organized by the NYC MySQL Group at Oracle's NYC headquarters
- 720p HD-Video of the PostgreSQL vs. MySQL event at Oracle
- Cross Compare of PostgreSQL 8.4, SQL Server 2008, MySQL 5.1
(Note that many of these links are rather old and may not be accurate)
- Is MySQL really this bad? - Ars Technica
- Comparison Matrix (includes Apache Derby and One$DB)
- http://www-css.fnal.gov/dsg/external/freeware/pgsql-vs-mysql.html
- http://www.devx.com/dbzone/Article/20743
- Comparison of MySQL 5.5 and PostgreSQL 9.0 replication
[edit] Pro PostgreSQL
-
- Related article on Command Prompt: PostgreSQL vs MySQL @ Oracle NYC Head Quarters NYC (my brain just broke)
- PostgreSQL Connection strings
- Why PostgreSQL Instead of MySQL in 2009
- Transactional DDL in PostgreSQL: A Competitive Analysis
- http://article.gmane.org/gmane.comp.lang.ruby.rails/12576
- http://www.sitepoint.com/article/site-mysql-postgresql-1
- http://feedlounge.com/blog/2005/11/20/switched-to-postgresql/
[edit] Pro MySQL
- Hi-Def Video of the HuffingtonPost CTO discussing Enterprise MySQL: Heavy-Traffic Management and Ultra-Availability at the NYC MySQL Group at Sun Microsystems
See Also the Following Articles
- Python vs Ruby
- Rsync vs zsync
- Gnome Shell vs Unity
- H.264 vs VP8
- SalixOS vs Slackware
- Slax vs Ubuntu
- Lubuntu vs Xubuntu
- Amarok vs Winamp
- Software
- Free Open Source Software
- All articles with unsourced statements
- Articles with unsourced statements
All articles with unsourced statements
Articles with unsourced statements
Free Open Source Software
Software
Beginning with this release of the InnoDB Plugin, you can use the attributes ROW_FORMAT=COMPRESSED or KEY_BLOCK_SIZE in the CREATE TABLE and ALTER TABLE commands to request InnoDB to compress each page to 1K, 2K, 4K, 8K, or 16K bytes.
, 3.2. Specifying Compression
A "gotcha" is a feature or function which works as advertised - but not as expected.
behaviour [...] is the diametrical opposite of the default behaviour of most other databases. (Tested: DB2 8.1, Firebird 1.5.1, Oracle 8.1.7 and PostgreSQL 7.4.3 [...])
it can only insert a single value at a time. This is a major performance limitation, and also suffers concurrency issues. INSERT IGNORE and REPLACE handle multi-valued inserted much more gracefully.
, PostgreSQL Vs. MySQL for INSERT IGNORE + REPLACE - stored procedures, savepoints and beyond
InnoDB tables support checking of foreign key constraints... For other storage engines, MySQL Server parses and silently ignores the FOREIGN KEY and REFERENCES syntax in CREATE TABLE statements.
,MySQL 5.1 Reference Manual:: 12.1.9. CREATE TABLE Syntax
MySQL follows the SQL:2003 syntax for stored routines, which is also used by IBM's DB2.
,MySQL 5.1 Reference Manual:: 18 Stored Procedures and Functions
MySQL triggers are activated by SQL statements only. They are not activated by changes in tables made by APIs that do not transmit SQL statements to the MySQL Server; in particular, they are not activated by updates made using the NDB API.
,MySQL 5.1 Reference Manual:: 19 Triggers
A partial index is an index built over a subset of a table; the subset is defined by a conditional expression (called the predicate of the partial index). The index contains entries for only those table rows that satisfy the predicate. Partial indexes are a specialized feature, but there are several situations in which they are useful.
One major reason for using a partial index is to avoid indexing common values. Since a query searching for a common value (one that accounts for more than a few percent of all the table rows) will not use the index anyway, there is no point in keeping those rows in the index at all. This reduces the size of the index, which will speed up queries that do use the index. It will also speed up many table update operations because the index does not need to be updated in all cases.
,PostgreSQL 8.2.6 Documentation: Chapter 11. Indexes
MySQL is an open-source PRODUCT.
Postgres is an open-source PROJECT.
,Postgres is not for sale (reprint of original blog post)