An Overview of SQL Data Types
I have been asked to do more posts on databases, so I’m returning to the world of relational databases this week. I posted an introduction to databases a few months back that you can take a look at for a 10,000-foot view. The next logical step seems to be to take a look at the types of data available across the SQL standard and the more common relational database systems out there.
Note: This is intended as a general overview. While I do have some experience with other database systems, the vast majority of my background is with SQL Server. Hopefully, I didn’t get any of my facts wrong in the comparisons, but if I did let me know and I’ll correct it.
There are several general categories that the data types can placed in to: Numerical, String, Date/Time, Binary, and Other.
Numerical
The numerical data types comprise, funnily enough, number data. This generally falls into two main groups: integer types and decimal point types. Integer data types hold only whole numbers and decimal point types hold anything with a decimal place.
Integer Types
Integer Type | Range | SQL Server | Oracle | MySQL | Postgres |
BIT | 0 to 1 | BIT | NUMBER(1) | X | BOOLEAN |
TINYINT | -128 to 127 | TINYINT | NUMBER(3) | TINYINT | X |
MEDIUMINT | -8,388,608 to 8,388,607 | X | X | MEDIUMINT | X |
SMALLINT | -32,768 to 32,767 | SMALLINT | NUMBER(5) | SMALLINT | SMALLINT |
INTEGER (INT) | -2,147,483,648 to 2,147,483,647 | INT | NUMBER(10) | INT | INT |
BIGINT | -9,223,372,036,854,775,808 to 9,223,372,036, 854,775,807 | BIGINT | NUMBER(19) | BIGINT | BIGINT |
It should be noted that in addition to these, MySQL also supports unsigned versions of each of these, which basically changes the range to be 0 to twice the listed top end. Also, Oracle only has one integer type, NUMBER, with a size value after it.
Decimal Point Types
Most of these systems break these down into two subcategories of types: Fixed point (also called Exact Value) types and Floating point (also called Approximate Value) types. Without delving into the semantic differences between the two which would probably require an advanced math degree to truly understand (at least in my brain), I’ll group them together in one lump for this introductory overview.
Also note that in my discussion, I’m using the US-culture definition of decimal points and commas. I realize that how numbers are formatted depends on where in the world you are and in many cultures, the comma and the decimal are swapped from what I’m used to.
For the most part, the range of numbers each type can hold depends on how you define it. This is because when you define the field, you tell it what level of precision (total number of digits) and scale (the number of digits after the decimal place) you want to be applied. So instead this will be an overview of the various types available and the name each system gives it.
SQL Server Type | Oracle Type | MySQL Type | Postgres Type | Usage |
FLOAT | BINARY_DOUBLE | DOUBLE | DOUBLE | Approximate |
REAL | BINARY_FLOAT | FLOAT | DOUBLE | Approximate |
DECIMAL | NUMBER | DECIMAL | NUMERIC | Exact |
NUMERIC | NUMBER | DECIMAL | NUMERIC | Exact |
MONEY and SMALLMONEY | X | X | MONEY | Monetary, fixed to 4 decimal places |
If you want to delve into the nuances of each type and when to use it, I’ll leave that as an exercise for your own brain.
String
Most of the systems support CHAR, VARCHAR, and TEXT.
CHAR is used for fixed-length string storage. It generally doesn’t see much use these days as most people tend to stick with VARCHAR unless you know the field’s value is always going to be of a certain length. VARCHAR is used for variable-length string storage, with only the max size being defined. TEXT, while not deprecated, generally doesn’t see as much use either as again, people tend to use VARCHAR(max) instead of it. It used to be reserved for extremely large fields of text, but in most systems, VARCHAR can support the same data sizes and is often more performant, so TEXT has largely gone by the wayside.
The basic three aside, let’s focus on the aberrations and oddities of each of the systems. Most of these oddities arose in the early days of computing when hard drive storage space was at a premium and extremely expensive. As a result, vendors tried all kinds of things to improve the efficiency of character data storage, which was by far the most costly in terms of space and performance requirements.
SQL Server
The main thing to remember with SQL Server is that in older versions of the systems (pre-SQL Server 2019), CHAR and VARCHAR only supported 8-bit character sets (essentially the ASCII character set depending on how it was set up) and did not support the full Unicode character set. If you wished to use Unicode characters in SQL Server pre-2019, then you had to use the NCHAR, NVARCHAR, and NTEXT variations, which supported the full 16-bit character sets. Starting with 2019, however, this was remedied and full UTF-8 character sets are supported in CHAR/VARCHAR if you set up your database to do so. This is one of the biggest gotchas when working with SQL Server and has come back to burn many developers as any system they are developing expands from US-centric to world-centric in nature. The simple solution is to move to SQL Server 2019 or later or to Azure SQL, which have full Unicode support out of the box.
Oracle
Oracle has a couple of oddities. First, it doesn’t have a TEXT type. Instead, Oracle calls it LONG for some reason. But it’s essentially the same. Also, Oracle has a variation of VARCHAR called VARCHAR2. It’s essentially the same as VARCHAR but supports a couple of additional Oracle-specific standards beyond the standard ANSI definition for VARCHAR. Most Oracle developers just use VARCHAR2.
Oracle provides a couple of other types for large-character data storage: CLOB/NCLOB. These types were essentially designed to be TEXT (LONG), but have twice the storage capacity (4 GB vs 2 GB of data).
MySQL
MySQL provides a couple of additional string types: TINYTEXT, MEDIUMTEXT, and LONGTEXT. These essentially worked the same as TEXT, but have different maximum storage lengths.
Postgres
Postgres really doesn’t have any oddities with string It’s… odd.
Dates and Times
Dates and times are another one where there’s the ANSI-defined standard for SQL, then there’s what the various vendors have put into place.
SQL Server
Type | Range | Usage |
DATE | 0001-01-01 to 9999-12-31 | Date only field |
DATETIME | 1753-01-01 00:00:00 to 9999-12-31 23:59:59.997 | Date and time together. Same as TIMESTAMP in other systems |
DATETIME2 | 0000-01-01 00:00:00 to 9999-12-31 23:59:59.9999999 | Same as DATETIME, but with more precision |
DATETIMEOFFSET | 0000-01-01 00:00:00 to 9999-12-31 23:59:59.9999999 | Same as DATETIME2, but adds time zone awareness |
SMALLDATETIME | 1900-01-01 00:00:00 to 2079-06-06 23:59:59 | Subset of DATETIME |
TIME | 00:00:00.0000000 to 23:59:59.9999999 | Time only field |
It should be noted that SQL Server does have a data type called TIMESTAMP. However, this is a read-only field that stores a date-time value of when that record was created or last updated. It is not updatable and is in no way related to the TIMESTAMP data type that other systems (and the ANSI standard) support.
Oracle
Type | Range | Usage |
DATE | 0001-01-01 to 9999-12-31 | Date only field |
TIMESTAMP | 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999 | Date and time together |
TIMESTAMP WITH TIME ZONE | 0001-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999 | Same as TIMESTAMP, but adds time zone awareness |
TIMESTAMP WITH LOCAL TIME ZONE | 0000-01-01 00:00:00.000000 to 9999-12-31 23:59:59.9999999 | Same as TIMESTAMP, but adds time zone awareness |
Oracle also adds two additional data types called INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND which can be used to store an amount of time between two TIMESTAMP values.
MySQL
Type | Range | Usage |
DATE | 1000-01-01 to 9999-12-31 | Date only field |
TIMESTAMP | 1970-01-01 00:00:01 to 2038-01-19 03:14:07 | Date and time together |
DATETIME | 1000-01-01 00:00:00.000000 to 9999-12-31 23:59:59.999999 | Same as TIMESTAMP, but adds more precision |
Postgres
Type | Range | Usage |
DATE | 4713 BC to 294276 AD | Date only field |
TIME | 00:00:00 to 24:00:00 | Time only field |
TIMESTAMP | 4713 BC 00:00:00.000000 to 294276 AD 23:59:59.999999 | Date and time together |
In Postgres, TIMESTAMP and TIME support are stored with or without time zone information. Postgres also offers the INTERVAL data type to store the difference between two TIMESTAMP values.
Binary
While it’s generally a bad idea to store binary data in a relational database, data types are provided by each system to allow it. SQL Server provides BINARY, VARBINARY, and IMAGE. Oracle provides BINARY, VARBINARY, BLOB, RAW, and LONG RAW. MySQL provides BINARY, VARBINARY, and BLOB. And Postgres offers the BYTEA data type.
These types basically all work the same way. They are designed to store byte data from a file or stream input. The biggest difference is the supported sizes in each type and how it handles the data storage.
Miscellaneous
There are a lot of miscellaneous data types in each system to handle types of data that don’t readily fall into one of the other categories, or that could be put into another data type but the vendor has provided something more specific to that type. For instance: spatial geometry data such as latitude, longitude, and altitude or 3D coordinates; JSON and XML data; UUID/GUID values; enumerations; and so forth.
Conclusion
The main takeaway is this. Every one of the major relational databases supports the core set of data types and what you learn on one system can easily be transferred to one of the others with only minor tweaks. Learn the core principles and you can take that with you to pretty much any job where you need to work with a relational database system.