Data Type

Polars internally uses the Arrow data type. The Arrow data type is part of the Apache Arrow project that defines a cross-platform, language-agnostic data format. This data format enables efficient data exchange between different systems and languages without the need for data serialization and deserialization. Arrow datatypes include many common data types, such as integers, floating-point numbers, strings, dates, and times. Complex data structures String, Categorical, and Object are also supported.

Group
Types
Comment
Number typeInt88-bit signed integer.
Number typeInt1616-bit signed integer.
Number typeInt3232-bit signed integer.
Number typeInt6464-bit signed integer.
Number typeUInt88-bit unsigned integer.
Number typeUInt1616-bit unsigned integer.
Number typeUInt3232-bit unsigned integer.
Number typeUInt6464-bit unsigned integer.
Number typeFloat3232-bit floating point.
Number typeFloat6464-bit floating point.
Nested typesStructA struct type similar to Vec<Series> can be used to encapsulate multiple columns of data in a single column. The columns do not have to be of the same type.
Embedding typeListThe list type is the Arrow LargeList type at the bottom.
Time typeDateDate type, the underlying i32 type stores the number of days since 00:00:00 UTC1 on January 1, 1970. The date range is approximately from 5877641 BC to 5877641 AD.
Time typeDatetime(TimeUnit, Option<PlSmallStr>)Datetime type. The first parameter is the unit, and the second parameter is the time zone. Usually defined as Datetime(TimeUnit::Milliseconds,None); The underlying layer is stored in i64 type milliseconds from 00:00:00 UTC on January 1, 19701. The time range is approximately from about 292,469,238 B.C. to 292,473,178 A.D. In practice, it is perfectly sufficient to use i64 types to store timestamps in milliseconds.
Time typeDuration(TimeUnit)Poor storage time. Duration(TimeUnit::Milliseconds) is the return type of the Date/Datetime subtraction operation.
Time typeTimeThe type of time, which stores nanoseconds starting at 0:00 of the day.
otherBooleanBoolean, internally stored in bits.
otherStringString type, the underlying layer is LargeUtf8
otherBinaryArbitrary binary data.
otherObjectA limited supported data type that can be any value.
otherCategoricalCategorical variables. Similar to R factor. Crate feature dtype-categorical only

For a detailed Arrow type description, see:https://arrow.apache.org/docs/format/Columnar.html

1

This day (January 1, 1970 00:00:00 UTC) is known as the UNIX epoch

Float32 and Float64 comply with the IEEE 754 standard, with the following caveats:

Polars requirements operation cannot rely on the positivity or negativity of 0 or NaN, nor does it guarantee a payload with NaN values. This is not limited to arithmetic operations. All zeros are normalized to +0 and all NaN to positive NaN without payload before the sorting and grouping operation for a valid equation check: NaN and NaN comparisons are considered equal. NaN is larger than all non-NaN.

In the IEEE 754 floating-point number standard, 0 and NaN (non-numeric) are signed, which means that there are +0 and -0, positive NaN and negative NaN. Positive and negative zeros are numerically equal, but in some calculations, such as division or the limits of math functions, positive and negative zeros may behave differently. In the IEEE floating-point standard, NaN (non-numeric) is a binary representation with a "payload" section. He refers to the part other than the sign and exponential bits. This section can store additional information. For example, if the result of a mathematical operation is undefined, this information can be stored in the NaN payload. However, most often, this payload is not used, so in many operations, the handling of the payload of NaN is not clearly defined.