Expression Methods
| Method | Meaning |
|---|---|
| .alias(name:&str) | Rename a column |
| .floor_div(rhs: Expr) | Integer division by rhs, returning only the integer part and discarding the decimal part. |
| .pow(e) | Exponential function, where e is the exponent |
| .sqrt() | Square root |
| .cbrt() | Cube root |
| .cos() .sin() .cot() .tan() .arccos() .arcsin() .arctan() .arctan2() .cosh() .sinh() .tanh() .arccosh() .arcsinh() .arctanh() | Trigonometric functions |
| .degrees() | Convert radians to degrees |
| .radians() | Convert degrees to radians |
| .shuffle(seed: Option<u64>) | Randomly shuffle, with seed as the random seed. |
| .sample_n(n: Expr, with_replacement: bool, shuffle: bool, seed: Option<u64>) -> Expr | Randomly sample n elements, with_replacement=T indicates sampling with replacement1. Shuffle indicates whether to shuffle after sampling, seed is the random seed;[^sampling with replacement] |
| .std(ddof:u8) | Calculate standard deviation, ddof is the degrees of freedom2. |
| .var(ddof:u8) | Calculate variance, ddof is the degrees of freedom offset |
| .min() | Calculate minimum value, returns NaN if the series contains NaN |
| .max() | Maximum value, returns NaN if the series contains NaN |
| .nan_min() | Ignore NaN, minimum value |
| .nan_max() | Ignore NaN, maximum value |
| .mean() | Arithmetic mean |
| .median() | Median |
| .sum() | Arithmetic sum |
| .eq(E) | Conditional operation ==, but None==None returns Null. See Null and None and NaN |
| .eq_missing(E) | Conditional operation ==, but None==None returns true |
| .neq(E) | Not equal to |
| .neq_missing(E) | Not equal to, but None and None are considered equal |
| .lt(E) | Conditional operation < |
| .gt(E) | Conditional operation > |
| .gt_eq(E) | Conditional operation >= |
| .lt_eq(E) | Conditional operation <= |
| .not(E) | Conditional operation not |
| .is_null()/.is_not_null() | Conditional operation, whether it is null |
| .drop_nulls()/.drop_nans() | Discard null or NaN values in the series |
| .n_unique() | Count the number of unique items |
| .first() | Return the first element |
| .last() | Return the last element |
| .head(length:Option<usize>) | First few elements |
| .tail(length:Option<usize>) | Last few elements |
| .implode() | Convert Series into a List. |
| .explode() | Unpack a List |
| .agg_groups() | Return a list of group indices, see: agg_groups example, Page 10 |
| .filter(predicate:E) | Predicate is an expression that returns a boolean array |
| .slice(offset:Expr,length:Expr) | Index based on the slice described by offset and length |
| .append(other:Expr,upcast:bool) | Append the series of other to self. Upcast indicates whether to upcast, automatically converting to a larger data type. |
| .unique() | Remove duplicate values, but does not guarantee the original order |
| .unique_stable | Remove duplicate values while preserving the original order, more resource-intensive than .unique(). |
| .arg_unique() | Return the index of the first unique value |
| .arg_min() | Return the index of the minimum value |
| .name() | Return ExprNameNameSpace, a type that can operate on multiple field names. |
If "with_replacement" is True, each sample is independent, and a sample can be selected again in subsequent samplings after being selected. This is called "sampling with replacement." If "with_replacement" is False, the selected element is removed from the sample pool and will not be sampled again. This is called "sampling without replacement.
In statistics, the degrees of freedom typically used when calculating the sample standard deviation is n-1 (where n is the sample size). This is known as Bessel's correction, used to correct bias, making the sample standard deviation closer to the population standard deviation. This is because we use the sample mean in the standard deviation calculation, which makes all "differences" not completely independent. When ddof is 1 (the default value), the .std() method uses n-1 as the denominator to calculate the standard deviation, where n is the sample size. If you set ddof to 0, the .std() method will use n as the denominator to calculate the standard deviation. However, this value may underestimate the population standard deviation.