复杂聚合和自定义函数

  • 简单聚合:输入单个Series,输出只有一个元素的Series。
  • 复杂聚合:输入多个Series,输出多行和多列。

复杂聚合需要自定义函数计算出我们想要的结果。

DataFrame复杂聚合

DataFrame.group_by(["date"])?.apply(F) 组合可完成复杂聚合。 F为自定义函数,要求F满足 |x: DataFrame| -> Result<DataFrame, PolarsError> 以分组后的数据封装进Dataframe,这样可以获得所有字段,返回Dataframe,每一组可以返回多行多列,但要求不同组返回的Dataframe的字段顺序、名称、类型一致。在返回的Dataframe中需要自己维护组别字段。

#![allow(unused)]
fn main() {
let mut employee_df: DataFrame = df!(
    "Name" => ["Lao Li", "Lao Li", "Lao Li", "Lao Li", "Lao Zhang", "Lao Zhang", "Lao Zhang", "Lao Zhang", "Lao Wang", "Lao Wang", "Lao Wang", "Lao Wang"],
    "employee_ID" => ["Employee01", "Employee01", "Employee01", "Employee01", "Employee02", "Employee02", "Employee02", "Employee02", "Employee03", "Employee03", "Employee03", "Employee03"],
    "date" => ["August", "September", "October", "November", "August", "September", "October", "November", "August", "September", "October", "November"],
    "score" => [83, 24, 86, 74, 89, 59, 48, 79, 51, 71, 44, 90]
)?;

let f = |x: DataFrame| -> Result<DataFrame, PolarsError> {
    let col1: &Series = x.column("Name")?;
    let col2: &Series = x.column("employee_ID")?;
    let col3: &Series = x.column("score")?;
    let group_id = x.column("date")?.str()?.get(0).unwrap();
    // do something; We get those results below;
    let group_field = Series::new("group".into(), vec![group_id, group_id, group_id]);
    let res_field1 = Series::new("field1".into(), vec!["a1,1", "a2,1", "a3,1"]);
    let res_field2 = Series::new("field2".into(), vec!["a1,2", "a2,2", "a3,2"]);
    let res_field3 = Series::new("field3".into(), vec!["a1,3", "a2,3", "a3,3"]);
    let result = DataFrame::new(vec![group_field, res_field1, res_field2, res_field3])?;
    return Ok(result);
};

let res = employee_df.group_by(["date"])?.apply(f)?;//一次聚合返回了3行3列,要求是对于不同组schema必须一致(字段顺序,名称,类型)
println!("{}", res);
}

Output

shape: (12, 4)
┌───────────┬────────┬────────┬────────┐
│ group     ┆ field1 ┆ field2 ┆ field3 │
│ ---       ┆ ---    ┆ ---    ┆ ---    │
│ str       ┆ str    ┆ str    ┆ str    │
╞═══════════╪════════╪════════╪════════╡
│ August    ┆ a1,1   ┆ a1,2   ┆ a1,3   │
│ August    ┆ a2,1   ┆ a2,2   ┆ a2,3   │
│ August    ┆ a3,1   ┆ a3,2   ┆ a3,3   │
│ November  ┆ a1,1   ┆ a1,2   ┆ a1,3   │
│ November  ┆ a2,1   ┆ a2,2   ┆ a2,3   │
│ …         ┆ …      ┆ …      ┆ …      │
│ September ┆ a2,1   ┆ a2,2   ┆ a2,3   │
│ September ┆ a3,1   ┆ a3,2   ┆ a3,3   │
│ October   ┆ a1,1   ┆ a1,2   ┆ a1,3   │
│ October   ┆ a2,1   ┆ a2,2   ┆ a2,3   │
│ October   ┆ a3,1   ┆ a3,2   ┆ a3,3   │
└───────────┴────────┴────────┴────────┘