PostgreSQL查询优化中对Having和Group By子句的简化处理分析

网友投稿 421 2024-01-03

PostgreSQL查询优化中对Having和Group By子句的简化处理分析

这篇文章主要介绍“PostgreSQL查询优化中对Having和Group By子句的简化处理分析”，在日常操作中，相信很多人在PostgreSQL查询优化中对Having和Group By子句的简化处理分析问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”PostgreSQL查询优化中对Having和Group By子句的简化处理分析”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

一、基本概念

简化Having语句

把Having中的约束条件,如满足可以提升到Where条件中的,则移动到Where子句中,否则仍保留在Having语句中.这样做的目的是因为Having过滤在Group by之后执行,如能把Having中的过滤提升到Where中,则可以提前执行"选择"运算,减少Group by的开销.

以下语句,条件dwbh=1002提升到Where中执行:testdb=# explain verbose select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by a.dwbh,a.xb testdb-# having count(*) >= 1 and dwbh = 1002;QUERY PLAN ----------------------------------------------------------------------------- GroupAggregate (cost=15.01..15.06 rows=1 width=84) Output: dwbh, xb, count(*) Group Key: a.dwbh, a.xb Filter: (count(*) >=1) -- count(*) >= 1仍保留在Having中 -> Sort (cost=15.01..15.02 rows=2 width=76) Output: dwbh, xb Sort Key: a.xb -> Seq Scan on public.t_grxx a (cost=0.00..15.00rows=2 width=76) Output: dwbh, xb Filter: ((a.dwbh)::text =1002::text) -- 提升到Where中,扫描时过滤Tuple (10 rows)

如存在Group by & Grouping sets则不作处理:

testdb=# explain verbose testdb-# select a.dwbh,a.xb,count(*) testdb-# from t_grxx a testdb-# group by testdb-# grouping sets ((a.dwbh),(a.xb),()) testdb-# having count(*) >= 1 and dwbh = 1002 testdb-# order by a.dwbh,a.xb;QUERY PLAN ------------------------------------------------------------------------------- Sort (cost=28.04..28.05 rows=3 width=84) Output: dwbh, xb, (count(*)) Sort Key: a.dwbh, a.xb -> MixedAggregate (cost=0.00..28.02 rows=3 width=84) Output: dwbh, xb, count(*) Hash Key: a.dwbh Hash Key: a.xb Group Key: () Filter: ((count(*) >=1) AND ((a.dwbh)::text = 1002::text)) -- 扫描数据表后再过滤 -> Seq Scan on public.t_grxx a (cost=0.00..14.00 rows=400 width=76) Output: dwbh, grbh, xm, xb, nl (11 rows)

简化Group by语句如Group by中的字段列表已包含某个表主键的所有列,则该表在Group by语句中的其他列可以删除,这样的做法有利于提升在Group by过程中排序或Hash的性能,减少不必要的开销.

testdb=# explain verbose select a.dwbh,a.dwmc,count(*) testdb-# from t_dwxx a testdb-# group by a.dwbh,a.dwmc testdb-# having count(*) >= 1;QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=13.20..15.20 rows=53 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh, a.dwmc -- 分组键为dwbh & dwmc Filter:(count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..11.60 rows=160 width=256) Output: dwmc, dwbh, dwdz (6rows) testdb=# alter table t_dwxx add primary key(dwbh); -- 添加主键 ALTER TABLE testdb=# explain verbose select a.dwbh,a.dwmc,count(*) from t_dwxx a group by a.dwbh,a.dwmc having count(*) >= 1; QUERY PLAN ----------------------------------------------------------------------- HashAggregate (cost=1.05..1.09rows=1 width=264) Output: dwbh, dwmc, count(*) Group Key: a.dwbh -- 分组键只保留dwbh Filter:(count(*) >= 1) -> Seq Scan on public.t_dwxx a (cost=0.00..1.03 rows=3 width=256) Output: dwmc, dwbh, dwdz (6 rows)二、源码解读

相关处理的源码位于文件subquery_planner.c中,主函数为subquery_planner,代码片段如下:

/* * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause contains aggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed * only once per group). We also cant do this if there are any nonempty * grouping sets; moving such a clause into WHERE would potentially change * the results, if any referenced column isnt present in all the grouping * sets. (If there are only empty grouping sets, then the HAVING clause * must be degenerate as discussed below.) * * Also, it may be that the clause is so expensive to execute that were * better off doing it only once per group, despite the loss of * selectivity. This is hard to estimate short of doing the entire * planning process twice, so we use a heuristic: clauses containing * subplans are left in HAVING. Otherwise, we move or copy the HAVING * clause into WHERE, in hopes of eliminating tuples before aggregation * instead of after. * * If the query has explicit grouping then we can simply move such a * clause into WHERE; any group that fails the clause will not be in the * output because none of its tuples will reach the grouping or * aggregation stage. Otherwise we must have a degenerate (variable-free) * HAVING clause, which we put in WHERE so that query_planner() can use it * in a gating Result node, but also keep in HAVING to ensure that we * dont emit a bogus aggregated row. (This could be done better, but it * seems not worth optimizing.) * * Note that both havingQual and parse->jointree->quals are in * implicitly-ANDed-list form at this point, even though they are declared * as Node *. */ newHaving = NIL; foreach(l, (List *) parse->havingQual)//存在Having条件语句{ Node *havingclause = (Node *) lfirst(l);//获取谓词 if((parse->groupClause && parse->groupingSets) || contain_agg_clause(havingclause) || contain_volatile_functions(havingclause) || contain_subplans(havingclause)) {/* keep it in HAVING */ //如果有Group&&Group Sets语句 //保持不变newHaving = lappend(newHaving, havingclause); }else if (parse->groupClause && !parse->groupingSets) { /* move it to WHERE */ //只有group语句,可以加入到jointree的条件中 parse->jointree->quals = (Node *) lappend((List *) parse->jointree->quals, havingclause); } else//既没有group也没有grouping set,拷贝一份到jointree的条件中 { /* put a copy in WHERE, keep it in HAVING */parse->jointree->quals = (Node *) lappend((List*) parse->jointree->quals, copyObject(havingclause)); newHaving = lappend(newHaving, havingclause); } } parse->havingQual = (Node *) newHaving;//调整having子句 /* Remove any redundant GROUP BY columns */ remove_useless_groupby_columns(root);//去掉group by中无用的数据列

remove_useless_groupby_columns

/* * remove_useless_groupby_columns * Remove any columns in the GROUP BY clause that are redundant due to * being functionally dependent on other GROUP BY columns. * * Since some other DBMSes do not allow references to ungrouped columns, its * not unusual to find all columns listed in GROUP BY even though listing the * primary-key columns would be sufficient. Deleting such excess columns * avoids redundant sorting work, so its worth doing. When we do this, we * must mark the plan as dependent on the pkey constraint (compare the * parsers check_ungrouped_columns() and check_functional_grouping()). * * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE * index as the determining columns. But as with check_functional_grouping(), * theres currently no way to represent dependency on a NOT NULL constraint, * so we consider only the pkey for now. */ staticvoid remove_useless_groupby_columns(PlannerInfo *root) { Query *parse = root->parse;//查询树 Bitmapset **groupbyattnos;//位图集合Bitmapset **surplusvars;//位图集合 ListCell *lc; int relid; /* No chance to do anything if there are less than two GROUP BY items */ if (list_length(parse->groupClause) < 2)//如果只有1个ITEMS,无需处理 return; /* Dont fiddle with the GROUP BY clause if the query has grouping sets */ if (parse->groupingSets)//存在Grouping sets,不作处理 return; /* * Scan the GROUP BY clause to find GROUP BY items that are simple Vars. * Fill groupbyattnos[k] with a bitmapset of the column attnos of RTE k * that are GROUP BY items. */ //用于分组的属性 groupbyattnos = (Bitmapset **) palloc0(sizeof(Bitmapset *) * (list_length(parse->rtable) +1)); foreach(lc, parse->groupClause) { SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);Var *var = (Var *) tle->expr; /* * Ignore non-Vars and Vars from other query levels. * * XXX in principle, stable expressions containing Vars could also be * removed, if all the Vars are functionally dependent on other GROUP * BY items. But its not clear that such cases occur often enough to * be worth troubling over. */ if (!IsA(var, Var) || var->varlevelsup > 0) continue; /* OK, remember we have this Var */ relid = var->varno; Assert(relid <= list_length(parse->rtable)); groupbyattnos[relid] = bms_add_member(groupbyattnos[relid],var->varattno - FirstLowInvalidHeapAttributeNumber); }/* * Consider each relation and see if it is possible to remove some of its * Vars from GROUP BY. For simplicity and speed, we do the actual removal * in a separate pass. Here, we just fill surplusvars[k] with a bitmapset * of the column attnos of RTE k that are removable GROUP BY items. */surplusvars =NULL; /* dont allocate array unless required */ relid = 0; //如某个Relation的分组键中已含主键列,去掉其他列 foreach(lc, parse->rtable) { RangeTblEntry *rte = lfirst_node(RangeTblEntry, lc); Bitmapset *relattnos; Bitmapset *pkattnos; Oid constraintOid; relid++;/* Only plain relations could have primary-key constraints */ if (rte->rtekind != RTE_RELATION) continue; /* Nothing to do unless this rel has multiple Vars in GROUP BY */ relattnos = groupbyattnos[relid]; if(bms_membership(relattnos) != BMS_MULTIPLE)continue; /* * Cant remove any columns for this rel if there is no suitable * (i.e., nondeferrable) primary key constraint. */pkattnos = get_primary_key_attnos(rte->relid,false, &constraintOid); if (pkattnos == NULL) continue; /* * If the primary key is a proper subset of relattnos then we have * some items in the GROUP BY that can be removed. */ if(bms_subset_compare(pkattnos, relattnos) == BMS_SUBSET1) {/* * To easily remember whether weve found anything to do, we dont * allocate the surplusvars[] array until we find something. */ if (surplusvars == NULL) surplusvars = (Bitmapset **) palloc0(sizeof(Bitmapset *) * (list_length(parse->rtable) +1)); /* Remember the attnos of the removable columns */ surplusvars[relid] = bms_difference(relattnos, pkattnos); /* Also, mark the resulting plan as dependent on this constraint */parse->constraintDeps = lappend_oid(parse->constraintDeps, constraintOid); } }/* * If we found any surplus Vars, build a new GROUP BY clause without them. * (Note:this may leave some TLEs with unreferenced ressortgroupref * markings, but thats harmless.) */ if (surplusvars != NULL) { List*new_groupby = NIL;foreach(lc, parse->groupClause) { SortGroupClause *sgc = lfirst_node(SortGroupClause, lc); TargetEntry *tle = get_sortgroupclause_tle(sgc, parse->targetList);Var *var = (Var *) tle->expr; /* * New list must include non-Vars, outer Vars, and anything not * marked as surplus. */ if (!IsA(var, Var) || var->varlevelsup > 0|| !bms_is_member(var->varattno - FirstLowInvalidHeapAttributeNumber, surplusvars[var->varno])) new_groupby = lappend(new_groupby, sgc); } parse->groupClause = new_groupby; } }

到此，关于“PostgreSQL查询优化中对Having和Group By子句的简化处理分析”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注网站，小编会继续努力为大家带来更多实用的文章！

标签：优化方法日常学习代码

暂时没有评论，来抢沙发吧~

PostgreSQL查询优化中对Having和Group By子句的简化处理分析

PostgreSQL查询优化中对Having和Group By子句的简化处理分析

如何通过微信小程序 API 优化企业管理与服务，提升数字化转型效率？

如何在数字化转型中利用移动应用开发提升企业敏捷性与市场适应性？

企业如何通过vue小程序开发满足高效运营与合规性需求

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计