PostgreSQL中remove_useless_joins的实现逻辑是怎样的

网友投稿 342 2024-01-04

PostgreSQL中remove_useless_joins的实现逻辑是怎样的

这篇文章主要介绍“PostgreSQL中remove_useless_joins的实现逻辑是怎样的”，在日常操作中，相信很多人在PostgreSQL中remove_useless_joins的实现逻辑是怎样的问题上存在疑惑，小编查阅了各式资料，整理出简单好用的操作方法，希望对大家解答”PostgreSQL中remove_useless_joins的实现逻辑是怎样的”的疑惑有所帮助！接下来，请跟着小编一起来学习吧！

query_planner代码片段:

//... /* * Remove any useless outer joins. Ideally this would be done during * jointree preprocessing, but the necessary information isnt available * until weve built baserel data structures and classified qual clauses. */joinlist = remove_useless_joins(root, joinlist);//清除无用的外连接 /* * Also, reduce any semijoins with unique inner rels to plain inner joins. * Likewise, this cant be done until now for lack of needed info. */ reduce_unique_semijoins(root);//消除半连接 /* * Now distribute "placeholders" to base rels as needed. This has to be * done after join removal because removal could change whether a * placeholder is evaluable at a base rel. */ add_placeholders_to_base_rels(root);//在"base rels"中添加PH //...一、数据结构

PlaceHolderVar上一小节已介绍过PHInfo

/* * Placeholder node for an expression to be evaluated below the top level * of a plan tree. This is used during planning to represent the contained * expression. At the end of the planning process it is replaced by either * the contained expression or a Var referring to a lower-level evaluation of * the contained expression. Typically the evaluation occurs below an outer * join, and Var references above the outer join might thereby yield NULL * instead of the expression value. * * Although the planner treats this as an expression node type, it is not * recognized by the parser or executor, so we declare it here rather than * in primnodes.h. */ typedef struct PlaceHolderVar {Expr xpr; Expr *phexpr;/* the represented expression */ Relids phrels; /* base relids syntactically within expr src */ Index phid; /* ID for PHV (unique within planner run) */ Index phlevelsup; /* > 0 if PHV belongs to outer query */ } PlaceHolderVar;

SpecialJoinInfo

/* * "Special join" info. * * One-sided outer joins constrain the order of joining partially but not * completely. We flatten such joins into the planners top-level list of * relations to join, but record information about each outer join in a * SpecialJoinInfo struct. These structs are kept in the PlannerInfo nodes * join_info_list. * * Similarly, semijoins and antijoins created by flattening IN (subselect) * and EXISTS(subselect) clauses create partial constraints on join order. * These are likewise recorded in SpecialJoinInfo structs. * * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility * of planning for them, because this simplifies make_join_rel()s API. * * min_lefthand and min_righthand are the sets of base relids that must be * available on each side when performing the special join. lhs_strict is * true if the special joins condition cannot succeed when the LHS variables * are all NULL (this means that an outer join can commute with upper-level * outer joins even if it appears in their RHS). We dont bother to set * lhs_strict for FULL JOINs, however. * * It is not valid for either min_lefthand or min_righthand to be empty sets; * if they were, this would break the logic that enforces join order. * * syn_lefthand and syn_righthand are the sets of base relids that are * syntactically below this special join. (These are needed to help compute * min_lefthand and min_righthand for higher joins.) * * delay_upper_joins is set true if we detect a pushed-down clause that has * to be evaluated after this join is formed (because it references the RHS). * Any outer joins that have such a clause and this join in their RHS cannot * commute with this join, because that would leave noplace to check the * pushed-down clause. (We dont track this for FULL JOINs, either.) * * For a semijoin, we also extract the join operators and their RHS arguments * and set semi_operators, semi_rhs_exprs, semi_can_btree, and semi_can_hash. * This is done in support of possibly unique-ifying the RHS, so we dont * bother unless at least one of semi_can_btree and semi_can_hash can be set * true. (You might expect that this information would be computed during * join planning; but its helpful to have it available during planning of * parameterized table scans, so we store it in the SpecialJoinInfo structs.) * * jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching * the inputs to make it a LEFT JOIN. So the allowed values of jointype * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI. * * For purposes of join selectivity estimation, we create transient * SpecialJoinInfo structures for regular inner joins; so it is possible * to have jointype == JOIN_INNER in such a structure, even though this is * not allowed within join_info_list. We also create transient * SpecialJoinInfos with jointype == JOIN_INNER for outer joins, since for * cost estimation purposes it is sometimes useful to know the join size under * plain innerjoin semantics. Note that lhs_strict, delay_upper_joins, and * of course the semi_xxx fields are not set meaningfully within such structs. */ typedef struct SpecialJoinInfo { NodeTag type; Relids min_lefthand; /* base relids in minimum LHS for join */ Relids min_righthand; /* base relids in minimum RHS for join */ Relids syn_lefthand; /* base relids syntactically within LHS */ Relids syn_righthand; /* base relids syntactically within RHS */ JoinType jointype; /* always INNER, LEFT, FULL, SEMI, or ANTI */ bool lhs_strict; /* joinclause is strict for some LHS rel */ bool delay_upper_joins; /* cant commute with upper RHS */ /* Remaining fields are set only for JOIN_SEMI jointype: */ bool semi_can_btree; /* true if semi_operators are all btree */ bool semi_can_hash; /* true if semi_operators are all hash */ List *semi_operators; /* OIDs of equality join operators */ List *semi_rhs_exprs; /* righthand-side expressions of these ops */ } SpecialJoinInfo;二、源码解读

remove_useless_joins清除无用的连接,比如以下的SQL语句:

select t1.dwbh from t_grxx t1 left join t_dwxx t2 on t1.dwbh = t2.dwbh;

左连接,而且t_dwxx.dwbh唯一,这样的连接是不需要的连接,直接查询t_grxx即可.

从执行计划来看,PG只对t_grxx进行扫描:testdb=# explain verbose select t1.dwbh from t_grxx t1left join t_dwxx t2 on t1.dwbh = t2.dwbh; QUERY PLAN -------------------------------------------------------------------- Seq Scan on public.t_grxx t1 (cost=0.00..14.00 rows=400 width=38) Output: t1.dwbh (2 rows)

源代码如下:

/* * remove_useless_joins * Check for relations that dont actually need to be joined at all, * and remove them from the query. * * We are passed the current joinlist and return the updated list. Other * data structures that have to be updated are accessible via "root". */ List* remove_useless_joins(PlannerInfo *root,List *joinlist) { ListCell *lc; /* * We are only interested in relations that are left-joined to, so we can * scan the join_info_list to find them easily. */ restart: foreach(lc, root->join_info_list)//遍历连接信息链表{ SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); int innerrelid; int nremoved;/* Skip if not removable */ if(!join_is_removable(root, sjinfo))//判断是否可以清除连接 continue; /* * Currently, join_is_removable can only succeed when the sjinfos * righthand is a single baserel. Remove that rel from the query and * joinlist. */innerrelid = bms_singleton_member(sjinfo->min_righthand); remove_rel_from_query(root, innerrelid, bms_union(sjinfo->min_lefthand, sjinfo->min_righthand));//从查询中删除相应的Rel /* We verify that exactly one reference gets removed from joinlist */ nremoved = 0; joinlist = remove_rel_from_joinlist(joinlist, innerrelid, &nremoved);if (nremoved != 1) elog(ERROR, "failed to find relation %d in joinlist", innerrelid); /* * We can delete this SpecialJoinInfo from the list too, since its no * longer of interest. */ //更新连接链表信息 root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);/* * Restart the scan. This is necessary to ensure we find all * removable joins independently of ordering of the join_info_list * (note that removal of attr_needed bits may make a join appear * removable that did not before). Also, since we just deleted the * current list cell, wed have to have some kluge to continue the * list scan anyway. */ goto restart; } returnjoinlist; }

reduce_unique_semijoins

把可以简化的半连接转化为内连接.

比如以下的SQL语句:select t1.* from t_grxx t1 where dwbh IN (select t2.dwbh from t_dwxx t2);

由于子查询"select t2.dwbh from t_dwxx t2"的dwbh是PK,子查询提升后,t_grxx的dwbh只对应t_dwxx唯一的一条记录,因此可以把半连接转换为内连接,执行计划如下:

testdb=# explain verbose select t1.* fromt_grxx t1 where dwbh IN (select t2.dwbhfromt_dwxx t2); QUERY PLAN ----------------------------------------------------------------------------- Hash Join (cost=1.07..20.10 rows=6 width=176) Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl Inner Unique:true Hash Cond: ((t1.dwbh)::text = (t2.dwbh)::text) -> Seq Scan on public.t_grxx t1 (cost=0.00..14.00 rows=400 width=176) Output: t1.dwbh, t1.grbh, t1.xm, t1.xb, t1.nl -> Hash (cost=1.03..1.03 rows=3 width=38) Output: t2.dwbh -> Seq Scanon public.t_dwxx t2 (cost=0.00..1.03 rows=3 width=38) Output: t2.dwbh (10 rows)

跟踪分析:

(gdb) n 199 reduce_unique_semijoins(root); (gdb) step reduce_unique_semijoins (root=0x1702968) at analyzejoins.c:520 520 for (lc = list_head(root->join_info_list); lc != NULL; lc = next) (gdb) n 522 SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc); (gdb)

查看SpecialJoinInfo内存结构:

528 next = lnext(lc); (gdb) p *sjinfo $1 = {type = T_SpecialJoinInfo, min_lefthand = 0x1749818, min_righthand = 0x1749830, syn_lefthand = 0x1749570, syn_righthand = 0x17495d0, jointype = JOIN_SEMI, lhs_strict = true, delay_upper_joins = false, semi_can_btree = true, semi_can_hash = true, semi_operators = 0x17496c8, semi_rhs_exprs = 0x17497b8}

内表(innerrel,即t_dwxx)如支持唯一性,则可以考虑把半连接转换为内连接

550 if (!rel_supports_distinctness(root, innerrel)) ... 575 root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo); ...

源代码如下:

/* * reduce_unique_semijoins * Check for semijoins that can be simplified to plain inner joins * because the inner relation is provably unique for the join clauses. * * Ideally this would happen during reduce_outer_joins, but we dont have * enough information at that point. * * To perform the strength reduction when applicable, we need only delete * the semijoins SpecialJoinInfo from root->join_info_list. (We dont * bother fixing the join type attributed to it in the query jointree, * since that wont be consulted again.) */void reduce_unique_semijoins(PlannerInfo *root) { ListCell *lc; ListCell *next;/* * Scan the join_info_list to find semijoins. We cant use foreach * because we may delete the current cell. */ for(lc = list_head(root->join_info_list); lc !=NULL; lc = next) { SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);//特殊连接信息,先前通过deconstruct函数生成int innerrelid; RelOptInfo *innerrel; Relids joinrelids;List*restrictlist; next = lnext(lc);/* * Must be a non-delaying semijoin to a single baserel, else we arent * going to be able to do anything with it. (Its probably not * possible for delay_upper_joins to be set on a semijoin, but we * might as well check.) */ if(sjinfo->jointype != JOIN_SEMI || sjinfo->delay_upper_joins)continue; if(!bms_get_singleton_member(sjinfo->min_righthand, &innerrelid))continue; innerrel = find_base_rel(root, innerrelid);/* * Before we trouble to run generate_join_implied_equalities, make a * quick check to eliminate cases in which we will surely be unable to * prove uniqueness of the innerrel. */ if(!rel_supports_distinctness(root, innerrel))continue; /* Compute the relid set for the join we are considering */joinrelids = bms_union(sjinfo->min_lefthand, sjinfo->min_righthand);/* * Since were only considering a single-rel RHS, any join clauses it * has must be clauses linking it to the semijoins min_lefthand. We * can also consider EC-derived join clauses. */restrictlist = list_concat(generate_join_implied_equalities(root, joinrelids, sjinfo->min_lefthand, innerrel), innerrel->joininfo);/* Test whether the innerrel is unique for those clauses. */ if(!innerrel_is_unique(root, joinrelids, sjinfo->min_lefthand, innerrel, JOIN_SEMI, restrictlist,true)) continue; /* OK, remove the SpecialJoinInfo from the list. */ root->join_info_list = list_delete_ptr(root->join_info_list, sjinfo);//删除特殊连接信息} }

add_placeholders_to_base_rels把PHV分发到base rels中,代码较为简单

/* * add_placeholders_to_base_rels * Add any required PlaceHolderVars to base rels targetlists. * * If any placeholder can be computed at a base rel and is needed above it, * add it to that rels targetlist. This might look like it could be merged * with fix_placeholder_input_needed_levels, but it must be separate because * join removal happens in between, and can change the ph_eval_at sets. There * is essentially the same logic in add_placeholders_to_joinrel, but we cant * do that part until joinrels are formed. */void add_placeholders_to_base_rels(PlannerInfo *root) { ListCell *lc;foreach(lc, root->placeholder_list)//遍历PH链表{ PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(lc); Relids eval_at = phinfo->ph_eval_at; int varno;if(bms_get_singleton_member(eval_at, &varno) && bms_nonempty_difference(phinfo->ph_needed, eval_at))//添加到需要的RelOptInfo中{ RelOptInfo *rel = find_base_rel(root, varno); rel->reltarget->exprs = lappend(rel->reltarget->exprs, copyObject(phinfo->ph_var));/* reltargets cost and width fields will be updated later */} } }

到此，关于“PostgreSQL中remove_useless_joins的实现逻辑是怎样的”的学习就结束了，希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习，快去试试吧！若想继续学习更多相关知识，请继续关注网站，小编会继续努力为大家带来更多实用的文章！

标签：数据判断学习 root 日常

暂时没有评论，来抢沙发吧~

PostgreSQL中remove_useless_joins的实现逻辑是怎样的

PostgreSQL中remove_useless_joins的实现逻辑是怎样的

react 前端框架如何驱动企业数字化转型与创新发展

微前端架构如何改变企业的开发模式与效率提升

轻量级前端框架助力开发者提升项目效率与性能

最近发表

更多内容

小程序SDK

Finclip技术文档

小程序开发

小程序容器

小程序框架

Finclip小程序平台

Finclip用户投稿

车联网

推荐文章

小程序SDK是什么意思？小程序sdk和插件有什么区别？

小程序支付功能怎么实现？

企业app开发流程是什么？

app运营模式有哪些？

小程序多端引流怎么做？

小程序生态分析的机会和威胁

Flutter入门这一篇效率文章就够了

原生与跨平台解决方案分析,跨平台软件开发技术方案

热更新技术：让软件更新变得更加轻松快速

解决方案

银行解决方案

证券解决方案

互联网解决方案

政企OA解决方案

科技解决方案

loT解决方案

信任解决方案

热评文章

AppCan:基于混合模式的移动应用开发,移动混合模

Hybrid App混合模式开发的了解

小程序容器技术助力券商数字营销突围，小程序容器化的意

用mpvue开发微信小程序基础知识（vue.js开发

小程序多端框架全面测评对比，强烈推荐！

券商app架构 - 解析券商应用程序的构建与设计