PostgreSQL利用递归优化求稀疏列唯一值的方法
  • 作者:admin
  • 发表时间:2021-06-02 07:52
  • 来源:未知

在数据库中经常会碰到一些表的列是稀疏列,只有很少的值,例如性别字段,一般就只有2种不同的值。

但是当我们求这些稀疏列的唯一值时,如果表的数据量很大,速度还是会很慢。

例如:

创建测试表

bill=# create table t_sex (sex char(1), otherinfo text);
CREATE TABLE
bill=# insert into t_sex select 'm', generate_series(1,10000000)||'this is test';
INSERT 0 10000000
bill=# insert into t_sex select 'w', generate_series(1,10000000)||'this is test';
INSERT 0 10000000

 

查询:

可以看到下面的查询速度很慢。

bill=# select count(distinct sex) from t_sex;
 count
-------
   2
(1 row)

Time: 8803.505 ms (00:08.804)
bill=# select sex from t_sex t group by sex;
 sex
-----
 m
 w
(2 rows)

Time: 1026.464 ms (00:01.026)

 

那么我们对该字段加上索引又是什么情况呢?

速度依然没有明显

bill=# create index idx_sex_1 on t_sex(sex);
CREATE INDEX
bill=# select count(distinct sex) from t_sex;
 count
-------
   2
(1 row)

Time: 8502.460 ms (00:08.502)
bill=# select sex from t_sex t group by sex;
 sex
-----
 m
 w
(2 rows)

Time: 572.353 ms

 

的变化,可以看到执行计划已经使用Index Only Scan了。

bill=# explain select count(distinct sex) from t_sex;
                     QUERY PLAN
----------------------------------------------------------------------------------------------
 Aggregate (cost=371996.44..371996.45 rows=1 width=8)
  -> Index Only Scan using idx_sex_1 on t_sex (cost=0.44..321996.44 rows=20000000 width=2)
(2 rows)

 

同样的SQL我们看看在Oracle中性能如何?

创建测试表:

SQL> create table t_sex (sex char(1), otherinfo varchar2(100));

Table created.

SQL> insert into t_sex select 'm', rownum||'this is test' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> insert into t_sex select 'w', rownum||'this is test' from dual connect by level <=10000000;

10000000 rows created.

SQL> commit;

Commit complete.

 

性能测试:

SQL> set lines 1000 pages 2000
SQL> set autotrace on
SQL> set timing on

SQL> select count(distinct sex) from t_sex;

COUNT(DISTINCTSEX)
------------------
         2

Elapsed: 00:00:01.58

Execution Plan
----------------------------------------------------------
Plan hash value: 3915432945

----------------------------------------------------------------------------
| Id | Operation     | Name | Rows | Bytes | Cost (%CPU)| Time   |