`
315634034
  • 浏览: 5151 次
  • 性别: Icon_minigender_1
  • 来自: 长沙
最近访客 更多访客>>
社区版块
存档分类
最新评论

oracle随机查询

 
阅读更多

从Oracle8i开始Oracle提供采样表扫描特性。

Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描

全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.

采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.

SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。

SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.

Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。

1.使用SAMPLE选项

SQL> select * from employee SAMPLE(30);

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=25 Bytes=2175)
   1    0   TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        880  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          3  rows processed


SQL> select * from employee SAMPLE(20);

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=16 Bytes=1392)
   1    0   TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        839  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          2  rows processed     
                 

2.使用SAMPLE BLOCK选项

 

SQL> SELECT * FROM employee SAMPLE BLOCK (50); 

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7566 JONES      MANAGER         7839 02-APR-81       2975                    20
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7782 CLARK      MANAGER         7839 09-JUN-81       2450                    10
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30

10 rows selected.


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=41 Bytes=3567)
   1    0   TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          4  consistent gets
          0  physical reads
          0  redo size
       1162  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         10  rows processed

SQL>
       

3.采样前n条记录的查询

也可以使用dbms_random包实现

 

SQL> select * from ( 
  2  select * from employee
  3  order by dbms_random.value )
  4  where rownum <= 4;    

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7654 MARTIN     SALESMAN        7698 28-SEP-81       1250       1400         30
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7369 SMITH      CLERK           7902 17-DEC-80        800                    20
      7788 SCOTT      ANALYST         7566 19-APR-87       3000                    20


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE
   1    0   COUNT (STOPKEY)
   2    1     VIEW
   3    2       SORT (ORDER BY STOPKEY)
   4    3         TABLE ACCESS (FULL) OF 'EMPLOYEE'




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          3  consistent gets
          0  physical reads
          0  redo size
        927  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          4  rows processed
                

对比一下SAMPLE选项

 

SQL> SELECT * FROM employee SAMPLE (40); 

     EMPNO ENAME      JOB              MGR HIREDATE         SAL       COMM     DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
      7499 ALLEN      SALESMAN        7698 20-FEB-81       1600        300         30
      7521 WARD       SALESMAN        7698 22-FEB-81       1250        500         30
      7698 BLAKE      MANAGER         7839 01-MAY-81       2850                    30
      7839 KING       PRESIDENT            17-NOV-81       5000                    10
      7844 TURNER     SALESMAN        7698 08-SEP-81       1500          0         30


Execution Plan
----------------------------------------------------------
   0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=33 Bytes=2871)
   1    0   TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)




Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          5  consistent gets
          0  physical reads
          0  redo size
        961  bytes sent via SQL*Net to client
        503  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

SQL> 
      

主要注意以下几点:

1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO


===============================
【1】方法一:通过dbms_random.random

select   *   from  ( select   *   from  largetable  order   by  dbms_random.random)  where  rownum  <=   20000 ;

【2】方法二:通过dbms_random.value
select   *   from  ( select   *   from  largetable  order   by  dbms_random.value)  where  rownum  <=   20000 ;

【3】方法三:通过采样表扫描
select   *   from  ( select   *   from  largetable sample( 10 ))  where  rownum  <=   20000 ;

下面我们通过实践来比较这3种方法的效率,首先我们创建一个包含有10W条记录的表用于实验:
create   table  LARGETABLE
(
  ID       
NUMBER   not   null   primary   key ,
  BIRTHDAY DATE 
not   null
)

接下来我们插入10W条数据
create   or   replace   procedure  random_insert  as
  i         
number ;
  startDate date :
=  sysdate;

begin

  
for  i  in   1  ..  100000  loop
    
insert   into  largetable  values  (i, startDate  +   1 );
  
end  loop  commit ;

end ;

在SQL*PLUS下设置显示SQL语句执行时间:set timing on,让后分别运行上述三条语句:

第一个的执行时间为 00: 00: 16: 04
第二个的执行时间为 00: 00: 54: 04
第三个的执行时间为 00: 00: 08: 07

从这里我们可以看出在进行数据随机抽取时,采用sample的方法效率是最高的。为了保证每次随机查询的数据尽量不重复,我们可以把sample中的百分比提高一些(例如从10%提高到20%)。

关于Oracle Sample的介绍和用法,请参考eygle他老人家的一篇文章介绍:如何从结果集中获得随机结果

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics