PPTOK課件站

展開

首頁 > 培訓教程PPT
Greenplum數據庫基礎培訓PPT
  • 立即下載
  • PPT預覽

    • Greenplum數據庫基礎培訓PPT

    PPT摘要

    這是一個Greenplum數據庫基礎培訓PPT,主要介紹了相同的客戶端功能、增加支持并行處理的技術、增加支持數據倉庫和BI的特性等內容。

    Greenplum 數據庫基礎培訓
    Wei.Li
    Senior Data Architecture
    Alpine Solution
    2010/05
    Greenplum數據庫
    海量并行處理 (Massively Parallel Processing) DBMS
    基于 PostgreSQL 8.2
    相同的客戶端功能
    增加支持并行處理的技術
    增加支持數據倉庫和BI的特性
    外部表(external tables)/并行加載(parallel loading)
    資源管理
    查詢優化器增強(query optimizer enhancements)
    S1002 Network Configuration
    Greenplum 基本體系架構
    客戶端接口和程序
    Master Host
    每段(Segment)存放一部分用戶數據
     一個系統可以有多段
     用戶不能直接存取訪問
     所有對段的訪問都經過Master
     數據庫監聽進程(postgres)監聽來自Master的連接
    Greenplum數據庫之間的連接層
     進程間協調和管理
     基于千兆以太網架構
     屬于系統內部私網配置
     支持兩種協議:TCP or UDP
    Greenplum 高可用性體系架構
    表分區的概念
    將一張大表邏輯性地分成多個部分,如按照分區條件進行查詢,將減少數據的掃描范圍,提高系統性能。
    提高對于特定類型數據的查詢速度和性能
    也可以更方便數據庫的維護和更新
    兩種類型:
    Range分區 (日期范圍或數字范圍)/如日期、價格等
    List 分區,例如地區、產品等
    Greenplum中的表分區在使用中具有總表的繼承性,并通過Check參數指定相應的子表
    分區的子表依然根據分布策略分布在各segment上
    分區是一種非常有用的優化措施,例如一年的交易按交易日期分區后,查詢一天的交易性能將提高365倍!!!
    壓縮存儲
    壓縮存儲
    支持ZLIB和QUICKLZ方式的壓縮,壓縮比可到10:1
    壓縮表只能是Append Only方式
    壓縮數據不一定會帶來性能的下降,壓縮表將消耗CPU資源,而減少I/O資源占用
    語法
    CREATE TABLE foo (a int, b text)
     WITH (appendonly=true, compresstype=zlib, compresslevel=5);
    行列存儲
    Greenplum支持行或列存儲模式
    列模式目前只支持Append Only
          如果常用的查詢只取表中少量字段,則列模式效率更高,如查詢需要取表中的大量字段,行模式效率更高
    語法:
          CREATE TABLE sales2 (LIKE sales) WITH (appendonly=true, orientation=column);
    效率比較測試:
    測試1:需要去表中所有字段,此時行存儲更快。
     select * from dw_ods.s1_sema_scmcaccp_row where crdacct_nbr= '4033930000166380411';
     41秒
     select * from dw_ods.s1_sema_scmcaccp_col where crdacct_nbr= '4033930000166380411';
     116秒
    測試2:只取表中少量字段,列存儲更快
     select crdacct_status from dw_ods.s1_sema_scmcaccp_row where crdacct_nbr= '4033930000166380411';
     35秒
     select crdacct_status from dw_ods.s1_sema_scmcaccp_col where crdacct_nbr= '4033930000166380411';
     3秒
    外部表加載
    外部表的特征
    Read-only
    數據存放在數據庫外
    可執行SELECT, JOIN, SORT等命令,類似正規表的操作
    外部表的優點
    并行方式加載
    ETL的靈活性
    格式錯誤行的容錯處理
    支持多種數據源
    兩種方式
    External Tables: 基于文件
    Web Tables: 基于URL或指令
    基于外部表的高速數據加載
    利用并行數據流引擎,Greenplum可以直接用SQL操作外部表
    數據加載完全并行
    外部表加載的特征
    并行數據加載提供最好的性能
    能夠處理遠程存儲的文件
    采用HTTP協議
    200 MB/s data distribution rate per gpfdist
    gpfdist文件分發守護進程啟動:
     gpfdist -d /var/load_files/expenses -p 8080 -l /home/gpadmin/log &
    外部表定義:
    CREATE EXTERNAL TABLE ext_expenses    ( name text, date date,  amount float4, description text )
    LOCATION ('gpfdist//etlhost:8081/*','gpfdist//etlhost:8082/*')
    FORMAT 'TEXT' (DELIMITER '|')
    ENCODING ’UTF-8’
    LOG ERRORS INTO ext_expenses_loaderrors
    SEGMENT REJECT LIMIT 10000 ROWS ;
    Load good rows and catch poorly formatted rows, such as:
    rows with missing or extra attributes
    rows with attributes of the wrong data type
    rows with invalid client encoding sequences
    Does not apply to constraint errors:
    PRIMARY KEY, NOT NULL, CHECK or UNIQUE constraints
    Optional error handling clause for external tables:
    [LOG ERRORS INTO error_table] SEGMENT REJECT LIMIT count [ROWS | PERCENT]
    ( PERCENT based on gp_reject_percent_threshold parameter )
    Example
    CREATE EXTERNAL TABLE ext_customer
       (id int, name text, sponsor text)
       LOCATION ( 'gpfdist://filehost:8081/*.txt' )
       FORMAT 'TEXT' ( DELIMITER '|' NULL ' ')
       LOG ERRORS INTO err_customer SEGMENT REJECT LIMIT 5 ROWS;
    Data resides outside the database
    No database statistics for external table data
    Not meant for frequent or ad-hoc access
    Can manually set rough statistics in pg_class:
    UPDATE pg_class        SET reltuples=400000, relpages=400        WHERE relname='myexttable';
    PostgreSQL command
    Support loading  and unloading
    Optimized for loading a large number of rows
    Loads all rows in one command (not parallel)
    Loads data from a file or from standard input
    Supports error handling as does external tables
    EXAMPLE
    COPY mytable FROM '/data/myfile.csv' WITH CSV HEADER; (文件生成在Master)
    \COPY mytable FROM ‘/data/myfile.csv’ WITH CSV HEADER;(文件生成在本地)
    COPY country FROM '/data/gpdb/country_data'
    WITH DELIMITER '|' LOG ERRORS INTO err_country
    SEGMENT REJECT LIMIT 10 ROWS;
    Drop indexes and recreate after load
     Increase maintenance_work_mem parameter to speed up CREATE INDEX operations
    Run ANALYZE after load
    Run VACUUM after load errors、delete、upate。。。
    Do not use ODBC INSERT to load large volumes of data
    資源隊列的兩種典型管理方式
    Resource Queue Limits
    ACTIVE THRESHOLD EXAMPLE:  CREATE RESOURCE QUEUE adhoc ACTIVE THRESHOLD 10 IGNORE THRESHOLD 1000.0;
    COST THRESHOLD EXAMPLES:  CREATE RESOURCE QUEUE batch1 COST THRESHOLD 1000000.0 NOOVERCOMMIT; CREATE RESOURCE QUEUE batch1 COST THRESHOLD 1e+6;
    Greenplum Performance Monitor
     Highly interactive web-based performance monitoring
     Real-time and historic views of:
    Resource utilization
    Queries and query internals
    Each active segment is dumped in parallel
     Dump files created in segment data directory by default
     Supports compression (gzip)
     Ensure sufficient disk space for dump files
     A backup set is identified by a timestamp key
    Example:
    Back up a database:
          gp_dump gpdb
    Back up a database, and create dump files in a centralized location on all hosts:
           gp_dump --gp-d=/home/gpadmin/backups gpdb
    Back up a particular schema only:
          gp_dump -n myschema mydatabase
    Back up a single segment instance only (by noting the dbid of the segment instance):
          gp_dump --gp-s=i[5] gpdb
    On the master host
    gp_catalog_1_<dbid>_<timestamp>
    gp_cdatabase_1_<dbid>_<timestamp>
    gp_dump_1_<dbid>_<timestamp>
    gp_dump_status_1_<dbid>_<timestamp>
    On the segment hosts
    gp_dump_0_<dbid>_<timestamp>
    gp_dump_status_0_<dbid>_<timestamp>
    Use gp_restore command
     Need timestamp key from gp_dump
     Make sure dump files are placed on correct segment hosts
     Make sure database exists before restoring
     Database-level server configuration settings are not restored
    Examples
    Restore an Greenplum database using backup files created by gp_dump:
    gp_restore --gp-k=2005103112453 -d gpdb
    Restore a single segment instance only (by noting the dbid of the segment instance):
    gp_restore --gp-k=2005103112453 -d gpdb --gp-s=i[5]
    gpcrondump
    Calls to gp_dump
    Can be called directly or can schedule using CRON
    Send email notifications
    Flexible dump options
    Can copy configuration files
    Can dump system catalogs
    Can dump global objects
    Can include a post-dump script
    gpdbrestore
    Restores from gpcrondump files
    Can restore from an archive host – no need to pre-place dump files on segments
    Green plum also supports pg_dump and pg_restore
    Useful for migrating data to/from other DBMS
    pg_dump creates a single dump file
    Can be slow on very large databases
    Run at low-usage times
    Supports compression
    Can dump data as INSERT or COPY commands
    gp-syntax option includes DISTRIBUTED BY statements in DDL
    Dump a database called mydb into a SQL-script file:
     pg_dump mydb > db.sql
     To reload such a script into a (freshly created) database named newdb:
     psql -d newdb -f db.sql
     Dump a Greenplum database in tar file format and include distribution policy information:
     pg_dump -Ft --gp-syntax mydb > db.tar
     To dump a database into a custom-format archive file:
     pg_dump -Fc mydb > db.dump
     To reload an archive file into a (freshly created) database named newdb:
     pg_restore -d newdb db.dump
     To dump a single table named mytab:
     pg_dump -t mytab mydb > db.sql
     To specify an upper-case or mixed-case name in -t and related switches, you need to double-quote the name; else it will be folded to lower case. But double quotes are special to the shell, so in turn they must be quoted. Thus, to dump a single table with a mixed-case name, you need something like:
     pg_dump -t '"MixedCaseName"' mydb > mytab.sql
    客戶端工具
    pgAdmin3
    圖形化管理和SQL執行/分析/監控工具
    psql
     行命令操作和管理工具
    pgAdmin3 for GPDB
    pgAdmin3 is the leading graphical Open Source management, development and administration tool for PostgreSQL
    Greenplum has contributed extensive GPDB-specific enhancements
    With GPDB 3.3, Greenplum ships and supports this tool
    pgAdmin3 for GPDB
    pgAdmin3 for GPDB
    PSQL
    Connect through the master
    Connection information
    database name (-d | PGDATABASE)
    master host name (-h | PGHOST)
    master port (-p | PGPORT)
    user name (-U | PGUSER)
    First time connections
    template1 database
    default superuser account (gpadmin)
    Issuing SQL Statements
    Interactive mode psql mydatabase mydatabase=# SELECT * FROM foo;
    Non-interactive mode (single command) psql mydatabase –ac “SELECT * FROM foo;”
    Non-interactive mode (multiple commands) psql mydatabase –af /home/lab1/sql/createdb.sql
    (Use semi-colon (;) to denote end of a statement)
    Common PSQL Meta Commands
    \? (help on psql meta-commands)
    \h (help on SQL command syntax)
    \dt (show tables)
    \dtS (show system tables)
    \dg or \du (show roles)
    \l (show databases)
    \c db_name (connect to this database)
    \q (quit psql)
    \! (Enter into shell mode)
    \df (show function)
    \dn(show  schema)
    Set  search_path=…
    \timing
    postgresql.conf file
    Located in master or segment instance’s data directory
    Used to set configuration parameters on the system level
    Parameters that are using the default setting are commented out (#)
    Requires a restart (or reload using gpstop -u) for changes to take effect
    Viewing Parameter Settings
    View a specific parameter setting Example: SHOW search_path;
    View all parameter settings   Example: SHOW ALL;
    Set parameter   Example: set search_path=public
        set client_encoding=gb18030
    Configuring Host-Based Authentication
    客戶端授權
    是否允許從某個客戶端的連接
    用戶是否可以連接到所請求的數據庫
    pg_hba.conf file
    基于host address, database, and/or DB user account控制權限
    位于master和segment實例的數據目錄中
    系統初始化時進行default配置
    Default Master Host pg_hba.conf
    Local connections allowed for Greenplum superuser
    Remote connections not allowed
    EXAMPLE
    # TYPE    DATABASE    USER    CIDR-ADDRESS    METHOD
      local     all     gpadmin                 ident sameuser
      local     all     all                     ident sameuser
      host      all     gpadmin     127.0.0.0/0     trust
    配置 pg_hba.conf
    EXAMPLE
    # TYPE    DATABASE    USER    CIDR-ADDRESS    METHOD
      local all all                         trust
      host all   all  127.0.0.1/32    trust
      host all   all  ::1/128         trust
      host all   all  10.0.0.206/32   trust
          host carddw etl 21.104.138.12/32 md5
          host gpadmin all 21.104.138.0/24 md5
    gpstop  -u 可與在不重啟數據庫方式下,讓設置生效
    SQL語法
    具體參考《Gp sql language》:
    注意事項:
    DELETE  ,UPDATE在兩表關聯時,兩個表的distribution必須一致。
        如:delete  from tablea  using tableb where tablea.id=tableb.id UPDATE table a  as a
          SET  desc= b.desc
          FROM  tableb as b
          WHERE  a.id=b.id
    以上操作 table a,table b 必須使用相同的分布,必要時可以使用 alter table set distribution 進行分布修改。
    數據類型
    常用數據類型
    CHAR,VARCHAR,TEXT
    Smallint ,integer,bigint
    Numeric, real,double precision
    Timestamp,date,time
    Boolean
    Array 類型。如 integer[]
    其它數據類型請參考
    All system catalogs in pg_catalog schema
    Standard PostgreSQL system catalogs (pg_*)
    Greenplum-specific catalogs:
     gp_configuration
     gp_distribution_policy
     gp_id
     gp_version_at_initdb
     pg_resqueue
     pg_exttable
     pg_tables
     pg_class
     pg_stat_activity
    To list all system catalog tables in psql: \dtS
    To list all system views in psql: \dvS
    其它 catalog 參考
    函數
    日期函數
    Extract(day|month|year。。。    From  date);
    Select   date + ‘1 day’::interval,date+ ‘1 month’::interval
    SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40'); Result: 16
    SELECT date_trunc('hour', TIMESTAMP '2001-02-16 20:38:40'); Result: 2001-02-16 20:00:00
    pg_sleep(seconds);
    系統日期變量
    Current_date
    Current_time
    Current_timestamp
    Now()
    Timeofday()   在 事務中發生變化,以上函數在事務中不變
    函數
    字符串處理函數
    Substr/length/lower/upper/trim/replace/position
    rPad/lpad
    To_char,  
    ||  (字符串連接)
    substring(string from pattern
    ~/  ~*   ,like,simillar   to  (模式匹配)
    其它雜類
    Case   。。。When/Coalesce
    nullif
    generate_series
    In/not  in/exists/any/all
    存儲過程
    Greenplum支持SQL/PYTHON/PERL/C語言構建函數,以下著重介紹SQL  存儲過程。
    一個存儲過程就是一個事務,包括對子過程的調用都在一個事務內
    存儲過程結構:
    CREATE FUNCTION somefunc() RETURNS integer AS $$
    DECLARE
        quantity integer := 30;
    BEGIN
    RETURN ....;
    END;
    $$ LANGUAGE plpgsql;
    賦值
    給一個變量或行/記錄賦值用下面方法:identifier := expression
    例子:user_id := 20;
    執行一個沒有結果的查詢: PERFORM query;
    一個例子:
    PERFORM create_mv('cs_session_page_requests_mv', my_query);
    存儲過程
    動態SQL
    EXECUTE command-string [INTO [STRICT] target];
    SELECT INTO
    Example:SELECT   ID INTO VAR_ID FROM TABLEA
    獲取結果狀態
    GET DIAGNOSTICS variable = item [, ...];
    一個例子:
     ·GET DIAGNOSTICS integer_var = ROW_COUNT;
    SQL返回變量
    SQLERRM, SQLSTATE
    控制結構
    IF ... THEN ... ELSEIF ... THEN ... ELSE
    LOOP, EXIT, CONTINUE, WHILE, FOR
    從函數返回
    有兩個命令可以用來從函數中返回數據:RETURN 和 RETURN NEXT 。
    Syntax:RETURN expression;
    設置回調
    EXEC SQL WHENEVER condition action;
     condition 可以是下列之一:
    SQLERROR,SQLWARNING,NOT FOUND
    存儲過程
    異常處理
    EXCEPTION WHEN unique_violation
    THEN
    -- do nothing
    END;
    忽略錯誤:
    EXCEPTION  WHEN OTHERS THEN
                RAISE NOTICE 'an EXCEPTION is about to be raised';
                RAISE EXCEPTION 'NUM:%, DETAILS:%', SQLSTATE, SQLERRM;
    END;
    錯誤和消息
    RAISE level 'format' [, expression [, ...]];
    Level:
    Info:信息輸入
    Notice:信息提示
    Exception:產生一個例外,將退出存儲過程
    Example: RAISE NOTICE 'Calling cs_create_job(%)', v_job_id;
    Set performance expectations
    acceptable response times, queries per minute, etc.
    Benchmarks
    Know your baseline hardware performance
    throughput / capacity
    Know your workload
    heavy usage times
    resource contention
    data contention
    Focus your optimizations
    Hardware Issues / Failed Segments
    Resource Allocation
    Contention Between Concurrent Workloads
    Inaccurate Database Statistics
    Uneven Data Distribution
    SQL Formulation
    Database Design
    Disk failures
    Host failures
    Network failures
    OS not tuned for Greenplum
    Disk Capacity
     70% maximum recommended
     VACUUM after updates, deletes and loads
    VACUUM configuration parameters
     max_fsm_relations = tables + indexes + system tables
     max_fsm_pages = 16 * max_fsm_relations
    Greenplum resource queues
    limit active queries in the system
    limit the size of a query a particular user can run
    Perform admin tasks at low usage times
    Data loading, ETL
    VACUUM, ANALYZE
    Backups
    Design applications to prevent lock conflicts
    Concurrent sessions not updating the same data at the same time
    Resource-related Configuration Parameters
    work_mem = 32MB
    maintenance_work_mem = 64MB
    shared_buffers = 125MB
    Database statistics used by the query planner
    Run ANALYZE after
    Data loads
    Restores from backups
    Changes to schema (adding indexes, etc.)
    Inserts, Updates, or Deletes
    Can configure statistics collection
     default_statistics_target = 25
     gp_analyze_relative_error = .25
     on specific table columns
    ALTER TABLE name ALTER column SET STATISTICS #
    Table Distribution Key Considerations
    Even data distribution
    Local vs. distributed operations
    Even query processing
    Checking for data skew
    gpskew –t schema.table
    Unix system utilities (gpssh):gpssh –f seg_host ->sar 1 100
    Rebalancing a Table
      CREATE TABLE sales_temp (LIKE sales) DISTRIBUTED BY (date, total, customer); INSERT INTO sales_temp SELECT * FROM sales; DROP sales; ALTER TABLE sales_temp RENAME TO sales;
    Data Type Selection
    smallest size possible to fit data
    INTEGER, not NUMERIC(11,2) for integer fields
    same data type across tables for join columns
    consider hash of wide join keys, using BYTEA instead of CHAR(100)
     varchar or text for character data
    Denormalization (star schema)
    Table Partitioning
    Use sparingly in Greenplum Database
    Try workload without indexes first
    Can improve performance of OLTP type workloads
    Other index considerations:
    Avoid on frequently updated columns
    Avoid overlapping indexes
    Use bitmap indexes where applicable instead of B-tree
    Drop indexes for loads
    Consider a clustered index
    Configuring Index Usage:
    enable_indexscan = on | off
    General Considerations
    Know your data
    Minimize returned rows
    Avoid unnecessary columns/tables in result set
    Avoid large sorts if possible
    Match data types in joins
    Greenplum-specific Considerations
    Join on common distribution key columns when possible
    Consider data distribution policy and query predicates
    System Catalog Tables and Views
    pg_stat_activity
    pg_locks / pg_class
    Database Logs
    Located in master (and segment) data directories
    UNIX system utilities (gpssh)
    Examine query plans to identify tuning opportunities
    What to look for?
    Plan operations that are taking exceptionally long
    Are the planner’s estimates close to reality? (EXPLAIN ANALYZE)
    Is the planner applying selective predicates early?
    Is the planner choosing the best join order?
    Is the planner selectively scanning partitioned tables?
    Is the planner choosing hash aggregate and hash join operations where applicable?
    Is there sufficient work memory?
    To see the plan for a query
    EXPLAIN <query>
    EXPLAIN ANALYZE <query>
    Query plans are read from bottom to top
    Motions (Gather, Redistribute, Broadcast)
    Joins, sorts, aggregations
    Table scans
    The following metrics are given for each operation
    cost (units of disk page fetches)
    rows (rows output by this node)
    width (bytes of the rows produced by this node)
    EXPLAIN SELECT * FROM names WHERE name = 'Joelle';     
    --------------------------- QUERY PLAN -------------------------
    Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
        -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
                    Filter: name::text ~~ 'Joelle'::text
    EXPLAIN ANALYZE SELECT * FROM names WHERE name = 'Joelle';
    ---------------------- QUERY PLAN ------------------------
    Gather Motion 2:1 (slice1) (cost=0.00..20.88 rows=1 width=13)
      recv: Total 1 rows with 0.305 ms to first row, 0.537 ms to end.
       -> Seq Scan on 'names' (cost=0.00..20.88 rows=1 width=13)
         Total 1 rows (seg0) with 0.255 ms to first row, 0.486 ms to end.
             Filter: name::text ~~ 'Joelle'::text
    22.548 ms elapsed
    MVCC (multi-version concurrency control)
    Greenplum supports all transaction isolation levels defined in the SQL standard
    INSERT/COPY acquire locks at the row-level
    UPDATE/DELETE acquire locks at the table-level
    Can use LOCK command to acquire specific locks
    ACCESS SHARE (SELECT, ANALYZE)
    ROW SHARE (SELECT FOR UPDATE, SELECT FOR SHARE)
    ROW EXCLUSIVE (INSERT, COPY)
    SHARE UPDATE EXCLUSIVE (VACUUM)
    SHARE (CREATE INDEX)
    SHARE ROW EXCLUSIVE
    EXCLUSIVE (UPDATE/DELETE)
    ACCESS EXCLUSIVE (ALTER TABLE, DROP TABLE, REINDEX, CLUSTER, and VACUUM FULL)
    Lock conflicts caused by:
    Concurrent transactions accessing the same object
    Resource queue locks
    Transaction deadlocks between segments (rare)
    Query pg_locks system table to see current locks
    EXAMPLE:
    SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query
    FROM pg_locks l, pg_class c, pg_stat_activity a
    WHERE l.relation=c.oid AND l.pid=a.procpid
    ORDER BY c.relname;
    Transactions bundle multiple statements into one ‘all-or-nothing’ operation
    Transaction commands
    BEGIN or START TRANSACTION
    END or COMMIT
    ROLLBACK
    SAVEPOINT and ROLLBACK TO SAVEPOINT
    Autocommit mode in psql
    \set autocommit on|off
    Two-phase commit transactions not supported
    PREPARE TRANSACTION
    COMMIT PREPARED
    ROLLBACK PREPARED
    Collects information about database activity
    Server configuration parameters
    start_stats_collector = on
    stats_block_level = off
    stats_row_level = off
    stats_queue_level = off
    stats_command_string = on
    To see statistics views and tables in catalog:
     \dtvS pg_stat*
    Linux: ps ax | grep postgres
    Solaris: ps –ef | grep postgres
     pargs <process_id>
    Greenplum Master Instance
       postgres database listener process
    postgres: <sub_process_name>
    postgres: seqserver process
    postgres: <user> <database> <con#> <host> <cmd#><slice#>
    Greenplum Segment Instance
       postgres database listener process
     postgres: <sub_process_name>
    附      錄
    About Window Functions
    Constructing a Window Specification
    OVER clause
    WINDOW clause
    Built-in Window Functions
    New class of function allowed only in the SELECT list
    Returns a value per row (unlike aggregate functions)
    Results interpreted in terms of the current row and its corresponding window partition or frame
    Characterized by the use of the OVER clause
    Defines the window partitions (groups of rows) to apply the function
    Defines ordering of data within a window
    Defines the positional or logical framing of a row in respect to its window
    All window functions have an OVER() clause
    Specifies the ‘window’ of data to which the function applies
    Defines:
     Window partitions (PARTITION BY clause)
     Ordering within a window partition (ORDER BY clause)
     Framing within a window partition (ROWS/RANGE clauses)
    SELECT * , 
    row_number()
    OVER()
    FROM sale
    ORDER BY cn;
    SELECT * , 
    row_number()
    OVER(PARTITION BY cn)
    FROM sale
    ORDER BY cn;
    SELECT vn, sum(prc*qty)
    FROM sale
    GROUP BY vn
    ORDER BY 2 DESC;
    SELECT vn, sum(prc*qty), rank()
    OVER (ORDER BY sum(prc*qty) DESC)
    FROM sale
    GROUP BY vn
    ORDER BY 2 DESC;
    Window Framing: “Box car” Average
    Useful for multiple window function queries
    Define and name a window specification
    Reuse window specification throughout the query
    EXAMPLE:
    SELECT
    RANK() OVER (ORDER BY pn), 
    SUM(prc*qty) OVER (ORDER BY pn),
    AVG(prc*qty) OVER (ORDER BY pn)
    FROM sale;
    SELECT
    RANK() OVER (w1),
    SUM(prc*qty) OVER (w1),
    AVG(prc*qty) OVER (w1)
    FROM sale
    WINDOW w1 AS (ORDER BY pn);
    cume_dist()
    dense_rank()
    first_value(expr)
    lag(expr [,offset] [,default])
    last_value(expr)
    lead(expr [,offset] [,default])
    ntile(expr)
    percent_rank()
    rank()
    row_number()
    * Any aggregate function (used with the OVER clause) can    also be used as a window function
    Q&A
    問題與解答
     

    相關PPT

    《Greenplum數據庫基礎培訓PPT》是由用戶south于2016-03-05上傳,屬于培訓教程PPT。
    顯示全部

    相關PPT

    熱門推薦

    熱門下載

    青青青伊人