【数据库系统概念】期末复习笔记
选择题
1. 背景知识:
- SQL server:Microsoft;My SQL/Oracle:Oracle;DB2:IBM。
- 开源的:MySQL and PostgreSQL。
- 问哪个是“domestic”数据库,就是问国产数据库。
- C对应ODBC,Java对应JDBC。
- 关系型数据库能够管理多种类型,如音视频、网页等。
- xml管理半结构化数据,也就是有嵌套的(大概率出现xml的句子是正确的)
2. ACID
- a:要么全做,要么完全不做
- d:只要提交成功,即使系统崩溃,修改也能保存
- i:事务执行时感受不到其他事务的存在
- 程序员负责:一致性(c)
- 恢复系统负责:原子性(a),持久性(d)
- 并发控制系统负责:原子性(a),隔离性(i)
3. 三级模式(/抽象)两级映射
- 1. 逻辑独立性定义:对于app来说,不依赖逻辑模式,就算逻辑模式改了,app不必被重写
- 2. 物理数据独立性定义:逻辑层不必知道复杂的物理层结构。对于app来说,物理模式更改后,app不必重写。
- 3. 物理层(描述数据怎样被存储)-内模式;逻辑层(描述存储什么数据及其之间关系)-模式+外模式;视图层-看不到整个数据库
- 4. 需求分析、概念模式设计(画ER图)、逻辑模式设计(定primary key)、物理模式设计(建立index)
4. 查询优化分为:基于代价优化,启发式优化
5. 实例(instance,“particular moment”)与模式的区别
6. 查询执行引擎执行查询计算计划
7. 2PL
- strict 2PL:X锁在提交后释放
- r 2PL:所有锁都要在提交后释放
- 2PL能保证冲突可串行化,不是无级联回滚的,不避免死锁
- s 2PL可以保证无级联回滚
-
锁的升级与降级:第一阶段可升级,第二阶段可降级
8. 故障分类
- 1. 事务故障
- 1. 逻辑错误——事务内部错误,如找不到数据、非法输入
- 2. 系统错误——死锁
- 2. 系统崩溃——硬件故障或OS/DB存在漏洞
- 3. 磁盘故障——磁盘坏掉了
9. 查询过程:翻译-优化-执行
10. 文件组织:
- 定长记录、变长记录。与建表时的声明无关。变长记录有定长部分和变长部分。
- 没有主键:heap。需建立多级记录
- 顺序文件组织,为降低每次元组增删时的开销,需隔段时间整理顺序。目的:为了高效处理按某个搜索码的顺序排序的记录
- 同一个文件里能装下不同关系表:multitable clustering。适用:需要频繁多表连接时
11. 安全控制
- 1. grant,revoke
12. 关系代数六个基本操作:投影,选择,union,different,笛卡尔,rename
13. 关系表达式的等价
- 1. 等价规则太多了,必背不过来。主要是认识符号的含义(如下),然后当场画表举反例。
13. SQL纠错
- select后面没被聚集的,一定是分组条件,否则错
- 主键只能有一组
- 外键约束,先删去谁
14. DDL、DML
15. Armstrong公里(6条)
16. 全部概化,部分概化,正交特化,重叠特化
17. 如何求属性集闭包 y 包含于{A,b,C}
18. 数据模型:“概念的集合”
19. 数据字典:包含元数据
20. reference integrity(参照完整性) → foreign key
可能不考的内容列在此处:
- 死锁处理(预防死锁:等待死亡、伤害等待;死锁检测:等待图画法)。
- 代价评估表格
- BCNF分解
第十四章 索引
考法:
- 创建索引的sql语句
- create index 索引名称 on 表名(属性名)
- 判断索引类型(聚集/非聚集,稠密/稀疏)并解释原因
- 话术:
-
是稠密索引。因为对于被索引文件的 balance 属性上的每个值(共有 7 个),在索引中都有一个对应的索引项。
-
是non- clustering index。因为被索引文件中记录的排列顺序与索引项指定的查询顺序不一致。
- 根据SQL判断某索引有无成效并解释原因
- 左前缀原则导致导致索引失效
- 一看insert/delete,必然拖慢
- 一看update,都有可能,看索引要不要修改了
- 一看selete,要么加快要么无效,绝不可能拖慢
- 自己设计合适的索引
- where子句涉及到的查询属性(e.g. budget)、连接属性(dept_name)、 group-by子句中的分组属性( e.g. building)
-
select building, avg (salary) as avg_salary from instructor natual_join department where budget>20000 group by building having avg (salary) > 42000 order by dept_name desc
例题1:来源:yw老师ppt
Consider the following tables:
factory(factoryID, name, manager, account)
employee(employeeID, name, age, factoryID)
airconditioner(serialID, date, model, price, factoryID)
(1) Consider the following SQL query. In addition to the primary indices on the primary keys of the tables, on which attributes in the tables the indices can be further defined to speed up the query?
select A.factoryID, avg(employeeID)
from factory as A, employee as B
where A.factoryID = B.factoryID and age>20 and account>30,000
group by B.factoryID
答案:employee表的factoryID,有利于加速两表连接,也有利于group by;
employee表的age,有利于从employee表中查找满足查询条件的目标元组
factory的account,有利于从factory表中查找满足查询条件的目标元组
(2) For the following query
update airconditioner
set price = price + 100
where model =GL2020
A nonclustered index is defined on the attribute model. Does this index speed up or slow down the operation, and why?
答案:会加快该操作,因为该索引的建立有利于从airconditioner表中查找需要update的目标元组,并且该索引无需修改。
(3) For the following delete operation
delete
from airconditioner
where model=HX8302 and price>1500
A primary/clustered index has been defined on the primary key serialID. Does this index speed up or slow down the operation, and why?
答案:会拖慢该操作,因为该索引无法被利用,并且删除掉某些元组后需要修改索引,产生额外开销。
(4) if a multiple-attribute index (i.e. composite index) is defined on airconditioner(serialID, date, model, price, factoryID) by create index DMFIndex on airconditioner(date, model, factoryID) ,can the index DMFIndex speed up the following queries,why?
(i) select serialID
from airconditioner
where date=‘2021-11-30’ and model=‘Haier’
(ii) select serialID
from airconditioner
where date=‘2021-11-30’ and price>2000
(ii) select serialID
from airconditioner
where date=‘2021-11-30’ and factoryID=1040
(iv) select serialID
from airconditioner
where model=‘Haier’ and factoryID=1040
答案:前三个能,最后一个不能。写原因时写明用了索引中的几个属性,几个生效,关键词“左前缀”,最后写结论。
例题2:来源:期末题
Considering the Banking database in Figure 1, answer the following questions.
(1) The attribute account_number in the table account is defined as the primary key and a
clustering index is created on it. Some tuples/records are then inserted into the table account,
the records in data file storing account are organized as a heap file, sequential file, or
multitable clustering file, and why?
答案:顺序文件。因为已经定义主键和聚集索引,那么后续插入的元组就会按照主键的顺序排序。
(2) For the table account, after account_number has been defined as the primary key, another
index BNIdx is defined on the attribute branch_name and organized as a B+-tree. Then, the
index BNIdx is a primary/clustering index or secondary/non-clustering index, and why? (3
points)
(3) For the following SQL query, in addition to the existing primary indices on the primary
keys of the tables, on which attributes the indices can be further defined to speed up the
query? (3 points)
select branch_city, sum(amount)
from branch inner join loan on branch_name
where assets>1000
group by branch_city
答案:loan表的branchname,可以加速两表连接;
branch表的assets,可以加速查找满足查询条件的目标元组;
branch表的branchcity,可以加速group by分组。
(4) Give a SQL statement to define a composite index on combined search key
(branch_name, amount) on the table loan.
Can this index be efficiently used for the following query, and why? (4 points)
select *
from loan
where amount>100
答案:查询涉及到的属性是amount,不满足左前缀原则,该索引并不生效,所以不能被该查询利用。
例题3:来源:df老师ppt录像
例题4(画图题):来源:df老师ppt录像
第十八章 多粒度锁
(即意向锁)
相容性表
表头:IS IX S SIX X 这个顺序是从弱到强,记法:🐍,小手手,⭐
第一行四个√,对角线三个√
自上而下加锁,自下而上解锁
例题1
考虑三级多粒度锁,tuple-table-DB,判断以下SQL语句会导致在每级加什么锁?
1.select ID from student where age > 0
⇧ 自上而下加锁:IS-IS-S
2.select ID from student
⇧ 自上而下加锁:IS-S-
3.update student set ID = ID + 1 where age > 20
⇧ 自上而下加锁:IX-IX-X
4.delete from student where age > 20
⇧ 同上
5.update student set ID = ID + 1
⇧ 自上而下加锁:IX-X-
6.delete from student
⇧ 同上
7.insert into student(ID, sname, age) values(2022211103, bingtangxueli, 20)
⇧ 自上而下加锁:IX-X-
例题2
In the University database system, it is assumed that multiple granularity locks are granted on three-level data items, i.e. the database, tables, and tuples
(1) If a transaction T1 is now modifying the table Course by,
update Course
set credits=credits+1
where title=’DBS’
, then what types of locks on the database, tables and tuples should it hold?
(2) When T1 is modifying Course, transaction T2 wants to access Course by
select *
from Course
where title in {‘OS’, ‘DBS’, ‘Compiler’}
, then what types of three-level locks should T2 request? Among these lock requests, which will be granted, and which will be refused?
第十八章 并发
考法:
- 判断调度是否满足2PL
- 话术
-
不是符合两阶段锁协议的并发调度。因为事务 T2 在 unlock(B)后又 lock(A),不符合协议规定的锁增长和锁缩减的两个阶段。
- 给定多个事务,构造符合2PL的调度(没见过例题,暂且不详)
例题(貌似):
构造符合2PL的调度
答案:
第十七章 事务
考法:
- 画前驱图(precedence graph)
-
除了读后读,其余都有依赖边
-
- 判断并发调度是否可串行
- 话术:非冲突可串行的。因为前驱图中存在环路。
- 判断并发调度是否是可恢复recoverable、 无级联回滚cascadeless
-
可恢复调度( eeuveralle schedule):我读你,你交我再交
-
话术:
无级联调度(cascadelcss schedule):你交我再读,我不会回滚 -
话术
第十六章 查询优化
第一问写SQL,第二问用启发式优化,画出优化后的查询树。
步骤:
- 画出树的雏形
- 拆选择,尽可能下移:1.连接条件到树杈处停止。2.其他选择条件到叶子停止。
- 投影下移,必要时复制。
- 将笛卡尔积转变成带条件连接
例题:来源:df老师ppt录像
* * * 以 下 是 期 中 之 前 的 考 点 * * *
第七章 NF
考法:
求闭包,候选键(用属性对的格式写,小括号逗号),判断范式级别,求Fc,求3NF分解
- 判断范式级别和原因
- 原因
- 1NF:存在非键属性对候选码的部份依赖,违反2NF,所以是1NF
- 2NF:非键属性对候选码都是部分依赖,所以满足2NF;存在非键属性对候选码的传递依赖,违反3NF,所以最高级别是2NF
- 3NF:存在键属性对候选码的部份依赖(或传递依赖,看情况)
- 判断方法:
- 2NF:只看右非左键,左完全
- 3NF:要么左超,要么右键
- BCNF:仅左超
- 原因
- 求Fc:
- 拆:把右侧有多个字母的拆开
- 删:多余的式子删除(判断多余的方法:去掉该式子后,左侧符号求闭包,如果包含了右侧,则该式子多余)
- 去:去掉左侧能内部推的冗余字母
- 合:把左侧相同的式子合到一起
- 经验表明,最容易错的是“合”这一步
- 求3NF分解
- 求候选键
- 求Fc
- 给Fc写成分组,如果有包含关系,被包含的去掉
- 如果没用分组包含候选键,候选键作为一个分组加入
第六章 ER图与关系模式的相互转换
composite attribute, e.g. <firstname, lastname>: 子属性转为表中单独属性
multivalued attribute: 转为单独表,表中属性包括:实体集主键+多值属性
many-to-many relationship set, 单独转表
many-to-one relationship sets with partial participation on the many-side, 联系可以单独转表,也可以不单独转表
many-to-one relationship sets with total participation on the many-side,联系不单独转表,归入many-side的表中
多元联系→ 可直接转为关系表,表中属性包括联系所关联的实体集的主键
weak entity sets:所依赖的强实体集的主键+弱实体集属性→ relational table
ISA联系转表:父类实体主键+子类实体全部属性
第二至五章 关系代数与SQL
关系运算
例题:来源:期中题
SQL - 改
涨工资
update instructor set salary = salary * 1.03 where salary > 100000 update instructor set salary = salary * 1.05 where salary <= 100000
case语句
update instructor set salary = case when salary <= 100000 then salary * 1.05 else salary * 1.03 end
算学分
update student set tot_cred = (select sum(credits) from takes, course where takes.course_id = course.course_id and student.ID = takes.ID and takes.gread <> 'F' and takes.gread is not null)
SQL-删
删所有元组,保留表
delete from instructor
删特定元组
delete from instructor where dept_name = 'biology'
delete from instructor where dept_name in (select dept_name from department where building = 'Watson')
delete from instructor where salary < ( select avg(salary) from instructor)
删列
alter table r drop A
删表
drop table r
SQL-增
插入元组
insert into student(103, xiaoming, csc, 10)
insert into student(SNO, SNAME, Dept_name, credit) Values (103, xiaoming, csc, 10)
插入表
insert into student select ID, name, dept_name, 0 from instructor
增列
alter table r add A D
SQL - 创建
创建表
create table instructor ( ID char(5), name varchar(20) not null, dept_name varchar(20), salary numeric(8, 2), primary key(ID), foreign key (dept_name) reference department; )
create table student ( ID varchar(5), name varchar(20) not null, dept_name varchar(20), tot_cred numeric(3, 0), peimary key(ID), foreign key (dept_name) references department)
create table takes ( ID varchar(5), course_id varchar(8), sec_id varchar(8), semester varchar(6), year numeric(4, 0), gread varchar(2), primary key(IDm course_id, sec_id, semester, year), foreign key (ID) references student, foreign key (course_id, sec_id, semester, year) references section)
Integrity Constraints(建表时的完整性约束)
data type:char,varchar,int
primary key(A)
foreign key(A) references r2
级联外键:foreign key(A) references r2 on update cascade on delete cascade
not null
unique:当题干说“候选键”时
check (P ): 例: check (semester in (’Fall’, ’Winter’, ’Spring’, ’Summer’))
创建索引
create index i on R(A)
创建视图
create view vname as (sql语句)
创建临时表
with newR(A,A2……) as (sql语句)
授予权限
grant select, update on employee to UserXiaoming
专题:SQL检验
专题:内外连接
例题1:期中题
对于如下数据库:
问题:
大意是让找出所有有积分的客户的信息。但是积分表中的客户(有积分)不一定出现在客户表中。也就是得到的表应该是:
李明 积分10 北京
王宁 积分5 null(因为王宁不在客户表中,只在积分表中)
所以,用积分表左外连接客户表。
例题2:期末题
对于如下数据库:
Employee(employeeID, employeename, age, address, sex, salary, deptID)
Department( deptID, deptname, managerID, managername)
DepartLocations(deptID, deptlocation)
Project (projectID, projectname, projectlocation, deptID)
Workson(employeeID, projectID, hours)
Dependent(employeeID, dependentname, sex, Birthdate, Relationship)
问题:
A new project is started, and its information is as follows: projectID=1021, projectname=’Building’,
projectlocation=’Shanghai’, deptID=’201’. Find out from the department ‘201’ all employees who currently do not work for any projects, and assign to them this new project ‘1021’. It is assumed that these selected employees will work for the new project for 500 hours.
Give SQL statements to modify the tables in the database, so as to record all the information about the new project ‘1021’ and the employees working on this project.
找出那些没出现在Workson表中的员工。
select employeeID
from Employee left outer join Workson
where projectID is null
然后将其插入到Workson表中。
注意,写别的连接都是不对的。如果只写join,是没有空值的。
专题:exists
框架:
-- 要查询“做了所有B的A”
select * from A
where not exists(
select * from B
where not exists(
select * from AB
where A.?=AB.? and B.?=AB.?
)
)
例题:
查找出选修了全部课程的学生
select * from S
where not exists(
select * from C
where and not exists(
select * from SC
where S.sno=SC.sno and C.cno=SC.cno
)
)
查询至少选修了C001号课程和C002号课程的学生信息
select * from S
where not exists(
select * from C
where C.cno in ('C001', 'C002') and not exists(
select * from SC
where S.sno=SC.sno and C.cno=SC.cno
)
)
-- 当然此题这么写是多余的,只是为了练习使用框架,直接union更好
查询至少选修了2022211103号学生选修的全部课程的学生信息
with C103(Cno) as (
select Cno from SC
where Sno=2022211103
)
select * from S
where not exists(
select * from C103
where and not exists(
select * from SC
where S.sno=SC.sno and Cxiaoming.cno=SC.cno
)
)
-- 实在想不明白了就先写子查询(103同学选的全部课程的课程编号)
查询至少选修了2022211103号学生选修的全部课程的学生Sno
(那么可以只用SC这一张表,用三遍,这就需要仔细考虑最内层的连接条件)
select Sno from SC as T1 -- T1代表S
where not exists(
select * from SC as T2 -- T2代表C
where T2.sno=2022211103 and not exists(
select * from SC as T3 -- T3代表SC
where T2.cno = T3.cno and T1.sno=T3.sno -- 所以T2和T3连接条件用cno,T1和T3连接条件用sno
)
)
查询选修张老师讲授所有课程的学生姓名
select Sname from S
where not exists(
select * from C
where C.teacher='张老师' and not exists(
select * from SC
where S.sno=SC.sno and C.cno=SC.cno
)
)
查询至少使用s1供应商所供应的全部零件的工程
with P1(pno) as (
select pno from SPJ where SPJ.sno='s1'
) -- s1供应商所供应的全部零件
select Jname from J
where not exists(
select * from P1
where not exists(
select * from SPJ
where J.Jno=SPJ.Jno and P1.pno=SPJ.pno
)
)
小结:找了这么多例题,我发现第一步不是先上框架,而是和其他的查询一样,先找出总共需要哪些表。有的题只需一张表T,一张表用三次。之后把框架摆上来,看A怎么从T中选出,B怎么从T中选出,连接条件怎么写……
距离考试还有3h的合理预测:不考
(⌒_⌒;)