这一部分基本不变。延续2022年版的内容,回顾这里的重点题型即可:
特别推荐回顾:
此外有新的例题:
Q010-1. An internal credit scoring model, named ALPHA, was created by a former employee using 25 defined features and make recommendations on model performance improvements. The ALPHA model compared expected and actual defaults over the past 12 months.
(1) The best description of the ALPHA model is that it is an example of a(n):
解析:选A。因为输出结果为1/0型,属于Logit模型。存在输入,因此为有监督模型,排除B项。CART模型适用于树结构,而题干中是单层结构,排除C项。
(2) For Records modeled with correct predictions and errors:
The model was able to correctly predict a default in 5,290 instances of the model prediction dataset after the completed data wrangling. The precision of the model is closest to:
解析:选C。P = TP/(TP + FP) = 5,290/(5,290 + 273)。
(3) A colleague mentions that there are concerns in how long it takes the ALPHA model to complete its recommendations, and they discuss several potential methods to reduce computation time for the ALPHA model. The most appropriate method to resolve the computation problem is:
解析:选A。主成分分析可以减少解释数据变化所需的变量数量、减少完成该模型所需的计算时间,而不是对每个记录使用每个参数。B项,降低学习率实际上会增加计算需求,因为它将增加模型需要运行的迭代次数,以便能够学习指定的目标。C项,缩尾用于管理离群值场景,方法是用最小或最大的非离群值数据点替换单个离群值,有效地增加分布曲线的端点。但数据点依旧存在,因此缩尾不会对模型的计算需求产生影响。
Q010-2. A database from a large national weather provider that contains detailed weather data (temperature, humidity, rainfall, atmospheric pressure, etc.) at a very localized geographic level recorded by GPS coordinates for the past 36 months: The database contains a reference note that some geographic areas had their sensors upgraded to capture additional metrics that include a field to identify when that upgrade occurred.
(1) The type of error least likely to be generated by the weather dataset reference note is:
解析:选A。无效错误是指数据超出了有意义的范围,从而导致数据无效。在本例中,传感器被升级为收集额外的信息,而不是纠正之前的记录。但是不完整性错误(因为更新前部分字段无数据)和不统一错误(因为数据粒度变化)是存在的。
(2) There are many data fields included that would likely be highly irrelevant to their analysis and begins the process of selecting a subset of data fields that he believes are applicable. The selection of a subset of data from the weather dataset is best described as:
解析:选B。识别和删除数据集中不需要、不相关或冗余的特征的过程称为特征选择,符合题意。A项,修剪是一个处理数据集中异常值的过程,通过简单地删除极值,也称为截断。C项,特征工程是对当前天气数据集中不存在的新特征进行组合、巩固或创建的过程。
有一些小知识点需要再次记忆:
(X-min)/(max-min),标准化是指(X-μ)/σ 其他知识点可以查看:
(完)
上一篇:7.7大数据存储