task2 【南瓜书ML】线性模型的数学推导(最小二乘估计、广义瑞利商、极大似( 四 )


标准的线性回归模型:
y i = w 0 + w 1 x i 1 + . . . + w p x i p + ? i y_i = w_0 + w_1x_{i1} +...+w_px_{ip} + \ yi?=w0?+w1?xi1?+...+wp?xip?+?i?
GAM模型框架:
y i = w 0 + ∑ j = 1 p f j ( x i j ) + ? i y_i = w_0 + \sum\{j=1}^{p}f_{j}(x_{ij}) + \ yi?=w0?+j=1∑p?fj?(xij?)+?i?
GAM模型的优点与不足:
(1) 多项式回归实例介绍:
..
sklearn.preprocessing.PolynomialFeatures(degree=2, *, interaction_only=False, include_bias=True, order='C'):
from sklearn.preprocessing import PolynomialFeaturesX_arr = np.arange(6).reshape(3, 2)print("原始X为:\n",X_arr)poly = PolynomialFeatures(2)print("2次转化X:\n",poly.fit_transform(X_arr))poly = PolynomialFeatures(interaction_only=True)print("2次转化X:\n",poly.fit_transform(X_arr))
原始X为:[[0 1][2 3][4 5]]2次转化X:[[ 1.0.1.0.0.1.][ 1.2.3.4.6.9.][ 1.4.5. 16. 20. 25.]]2次转化X:[[ 1.0.1.0.][ 1.2.3.6.][ 1.4.5. 20.]]
(2) GAM模型实例介绍:
安装pygam:pippygam
from pygam import LinearGAMgam = LinearGAM().fit(boston_data[boston.feature_names], y)gam.summary()
LinearGAM=============================================== ==========================================================Distribution:NormalDist Effective DoF:103.2423Link Function:IdentityLink Log Likelihood:-1589.7653Number of Samples:506 AIC:3388.0152AICc:3442.7649GCV:13.7683Scale:8.8269Pseudo R-Squared:0.9168==========================================================================================================Feature FunctionLambdaRankEDoFP > xSig. Code================================= ==================== ============ ============ ============ ============s(0)[0.6]2011.12.20e-11***s(1)[0.6]2012.88.15e-02.s(2)[0.6]2013.42.59e-03**s(3)[0.6]203.62.76e-01s(4)[0.6]2011.31.11e-16***s(5)[0.6]2010.21.11e-16***s(6)[0.6]2010.48.22e-01s(7)[0.6]208.54.44e-16***s(8)[0.6]203.55.96e-03**s(9)[0.6]203.51.33e-09***s(10)[0.6]201.83.26e-03**s(11)[0.6]206.46.25e-02.s(12)[0.6]206.61.11e-16***intercept10.02.23e-13***==========================================================================================================Significance codes:0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1WARNING: Fitting splines and a linear function to a feature introduces a model identifiability problemwhich can cause p-values to appear significant when they are not.WARNING: p-values calculated in this manner behave correctly for un-penalized models or models withknown smoothing parameters, but when smoothing parameters have been estimated, the p-valuesare typically lower than they should be, meaning that the tests reject the null too readily./home/leo/anaconda3/lib/python3.7/site-packages/ipykernel_launcher.py:3: UserWarning: KNOWN BUG: p-values computed in this summary are likely much smaller than they should be. Please do not make inferences based on these values! Collaborate on a solution, and stay up to date at: github.com/dswah/pyGAM/issues/163 This is separate from the ipykernel package so we can avoid doing imports until
[1] 陈希孺编著.概率论与数理统计[M].中国科学技术大学出版社,2009
[2] B 站视频教程:
[3] 线上南瓜书:#//
[4] 开源地址:
[5] 周志华《机器学习》版本空间
[6] 周志华老师《机器学习》假设空间和版本空间概念辨析
【task2【南瓜书ML】线性模型的数学推导(最小二乘估计、广义瑞利商、极大似】[7] 深入浅出线性判别分析(LDA,从理论到代码实现)