原本問題,為什麼無法自行設定Taipei為第一類,Tainan為第二類 (前提為Kaohsiung為基準)
原來Taipei和Tainan為文字,所以要加單引號或是雙引號,以下所highlight之處,如'Taipei'。
語法和解答如下
> mydata = read.table(file.choose(), sep=',', header=T)
> mydata
Sales Temperature Region
1 5 1 Taipei
2 14 3 Taipei
3 18 4 Tainan
4 6 2 Taipei
5 33 5 Kaohsiung
6 47 8 Kaohsiung
7 67 9 Kaohsiung
8 16 4 Taipei
9 34 5 Tainan
10 41 6 Tainan
11 7 1 Taipei
12 10 2 Taipei
13 65 4 Kaohsiung
14 45 5 Kaohsiung
15 47 6 Kaohsiung
16 61 7 Kaohsiung
17 23 4 Tainan
18 16 3 Taipei
19 9 2 Taipei
20 10 3 Tainan
> attach(mydata)
> unique(Region)
[1] Taipei Tainan Kaohsiung
Levels: Kaohsiung Tainan Taipei
> n = length(Region)
> d1 = c(); d2 = c()
> for (i in 1:n)
+ {
+ if (Region[i]=='Taipei') {d1[i]=1; d2[i]=0}
+ else if (Region[i]=='Tainan') {d1[i]=0; d2[i]=1}
+ else if (Region[i]=='Kaohsiung') {d1[i]=0; d2[i]=0}
+ }
> mydata2 = cbind(mydata,d1,d2)
> out=lm(Sales ~ Temperature+d1+d2, data=mydata2)
> summary(out)
Call:
lm(formula = Sales ~ Temperature + d1 + d2, data = mydata2)
Residuals:
Min 1Q Median 3Q Max
-13.9980 -3.5308 -0.3738 2.9668 22.0036
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.990 10.032 2.690 0.01608 *
Temperature 4.002 1.511 2.649 0.01751 *
d1 -25.619 7.537 -3.399 0.00367 **
d2 -19.397 5.766 -3.364 0.00395 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.563 on 16 degrees of freedom
Multiple R-squared: 0.8584, Adjusted R-squared: 0.8318
F-statistic: 32.32 on 3 and 16 DF, p-value: 5.055e-07
沒有留言:
張貼留言