原本問題,為什麼無法自行設定Taipei為第一類,Tainan為第二類 (前提為Kaohsiung為基準)
原來Taipei和Tainan為文字,所以要加單引號或是雙引號,以下所highlight之處,如'Taipei'。
語法和解答如下
> mydata = read.table(file.choose(), sep=',', header=T)
> mydata
Sales Temperature Region
1 5 1 Taipei
2 14 3 Taipei
3 18 4 Tainan
4 6 2 Taipei
5 33 5 Kaohsiung
6 47 8 Kaohsiung
7 67 9 Kaohsiung
8 16 4 Taipei
9 34 5 Tainan
10 41 6 Tainan
11 7 1 Taipei
12 10 2 Taipei
13 65 4 Kaohsiung
14 45 5 Kaohsiung
15 47 6 Kaohsiung
16 61 7 Kaohsiung
17 23 4 Tainan
18 16 3 Taipei
19 9 2 Taipei
20 10 3 Tainan
> attach(mydata)
> unique(Region)  
[1] Taipei    Tainan    Kaohsiung
Levels: Kaohsiung Tainan Taipei
> n = length(Region)  
> d1 = c(); d2 = c() 
> for (i in 1:n)    
+ {
+ if (Region[i]=='Taipei') {d1[i]=1; d2[i]=0}
+ else if (Region[i]=='Tainan') {d1[i]=0; d2[i]=1}
+ else if (Region[i]=='Kaohsiung') {d1[i]=0; d2[i]=0}
+ }
> mydata2 = cbind(mydata,d1,d2)
> out=lm(Sales ~ Temperature+d1+d2, data=mydata2)
> summary(out)
Call:
lm(formula = Sales ~ Temperature + d1 + d2, data = mydata2)
Residuals:
     Min       1Q   Median       3Q      Max 
-13.9980  -3.5308  -0.3738   2.9668  22.0036 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept)   26.990     10.032   2.690  0.01608 * 
Temperature    4.002      1.511   2.649  0.01751 * 
d1           -25.619      7.537  -3.399  0.00367 **
d2           -19.397      5.766  -3.364  0.00395 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
Residual standard error: 8.563 on 16 degrees of freedom
Multiple R-squared: 0.8584,     Adjusted R-squared: 0.8318 
F-statistic: 32.32 on 3 and 16 DF,  p-value: 5.055e-07 
沒有留言:
張貼留言