自Oracle 184.108.40.206版本开始，一个新的隐含参数 - _datafile_write_errors_crash_instance 被引入到数据库中，通过这个参数名就可以了解到其含义：当发生数据文件写错误时，Crash数据库实例。
Fri Jan 13 19:32:21 2013
KCF: write/open error block=0xf1fa6 online=1
error=27063 txt: 'IBM AIX RISC System/6000 Error: 22: Invalid argument
Additional information: -1
Additional information: 557056'
Automatic datafile offline due to write error on
file 73: /dev/rods_gm05
鉴于很多用户遇到的困境，Oracle做出了修正，这一修正在MOS上以BUG形式被提交，其内容为： Bug 7691270 Crash the DB in case of write errors (rather than just offline files) 。
In patch set 220.127.116.11 a new behaviour for datafile write errors has been implemented. With this release ANY write error to a datafile will cause the instance to abort. Before 18.104.22.168 those errors usually led to an offline datafile if the database operates in archivelog mode (your production database do, don't they?!) and the datafile does not belong to the SYSTEM tablespace. Internal discussion found this behaviour not up-to-date and alligned with RAC systems and modern storages. Therefore it has been changed and a new underscore parameter got introduced.
This is the default setting´and the new behaviour beginning with Oracle 22.214.171.124
If you would like to revert to the pre-126.96.36.199 behaviour you'll have to set in your init.ora/spfile this parameter to false. But keep in mind that there's a reason why this has been changed.
当然如果我们不想尝试这个新特性，可以通过将 _DATAFILE_WRITE_ERRORS_CRASH_INSTANCE 设置为FALSE来屏蔽该行为。该参数是一个动态参数：
SQL> alter system set "_datafile_write_errors_crash_instance"=false;
20+ years ago a feature was added to Oracle to offline a datafile when there was an error writing a dirty buffer to it and it was not part of the system tablespace. At that time it made sense to do this since neither RAC or even
OPS was implemented and storage arrays did not exist. Then the most likelycause of an I/O error was a problem with the direct attached disk drive holding the datafile. By offlining the datafile the database might be able to continue running. Customers assumed that a disk failure would require restoring a backup and doing a media recovery so taking the file offline might improve availability. High availability was not expected.
Today almost all customers use highly available storage arrays accessible from multiple hosts. Now most I/O errors are either transient or are local to the host that encounters them. Real disk failures are hidden by the storage array redundancy. Customers expect a disk failure to have no effect on the operation of the database.
Unfortunately the code to offline a datafile on an I/O error is still there. The effect is that an error on one node in a cluster offlines the datafile and usually takes down the entire application on all nodes or even crashes all instances if the problem is with an undo tablespace. For example dismounting a file system on one node in a cluster makes that node get I/O errors for the files on that file system. This makes a mistake on one node take down the entire cluster.
Offlining a datafile on a write error is even a problem with single instance. Most I/O errors today will go away if the database is restarted on another machine or if the current machine is rebooted. However if the I/O error took a datafile offline, then the administrator must do a media recovery to make the application function again. This is an unusual procedure that takes awhile.
If the database instances do not crash it takes longer for the administrator to find out that the application is not working even though the database appears to be up and running. This is a problem with both RAC and single
Question: One concern is that a failed datafile write to a non-critical tablespace will bring down the database when it occurs in the only open instance.
It is true that there may be some situations where taking the file offline would be better. On the other hand there are cases where crashing in single instance is better because rebooting the server or restarting the instance will bring it up sooner with no need for manual intervention. Since we have to choose without knowing much about the system we have to base our choice on the odds of the failure being one case or the other. Twenty years ago a datafile was on one disk, almost all I/O errors were disk failures and a disk failure always meant doing media recovery. In that situation taking the datafile offline was clearly the right thing to do, even if the tablespace was critical to the application - it was going to need media recovery in any case.
Today systems are much different.- Storage arrays and mirroring mean that disk failures almost never require media recovery. I/O write errors usually stop happening when the system is reinitialized.
- Many customers have mechanisms like CRS to automatically restart the database, possibly on a different node.
Now it is much more likely that restarting the instance will resolve the problem without doing any media recovery, and it will happen automatically. The chance that the application can continue running with the offline datafile has always been slight, but when media recovery was going to be required anyway there was no harm in trying to offline the file. Now there is a lot of harm in offlining the file since it prevents automatic recovery and requires an administrator to perform tasks he is unfamiliar with. Today crashing the instance has a better chance of getting the application running sooner.