Had a panic after a recent upgrade when we noticed that gc cr block lost had a 600x increase. Fortunately, Ram Cheruvattath had run across this previously and directed me to
False increase of ‘Global Cache Blocks Lost’ or ‘gc blocks lost’ after upgrade to 12c (Doc ID 2096299.1)
SQL> ed Wrote file afiedit.sql 1 SELECT sysdate snap_time,a.inst_ID "INSTANCE", A.VALUE "GC BLOCKS LOST", 2 B.VALUE "GC CUR BLOCKS SERVED", 3 C.VALUE "GC CR BLOCKS SERVED", 4 A.VALUE/(B.VALUE+C.VALUE) RATIO 5 FROM GV$SYSSTAT A, GV$SYSSTAT B, GV$SYSSTAT C 6 WHERE A.NAME='gc blocks lost' AND 7 B.NAME='gc current blocks served' AND 8 C.NAME='gc cr blocks served' and 9 B.INST_ID=a.inst_id AND 10 C.INST_ID = a.inst_id 11* order by 2 SQL> / SNAP_TIME INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED RATIO ----------------- ---------- -------------- -------------------- ------------------- ---------- 20160810 14:01:08 1 19373 6732648 7025417 .00140812 20160810 14:01:08 2 16649 7331907 5754179 .001272267 20160810 14:01:08 3 85182 10767159 8645896 .004387872 20160810 14:01:08 4 37808 8047542 14586125 .001670432 20160810 14:01:08 5 37987 8310080 15634197 .001586475 20160810 14:01:08 6 1877 2385405 713257 .000605745 6 rows selected. SQL> / SNAP_TIME INSTANCE GC BLOCKS LOST GC CUR BLOCKS SERVED GC CR BLOCKS SERVED RATIO ----------------- ---------- -------------- -------------------- ------------------- ---------- 20160810 14:01:26 1 19446 6744274 7031562 .001411602 20160810 14:01:26 2 16716 7344979 5759818 .001275563 20160810 14:01:26 3 85187 10771512 8650641 .004386074 20160810 14:01:26 4 37917 8074631 14623146 .001670516 20160810 14:01:26 5 38083 8336128 15713424 .001583522 20160810 14:01:26 6 1877 2385555 713323 .000605703 6 rows selected.
This upgrade included a migration to brand new hardware so I was suspicious that that was the case, but having so many bad cables would have been even more suspicious. A reporting database that didn’t include new hardware exhibited the same behavior so, bug it is.
Now this being said, on a completely different databases, one of the instances on an Active Data Guard standby was reporting 30K lost blocks a minute. So, something else was going on. This server was showing many packet reassembles failures:
> cat packets.sh while true; do x=`netstat -s | grep "packet reassembles failed"|awk '{print $1}'` echo `date` ": " `expr $x - $t`; t=$x sleep 5; done; ... Wed Aug 17 16:08:01 CDT 2016 : 265 Wed Aug 17 16:08:06 CDT 2016 : 138 Wed Aug 17 16:08:11 CDT 2016 : 211 Wed Aug 17 16:08:16 CDT 2016 : 1105 Wed Aug 17 16:08:21 CDT 2016 : 1649 Wed Aug 17 16:08:26 CDT 2016 : 1449 Wed Aug 17 16:08:31 CDT 2016 : 1896 Wed Aug 17 16:08:36 CDT 2016 : 2175 Wed Aug 17 16:08:41 CDT 2016 : 3152 Wed Aug 17 16:08:46 CDT 2016 : 3205 Wed Aug 17 16:08:51 CDT 2016 : 2801 Wed Aug 17 16:08:56 CDT 2016 : 2675 Wed Aug 17 16:09:01 CDT 2016 : 1328 Wed Aug 17 16:09:06 CDT 2016 : 451 Wed Aug 17 16:09:11 CDT 2016 : 89 ...
This issue was resolved by increasing the buffer size from 256 to 1M. Still need to figure out why there was such an increase in traffic for this to even manifest. But, stable for now.