{"id":19524,"date":"2019-10-09T14:20:53","date_gmt":"2019-10-09T05:20:53","guid":{"rendered":"https:\/\/labs.gree.jp\/blog\/?p=19524"},"modified":"2019-12-24T16:58:27","modified_gmt":"2019-12-24T07:58:27","slug":"post19524","status":"publish","type":"post","link":"https:\/\/labs.gree.jp\/blog\/2019\/10\/19524\/","title":{"rendered":"Machine check handler \u3084  Generic Hardware Error Source \u3001\u3042\u308b\u3044\u306f\u3055\u3044\u304d\u3093\u306eLinux\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u306b\u3064\u3044\u3066"},"content":{"rendered":"<p>\u3053\u3093\u306b\u3061\u308f\u3002\u305b\u3058\u307e\u3067\u3059\u3002<\/p>\n<p>Surface Pro X\u307b\u3057\u3044\u3067\u3059\u306d\u3002 Surface Duo\u3082Surface Neo\u3082\u307b\u3057\u3044\u3067\u3059\u306d\u3002Microsoft\u3055\u3093\u304c\u3001\u3053\u3093\u306a\u306b\u3082\u30ef\u30af\u30ef\u30af\u3059\u308b\u30cf\u30fc\u30c9\u30a6\u30a7\u30a2\u3092\u63d0\u4f9b\u3057\u3066\u304f\u308c\u308b\u3088\u3046\u306b\u306a\u308b\u3068\u306f\u3001\u6700\u9ad8\u3067\u3059\u306d\u3002\u697d\u3057\u307f\u3067\u3059\u306d\u3002<\/p>\n<p>\u3069\u3046\u3067\u3082\u3044\u3044\u3067\u3059\u304c\u3001Pro\u3001Duo\u3001Neo\u3068\u97fb\u3092\u8e0f\u3093\u3067\u3066\u3044\u3044\u306a\u3068\u500b\u4eba\u7684\u306b\u306f\u611f\u5fc3\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<h2>\u306f\u3058\u3081\u306b<\/h2>\n<p>\u3044\u3061\u304a\u3046\u3001\u308f\u305f\u3057\u306e\u5c02\u9580\u5206\u91ce\u306f InnoDB \u304b\u306a\uff1f\u3068\u601d\u3063\u3066\u3044\u308b\u306e\u3067\u3059\u304c\u3001\u30b5\u30fc\u30d0\u306e\u4f4e\u30ec\u30a4\u30e4\u30fc\u306e\u90e8\u5206\u3082\u3001\u55dc\u3080\u7a0b\u5ea6\u306b\u898b\u3066\u3044\u307e\u3059\u3002\u4e00\u5e74\u304f\u3089\u3044\u524d\u3001Skylake-SP\u4e16\u4ee3\u306e\u30b5\u30fc\u30d0\u3092\u30aa\u30f3\u30d7\u30ec\u30df\u30b9\u74b0\u5883\u306b\u5c0e\u5165\u3057\u305f\u969b\u3001\u3044\u308d\u3044\u308d\u8abf\u67fb\u3059\u308b\u306a\u3069\u3057\u307e\u3057\u305f\u3002<\/p>\n<p>\u3042\u307e\u308a\u306b\u3082\u30cb\u30c3\u30c1\u3067\u8ab0\u5f97\u306a\u306e\u304b\uff08\u3044\u3084\u5c11\u306a\u304f\u3068\u3082\u4ffa\u5f97\u3067\u306f\u3042\u308b\uff09\u3068\u3044\u3046\u8a71\u3082\u3042\u308b\u306e\u3067\u3059\u304c\u3001Skylake-SP\u3042\u305f\u308a\u304b\u3089\u5909\u308f\u308a\u3064\u3064\u3042\u308b\u90e8\u5206\u3067\u3059\u306e\u3067\u3001\u305b\u3063\u304b\u304f\u306a\u306e\u3067\u5099\u5fd8\u9332\u3092\u517c\u306d\u3066\u66f8\u3044\u3066\u304a\u3053\u3046\u3068\u601d\u3044\u307e\u3059\u3002<br \/>\n\u306f\u3058\u3081\u306b\u65ad\u3063\u3066\u304a\u304d\u307e\u3059\u3068\u3001\u308f\u305f\u3057\u306f\u3053\u3061\u3089\u65b9\u9762\u30b7\u30ed\u30a6\u30c8\u306a\u306e\u3067\u3001 Kernel\/VM \u754c\u9688\u304b\u3089\u3054\u610f\u898b\u30fb\u3054\u611f\u60f3\u3044\u305f\u3060\u3051\u307e\u3059\u3068\u5e78\u751a\u3067\u3059\u3002<\/p>\n<p>\u4eca\u65e5\u306f\u3001Linux \u306e Machine check handler \u3084\u3001  Generic Hardware Error Source\u3001 \u3042\u308b\u3044\u306f\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u69cb\u306e\u8a71\u3092\u3057\u307e\u3059\u3002<\/p>\n<p>\u203bUbuntu 16.04 LTS \u3067 kernel 4.15 \u3092\u30d9\u30fc\u30b9\u306b\u304a\u8a71\u3057\u307e\u3059\u3002<\/p>\n<p>\u5177\u4f53\u7684\u306b\u3069\u306e\u3088\u3046\u306a\u8a71\u3092\u3059\u308b\u304b\u6700\u521d\u306b\u307e\u3068\u3081\u307e\u3059\u3068\u3001\u6b21\u306e\u3088\u3046\u306a\u8a71\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<ul>\n<li>\u30b5\u30fc\u30d0\u88fd\u54c1\u5074\u3067\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u304c\u642d\u8f09\u3055\u308c\u3066\u3044\u305f\u5834\u5408\u3001\u3080\u304b\u3057\u304b\u3089\u3001EDAC\u3092\u7121\u52b9\u5316\u3057 mce=ignore_ce \u3059\u308b\u3053\u3068\u306f\u63a8\u5968\u3055\u308c\u3066\u3044\u305f\u3088\u3046\u3067\u3059\u3002\u305f\u3060\u3001\u3053\u306e\u3042\u305f\u308a\u8a73\u3057\u304f\u306f\u30d9\u30f3\u30c0\u30fc\u3055\u3093\u306b\u78ba\u8a8d\u3059\u308b\u306e\u304c\u826f\u3044\u3067\u3057\u3087\u3046\u3002<\/li>\n<li>SKylake-SP\u4ee5\u964d\u3001\u3054\u304f\u4e00\u90e8\u306e\u30d9\u30f3\u30c0\u30fc\u3067\u306f ghes_edac \u304c\u6709\u52b9\u306b\u306a\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002\u4eca\u5f8c\u3001 ghes_edac \u304c\u6709\u52b9\u306b\u306a\u308b\u30d9\u30f3\u30c0\u30fc\u306f\u5897\u3048\u3066\u3044\u304f\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002<\/li>\n<li>ghes_edac \u304c\u6709\u52b9\u306b\u306a\u3063\u3066\u3044\u308b\u5834\u5408\u3067\u3082\u3001 mce=ignore_ce \u306b\u3059\u308b\u306e\u306f\u826f\u3044\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002\u3053\u308c\u3082\u8a73\u3057\u304f\u306f\u30d9\u30f3\u30c0\u30fc\u3055\u3093\u306b\u78ba\u8a8d\u3057\u3066\u304f\u3060\u3055\u3044\u3002<\/li>\n<li>MCE \u3068\u304b GHES \u3068\u304b GHES_EDAC \u3068\u304b\u3001\u81ea\u5206\u3067\u8abf\u3079\u3066\u3082\u826f\u3044\u304b\u306a\u3068\u601d\u3063\u305f\u306e\u3067\u3001\u81ea\u5206\u3067\u8abf\u3079\u307e\u3057\u305f\u3002<\/li>\n<li>ghes.disable=1 \u306b\u3059\u308b\u3068\u3001 ghes_edac \u3092\u7121\u52b9\u5316\u3059\u308b\u3053\u3068\u306f\u3067\u304d\u307e\u3059\u3002<\/li>\n<li>\u7279\u5b9a\u306e\u4e0d\u5177\u5408\u306b\u5bfe\u3059\u308b workaround \u3068\u3057\u3066 ghes.disable=1 \u304c\u63a8\u5968\u3055\u308c\u308b\u3053\u3068\u306f\u3042\u308b\u306e\u3067\u3059\u304c\u3001\u305d\u3046\u3044\u3046\u3053\u3068\u3067\u3082\u306a\u3044\u9650\u308a\u3001 ghes_edac \u7121\u52b9\u5316\u3057\u306a\u304f\u3066\u3082\u826f\u3044\u304b\u3082\u3001\u3068\u8a8d\u8b58\u3057\u3066\u3044\u307e\u3059\u3002<\/li>\n<\/ul>\n<p>\u3067\u306f\u3001\u306f\u3058\u3081\u307e\u3059\u3002<\/p>\n<h2>\u30b5\u30fc\u30d0\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd<\/h2>\n<p>\u30b5\u30fc\u30d0\u88fd\u54c1\u306b\u306f\u3001\u30e1\u30e2\u30ea\u306e\u6545\u969c\u3092\u691c\u77e5\u3059\u308b\u6a5f\u80fd\u3092\u6301\u3064\u3082\u306e\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p>\u4f8b\u3048\u3070\u3001HPE\u306e\u30b5\u30fc\u30d0\u306b\u306f\u3001HPE Advanced Memory Error Detection Technology(HP Advanced Memory Error Detection Technology)\u3068\u3044\u3046\u6a5f\u80fd\u304c\u3001\u5b9f\u306b20\u5e74\u4ee5\u4e0a\u524d\u304b\u3089\u642d\u8f09\u3055\u308c\u3066\u3044\u308b\u3088\u3046\u3067\u3001\u8a02\u6b63\u53ef\u80fd\u306a\u30a8\u30e9\u30fc\u306e\u767a\u751f\u50be\u5411\u304b\u3089\u3001\u8a02\u6b63\u4e0d\u53ef\u80fd\u306a\u30a8\u30e9\u30fc\u306e\u767a\u751f\u3092\u3001\u4e8b\u524d\u306b\u691c\u77e5\u3059\u308b\u3053\u3068\u304c\u53ef\u80fd\u3060\u3063\u305f\u3088\u3046\u3067\u3059\u3002<\/p>\n<p>\u8a73\u3057\u304f\u306f\u3001\u6b21\u306eTechnology brief\u306b\u3001\u8a73\u7d30\u306a\u89e3\u8aac\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/support.hpe.com\/hpsc\/doc\/public\/display?docId=emr_na-c02878598\">HP Advanced Memory Error Detection Technology<\/a><\/p>\n<p>HPE\u306b\u9650\u3089\u305a\u3001\u30b5\u30fc\u30d0\u88fd\u54c1\u306b\u306f\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3092\u642d\u8f09\u3057\u305f\u3082\u306e\u304c\u591a\u3044\u306e\u3067\u3059\u304c\u3001\u3053\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u304c\u3001Linux \u306e Machine check handler \u3084 hardware-driven EDAC driver \uff08chipset-specific EDAC module\uff09\u3068\u7af6\u5408\u3059\u308b\u3053\u3068\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<h2>Machine check handler<\/h2>\n<p>Machine Check Exception \u3068\u3044\u3046\u3082\u306e\u304c\u3042\u308a\u307e\u3059\u3002\u3080\u304b\u3057\u304b\u3089\u7269\u7406\u30b5\u30fc\u30d0\u3092\u4f7f\u3063\u3066\u308b\u5e74\u5b63\u306e\u5165\u3063\u305f\u30a8\u30f3\u30b8\u30cb\u30a2\u3067\u3042\u308c\u3070\u3001\u3044\u307e\u307e\u3067\u4f55\u5ea6\u304b\u898b\u305f\u3053\u3068\u304c\u3042\u308b\u306e\u3067\u306f\u306a\u3044\u3067\u3057\u3087\u3046\u304b\u3002\u3042\u308b\u3044\u306f\u3001Windows\u3084Mac\u3092\u30af\u30e9\u30a4\u30a2\u30f3\u30c8\u3068\u3057\u3066\u4f7f\u3063\u3066\u3044\u308b\u969b\u3001\u4e0d\u5e78\u306b\u3082\u906d\u9047\u3057\u305f\u3053\u3068\u304c\u3042\u308b\u65b9\u3082\u3044\u3089\u3063\u3057\u3083\u308b\u3067\u3057\u3087\u3046\u3002<\/p>\n<p>\u8a73\u3057\u3044\u8aac\u660e\u306f\u3001<a href=\"https:\/\/en.wikipedia.org\/wiki\/Machine-check_exception\">Wikipedia\u306eMachine-check exception<\/a>\u3084\u3001 <a href=\"http:\/\/halobates.de\/mce.pdf\">Machine check handling on Linux<\/a> \u3092\u8aad\u3093\u3067\u9802\u304f\u3068\u826f\u3044\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002<\/p>\n<p>Ubuntu \u3067 kernel 4.15 \u306f\u3001arch\/x86\/kernel\/cpu\/mcheck\/mce.c \u306f kernel module \u3067\u306f\u306a\u304f\u3001CONFIG_X86_MCE=y \u3067 build \u3055\u308c\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002mce.c \u304cload \u3055\u308c\u3066\u3044\u308b\u306e\u306f\u3001\u6b21\u306e\u3088\u3046\u306b dmesg \u3067\u78ba\u8a8d\u3067\u304d\u307e\u3059\u3002<\/p>\n<pre><code>$ dmesg | grep 'mce:'\n[    0.039613] mce: CPU supports 20 MCE banks\n$\n<\/code><\/pre>\n<p>\u3053\u3053\u3067\u3044\u3046 MCE banks \u3068\u306f\u4f55\u3067\u3057\u3087\u3046\u304b\u3002\u3055\u304d\u307b\u3069\u306e \u201cMachine check handling on Linux\u201d \u306b\u306f\u3001\u6b21\u306e\u3088\u3046\u306a\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<blockquote><p>\n  In addition there are some more registers for each bank. A bank is a group of<br \/>\nerrors generated by a specific subsystem (like CPU, bus unit, cache, north<br \/>\nbridge).<\/p>\n<p>The number and meaning of banks is CPU dependent.<br \/>\nEach bank has a number of sub-errors that can be enabled or disabled<br \/>\nindividually. Normally a generic machine check handler enables all errors and<br \/>\nall banks<\/p>\n<p>A machine check bank also has a register for the address associated with the<br \/>\nerror.<\/p><\/blockquote>\n<p>\u96d1\u306b\u3044\u3046\u3068\u3001 Machine Check bank \u306f\u3001CPU\u3001\u30d0\u30b9\u3001\u30ad\u30e3\u30c3\u30b7\u30e5\u3001PCI-Express \u306a\u3069\u3067\u767a\u751f\u3057\u305f\u30a8\u30e9\u30fc\u306e\u60c5\u5831\u3092\u3001\u683c\u7d0d\u3059\u308b\u3068\u3053\u308d\u3068\u8003\u3048\u3066\u826f\u3055\u305d\u3046\u3067\u3059\u306d\u3002<\/p>\n<p>\u307e\u305f\u3001 \u3055\u304d\u307b\u3069\u306e \u201cMachine check handling on Linux\u201d  \u306b\u306f\u3001\u6b21\u306e\u3088\u3046\u306a\u8a18\u8ff0\u3082\u3042\u308a\u307e\u3059\u3002machine check \u306b\u306f\u3001 uncrrectable \u306a Machine Check Exception \u3068\u3001 correctable \u306a silent machine check \u306e\u3001\u4e8c\u7a2e\u985e\u304c\u3042\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<blockquote><p>\n  There are two main kinds of machine check: machine check exceptions<br \/>\n(MCEs) and silent machine check. A machine check exception happens when<br \/>\nthere is an error that the hardware cannot correct. It will cause the CPU to<br \/>\ninterrupt the current program and call a special exception handler.<\/p>\n<p>With a silent machine check the hardware was able to correct the error, but<br \/>\nlogged the event to internal registers. There the event can be read by the<br \/>\noperating system or the firmware later. Silent machine checks don\u2019t need<br \/>\nimmediate software or administrator action, but it is useful to log and analyze<br \/>\nthem to get early cues about hardware problems.<\/p><\/blockquote>\n<p>\u3067\u306f\u3001 Linux \u306f\u3053\u308c\u3089\u306b\u5bfe\u3057\u3066\u3069\u306e\u3088\u3046\u306a\u5b9f\u88c5\u306b\u306a\u3063\u3066\u3044\u308b\u306e\u3067\u3057\u3087\u3046\u304b\u3002 Linux \u306e Machine check handler \u306f\u3001 corrected error \u3092 polling \u3067\u30c1\u30a7\u30c3\u30af\u3059\u308b\u5b9f\u88c5\u306b\u306a\u3063\u3066\u3044\u308b\u305d\u3046\u3067\u3001 <a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/Documentation\/x86\/x86_64\/boot-options.txt#L25-L33\">mce=ignore_ce\u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8<\/a>\u306b\u3001\u6b21\u306e\u3088\u3046\u306a\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<pre><code>   mce=ignore_ce\n        Disable features for corrected errors, e.g. polling timer\n        and CMCI.  All events reported as corrected are not cleared\n        by OS and remained in its error banks.\n        Usually this disablement is not recommended, however if\n        there is an agent checking\/clearing corrected errors\n        (e.g. BIOS or hardware monitoring applications), conflicting\n        with OS's error handling, and you cannot deactivate the agent,\n        then this option will be a help.\n<\/code><\/pre>\n<p>mce=ignore_ce \u306b\u3057\u3066\u5b09\u3057\u3044\u306e\u306f\u3069\u3046\u3044\u3063\u305f\u3068\u304d\u304b\u3068\u3044\u3046\u3068\u3001\u305d\u306e\u4e00\u3064\u306f\u3001\u30b5\u30fc\u30d0\u306eBIOS\u306a\u3069\u304c\u3001\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3092\u6301\u3063\u3066\u3044\u305f\u3068\u304d\u3067\u3059\u3002\u30b5\u30fc\u30d0\u306b\u7d44\u307f\u8fbc\u307e\u308c\u3066\u3044\u308b\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u304c\u3001\u30e1\u30e2\u30ea\u6545\u969c\u306e\u4e88\u5146\u691c\u77e5\u306e\u305f\u3081\u306b corrected  error \u3092\u6d3b\u304b\u305b\u306a\u304f\u306a\u308b\u53ef\u80fd\u6027\u304c\u3042\u308b\u306e\u3067\u3001 mce=ignore_ce \u306b\u3059\u308b\u3053\u3068\u304c\u671b\u307e\u3057\u3044\u3068\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<p>SUSE \u306e\u30c9\u30ad\u30e5\u30e1\u30f3\u30c8\u306b\u3082\u3001mce=ignore_ce \u306b\u95a2\u3059\u308b\u8a18\u8ff0\u304c\u5b58\u5728\u3057\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/www.suse.com\/support\/kb\/doc\/?id=7022118\">Considerations for dealing with correctable memory error messages<\/a><\/p>\n<p>\u3061\u306a\u307f\u306b\u3001 mce=ignore_ce \u306b\u306a\u3063\u3066\u3044\u308b\u304b\u3069\u3046\u304b\u306f\u3001 \/sys \u914d\u4e0b\u3092\u307f\u308b\u3068\u78ba\u8a8d\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<pre><code>$ head `find \/sys 2>\/dev\/null | grep -i ignore_ce`\n==> \/sys\/devices\/system\/machinecheck\/machinecheck15\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck2\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck13\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck0\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck11\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck9\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck7\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck5\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck3\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck14\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck1\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck12\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck10\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck8\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck6\/ignore_ce <==\n1\n==> \/sys\/devices\/system\/machinecheck\/machinecheck4\/ignore_ce <==\n1\n$<\/code><\/pre>\n<h2>ghes_edac<\/h2>\n<p>Ubuntu \u306b\u9650\u3089\u305a\u3001kernel 4.15 \u4ee5\u964d \u304b\u3064 Skylake-SP \u4e16\u4ee3\u306e HPE \u306e\u30b5\u30fc\u30d0\u3067\u306f\u3001 Skylake-SP \u7528\u306e EDAC \u7528 kernel module\uff08skx_edac.ko\uff09 \u3067\u306f\u306a\u304f\u3001kernel\u7d44\u307f\u8fbc\u307f\u306e ghes_edac \u304c \u6709\u52b9\u306b\u306a\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/commit\/5deed6b6a479ad5851d7ead6412dc6faa84a694e\">https:\/\/github.com\/torvalds\/linux\/commit\/5deed6b6a479ad5851d7ead6412dc6faa84a694e<\/a><\/p>\n<p>drivers\/edac\/ghes_edac.c \u3092\u4f7f\u3046\u969b\u3001 hardware-driven EDAC driver \uff08 chipset-specific EDAC module\uff09\u306f\u81ea\u52d5\u7684\u306b\u7121\u52b9\u5316\u3055\u308c\u307e\u3059\u3002<br \/>\n\u6b21\u306b\u89e3\u8aac\u3055\u308c\u3066\u3044\u308b\u3088\u3046\u306b\u3001 ghes_edac.c \u306f Firmware first \u306a EDAC driver \u3068\u3057\u3066\u8a2d\u8a08\u3055\u308c\u3066\u3044\u308b\u305d\u3046\u3067\u3059\u3002<\/p>\n<p><a href=\"https:\/\/lwn.net\/Articles\/539429\/\">https:\/\/lwn.net\/Articles\/539429\/<\/a><\/p>\n<blockquote><p>\n  There are currently 3 error mechanisms inside the Linux Kernel: edac, mcelog and ghes.<\/p>\n<p>Unfortunately, not all those error mechanisms will work at the same time, as accessing the error registers by the BIOS may interfere on reading them from OS.<\/p>\n<p>So, all those 3 mechanisms need to be integrated, in order to avoid such problems.<\/p>\n<p>This patch series adds a new EDAC driver that uses \"Firmware first\" APEI\/GHES as an error report mechanism. It automatically disables the hardware-driven EDAC drivers when GHES is enabled, preventing to have both OS and BIOS to read at the very same error mechanisms.<\/p><\/blockquote>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/Kconfig#L55-L75\">kernel 4.15 \u306e Kconfig<\/a> \u306b\u3082\u3001 ghes_edac \u306b\u3064\u3044\u3066\u306e\u89e3\u8aac\u304c\u3042\u308a\u307e\u3059\u3002\u78ba\u8a8d\u3057\u305f\u3068\u3053\u308d\u3001 Ubuntu \u306b\u304a\u3044\u3066 kernel 4.15 \u3067\u306f\u3001 CONFIG_EDAC_GHES=y \u306b\u306a\u3063\u3066\u3044\u307e\u3059\u306e\u3067\u3001ghes_edac.ko \u306f module \u3067\u306f\u306a\u304f\u3001 kernel \u7d44\u307f\u8fbc\u307f\u306b\u306a\u308a\u307e\u3057\u305f\u3002\u305f\u3060\u3001\u6b21\u306e\u89e3\u8aac\u306b\u3042\u308b\u3088\u3046\u306b\u3001 ghes.disable=1 \u3068\u3059\u308b\u3053\u3068\u3067\u3001 ghes_edac \u3092\u7121\u52b9\u5316\u3059\u308b\u3053\u3068\u306f\u53ef\u80fd\u3067\u3059\u3002<\/p>\n<pre><code>config EDAC_GHES\n    bool \"Output ACPI APEI\/GHES BIOS detected errors via EDAC\"\n    depends on ACPI_APEI_GHES && (EDAC=y)\n    help\n      Not all machines support hardware-driven error report. Some of those\n      provide a BIOS-driven error report mechanism via ACPI, using the\n      APEI\/GHES driver. By enabling this option, the error reports provided\n      by GHES are sent to userspace via the EDAC API.\n\n      When this option is enabled, it will disable the hardware-driven\n      mechanisms, if a GHES BIOS is detected, entering into the\n      \"Firmware First\" mode.\n\n      It should be noticed that keeping both GHES and a hardware-driven\n      error mechanism won't work well, as BIOS will race with OS, while\n      reading the error registers. So, if you want to not use \"Firmware\n      first\" GHES error mechanism, you should disable GHES either at\n      compilation time or by passing \"ghes.disable=1\" Kernel parameter\n      at boot time.\n\n      In doubt, say 'Y'.\n<\/code><\/pre>\n<p>ghes_edac.c \u306e Frimware First mode \u304c\u6709\u7528\u306a\u306e\u306f\u3001\u30b5\u30fc\u30d0\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3068\u306e\u7af6\u5408\u304c\u907f\u3051\u3089\u308c\u308b\u304b\u3089\u306e\u3088\u3046\u3067\u3059\u3002<\/p>\n<p>\u3061\u306a\u307f\u306b\u3001 ghes.disable=1 \u306b\u3057\u3066 ghes_edac \u3092\u7121\u52b9\u5316\u3059\u308b\u3068\u3001Skylake-SP \u3067\u306f skx_edac.ko \u304c load \u3055\u308c\u308b\u3088\u3046\u3067\u3057\u305f\u3002 skx_edac.ko \u304c\u30b5\u30fc\u30d0\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3068\u7af6\u5408\u3059\u308b\u306e\u3067\u3042\u308c\u3070\u3001 skx_edac.ko \u306f\u7121\u52b9\u5316\u3057\u305f\u65b9\u304c\u826f\u3044\u3093\u3067\u3057\u3087\u3046\u306d\u3002<\/p>\n<p>Firmware First mode \u306b\u3064\u3044\u3066\u306f\u3001 Intel \u306e\u6b21\u306e white paper \u306b\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/firmware.intel.com\/sites\/default\/files\/resources\/A_Tour_beyond_BIOS_Implementing_APEI_with_UEFI_White_Paper.pdf\">https:\/\/firmware.intel.com\/sites\/default\/files\/resources\/A_Tour_beyond_BIOS_Implementing_APEI_with_UEFI_White_Paper.pdf<\/a><\/p>\n<blockquote><p>\n  APEI Error handling models<\/p>\n<p>APEI offers two error models, Firmware first model and OS Native model. Firmware 1st is used when the host firmware needs to initially examine the error and attempt recovery or corrective action in an OS transparent way. This model is also used when certain OEMs want more control over error handling before the OS takes control, such as for purposes of executing some management functions. In the Firmware 1st model, all errors are initially signaled to the host firmware via SMI or other General Purpose Input (GPI) events. Then host firmware analyzes and decides what to do, and at the end of the flow creates a detailed APEI error log with FRU information to OS. Finally, the host firmware will then signal the OS about the existence of the error via SCI, NMI, or other interrupts.<\/p>\n<p>The OS native model, on the other hand, provides handling of the error directly by the OS or OS level software by directly accessing the hardware registers and analyzing the error. This requires standard architecture in the hardware for providing error information in the hardware and signaling, for e.g. industry standard PCIe AER and x86 MCA architectures. This model takes the burden off of host firmware.<\/p>\n<p>Platforms can use combined model also, where some errors are handled firmware 1st, some natively and some both (a.k.a. parallel model). Many of the servers employ this combined model for better handling and managing the server better via remotely.<\/p><\/blockquote>\n<p>\u305f\u3044\u3078\u3093\u96d1\u306b\u8a00\u3044\u307e\u3059\u3068\u3001 OS Native model \u306f hardware register (error bank) \u304b\u3089\u3001OS\u304c\u76f4\u63a5 error \u3092\u8aad\u307f\u8fbc\u3093\u3067\u3057\u307e\u3044\u307e\u3059\u304c\u3001 firmware first model \u306f\u3001 firmware \u304c\u6700\u521d\u306b bank \u3092\u8aad\u3080\u3088\u3046\u306a\u632f\u308b\u821e\u3044\u3001\u3068\u3044\u3046\u3053\u3068\u3067\u3057\u3087\u3046\u3002<\/p>\n<p>ghes.c \u304c firmware first mode \u3067\u521d\u671f\u5316\u3055\u308c\u3066\u3044\u308b\u5834\u5408\u3001\u6b21\u306e\u3088\u3046\u306b dmesg \u3067\u78ba\u8a8d\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/commit\/9fb0bfe1408d5506b7b83d13d1eed573fd71d67d\">https:\/\/github.com\/torvalds\/linux\/commit\/9fb0bfe1408d5506b7b83d13d1eed573fd71d67d<\/a><\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/ghes.c#L1235\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/ghes.c#L1235<\/a><\/p>\n<pre><code>$ dmesg | grep 'firmware first'\n[    1.172974] GHES: APEI firmware first mode is enabled by APEI bit and WHEA _OSC.\n$<\/code><\/pre>\n<p>ghes_edac \u306b\u5bfe\u5fdc\u3057\u3066\u3044\u308b\u30d9\u30f3\u30c0\u30fc\u306f\u3001\u307e\u3060\u9650\u5b9a\u7684\u306a\u3088\u3046\u3067\u3001\u591a\u304f\u306e\u30d9\u30f3\u30c0\u30fc\u306f\u3001 sky_edac.ko\uff08hardware-driven EDAC driver \u3042\u308b\u3044\u306f chipset-specific EDAC module) \u3067\u52d5\u4f5c\u3057\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002\u305d\u308c\u306f\u4f55\u6545\u304b\u3068\u8a00\u3044\u307e\u3059\u3068\u3001\u6b21\u306e\u3088\u3046\u306a commit log \u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/commit\/5deed6b6a479ad5851d7ead6412dc6faa84a694e#diff-1690ecd81c78e5312438ae4279550e8d\">https:\/\/github.com\/torvalds\/linux\/commit\/5deed6b6a479ad5851d7ead6412dc6faa84a694e#diff-1690ecd81c78e5312438ae4279550e8d<\/a><\/p>\n<pre><code>EDAC, ghes: Add platform check\n\nThe ghes_edac driver was introduced in 2013 [1], but it has not been\nenabled by any distro yet. This driver obtains error info from firmware\ninterfaces (APEI), which are not properly implemented on many platforms,\nas the driver says on load:\n\n  This EDAC driver relies on BIOS to enumerate memory and get error\n  reports. Unfortunately, not all BIOSes reflect the memory layout\n  correctly. So, the end result of using this driver varies from vendor\n  to vendor. If you find incorrect reports, please contact your hardware\n  vendor to correct its BIOS.\n\nTo get out from this situation, add a platform check to selectively\nenable the driver on platforms that are known to have proper APEI\nfirmware implementation.\n\n\"ghes_edac.force_load=1\" skips this platform check.\n\n[1]: https:\/\/lkml.kernel.org\/r\/cover.1360931635.git.mchehab@redhat.com\n<\/code><\/pre>\n<p>\u5fc5\u305a\u3057\u3082\u3059\u3079\u3066\u306e\u30d9\u30f3\u30c0\u30fc\u306eBIOS\u304c\u3001APEI\u306b\u6b63\u3057\u304f\u6e96\u62e0\u3057\u3066\u3044\u308b\u3068\u306f\u9650\u3089\u306a\u3044\u3001\u3068\u3044\u3063\u305f\u3068\u3053\u308d\u306a\u306e\u3067\u3057\u3087\u3046\u3002<\/p>\n<p>ghes.c \u304c firmware first mode \u3067\u521d\u671f\u5316\u3055\u308c\u3066\u3044\u308b\u30b5\u30fc\u30d0\u306b\u7f6e\u3044\u3066\u3001 \/sys\/devices \u914d\u4e0b\u306f\u3001\u4f8b\u3048\u3070\u6b21\u306e\u3088\u3046\u306b\u306a\u3063\u305f\u308a\u3057\u307e\u3059\u3002<\/p>\n<pre><code>$ find \/sys -type d 2>\/dev\/null | grep -i -e ghes -e edac\n\/sys\/devices\/platform\/GHES.1\n\/sys\/devices\/platform\/GHES.1\/power\n\/sys\/devices\/platform\/GHES.65534\n\/sys\/devices\/platform\/GHES.65534\/power\n\/sys\/devices\/platform\/GHES.0\n\/sys\/devices\/platform\/GHES.0\/power\n\/sys\/devices\/system\/edac\n\/sys\/devices\/system\/edac\/power\n\/sys\/devices\/system\/edac\/mc\n\/sys\/devices\/system\/edac\/mc\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm5\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm5\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm1\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm1\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm6\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm6\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm2\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm2\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm0\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm0\/power\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm7\n\/sys\/devices\/system\/edac\/mc\/mc0\/dimm7\/power\n\/sys\/bus\/platform\/drivers\/GHES \/sys\/bus\/edac\n\/sys\/bus\/edac\/devices\n\/sys\/bus\/edac\/drivers\n\/sys\/module\/edac_core\n\/sys\/module\/edac_core\/parameters\n$<\/code><\/pre>\n<p>\/sys\/devices\/platform\/GHES.* \u3068\u3044\u3046\u306e\u306f\u3001ACPI Table \u306e HEST \u304b\u3089\u8aad\u307f\u8fbc\u3093\u3060 Source Id \u306b\u3088\u308b\u3082\u306e\u3088\u3046\u3067\u3059\u3002\u5177\u4f53\u7684\u306b\u3001\u521d\u671f\u5316\u3057\u3066\u308b\u306e\u306f\u6b21\u306e\u3042\u305f\u308a\u306e\u3088\u3046\u3067\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L222\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L222<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L253\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L253<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L192\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L192<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L202\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L202<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L151\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L151<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L173\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/acpi\/apei\/hest.c#L173<\/a><\/p>\n<p>HEST \u306f\u3001\u6b21\u306e\u3088\u3046\u306b disassemble \u3057\u3066\u78ba\u8a8d\u3067\u304d\u307e\u3059\u3002<\/p>\n<pre><code>$ sudo cat \/sys\/firmware\/acpi\/tables\/HEST > HEST.dat\n$ iasl -d HEST.dat\nIntel ACPI Component Architecture\nASL+ Optimizing Compiler version 20160108-64\nCopyright (c) 2000 - 2016 Intel Corporation\nInput file HEST.dat, Length 0xE8 (232) bytes\nACPI: HEST 0x0000000000000000 0000E8 (v01 HPE    Server   00000001 INTL 00000001)\nAcpi Data Table [HEST] decoded\nFormatted output:  HEST.dsl - 6468 bytes\n$<\/code><\/pre>\n<p>\u5148\u7a0b\u306e\u4f8b\u3067\u306fSourece Id\u304c 0000, 0001, FFFE \u306a\u306e\u3067\u3001\u305d\u308c\u305e\u308c GHES.0, GHES.1, GHES.65534 \u306b \u5bfe\u5fdc\u3057\u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<pre><code>$ grep -e 'Generic Hardware Error Source' -e '  Source Id' HEST.dsl\n[028h 0040   2]                Subtable Type : 0009[Generic Hardware Error Source]\n[02Ah 0042   2]                    Source Id : 0000\n[068h 0104   2]                Subtable Type : 0009[Generic Hardware Error Source]\n[06Ah 0106   2]                    Source Id : 0001\n[0A8h 0168   2]                Subtable Type : 0009 [Generic Hardware Error Source]\n[0AAh 0170   2]                    Source Id : FFFE\n$<\/code><\/pre>\n<p>ghes_edac \u306f\u3001\u3069\u306e Source Id \u304c Memory Controller \u306a\u306e\u304b\u3001\u4e8b\u524d\u306b\u77e5\u3063\u3066\u3044\u308b\u308f\u3051\u3067\u306f\u306a\u3044\u3068\u8003\u3048\u3089\u308c\u307e\u3059\u3002<br \/>\nghes_edac \u306e\u521d\u671f\u5316\u5468\u308a\u3092\u78ba\u8a8d\u3059\u308b\u3068\u3001SMBIOS\u306e\u4ed5\u69d8\u306b\u5f93\u3063\u3066\u3001 dmidecode \u3057\u3066 DIMM\u306e\u60c5\u5831\u3092\u8aad\u307f\u53d6\u3063\u3066\u3044\u308b\u3060\u3051\u3067\u3001 Source Id \u3092\u610f\u8b58\u3057\u3066\u3044\u308b\u308f\u3051\u3067\u306f\u306a\u3044\u3088\u3046\u306b\u898b\u3048\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L497\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L497<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L1023\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L1023<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L1035\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L1035<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L89-L135\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L89-L135<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L115\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/firmware\/dmi_scan.c#L115<\/a><br \/>\n<a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L84-L173\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L84-L173<\/a><\/p>\n<p>\u305d\u3057\u3066\u3001ghes_edac_report_mem_error() \u3067\u3001\u3069\u306e\u3088\u3046\u306a\u30a8\u30e9\u30fc\u3067\u3042\u308b\u304b\u5224\u5b9a\u3059\u308b\u306a\u3069\u3057 \u3066\u3044\u308b\u3088\u3046\u3067\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L234-L292\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/drivers\/edac\/ghes_edac.c#L234-L292<\/a><\/p>\n<p>Section Type \u306f\u3001Generic Error Data Entry \u306b\u542b\u307e\u308c\u3066\u3044\u308b\u3082\u306e\u3067\u3059\u304c\u3001\u305d\u308c\u305e\u308c\u3001 \u5f8c\u8ff0 \u3059\u308bACPI\u3084UEFI\u306espec\u306b\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/include\/acpi\/actbl1.h#L655-L658\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/include\/acpi\/actbl1.h#L655-L658<\/a><\/p>\n<h3>GHES(Generic Hardware Error Source)<\/h3>\n<p>GHES \u306e\u4ed5\u69d8\u306b\u3064\u3044\u3066\u306f <a href=\"https:\/\/uefi.org\/sites\/default\/files\/resources\/ACPI_4.pdf\">ACPI Specification version 4.0<\/a>, section 17.3.2.6 \u306b\u8a18\u8ff0\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<blockquote><p>\n  17.3.2.6 Generic Hardware Error Source<\/p>\n<p>The platform may describe a generic hardware error source to OSPM using the Generic Hardware Error Source structure. A generic hardware error source is an error source that either notifies OSPM of the presence of an error using a non-standard notification mechanism or reports error information that is encoded in a non-standard format.<\/p>\n<p>Using the information in a Generic Hardware Error Source structure, OSPM configures an error handler to read the error data from an error status block \u2013 a range of memory set aside by the platform for recording error status information.<\/p>\n<p>As the generic hardware error source is non-standard, OSPM does not implement built-in support for configuration and control operations. The error source must be configured by system firmware during boot.<\/p><\/blockquote>\n<p>APEI Error handling models \u3067 Firmware first model \u304c\u63a1\u7528\u3055\u308c\u3066\u3044\u308b\u30b1\u30fc\u30b9\u3067\u306f\u3001 GHES \u304c\u4f7f\u308f\u308c\u308b\u3001\u3068\u3044\u3046\u3053\u3068\u3067\u3057\u3087\u3046\u3002<\/p>\n<h3>Generic Error Data entry<\/h3>\n<p><a href=\"https:\/\/uefi.org\/sites\/default\/files\/resources\/ACPI_6_1.pdf\">ACPI 6.1 \u306e spec<\/a>\u306e Table 18-343 Generic Error Data Entry \u306b\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3059\u3002 Section Type \u306f offset 0 \u306b\u3042\u308a\u3001Section Descriptor \u306b\u3064\u3044\u3066\u306f UEFI specification  \u306b\u8a18\u8ff0\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<h3>Section Descriptor<\/h3>\n<p><a href=\"https:\/\/uefi.org\/sites\/default\/files\/resources\/UEFI%20Spec%202_6%20Errata%20A%20final.pdf\">UEFI\u306espec<\/a>\u306eN.2.2 Section Descriptor \u306b\u8a18\u8ff0\u304c\u3042\u308a\u307e\u3057\u305f\u3002<\/p>\n<p>UEFI Specification \u306b\u5bfe\u5fdc\u3059\u308b Section Type \u306f\u3001 kernel 4.15 \u3067\u306f\u6b21\u306e\u3088\u3046\u306b\u3001CPU\u3084 PCI-Express\u306a\u3069\u306b\u95a2\u3059\u308b\u3082\u306e\u304c\u5b9a\u7fa9\u3055\u308c\u3066\u3044\u307e\u3059\u3002<\/p>\n<p><a href=\"https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/include\/linux\/cper.h#L166-L216\">https:\/\/github.com\/torvalds\/linux\/blob\/v4.15\/include\/linux\/cper.h#L166-L216<\/a><\/p>\n<h2>\u3053\u308c\u3089\u3092\u8e0f\u307e\u3048\u3066<\/h2>\n<p>\u73fe\u72b6\u3001\u308f\u305f\u3057\u3068\u3057\u3066\u306f\u6b21\u306e\u3088\u3046\u306b\u8a8d\u8b58\u3057\u3066\u3044\u307e\u3059\u3002<\/p>\n<ul>\n<li>\u30b5\u30fc\u30d0\u88fd\u54c1\u5074\u3067\u6301\u3063\u3066\u3044\u308b\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u306f\u3001MCE Bank \u304b\u3089 Correctable Memory Error \u306a\u3069\u3092\u8aad\u307f\u8fbc\u3093\u3067\u3001\u305d\u306e\u60c5\u5831\u3092\u6d3b\u7528\u3057\u3066\u3044\u308b\u3053\u3068\u304c\u3042\u308b\u3002<\/li>\n<li>\u3088\u3063\u3066\u3001 mce=ignore_ce \u3092\u8a2d\u5b9a\u3057\u3001Linux \u306e Machine check handler \u304c polling \u3067 corrected error \u3092\u8aad\u307f\u53d6\u3089\u306a\u3044\u3088\u3046\u306b\u3059\u308b\u306e\u306f\u3001\u9069\u5207\u306a\u5834\u5408\u304c\u3042\u308b\u3002\uff08\u8a73\u3057\u304f\u306f\u30d9\u30f3\u30c0\u30fc\u3055\u3093\u306b\u78ba\u8a8d\u3059\u308b\uff09<\/li>\n<li>sky_edac.ko \u306a\u3069\u306e hardware-driven EDAC driver \u3042\u308b\u3044\u306f chipset-specific EDAC module \u304c\u30b5\u30fc\u30d0\u88fd\u54c1\u5074\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3068\u7af6\u5408\u3059\u308b\u5834\u5408\u3001\u7121\u52b9\u5316\u3057\u305f\u65b9\u304c\u9069\u5207\u306a\u5834\u5408\u304c\u3042\u308b\u3002\uff08\u8a73\u3057\u304f\u306f\u30d9\u30f3\u30c0\u30fc\u3055\u3093\u306b\u78ba\u8a8d\u3059\u308b\uff09<\/li>\n<li>ghes_edac \u306f Firmware First \u3067\u3042\u308a\u3001\u30b5\u30fc\u30d0\u88fd\u54c1\u5074\u306e\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u3068\u7af6\u5408\u3057\u306a\u3044\u3088\u3046\u3001\u610f\u8b58\u3057\u3066\u8a2d\u8a08\u3055\u308c\u3066\u3044\u308b\u3002<\/li>\n<li>kernel 4.15 \u306e ghes \u306f\u3001HEST \u306b\u5b9a\u7fa9\u3055\u308c\u305f Source Id \u304b\u3089\u30a8\u30e9\u30fc\u306e\u60c5\u5831\u3092\u53d6\u5f97\u3057\u3001\u9069\u5207\u306b\u30cf\u30f3\u30c9\u30ea\u30f3\u30b0\u3057\u3066\u3044\u308b\u3002\u305d\u308c\u3089\u306e\u60c5\u5831\u306f\u3001 Memory \u3060\u3051\u3067\u306a \u304f\u3001 PCI-Express \u306a\u3069\u306e\u30a8\u30e9\u30fc\u3082\u542b\u307e\u308c\u308b\u3068\u8003\u3048\u3089\u308c\u308b\u3002<\/li>\n<li>ghes \u304c\u6709\u52b9\u306b\u306a\u3063\u3066\u3044\u308b\u3068\u3001 kernel \u5074\u3067 PCI-Express \u306a\u3069\u306e\u30a8\u30e9\u30fc\u60c5\u5831\u306a\u3069\u3082\u53d6\u5f97\u3067\u304d\u308b\u3067\u3042\u308d\u3046\u304b\u3089\u3001ghes_edac \u304c\u6709\u52b9\u306b\u306a\u3063\u3066\u3044\u308b\u74b0\u5883\u306b\u304a\u3044\u3066\u3001 \u3088\u307b\u3069\u306e\u4e0d\u5177\u5408\u304c\u306a\u3044\u9650\u308a\u3001 ghes.disable=1 \u306b\u3059\u308b\u5fc5\u8981\u306f\u306a\u3044\u3068\u8003\u3048\u3089\u308c\u308b\u3002\uff08\u8a73\u3057\u304f\u306f\u30d9\u30f3\u30c0\u30fc\u3055\u3093\u306b\u78ba\u8a8d\u3059\u308b\uff09<\/li>\n<li>Generic Hardware Error Source \u304c\u6d3b\u7528\u3067\u304d\u305f\u3089\u4f55\u304c\u5b09\u3057\u3044\u304b\u3068\u3044\u3046\u3068\u3001\u7279\u5b9a\u306e\u30c1\u30c3\u30d7\u30bb\u30c3\u30c8\u3084\u3001\u7279\u5b9a\u306eCPU\u306e\u4e16\u4ee3\u5411\u3051\u306b\u958b\u767a\u3055\u308c\u305fEDAC module\u3092\u4f7f\u308f\u306a\u304f\u3066\u3082\u3001kernel \u306f\u30cf\u30fc\u30c9\u30a6\u30a7\u30a2\u306e\u30a8\u30e9\u30fc\u30cf\u30f3\u30c9\u30ea\u30f3\u30b0\u3092\u5b9f\u73fe\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308b\u3002\u305f\u3060\u305d\u306e\u5834\u5408\u3001 firmware \u306a\u3069\u304c\u9069\u5207\u306b\u8a2d\u8a08\u3055\u308c\u3066\u3044\u308b\u3053\u3068\u304c\u6c42\u3081\u3089\u308c\u308b\u3002<\/li>\n<\/ul>\n<p>\u3082\u3068\u3082\u3068\u3001\u30e1\u30e2\u30ea\u76e3\u8996\u6a5f\u80fd\u306f\u3001HPC\u306a\u3069\u3067DRAM\u305f\u304f\u3055\u3093\u4f7f\u3046\u3088\u3046\u306b\u306a\u3063\u305f\u7d50\u679c\u3001\u30e1\u30e2\u30ea\u306e\u6545\u969c\u306b\u60a9\u307e\u3055\u308c\u308b\u3088\u3046\u306b\u306a\u3063\u305f\u306e\u3067\u3001\u305d\u306e\u5f71\u97ff\u3092\u7de9\u548c\u3059\u308b\u305f\u3081\u306b\u6539\u5584\u3055\u308c\u3066\u3044\u3063\u305f\u3082\u306e\u306e\u3088\u3046\u3067\u3059\u3002\u3055\u3044\u304d\u3093\u306e\u30b5\u30fc\u30d0\u7528CPU\u306f\u3001\u6614\u3068\u9055\u3063\u3066\u6841\u9055\u3044\u306b\u5927\u91cf\u306eDRAM\u3092\u8a70\u3081\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u306e\u3067\u3001\u3053\u3046\u3044\u3063\u305f\u8a2d\u5b9a\u3092\u898b\u76f4\u3057\u3066\u307f\u308b\u306e\u3082\u3001\u5b89\u5b9a\u7a3c\u50cd\u306e\u305f\u3081\u306b\u306f\u826f\u3044\u306e\u3067\u306f\u306a\u3044\u304b\u306a\u3001\u305d\u3046\u601d\u3063\u305f\u308a\u3057\u307e\u3059\u3002<\/p>\n<h2>\u304a\u308f\u308a\u306b<\/h2>\n<p>\u305d\u3046\u3068\u3046\u30cb\u30c3\u30c1\u306a\u5185\u5bb9\u3067\u3057\u305f\u304c\u3001\u81ea\u5206\u3068\u3057\u3066\u306f\u4e45\u3005\u306b\u3061\u3087\u3046\u982d\u3092\u4f7f\u3063\u305f\u306e\u3067\u3001\u305b\u3063\u304b\u304f\u306a\u306e\u3067\u516c\u958b\u3055\u305b\u3066\u3044\u305f\u3060\u3053\u3046\u304b\u3068\u601d\u3044\u307e\u3057\u305f\u3002\u3069\u306a\u305f\u304b\u306e\u304a\u5f79\u306b\u7acb\u3066\u308c\u3070\u5e78\u3044\u3067\u3059\u3002<\/p>\n<p>\u500b\u4eba\u7684\u306b\u306f\u3001BIOS\u3084firmware\u3001UEFI\u306e\u90e8\u5206\u3067\u3001\u4ed5\u69d8\u30ec\u30d9\u30eb\u3067\u3053\u3046\u3044\u3063\u305f\u6539\u5584\u304c\u7a4d\u307f\u4e0a\u3052\u3089\u308c\u3066\u3044\u3063\u3066\u308b\u306e\u306b\u3001\u5730\u5473\u306b\u611f\u52d5\u3057\u305f\u308a\u3057\u307e\u3057\u305f\u3002<\/p>\n<p>\u6b21\u56de\u306f\u3001 MySQL \u306b\u95a2\u3059\u308b\u8a71\u3092\u3057\u305f\u3044\u3068\u601d\u3044\u307e\u3059\u3002<\/p>\n<h2>References<\/h2>\n<ul>\n<li><a href=\"https:\/\/qiita.com\/timwata\/items\/0bca99a6c1f8ec3a3855\">ACPI Table<\/a><\/li>\n<li><a href=\"http:\/\/halobates.de\/mce.pdf\">Machine check handling on Linux<\/a><\/li>\n<li><a href=\"http:\/\/halobates.de\/lk10-mcelog.pdf\">mcelog: memory error handling in user space<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>\u3053\u3093\u306b\u3061\u308f\u3002\u305b\u3058\u307e\u3067\u3059\u3002 Surface Pro X\u307b\u3057\u3044\u3067\u3059\u306d\u3002 Surface Duo\u3082Surface Neo\u3082\u307b\u3057\u3044\u3067\u3059\u306d\u3002Microsoft\u3055\u3093\u304c\u3001\u3053\u3093\u306a\u306b\u3082\u30ef\u30af\u30ef\u30af\u3059\u308b\u30cf\u30fc\u30c9\u30a6\u30a7\u30a2\u3092\u63d0\u4f9b\u3057\u3066\u304f\u308c\u308b\u3088\u3046\u306b\u306a\u308b\u3068\u306f [&hellip;]<\/p>\n","protected":false},"author":137,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[9],"tags":[71],"class_list":["post-19524","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-info","tag-linux"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/19524","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/users\/137"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/comments?post=19524"}],"version-history":[{"count":3,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/19524\/revisions"}],"predecessor-version":[{"id":20333,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/19524\/revisions\/20333"}],"wp:attachment":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/media?parent=19524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/categories?post=19524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/tags?post=19524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}