View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0000382 | AlmaLinux-8 | kernel | public | 2023-04-03 08:22 | 2023-04-04 09:36 |
Reporter | jiang | Assigned To | |||
Priority | urgent | Severity | crash | Reproducibility | always |
Status | new | Resolution | open | ||
Platform | x86_64 | OS | almalinux | OS Version | almalinux8.7 |
Summary | 0000382: when create >100 gre tunnel devices and set master to a bridge device, the kernel will crash | ||||
Description | we create some gre tunnels and attach them to a bridge device. And this bridge device has an IP which is equal to gre local IP. When we add more than 100 gre tunnels, we can see kernel crash. | ||||
Steps To Reproduce | there is a few steps: 1. almaLinux8.7 starts 2. run ./add_gre_devices.sh (my primary netwrork interface is ens33, with 192.168.131.191, netmask 255.255.255.0) then the kernel crash the add_gre_device.sh is as follows: ----------------------------------------------------START------------------------------------------------------- #!/bin/bash iptables -F ip link del dev br0 ip link add name br0 type bridge ip link set ens33 master br0 ip route del 192.168.131.0/24 ifconfig br0 192.168.131.191 ip route add 192.168.131.0/24 dev br0 for (( i = 1 ; i < 150; i++ )) do IP=`expr $i` DevName=`expr $i` echo "ip link add name ap$DevName type gretap local 192.168.131.191 remote 192.168.131.$IP && ip link set ap$DevName up && ip link set ap$DevName master br0" ip link add name ap$DevName type gretap local 192.168.131.191 remote 192.168.131.$IP && ip link set ap$DevName up && ip link set ap$DevName master br0 done -----------------------------------------------END-------------------------------------------- | ||||
Additional Information | In centos7, centos8, almaLinux8.7, we can see this crash issue(kernel reports double-fault). the vmcore-dmesg.txt is copied here: -----------------------------------------------START------------------------------------------ [ 496.548662] BUG: stack guard page was hit at 00000000622361f6 (stack is 00000000673d50e5..00000000ed53eb7f) [ 496.548669] kernel stack overflow (double-fault): 0000 [#1] SMP PTI [ 496.548671] CPU: 1 PID: 20 Comm: ksoftirqd/1 Kdump: loaded Tainted: G ---------r- - 4.18.0-425.3.1.el8.x86_64 #1 [ 496.548674] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 496.548675] RIP: 0010:__skb_flow_dissect+0x39/0x15d0 [ 496.548677] Code: 89 cf 41 56 41 55 49 89 d5 41 54 49 89 f4 53 44 89 cb 48 83 e4 f0 48 81 ec c0 00 00 00 44 8b 5d 10 65 48 8b 04 25 28 00 00 00 <48> 89 84 24 b8 00 00 00 31 c0 4d 85 c0 0f 84 29 07 00 00 41 0f b7 [ 496.548680] RSP: 0018:ffff9cb240aeff40 EFLAGS: 00010282 [ 496.548684] RAX: f209f5ad326b0900 RBX: 0000000000000000 RCX: ffff9cb240af0060 [ 496.548685] RDX: ffffffff8cbb8e60 RSI: ffff910264172500 RDI: 0000000000000000 [ 496.548687] RBP: ffff9cb240af0030 R08: 0000000000000000 R09: 0000000000000000 [ 496.548688] R10: 0000000000000000 R11: 0000000000000000 R12: ffff910264172500 [ 496.548690] R13: ffffffff8cbb8e60 R14: ffff91026c3df000 R15: ffff9cb240af0060 [ 496.548692] FS: 0000000000000000(0000) GS:ffff910279e40000(0000) knlGS:0000000000000000 [ 496.548693] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 496.548695] CR2: ffff9cb240aeff38 CR3: 0000000060810004 CR4: 00000000000706e0 [ 496.548696] Call Trace: [ 496.548697] __skb_get_hash+0x57/0x1f0 [ 496.548699] ? nft_do_chain+0x4d0/0x4e0 [nf_tables] [ 496.548700] ip_tunnel_xmit+0x41e/0x770 [ip_tunnel] [ 496.548701] ? kmalloc_reserve+0x2e/0x80 [ 496.548703] ? __gre_xmit+0x6c/0x1f0 [ip_gre] [ 496.548704] gre_tap_xmit+0x10b/0x180 [ip_gre] [ 496.548705] dev_hard_start_xmit+0xd7/0x240 [ 496.548707] sch_direct_xmit+0x9f/0x370 [ 496.548708] __dev_queue_xmit+0x958/0xb60 [ 496.548709] ? nft_do_chain_bridge+0x70/0x190 [nf_tables] [ 496.548711] br_dev_queue_push_xmit+0xbc/0x190 [bridge] [ 496.548712] br_forward_finish+0xaf/0xc0 [bridge] [ 496.548714] ? br_fdb_offloaded_set+0x60/0x60 [bridge] [ 496.548715] __br_forward+0x156/0x1c0 [bridge] [ 496.548716] ? br_dev_queue_push_xmit+0x190/0x190 [bridge] [ 496.548718] deliver_clone+0x32/0x50 [bridge] [ 496.548719] maybe_deliver+0x91/0xd0 [bridge] [ 496.548720] br_flood+0x93/0x130 [bridge] [ 496.548722] br_dev_xmit+0x2f4/0x430 [bridge] [ 496.548723] dev_hard_start_xmit+0xd7/0x240 [ 496.548724] __dev_queue_xmit+0x80c/0xb60 [ 496.548726] ? __alloc_skb+0xe5/0x1c0 [ 496.548727] arp_xmit+0x9d/0xb0 [ 496.548728] ? arp_send_dst.part.21+0x18/0x90 [ 496.548729] arp_solicit+0xf5/0x2d0 [ 496.548731] ? kmem_cache_alloc+0x13f/0x280 [ 496.548732] neigh_probe+0x4c/0x60 [ 496.548733] __neigh_event_send+0xa3/0x370 [ 496.548734] neigh_resolve_output+0x12f/0x1a0 [ 496.548736] ip_finish_output2+0x192/0x430 [ 496.548737] ? ipv4_confirm+0x3c/0xe0 [nf_conntrack] [ 496.548738] ip_output+0x70/0xf0 [ 496.548739] ? __ip_finish_output+0x1d0/0x1d0 [ 496.548741] iptunnel_xmit+0x185/0x230 [ 496.548742] ip_tunnel_xmit+0x409/0x770 [ip_tunnel] [ 496.548849] gre_tap_xmit+0x10b/0x180 [ip_gre] [ 496.548851] dev_hard_start_xmit+0xd7/0x240 [ 496.548853] sch_direct_xmit+0x9f/0x370 [ 496.548854] __dev_queue_xmit+0x958/0xb60 [ 496.548855] ? nft_do_chain_bridge+0x70/0x190 [nf_tables] [ 496.548857] br_dev_queue_push_xmit+0xbc/0x190 [bridge] ------------------------------------------------END--------------------------------------------- And In almaLinux9.1, kernel will not crash,but print "dead loop on virtual device br0, fix it urgently" | ||||
Tags | almalinux8, Bug, kernel | ||||
abrt_hash | |||||
URL | |||||
|
sometimes vmcore-dmesg.txt reports: [ 6946.688867] PANIC: double fault, error_code: 0x0 [ 6946.688870] CPU: 0 PID: 83263 Comm: ip Kdump: loaded Tainted: G ---------r- - 4.18.0-425.3.1.el8.x86_64 #1 [ 6946.688871] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019 [ 6946.688872] RIP: 0010:nft_do_chain+0x25/0x4e0 [nf_tables] [ 6946.688873] Code: 00 00 00 00 00 0f 1f 44 00 00 55 b9 0a 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 81 ec b0 01 00 00 <48> 89 34 24 4c 8d 7c 24 50 65 48 8b 04 25 28 00 00 00 48 89 84 24 [ 6946.688874] RSP: 0000:ffffb7a0fffffe50 EFLAGS: 00010286 [ 6946.688875] RAX: 0000000000000000 RBX: ffff8debf48f7800 RCX: 000000000000000a [ 6946.688876] RDX: 0000000000000014 RSI: ffff8decdb0f2550 RDI: ffffb7a100000040 [ 6946.688876] RBP: ffffb7a100000030 R08: ffff8decdd790000 R09: eaee172c602dc964 [ 6946.688877] R10: ffff8dece371ec00 R11: f488111c00000000 R12: ffffb7a100000040 [ 6946.688877] R13: ffff8decce460f00 R14: ffff8dece371ec00 R15: 000000000000002f [ 6946.688878] FS: 0000000000000000(0000) GS:ffff8decf9e00000(0000) knlGS:0000000000000000 [ 6946.688879] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 6946.688879] CR2: ffffb7a0fffffe48 CR3: 000000003586e006 CR4: 00000000003706f0 [ 6946.688880] Call Trace: [ 6946.688880] <IRQ> [ 6946.688880] ? fnhe_hashfun+0x2f/0xa0 [ 6946.688907] nft_do_chain_ipv4+0x66/0x90 [nf_tables] [ 6946.688907] nf_hook_slow+0x44/0xd0 [ 6946.688907] __ip_local_out+0xd7/0x140 [ 6946.688908] ? ip_forward_options.cold.8+0x18/0x18 [ 6946.688908] ip_local_out+0x17/0x50 [ 6946.688909] iptunnel_xmit+0x185/0x230 [ 6946.688909] ip_tunnel_xmit+0x409/0x770 [ip_tunnel] [ 6946.688909] gre_tap_xmit+0x10b/0x180 [ip_gre] [ 6946.688910] dev_hard_start_xmit+0xd7/0x240 [ 6946.688910] sch_direct_xmit+0x9f/0x370 [ 6946.688911] __dev_queue_xmit+0x958/0xb60 [ 6946.688911] ? nft_do_chain_bridge+0x70/0x190 [nf_tables] [ 6946.688912] br_dev_queue_push_xmit+0xbc/0x190 [bridge] [ 6946.688912] br_forward_finish+0xaf/0xc0 [bridge] [ 6946.688912] ? br_fdb_offloaded_set+0x60/0x60 [bridge] [ 6946.688913] __br_forward+0x156/0x1c0 [bridge] [ 6946.688914] ? br_dev_queue_push_xmit+0x190/0x190 [bridge] [ 6946.688914] deliver_clone+0x32/0x50 [bridge] [ 6946.688914] maybe_deliver+0x91/0xd0 [bridge] [ 6946.688915] br_flood+0x93/0x130 [bridge] [ 6946.688915] br_dev_xmit+0x2f4/0x430 [bridge] [ 6946.688916] dev_hard_start_xmit+0xd7/0x240 [ 6946.688916] __dev_queue_xmit+0x80c/0xb60 [ 6946.688916] ? __alloc_skb+0xe5/0x1c0 [ 6946.688917] arp_xmit+0x9d/0xb0 [ 6946.688917] ? arp_send_dst.part.21+0x18/0x90 [ 6946.688918] arp_solicit+0xf5/0x2d0 [ 6946.688918] ? kmem_cache_alloc+0x13f/0x280 [ 6946.688918] neigh_probe+0x4c/0x60 [ 6946.688919] __neigh_event_send+0xa3/0x370 [ 6946.688919] neigh_resolve_output+0x12f/0x1a0 [ 6946.688920] ip_finish_output2+0x192/0x430 [ 6946.688920] ? ipv4_confirm+0x3c/0xe0 [nf_conntrack] [ 6946.688921] ip_output+0x70/0xf0 [ 6946.688921] ? __ip_finish_output+0x1d0/0x1d0 [ 6946.688921] iptunnel_xmit+0x185/0x230 [ 6946.688922] ip_tunnel_xmit+0x409/0x770 [ip_tunnel] [ 6946.688922] gre_tap_xmit+0x10b/0x180 [ip_gre] [ 6946.688923] dev_hard_start_xmit+0xd7/0x240 |
|
currently we find that this crash is due to "stack overflew", and this is a dev_hard_start_xmit loop Becuase our route is 192.168.131.0/24 dev br0, so arp packet before gre packet will be locally sent out by dev_hard_start_xmit(then br_dev_xmit will be called). But br_dev_xmit will call br_flood to send this packet to gretap device again, then ip_tunnel_xmit will be called again, and then dev_hard_start_xmit(===>br_dev_xmit) will be called again. In this way the stack overflows. |