View Issue Details

IDProjectCategoryView StatusLast Update
0000382AlmaLinux-8kernelpublic2023-04-04 09:36
Reporterjiang Assigned To 
PriorityurgentSeveritycrashReproducibilityalways
Status newResolutionopen 
Platformx86_64OSalmalinuxOS Versionalmalinux8.7
Summary0000382: when create >100 gre tunnel devices and set master to a bridge device, the kernel will crash
Descriptionwe create some gre tunnels and attach them to a bridge device. And this bridge device has an IP which is equal to gre local IP. When we add more than 100 gre tunnels, we can see kernel crash.
Steps To Reproducethere is a few steps:
1. almaLinux8.7 starts
2. run ./add_gre_devices.sh (my primary netwrork interface is ens33, with 192.168.131.191, netmask 255.255.255.0)

then the kernel crash

the add_gre_device.sh is as follows:
----------------------------------------------------START-------------------------------------------------------
#!/bin/bash


iptables -F
ip link del dev br0
ip link add name br0 type bridge
ip link set ens33 master br0
ip route del 192.168.131.0/24
ifconfig br0 192.168.131.191
ip route add 192.168.131.0/24 dev br0

for (( i = 1 ; i < 150; i++ ))
do
    IP=`expr $i`
    DevName=`expr $i`
    echo "ip link add name ap$DevName type gretap local 192.168.131.191 remote 192.168.131.$IP && ip link set ap$DevName up && ip link set ap$DevName master br0"
    ip link add name ap$DevName type gretap local 192.168.131.191 remote 192.168.131.$IP && ip link set ap$DevName up && ip link set ap$DevName master br0
done

-----------------------------------------------END--------------------------------------------
Additional InformationIn centos7, centos8, almaLinux8.7, we can see this crash issue(kernel reports double-fault). the vmcore-dmesg.txt is copied here:
-----------------------------------------------START------------------------------------------
[ 496.548662] BUG: stack guard page was hit at 00000000622361f6 (stack is 00000000673d50e5..00000000ed53eb7f)
[ 496.548669] kernel stack overflow (double-fault): 0000 [#1] SMP PTI
[ 496.548671] CPU: 1 PID: 20 Comm: ksoftirqd/1 Kdump: loaded Tainted: G ---------r- - 4.18.0-425.3.1.el8.x86_64 #1
[ 496.548674] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 496.548675] RIP: 0010:__skb_flow_dissect+0x39/0x15d0
[ 496.548677] Code: 89 cf 41 56 41 55 49 89 d5 41 54 49 89 f4 53 44 89 cb 48 83 e4 f0 48 81 ec c0 00 00 00 44 8b 5d 10 65 48 8b 04 25 28 00 00 00 <48> 89 84 24 b8 00 00 00 31 c0 4d 85 c0 0f 84 29 07 00 00 41 0f b7
[ 496.548680] RSP: 0018:ffff9cb240aeff40 EFLAGS: 00010282
[ 496.548684] RAX: f209f5ad326b0900 RBX: 0000000000000000 RCX: ffff9cb240af0060
[ 496.548685] RDX: ffffffff8cbb8e60 RSI: ffff910264172500 RDI: 0000000000000000
[ 496.548687] RBP: ffff9cb240af0030 R08: 0000000000000000 R09: 0000000000000000
[ 496.548688] R10: 0000000000000000 R11: 0000000000000000 R12: ffff910264172500
[ 496.548690] R13: ffffffff8cbb8e60 R14: ffff91026c3df000 R15: ffff9cb240af0060
[ 496.548692] FS: 0000000000000000(0000) GS:ffff910279e40000(0000) knlGS:0000000000000000
[ 496.548693] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 496.548695] CR2: ffff9cb240aeff38 CR3: 0000000060810004 CR4: 00000000000706e0
[ 496.548696] Call Trace:
[ 496.548697] __skb_get_hash+0x57/0x1f0
[ 496.548699] ? nft_do_chain+0x4d0/0x4e0 [nf_tables]
[ 496.548700] ip_tunnel_xmit+0x41e/0x770 [ip_tunnel]
[ 496.548701] ? kmalloc_reserve+0x2e/0x80
[ 496.548703] ? __gre_xmit+0x6c/0x1f0 [ip_gre]
[ 496.548704] gre_tap_xmit+0x10b/0x180 [ip_gre]
[ 496.548705] dev_hard_start_xmit+0xd7/0x240
[ 496.548707] sch_direct_xmit+0x9f/0x370
[ 496.548708] __dev_queue_xmit+0x958/0xb60
[ 496.548709] ? nft_do_chain_bridge+0x70/0x190 [nf_tables]
[ 496.548711] br_dev_queue_push_xmit+0xbc/0x190 [bridge]
[ 496.548712] br_forward_finish+0xaf/0xc0 [bridge]
[ 496.548714] ? br_fdb_offloaded_set+0x60/0x60 [bridge]
[ 496.548715] __br_forward+0x156/0x1c0 [bridge]
[ 496.548716] ? br_dev_queue_push_xmit+0x190/0x190 [bridge]
[ 496.548718] deliver_clone+0x32/0x50 [bridge]
[ 496.548719] maybe_deliver+0x91/0xd0 [bridge]
[ 496.548720] br_flood+0x93/0x130 [bridge]
[ 496.548722] br_dev_xmit+0x2f4/0x430 [bridge]
[ 496.548723] dev_hard_start_xmit+0xd7/0x240
[ 496.548724] __dev_queue_xmit+0x80c/0xb60
[ 496.548726] ? __alloc_skb+0xe5/0x1c0
[ 496.548727] arp_xmit+0x9d/0xb0
[ 496.548728] ? arp_send_dst.part.21+0x18/0x90
[ 496.548729] arp_solicit+0xf5/0x2d0
[ 496.548731] ? kmem_cache_alloc+0x13f/0x280
[ 496.548732] neigh_probe+0x4c/0x60
[ 496.548733] __neigh_event_send+0xa3/0x370
[ 496.548734] neigh_resolve_output+0x12f/0x1a0
[ 496.548736] ip_finish_output2+0x192/0x430
[ 496.548737] ? ipv4_confirm+0x3c/0xe0 [nf_conntrack]
[ 496.548738] ip_output+0x70/0xf0
[ 496.548739] ? __ip_finish_output+0x1d0/0x1d0
[ 496.548741] iptunnel_xmit+0x185/0x230
[ 496.548742] ip_tunnel_xmit+0x409/0x770 [ip_tunnel]
[ 496.548849] gre_tap_xmit+0x10b/0x180 [ip_gre]
[ 496.548851] dev_hard_start_xmit+0xd7/0x240
[ 496.548853] sch_direct_xmit+0x9f/0x370
[ 496.548854] __dev_queue_xmit+0x958/0xb60
[ 496.548855] ? nft_do_chain_bridge+0x70/0x190 [nf_tables]
[ 496.548857] br_dev_queue_push_xmit+0xbc/0x190 [bridge]
------------------------------------------------END---------------------------------------------
And In almaLinux9.1, kernel will not crash,but print "dead loop on virtual device br0, fix it urgently"
Tagsalmalinux8, Bug, kernel
abrt_hash
URL

Activities

jiang

2023-04-03 08:56

reporter   ~0000845

sometimes vmcore-dmesg.txt reports:
[ 6946.688867] PANIC: double fault, error_code: 0x0
[ 6946.688870] CPU: 0 PID: 83263 Comm: ip Kdump: loaded Tainted: G ---------r- - 4.18.0-425.3.1.el8.x86_64 #1
[ 6946.688871] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
[ 6946.688872] RIP: 0010:nft_do_chain+0x25/0x4e0 [nf_tables]
[ 6946.688873] Code: 00 00 00 00 00 0f 1f 44 00 00 55 b9 0a 00 00 00 48 89 e5 41 57 41 56 41 55 41 54 49 89 fc 53 48 83 e4 f0 48 81 ec b0 01 00 00 <48> 89 34 24 4c 8d 7c 24 50 65 48 8b 04 25 28 00 00 00 48 89 84 24
[ 6946.688874] RSP: 0000:ffffb7a0fffffe50 EFLAGS: 00010286
[ 6946.688875] RAX: 0000000000000000 RBX: ffff8debf48f7800 RCX: 000000000000000a
[ 6946.688876] RDX: 0000000000000014 RSI: ffff8decdb0f2550 RDI: ffffb7a100000040
[ 6946.688876] RBP: ffffb7a100000030 R08: ffff8decdd790000 R09: eaee172c602dc964
[ 6946.688877] R10: ffff8dece371ec00 R11: f488111c00000000 R12: ffffb7a100000040
[ 6946.688877] R13: ffff8decce460f00 R14: ffff8dece371ec00 R15: 000000000000002f
[ 6946.688878] FS: 0000000000000000(0000) GS:ffff8decf9e00000(0000) knlGS:0000000000000000
[ 6946.688879] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6946.688879] CR2: ffffb7a0fffffe48 CR3: 000000003586e006 CR4: 00000000003706f0
[ 6946.688880] Call Trace:
[ 6946.688880] <IRQ>
[ 6946.688880] ? fnhe_hashfun+0x2f/0xa0
[ 6946.688907] nft_do_chain_ipv4+0x66/0x90 [nf_tables]
[ 6946.688907] nf_hook_slow+0x44/0xd0
[ 6946.688907] __ip_local_out+0xd7/0x140
[ 6946.688908] ? ip_forward_options.cold.8+0x18/0x18
[ 6946.688908] ip_local_out+0x17/0x50
[ 6946.688909] iptunnel_xmit+0x185/0x230
[ 6946.688909] ip_tunnel_xmit+0x409/0x770 [ip_tunnel]
[ 6946.688909] gre_tap_xmit+0x10b/0x180 [ip_gre]
[ 6946.688910] dev_hard_start_xmit+0xd7/0x240
[ 6946.688910] sch_direct_xmit+0x9f/0x370
[ 6946.688911] __dev_queue_xmit+0x958/0xb60
[ 6946.688911] ? nft_do_chain_bridge+0x70/0x190 [nf_tables]
[ 6946.688912] br_dev_queue_push_xmit+0xbc/0x190 [bridge]
[ 6946.688912] br_forward_finish+0xaf/0xc0 [bridge]
[ 6946.688912] ? br_fdb_offloaded_set+0x60/0x60 [bridge]
[ 6946.688913] __br_forward+0x156/0x1c0 [bridge]
[ 6946.688914] ? br_dev_queue_push_xmit+0x190/0x190 [bridge]
[ 6946.688914] deliver_clone+0x32/0x50 [bridge]
[ 6946.688914] maybe_deliver+0x91/0xd0 [bridge]
[ 6946.688915] br_flood+0x93/0x130 [bridge]
[ 6946.688915] br_dev_xmit+0x2f4/0x430 [bridge]
[ 6946.688916] dev_hard_start_xmit+0xd7/0x240
[ 6946.688916] __dev_queue_xmit+0x80c/0xb60
[ 6946.688916] ? __alloc_skb+0xe5/0x1c0
[ 6946.688917] arp_xmit+0x9d/0xb0
[ 6946.688917] ? arp_send_dst.part.21+0x18/0x90
[ 6946.688918] arp_solicit+0xf5/0x2d0
[ 6946.688918] ? kmem_cache_alloc+0x13f/0x280
[ 6946.688918] neigh_probe+0x4c/0x60
[ 6946.688919] __neigh_event_send+0xa3/0x370
[ 6946.688919] neigh_resolve_output+0x12f/0x1a0
[ 6946.688920] ip_finish_output2+0x192/0x430
[ 6946.688920] ? ipv4_confirm+0x3c/0xe0 [nf_conntrack]
[ 6946.688921] ip_output+0x70/0xf0
[ 6946.688921] ? __ip_finish_output+0x1d0/0x1d0
[ 6946.688921] iptunnel_xmit+0x185/0x230
[ 6946.688922] ip_tunnel_xmit+0x409/0x770 [ip_tunnel]
[ 6946.688922] gre_tap_xmit+0x10b/0x180 [ip_gre]
[ 6946.688923] dev_hard_start_xmit+0xd7/0x240

jiang

2023-04-04 09:36

reporter   ~0000846

currently we find that this crash is due to "stack overflew", and this is a dev_hard_start_xmit loop
Becuase our route is 192.168.131.0/24 dev br0, so arp packet before gre packet will be locally sent out by dev_hard_start_xmit(then br_dev_xmit will be called). But br_dev_xmit will call br_flood to send this packet to gretap device again, then ip_tunnel_xmit will be called again, and then dev_hard_start_xmit(===>br_dev_xmit) will be called again. In this way the stack overflows.

Issue History

Date Modified Username Field Change
2023-04-03 08:22 jiang New Issue
2023-04-03 08:25 jiang Tag Attached: almalinux8
2023-04-03 08:25 jiang Tag Attached: Bug
2023-04-03 08:25 jiang Tag Attached: kernel
2023-04-03 08:56 jiang Note Added: 0000845
2023-04-04 09:36 jiang Note Added: 0000846