View Issue Details

IDProjectCategoryView StatusLast Update
0000478AlmaLinux-9kernelpublic2024-09-04 14:05
Reporterzbal1977 Assigned To 
PrioritynormalSeveritymajorReproducibilityalways
Status newResolutionopen 
Summary0000478: TCP connection/socket gets stuck and the messages delayed
DescriptionHi,

We have a client/server application which was developed a long time ago. It has been running in production for more than 10 years. The client is a Windows application written in C++, and the server-side component is written in Java 8.

This client/server software has been working fine for a long time on Linux servers. Currently, we use AlmaLinux 9. It was working on AlmaLinux 9 until updating the kernel.

So, when we update the Linux kernel from “5.14.0-362.13.1.el9_3.x86_64” to “kernel-5.14.0-427.31.1.el9_4.x86_64” the application gets unstable: The client drops the connection based due to not receiving messages in the proper time. We notice delays, the client just waiting for the response from the server. We can notice that all messages from the client are delayed on the server-side, the Java application gets the messages with a delay of several seconds. The issue is always reproducible with the new kernel. And if we go back to the old kernel, the problem is gone. We kept running the test for hours in both cases.

Even if we restart clients as well as the server application, the issue does not disappear. Only the full OS reboot can solve the problem.

I can provide PCAP files created by tcpdump tool in both cases: working and non-working scenarios. You can see the delay in the PCAP file and also lot of TCP Window Full messages.

Please investigate the issue that what happened between these two kernel versions. It seems there is an issue in the new kernel. Let us know what has been changed since "5.14.0-362.13.1.el9_3.x86_64" kernel version. An application that was working for a long time on Linux, now gets unstable due to kernel update.

I already reported a bug on the kernel.org website. Link: https://bugzilla.kernel.org/show_bug.cgi?id=219221

You find the PCAP files in the attachment.

Please analyze it and help us to determine why the new kernel behaves differently and causes this weird behaviour. How it can be fixed either in our application or maybe fx is required in the kernel?

Thanks a lot!

Regards,

Zoltan
Steps To Reproduce- Start our Java8 application that opens TCP socket for listening on port 31421
- Start our client application that connects to server
- The client makes request to server, it sends XML messages via TCP socket
- Server sends the response
- After 20-30 minutes, the issue gets reproducible
- If more clients run at the same time, then it gets reproducible a bit earlier
- Stopping/restarting all clients and server application does not help, the TCP socket is still stuck. Only OS rebbot can solve it, but after 20-30 mins the issue starts again...

TagsNo tags attached.
Attached Files
working_tcp_packets.pcap (73,869 bytes)

Activities

Issue History

Date Modified Username Field Change
2024-09-04 14:04 zbal1977 New Issue
2024-09-04 14:04 zbal1977 File Added: non-working_tcp_packets.pcap
2024-09-04 14:04 zbal1977 File Added: working_tcp_packets.pcap