-
Notifications
You must be signed in to change notification settings - Fork 50
Add comprehensive tests for SSL connection error handling during node failures #695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Tests simulate various SSL connection failure scenarios including: - Bad file descriptor errors during send/recv - SSL handshake failures - Concurrent operations on closing connections - Node reboot scenarios with connection reset - Multiple socket error types (EPIPE, ECONNABORTED, ENOTCONN, etc.) All tests verify that the original error cause is preserved in last_error and included in ConnectionShutdown exception messages for better debugging. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Created comprehensive integration tests to verify SSL connection error handling in real-world scenarios like node reboots and network failures. Tests cover: - AsyncoreConnection SSL failures - AsyncioConnection SSL failures - Error message quality verification - Concurrent operations during connection closure Note: These tests require a cluster to run but serve as documentation for expected behavior and can be used when cluster is available. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
- Refactored mock request creation to reduce duplication - Added comments explaining intentional use of CERT_NONE in tests - Improved code maintainability and clarity Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Created detailed README documenting: - Background on the "Bad file descriptor" issue - Test structure and organization - How to run the tests - Implementation details - Contributing guidelines Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
| """Create a mock SSL connection for testing.""" | ||
| c = Connection(DefaultEndPoint('1.2.3.4'), **kwargs) | ||
| c._socket = Mock(spec=ssl.SSLSocket) | ||
| c._socket.send.side_effect = lambda x: len(x) |
| """Create a connection with pending requests.""" | ||
| c = Connection(DefaultEndPoint('1.2.3.4')) | ||
| c._socket = Mock(spec=ssl.SSLSocket) | ||
| c._socket.send.side_effect = lambda x: len(x) |
|
|
||
| import unittest | ||
| import ssl | ||
| import socket |
| import ssl | ||
| import socket | ||
| import errno | ||
| from threading import Thread, Event |
| import socket | ||
| import errno | ||
| from threading import Thread, Event | ||
| from unittest.mock import Mock, MagicMock, patch, PropertyMock |
| import errno | ||
| from threading import Thread, Event | ||
| from unittest.mock import Mock, MagicMock, patch, PropertyMock | ||
| from io import BytesIO |
| from cassandra.connection import ( | ||
| Connection, ConnectionShutdown, DefaultEndPoint, | ||
| ConnectionException | ||
| ) |
| """ | ||
|
|
||
| import unittest | ||
| import socket |
| import socket | ||
| import ssl | ||
| import errno | ||
| import threading |
| import ssl | ||
| import errno | ||
| import threading | ||
| import time |
| from cassandra.connection import ConnectionShutdown, DefaultEndPoint | ||
| from cassandra.cluster import NoHostAvailable | ||
|
|
||
| try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot drop AsyncoreConnection tests, it's not supported anymore in newer python versions
hence preventing those tests from running (or even breaks collection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all AsyncoreConnection tests as requested. The asyncore module is deprecated in Python 3.6 and removed in Python 3.12+, which was preventing test collection.
Changes in commit 3cdee85:
- Removed AsyncoreConnection import and all 5 AsyncoreConnection tests
- Updated SSLErrorMessageQualityTest to only use AsyncioConnection
- Updated documentation to reflect 4 integration tests (down from 8)
The 14 unit tests remain unchanged and provide comprehensive coverage for SSL error scenarios.
Removed AsyncoreConnection from integration tests as asyncore module is deprecated in Python 3.6 and removed in Python 3.12+. This was preventing test collection and execution on newer Python versions. Changes: - Removed AsyncoreConnection import - Removed AsyncoreSSLConnectionFailureTest class (5 tests) - Updated SSLErrorMessageQualityTest to only use AsyncioConnection - Updated documentation to reflect 4 integration tests instead of 8 The unit tests (14 tests) remain unchanged and provide comprehensive coverage for SSL error handling scenarios. Co-authored-by: fruch <340979+fruch@users.noreply.github.com>
Description
When nodes reboot while using client encryption,
ConnectionShutdownexceptions with "Bad file descriptor" messages obscured the root cause. The driver already preserves the original error inlast_errorand includes it in exception messages - this was just untested.Changes:
Unit tests (
tests/unit/test_ssl_connection_errors.py): 14 tests covering SSL socket failuresIntegration tests (
tests/integration/standard/test_ssl_connection_failures.py): 4 tests for AsyncioConnectionDocumentation (
tests/unit/SSL_CONNECTION_TESTS.md): Test organization and contributing guidelinesVerification:
No driver code changes needed - existing error handling is correct.
Pre-review checklist
./docs/source/.Original prompt
This section details on the original issue you should resolve
<issue_title>Driver reported "[Errno 9] Bad file descriptor"</issue_title>
<issue_description>
Argus
Scylla version:
2026.1.0~dev-20251205.866c96f536b0with build-id2c38506085b888e1baa43f81d05dab12df5132c1During latest master runs driver reported following error:
It seems that such errors appeared each time while one of nodes been down
Also been spotted there:
https://argus.scylladb.com/tests/scylla-cluster-tests/a8cd6873-19c1-49c1-ab5a-dca25655ed6c
Kernel Version:
6.14.0-1017-awsExtra information
Installation details
Cluster size: 6 nodes (i7i.4xlarge)
Scylla Nodes used in this run:
OS / Image:
ami-0810c73586fe68036(aws: N/A)Test:
longevity-50gb-3days-testTest id:
38f90182-547d-4b60-973c-7e826b926708Test name:
scylla-master/tier1/longevity-50gb-3days-testTest method:
longevity_test.LongevityTest.test_custom_timeTest config file(s):
Logs:
longevity-tls-50gb-3d-master-db-node-38f90182-7
longevity-tls-50gb-3d-master-db-node-38f90182-3
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-36-03.zst
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-39-47.zst
core.scylla-longevity-tls-50gb-3d-master-db-node-38f90182-6-2025-12-06_07-43-23.zst
db-cluster-38f90182.tar.zst
**[schema-logs-38f90182.tar.zst](https://argus.scylladb.com/api/v1/tests/scylla-clu...
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.