fix(jitsi): re-allocate Jicofo focus on reconnect (non-blocking)

The lightweight soft-rejoin path (jSess.Rejoin) skips Jicofo focus
allocation and joins the MUC with a bare presence. After the client
leaves and Jicofo idle-terminates the now-empty conference
(session-terminate <expired/>), the room/focus is torn down. A bare
rejoin presence is then rejected by Prosody with
<presence type='error'><not-allowed/>, and the library's JoinMUC matches
a stale status-110 left in its stanza buffer and falsely reports success.
The engine then waits forever for a session-initiate that never arrives
while actually being outside the room, so the client can never reconnect.

Re-establish the session from scratch via j.JoinMUC instead, which runs
dial -> focus allocation -> MUC join in the correct order (focus first,
so Jicofo recreates the room), exactly like the initial Connect, but
WITHOUT blocking on session-initiate. The fresh session-initiate is
awaited separately via WaitJingleReinitiate once a peer rejoins, so the
non-blocking reconnect contract is preserved.

Verified on a live deployment: two consecutive reconnect cycles now
complete (bridge open sctp -> reconnected -> session opened) where the
old path hung after "waiting for session-initiate".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
e.barskov
2026-06-01 00:36:25 +04:00
parent 08e03d0803
commit bd1a95cac5

View File

@@ -1329,19 +1329,37 @@ func (s *Session) reconnect(ctx context.Context) error {
s.resetPeerEpochs()
s.drainSendQueue()
jSess := s.jSess.Load()
if jSess == nil {
return s.reconnectFull(ctx)
// Re-establish the XMPP/MUC session from scratch rather than reusing the
// lightweight jSess.Rejoin (leave+join) path. Rejoin skips the Jicofo
// focus-allocation IQ, and Jitsi gates the MUC on focus: once the server
// is left alone in the room, Jicofo idle-terminates the conference
// (session-terminate <expired/>) and tears down the room, after which a
// bare presence is rejected with <presence type='error'><not-allowed/>.
// The library's JoinMUC then matches a stale status-110 still buffered in
// its stanza channel and falsely reports success, so we wait forever for a
// session-initiate that never comes while actually being outside the room.
//
// j.JoinMUC re-runs dial -> focus allocation -> MUC join in the correct
// order (focus first, so Jicofo recreates the room), exactly like the
// initial Connect, but WITHOUT blocking on session-initiate — preserving
// the non-blocking reconnect contract. We wait for the fresh
// session-initiate separately via WaitJingleReinitiate once a peer rejoins.
if old := s.jSess.Swap(nil); old != nil {
_ = old.Close()
}
// Rejoin MUC (leave + join) without waiting for session-initiate.
// This resets Jicofo's state for our participant so it will send
// a fresh session-initiate when another peer arrives.
logger.Infof("jitsi: rejoin %s/%s (non-blocking) ...", s.host, s.room)
if err := jSess.Rejoin(ctx, s.name); err != nil {
jSess, err := j.JoinMUC(ctx, j.Config{
Host: s.host,
Room: s.room,
Nick: s.name,
Debug: logger.IsVerbose(),
})
if err != nil {
logger.Warnf("jitsi: rejoin failed: %v - full reconnect", err)
return s.reconnectFull(ctx)
}
s.jSess.Store(jSess)
// Wait for Jicofo to send session-initiate (when a peer joins the room).
logger.Infof("jitsi: waiting for session-initiate in %s/%s ...", s.host, s.room)