VoIP Mailing List Archives
Mailing list archives for the VoIP community |
|
View previous topic :: View next topic |
Author |
Message |
ex.vitorino at gmail.com Guest
|
Posted: Wed Jun 11, 2008 5:23 am Post subject: [asterisk-users] 1.4.20.1 hang -- extra info + gdb hangs |
|
|
Here is an update,
1. Reviewed 'core show locks' with the help of russellb @ #asterisk-devs
last friday
2. Recommended recompilling asterisk with DONT_OPTIMIZE and
getting a stack trace with:
# gdb /usr/sbin/asterisk $(pidof asterisk)
(gdb) set pagination off
(gdb) thread apply all bt
We did reinstall asterisk with the new compile flags back then and just
experienced another hang now (weekend, monday and tuesday
were very low activity days).
Unfortunatelly, gdb seems to hang on startup, after what seems to be a
thread list. It never gets to the "reading symbols from..." steps. As such,
no gdb prompt -> no stack trace ! :-/
ps shows gdb process as <defunct> and, as such, it responds to no signals;
asterisk seems to not respond to signals as well... (maybe that's why gdb
hangs... I really do not know how gdb works in regards to attaching itself
to a running process)
Again we have a 'core show locks' + 'core show threads' output from asterisk
which we have no skills to read...
Lastly, asterisk log displays 12x...
[Jun 11 09:41:07] ERROR[4837] chan_sip.c: SIP transaction failed:
588233f5261d52ac621587ca327b5083 at 192.168.161.40
[Jun 11 09:41:07] ERROR[4837] chan_sip.c: We could NOT get the channel
lock for SIP/000e08de4cbe-097555c8!
...then...
[Jun 11 09:41:19] WARNING[4837] chan_sip.c: Maximum retries exceeded
on transmission 588233f5261d52ac621587ca327b5083 at 192.168.161.40 for
seqno 102 (Critical Request)
...and finally about 1200 of these:
[Jun 11 09:42:59] WARNING[4842] chan_iax2.c: Max retries exceeded to
host 192.168.166.40 on IAX2/private-13779 (type = 6, subclass = 11,
ts=40022, seqno=10)
...with several "combinations" of:
- the number inside WARNING[xxx] -> 13 different
- the host IP: 192.168.166.40 and 192.168.170.40
- the iax channel -> 12 different
Till today, our gut feelings were:
1. The TC400B installation / usage change
(idea: asterisk responds to no signals because it is waiting in
kernel space,
maybe something's wrong with zaptel, wctc4xxp, our HW ?)
2. The activation of a voicemail account with MWI
We now have an extra possibility:
- This system exchanges IAX calls with several other systems
- The hanging one is running asterisk 1.4.20.1, but all the others
are running 1.4.19
- The changelog from 1.4.19 -> 1.4.20.1 includes several chan_iax
fixes --> could the absense of such fixes in this system's iax peers
be leading it to hang ?
Possibility:
3. Upgrade all peers to 1.4.20.1
Again, if anyone can chime in with their contribution, thanks in advance.
Question of the day: why on earth does gdb hang ?! (our guess: because
asterisk does not respond to signals... now why ?!)
Cheers,
--
exvito |
|
Back to top |
|
|
stotaro at totarotechn... Guest
|
Posted: Wed Jun 11, 2008 6:33 am Post subject: [asterisk-users] 1.4.20.1 hang -- extra info + gdb hangs |
|
|
On Wed, Jun 11, 2008 at 6:23 AM, Ex Vito <ex.vitorino at gmail.com> wrote:
Quote: | Here is an update,
1. Reviewed 'core show locks' with the help of russellb @ #asterisk-devs
last friday
2. Recommended recompilling asterisk with DONT_OPTIMIZE and
getting a stack trace with:
# gdb /usr/sbin/asterisk $(pidof asterisk)
(gdb) set pagination off
(gdb) thread apply all bt
We did reinstall asterisk with the new compile flags back then and just
experienced another hang now (weekend, monday and tuesday
were very low activity days).
Unfortunatelly, gdb seems to hang on startup, after what seems to be a
thread list. It never gets to the "reading symbols from..." steps. As such,
no gdb prompt -> no stack trace ! :-/
ps shows gdb process as <defunct> and, as such, it responds to no signals;
asterisk seems to not respond to signals as well... (maybe that's why gdb
hangs... I really do not know how gdb works in regards to attaching itself
to a running process)
Again we have a 'core show locks' + 'core show threads' output from asterisk
which we have no skills to read...
Lastly, asterisk log displays 12x...
[Jun 11 09:41:07] ERROR[4837] chan_sip.c: SIP transaction failed:
588233f5261d52ac621587ca327b5083 at 192.168.161.40
[Jun 11 09:41:07] ERROR[4837] chan_sip.c: We could NOT get the channel
lock for SIP/000e08de4cbe-097555c8!
...then...
[Jun 11 09:41:19] WARNING[4837] chan_sip.c: Maximum retries exceeded
on transmission 588233f5261d52ac621587ca327b5083 at 192.168.161.40 for
seqno 102 (Critical Request)
...and finally about 1200 of these:
[Jun 11 09:42:59] WARNING[4842] chan_iax2.c: Max retries exceeded to
host 192.168.166.40 on IAX2/private-13779 (type = 6, subclass = 11,
ts=40022, seqno=10)
...with several "combinations" of:
- the number inside WARNING[xxx] -> 13 different
- the host IP: 192.168.166.40 and 192.168.170.40
- the iax channel -> 12 different
Till today, our gut feelings were:
1. The TC400B installation / usage change
(idea: asterisk responds to no signals because it is waiting in
kernel space,
maybe something's wrong with zaptel, wctc4xxp, our HW ?)
2. The activation of a voicemail account with MWI
We now have an extra possibility:
- This system exchanges IAX calls with several other systems
- The hanging one is running asterisk 1.4.20.1, but all the others
are running 1.4.19
- The changelog from 1.4.19 -> 1.4.20.1 includes several chan_iax
fixes --> could the absense of such fixes in this system's iax peers
be leading it to hang ?
Possibility:
3. Upgrade all peers to 1.4.20.1
Again, if anyone can chime in with their contribution, thanks in advance.
Question of the day: why on earth does gdb hang ?! (our guess: because
asterisk does not respond to signals... now why ?!)
Cheers,
--
exvito
|
Try switching from IAX to SIP.
Thanks,
Steve T |
|
Back to top |
|
|
ex.vitorino at gmail.com Guest
|
Posted: Wed Jun 11, 2008 12:05 pm Post subject: [asterisk-users] 1.4.20.1 hang -- extra info + gdb hangs |
|
|
On Wed, Jun 11, 2008 at 12:33 PM, Steve Totaro
<stotaro at totarotechnologies.com> wrote:
Quote: |
Try switching from IAX to SIP.
|
Steve, thanks for your suggestion... As you may understand that is not an easy
decision to take and implement: we're peering with about 20 other systems
within a private network where routing/firewalls/QoS etc has been setup
considering IAX -- it can be done, of course, but we already have better
"suspects"...
After a brief discussion over at IRC, we are seriously suspecting either the
TC400B or its driver, wctc4xxp (recall we're running latest asterisk+zaptel).
In pursuing some stability and trying to "prove" that that is the
source, we've
changed our g729 trunks to gsm and blacklisted wctc4xxp so as to ensure no
TC400B is used at all. If this brings us back to a stable system we can almost
say for certain that the issue is where we suspect it is: bad HW,
bad driver or,
as a last resort, bad kernel... (again, fyi, latest centos 5.1)
If things go as we expect, we'll then give asterisk-1.4-transcoder +
zaptel-1.4-transcoder branches a run while re-enabling wctc4xxp + g729 over
the IAX trunks -- hopefully they'll allow the usage of the TC400B along with
some stability -- also they'll provide feedback to the developers which is
obviously useful in the short and long run (let's have it shorter,
shall we?)
As always, we're open to suggestions and/or further questions.
We'll keep posting our experience.
Cheers,
--
exvito |
|
Back to top |
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|