Sponsor: VoiceMeUp - Corporate & Wholesale VoIP Services

VoIP Mailing List Archives
Mailing list archives for the VoIP community
 SearchSearch 

[asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (TC400B at fault ?)


 
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users
View previous topic :: View next topic  
Author Message
ex.vitorino at gmail.com
Guest





PostPosted: Fri Jun 06, 2008 7:01 am    Post subject: [asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (T Reply with quote

Hi list,

Looking to share info and obtain peer feedback.
Current possibilities: bad config, bad hw or asterisk/zaptel bug.

System: HP Proliant DL380 G5
Installed HW: TE220B to PSTN, TE122 to ChannelBank and TC400B.
OS: CentOS 5.1 kernel 2.6.18-53.1.21.el5
Asterisk: 1.4.20.1
Zaptel: 1.4.11

Events / History
---------------------
May 29th
- started production on the evening
- TC400B was not on the system as it was not available by then
June 4th
- installed TC400B at the end of the day
- test IAX/G.729 calls ok
June 5th
- 10.40: hang
- 16.40: hang
- 19.00: rebuild asterisk with DEBUG_LOCK + THREAD
June 6th (today)
- 11.35h: hang

So, in short, after we installed the TC400B, the system appears to hang
systematically. (which is really bad because we already had to RMA
a TC400B twice for this system).

Detail
---------
When it hung about an hour ago we tested:
- FXS @ channelbank @ TE122 => voicemail FAILS
- FXS @ channelbank @ TE122 => PSTN @ TE220B WORKED ONCE
- FXS @ channelbank @ TE122 => SIP phone FAILS
- SIP phone => anywhere (voicemail, SIP, FXS @ channelbank, PSTN @ PRI) FAILS
- PSTN @ TE220B => anywhere FAILS
- IAX => anywhere FAILS

asterisk log has 12 of:
ERROR [14733] chan_sip.c: We could NOT get the channel lock for
SIP/000e08dfdc72-0a107670!
ERROR[14733] chan_sip.c: SIP transaction failed:
43e5ad5f6dc5b58c46c597cd2af0c31e at 192.168.161.40

...followed by thousands of:
NOTICE[30599] chan_iax2.c: Avoiding IAX destroy deadlock

(log contains similar messages for the yesterday hangs)

asterisk CPU usage is apparently none
load is at about 3

network access to the system is ok
dmesg kernel message buffer looks ok

CLI core show locks shows lots of info which we're not able to
decode (attached)

CLI stop now has no effect
kill <pid> has no effect
kill -9 <pid> leads to <zombie> process
shutdown -r now leads to kernel panic probably while stopping zaptel because
the TE122 and TE220B drivers were not unloaded (attached)

In Our Heads
------------------
- we're suspecting that the presence of the TC400B is making asterisk behave
in different ways that lead to what we're now calling a hang (that
is the apparent
change in the system since it started mis-behaving)
- as such we're considering removing the TC400B to see if the system
stabilizes
- however removing it may remove the possibility of further
diagnosing this issue
and trying fixes
- of course, we're trying to manage customer expectations and
satisfaction at the
same time

Extra Context Info
------------------------
- system serves ~100 SIP extensions
- system peers with a dozen other systems withing the VPN (dundi+iax)
Thanks in advance for any feedback or pointer that can help us identify,
workaround and, ideally, fix this behaviour.

Cheers,
--
exvito
-------------- next part --------------
A non-text attachment was scrubbed...
Name: summary-log.txt.gz
Type: application/x-gzip
Size: 7202 bytes
Desc: not available
Url : http://lists.digium.com/pipermail/asterisk-users/attachments/20080606/30aa0ef0/attachment.bin
Back to top
ex.vitorino at gmail.com
Guest





PostPosted: Fri Jun 06, 2008 8:27 am    Post subject: [asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (T Reply with quote

On Fri, Jun 6, 2008 at 1:01 PM, Ex Vito <ex.vitorino at gmail.com> wrote:
Quote:
In Our Heads
------------------
- we're suspecting that the presence of the TC400B is making asterisk behave
in different ways that lead to what we're now calling a hang (that is the
apparent change in the system since it started mis-behaving)
- as such we're considering removing the TC400B to see if the system
stabilizes however removing it may remove the possibility of further
diagnosing this issue and trying fixes
- of course, we're trying to manage customer expectations and
satisfaction at the same time


...other possibility:

- instead of removing the TC400B, change the IAX trunk codec to
GSM instead of G.729... this would prevent the TC400B usage and
may lead to different (as in stable) behaviour

More troubleshooting ideas ?
--
exvito
Back to top
ex.vitorino at gmail.com
Guest





PostPosted: Fri Jun 06, 2008 9:45 am    Post subject: [asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (T Reply with quote

On Fri, Jun 6, 2008 at 3:16 PM, Shaun Ruffell <sruffell at digium.com> wrote:
Quote:

I'm soon going to petition for this interface to be merged into the trunk, so if you would like to try the branches out now and need any help, please contact me directly.


Thanks for you feedback Shaun.

I've had a quick feedback from russellb @ #asterisk-dev and we'll try next
to get a full stack trace when the hang condition occurs.

We've already rebuilt with the DONT_OPTIMIZE and had a lucky time-slot
to restart asterisk. So, now we're hoping it fails again (ironic,
isn't it?) so
we can move forward in the diagnostic.

Of course, future possibilities of changing codecs, removing the TC400B
or others are open (such as: I guess we enabled the 1st voicemail
account as test
on the same day that we installed the TC400B -- could it be the change ?)

We're still open to peer feedback, of course.
Post back later.
Cheers,
--
exvito
Back to top
ex.vitorino at gmail.com
Guest





PostPosted: Fri Jun 06, 2008 11:19 am    Post subject: [asterisk-users] 1.4.20.1 hang -- three times in 1.5 days (T Reply with quote

On Fri, Jun 6, 2008 at 5:01 PM, Andres <andres at telesip.net> wrote:
Quote:

Quote:

Of course, future possibilities of changing codecs, removing the TC400B
or others are open (such as: I guess we enabled the 1st voicemail
account as test
on the same day that we installed the TC400B -- could it be the change ?)


Do you have MWI enabled? We are suspecting a similar SIP deadlock on a
system that may be caused by it. Although our version is 1.4.17. There
is some mention of it on: http://bugs.digium.com/view.php?id=10953


Yes, on the single test mailbox that is configured. And yes, we are already
considering disabling it as a future troubleshooting step...

BTW, our voicemail account is realtime ODBC
--
exvito
Back to top
Display posts from previous:   
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

VoiceMeUp - Corporate & Wholesale VoIP Services