Sponsor: VoiceMeUp - Corporate & Wholesale VoIP Services

VoIP Mailing List Archives
Mailing list archives for the VoIP community
 SearchSearch 

[asterisk-users] Asterisk 1.4 reliability problems

Goto page 1, 2  Next
 
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users
View previous topic :: View next topic  
Author Message
ben.willcox at british...
Guest





PostPosted: Tue Mar 18, 2008 4:40 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Hello All,

We have been experiencing some ongoing reliability problems with
Asterisk for quite some time, and I am trying to find out if anyone else
has experienced the same problems.

We are running asterisk 1.4.17~dfsg-2+b1 on Debian Lenny, with a Digium
PRI card, and have approximately 120 sip peers, mostly Snom 360s, with a
few Grandstream GXP2000 and a handful of Handytone 486 units.

The symptoms, when they occur, are as follows:

-The inability to receive incoming calls to our ISDN PRI (callers get a
busy tone), this starts off becoming intermittent but becomes permanent.

-Asterisk cli commands work once, but then no longer return any data
until disconnecting and reconnecting to the cli, i.e. sip show peers,
show channels etc.

-Internal SIP calls stop working

-Calls remain stuck in queues, the queue members do not ring, and show
as Busy when issuing a 'queue show' command.
We've actually had these sort of problems for many months now, which
originally started when we were running Asterisk 1.2 on Gentoo. We have
done a large amount of fault finding and testing, which has involved a
replacement ISDN card, reinstall on complete different server hardware,
and changing to Asterisk 1.4 on Debian Lenny.

I believe there may be two separate issues here - we did track down one
problem to our cacti and nagios monitoring scripts, which were
connecting and disconnecting to the manager interface several times per
minute, which eventually caused asterisk to give the above symptoms,
although in addition to the above, asterisk would consume 100% cpu on
the box, and eventually need a hard-reboot of the server. I posted about
this to the list a few weeks ago, and it was confirmed that this could
cause such a problem. After stopping these services the problems were
much reduced.

However, we have now completely disabled the manager interface
(enabled=no in manager.conf), and yesterday the problem occurred again -
a restart of asterisk got everything going again.
So really I'm at a loss as to where to go from here. A colleague of mine
also has the same problem at his site running Asterisk 1.4 on Debian
Lenny, he has never used the manager interface, and has completely
different server hardware and ISDN card, so I wonder if it's a Debian
specific problem?

One option is to try reverting back to Asterisk 1.2, but that isn't
really a long-term solution. We also had major problems with 1.2 with
our Snom 360 phones, as with any Snom firmware > 6.2.2 there was a
serious problem whereby on hangup the channels were not cleared down,
meaning we had many outgoing ISDN calls held open for many hours until
we realised the problem. This problem does not occur in Asterisk 1.4,
although we have many log messages such as:

chan_sip.c: Remote host can't match request BYE to call <callid>

so I don't know if this is anything to worry about?

Any help would be gratefully received!

Thanks,
Ben
Back to top
bwentdg at pipeline.com
Guest





PostPosted: Tue Mar 18, 2008 5:13 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Curious, you mention "a number of problems" that have "gone on for months"
Question: Have you reported ANY or ALL of them to DIGIUM and if so
what has been their response on each of these problems ?

Ben Willcox wrote:
Quote:
Hello All,

We have been experiencing some ongoing reliability problems with
Asterisk for quite some time, and I am trying to find out if anyone else
has experienced the same problems.

We are running asterisk 1.4.17~dfsg-2+b1 on Debian Lenny, with a Digium
PRI card, and have approximately 120 sip peers, mostly Snom 360s, with a
few Grandstream GXP2000 and a handful of Handytone 486 units.

The symptoms, when they occur, are as follows:

-The inability to receive incoming calls to our ISDN PRI (callers get a
busy tone), this starts off becoming intermittent but becomes permanent.

-Asterisk cli commands work once, but then no longer return any data
until disconnecting and reconnecting to the cli, i.e. sip show peers,
show channels etc.

-Internal SIP calls stop working

-Calls remain stuck in queues, the queue members do not ring, and show
as Busy when issuing a 'queue show' command.


We've actually had these sort of problems for many months now, which
originally started when we were running Asterisk 1.2 on Gentoo. We have
done a large amount of fault finding and testing, which has involved a
replacement ISDN card, reinstall on complete different server hardware,
and changing to Asterisk 1.4 on Debian Lenny.

I believe there may be two separate issues here - we did track down one
problem to our cacti and nagios monitoring scripts, which were
connecting and disconnecting to the manager interface several times per
minute, which eventually caused asterisk to give the above symptoms,
although in addition to the above, asterisk would consume 100% cpu on
the box, and eventually need a hard-reboot of the server. I posted about
this to the list a few weeks ago, and it was confirmed that this could
cause such a problem. After stopping these services the problems were
much reduced.

However, we have now completely disabled the manager interface
(enabled=no in manager.conf), and yesterday the problem occurred again -
a restart of asterisk got everything going again.
So really I'm at a loss as to where to go from here. A colleague of mine
also has the same problem at his site running Asterisk 1.4 on Debian
Lenny, he has never used the manager interface, and has completely
different server hardware and ISDN card, so I wonder if it's a Debian
specific problem?

One option is to try reverting back to Asterisk 1.2, but that isn't
really a long-term solution. We also had major problems with 1.2 with
our Snom 360 phones, as with any Snom firmware > 6.2.2 there was a
serious problem whereby on hangup the channels were not cleared down,
meaning we had many outgoing ISDN calls held open for many hours until
we realised the problem. This problem does not occur in Asterisk 1.4,
although we have many log messages such as:

chan_sip.c: Remote host can't match request BYE to call <callid>

so I don't know if this is anything to worry about?

Any help would be gratefully received!

Thanks,
Ben



_______________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-users


Back to top
support at drdos.info
Guest





PostPosted: Tue Mar 18, 2008 5:56 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Ben Willcox wrote:
Quote:
Hello All,

One option is to try reverting back to Asterisk 1.2, but that isn't
really a long-term solution. We also had major problems with 1.2 with


Two things,

1.) On your queue setup, avoid using AgenCallbackLogin, it's known to
cause deadlocked channels.
2.) Restart the Asterisk service once a week. I do this via a CRON job
at 3am on Sundays.

Doug
--

Ben Franklin quote:

"Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety."
Back to top
bwentdg at pipeline.com
Guest





PostPosted: Tue Mar 18, 2008 6:04 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Could you clarify what you mean by a "Dead Locked Channel" ?
That is not a term I am familiar with used in context to "channels",
databases yes, channels ???

Thx

Doug Lytle wrote:
Quote:
Ben Willcox wrote:

Quote:
Hello All,

One option is to try reverting back to Asterisk 1.2, but that isn't
really a long-term solution. We also had major problems with 1.2 with



Two things,

1.) On your queue setup, avoid using AgenCallbackLogin, it's known to
cause deadlocked channels.
2.) Restart the Asterisk service once a week. I do this via a CRON job
at 3am on Sundays.

Doug


Back to top
support at drdos.info
Guest





PostPosted: Tue Mar 18, 2008 6:30 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Al Baker wrote:
Quote:
Could you clarify what you mean by a "Dead Locked Channel" ?
That is not a term I am familiar with used in context to "channels",
databases yes, channels ???


Non functional, but showing up within the console and not being
released. core show channels, sip show channels, etc. Channels within
Asterisk link technology types. IAX,SIP,ZAP, Whatever.

I may have it incorrect; if so, someone will correct me.
Doug


--

Ben Franklin quote:

"Those who would give up Essential Liberty to purchase a little Temporary Safety, deserve neither Liberty nor Safety."
Back to top
stotaro at totarotechn...
Guest





PostPosted: Tue Mar 18, 2008 6:40 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On Tue, Mar 18, 2008 at 5:40 AM, Ben Willcox
<ben.willcox at british-gymnastics.org> wrote:
Quote:
Hello All,

We have been experiencing some ongoing reliability problems with
Asterisk for quite some time, and I am trying to find out if anyone else
has experienced the same problems.

We are running asterisk 1.4.17~dfsg-2+b1 on Debian Lenny, with a Digium
PRI card, and have approximately 120 sip peers, mostly Snom 360s, with a
few Grandstream GXP2000 and a handful of Handytone 486 units.

The symptoms, when they occur, are as follows:

-The inability to receive incoming calls to our ISDN PRI (callers get a
busy tone), this starts off becoming intermittent but becomes permanent.

-Asterisk cli commands work once, but then no longer return any data
until disconnecting and reconnecting to the cli, i.e. sip show peers,
show channels etc.

-Internal SIP calls stop working

-Calls remain stuck in queues, the queue members do not ring, and show
as Busy when issuing a 'queue show' command.


We've actually had these sort of problems for many months now, which
originally started when we were running Asterisk 1.2 on Gentoo. We have
done a large amount of fault finding and testing, which has involved a
replacement ISDN card, reinstall on complete different server hardware,
and changing to Asterisk 1.4 on Debian Lenny.

I believe there may be two separate issues here - we did track down one
problem to our cacti and nagios monitoring scripts, which were
connecting and disconnecting to the manager interface several times per
minute, which eventually caused asterisk to give the above symptoms,
although in addition to the above, asterisk would consume 100% cpu on
the box, and eventually need a hard-reboot of the server. I posted about
this to the list a few weeks ago, and it was confirmed that this could
cause such a problem. After stopping these services the problems were
much reduced.

However, we have now completely disabled the manager interface
(enabled=no in manager.conf), and yesterday the problem occurred again -
a restart of asterisk got everything going again.
So really I'm at a loss as to where to go from here. A colleague of mine
also has the same problem at his site running Asterisk 1.4 on Debian
Lenny, he has never used the manager interface, and has completely
different server hardware and ISDN card, so I wonder if it's a Debian
specific problem?

One option is to try reverting back to Asterisk 1.2, but that isn't
really a long-term solution. We also had major problems with 1.2 with
our Snom 360 phones, as with any Snom firmware > 6.2.2 there was a
serious problem whereby on hangup the channels were not cleared down,
meaning we had many outgoing ISDN calls held open for many hours until
we realised the problem. This problem does not occur in Asterisk 1.4,
although we have many log messages such as:

chan_sip.c: Remote host can't match request BYE to call <callid>

so I don't know if this is anything to worry about?

Any help would be gratefully received!

Thanks,
Ben

I have seen this when banging on the AMI but you eliminated that.

Why not try a different OS such as CentOS for now? That would be my next step.

Thanks,
Steve Totaro
Back to top
asterisk-list at puzzl...
Guest





PostPosted: Tue Mar 18, 2008 6:54 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On Tue, 2008-03-18 at 07:04 -0400, Al Baker wrote:
Quote:
Could you clarify what you mean by a "Dead Locked Channel" ?
That is not a term I am familiar with used in context to "channels",
databases yes, channels ???

A channel got locked but never unlocked causing all sorts of funky
behavior. It's a bug. The developers have fixed a ton of these deadlocks
in 1.4 so it's usually a good plan to try the latest and greatest
version to see if the problem goes away.

I'm not very familiar with queue setups but Doug Lytle's advice sounds
like a plan. And try 1.4.19-rc2 to see if the deadlock problem persists.
If it does then please file a bug so it can be looked at.

Regards,
Patrick
Back to top
gordon+asterisk at dro...
Guest





PostPosted: Tue Mar 18, 2008 7:05 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On Tue, 18 Mar 2008, Steve Totaro wrote:

Quote:
Why not try a different OS such as CentOS for now? That would be my next step.

I wouldn't suggest chasing distros is the way to solve issues, especially
if you're happy with the hardware.

Personally, I'd go back to Debian, but stick to stable (Etch) and then
compile and install a custom kernel tailored exactly to your hardware,
then compile and install your own asterisk from source.

But only because that's what I do, and it works for me ...

Gordon
Back to top
astmattf at gmail.com
Guest





PostPosted: Tue Mar 18, 2008 7:06 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

I would suggest upgrading to at least 1.4.18. I was able to run it for
about 2 weeks and almost one million calls before I could get it to
crash, and the 1.4.19RC2 seems to fix even more of the locking issues
as well. I know a lot of these problems still existed under 1.4.17.

MATT---

On 3/18/08, Patrick <asterisk-list at puzzled.xs4all.nl> wrote:
Quote:

On Tue, 2008-03-18 at 07:04 -0400, Al Baker wrote:
Quote:
Could you clarify what you mean by a "Dead Locked Channel" ?
That is not a term I am familiar with used in context to "channels",
databases yes, channels ???


A channel got locked but never unlocked causing all sorts of funky
behavior. It's a bug. The developers have fixed a ton of these deadlocks
in 1.4 so it's usually a good plan to try the latest and greatest
version to see if the problem goes away.

I'm not very familiar with queue setups but Doug Lytle's advice sounds
like a plan. And try 1.4.19-rc2 to see if the deadlock problem persists.
If it does then please file a bug so it can be looked at.

Regards,

Patrick



_______________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-users
Back to top
ben.willcox at british...
Guest





PostPosted: Tue Mar 18, 2008 10:40 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Hi All,

Thanks for all the replies. Here are my responses to the responses:

On Tue, 2008-03-18 at 06:13 -0400, Al Baker wrote:
Quote:
Curious, you mention "a number of problems" that have "gone on for months"
Question: Have you reported ANY or ALL of them to DIGIUM and if so
what has been their response on each of these problems ?

We have been working very closely with the reseller that supplied us
with the system, and although we have made progress over this time and
they have given us a lot of technical support, I now feel that it will
be quicker to progress the current issues independently. I don't know if
the issues were escalated as far as Digium though.

Tzafrir Cohen wrote:
Quote:
The symptoms you mention suggest some sort of deadlock. Please enable
debug and the full log. Maybe this will provide some hints. But please
check that the full log is rotated in /etc/logrotate.d/asterisk .

Can you reproduce this situation? e.g.: by extensive usage of the
manager interface? If so, it might help for testing.

I will enable full debug logging. I suspect that we could reproduce the
original problem with the manager interface by stress testing it with
multiple connections, but I'm not sure if this is the same problem that
we are currently experiencing.
I also want to avoid causing problems on our production system at the
moment, as it is rather 'delicate' as far as the users are concerned at
the moment.

Steve Totaro wrote:
Quote:
Why not try a different OS such as CentOS for now? That would be my
next step.

I have considered this, to at least to establish whether it is a Debian
specific problem, either with the asterisk packages themselves, or some
other configuration or package issue. I am umming and ahhing between
this and Gordon's suggestion below:

Gordon Henderson wrote:
Quote:
Personally, I'd go back to Debian, but stick to stable (Etch) and
then
compile and install a custom kernel tailored exactly to your
hardware,
then compile and install your own asterisk from source.

I'm thinking that this may be the way I should go, then I will have the
freedom to install any version of asterisk that I need, whilst also
keeping my favourite distro.

Doug Lytle wrote:
Quote:
Two things,

1.) On your queue setup, avoid using AgenCallbackLogin, it's known
to
cause deadlocked channels.
2.) Restart the Asterisk service once a week. I do this via a CRON
job
at 3am on Sundays.

We're actually not using Agents on our queues, just SIP channels, so
hopefully this is not the problem. We simulate 'agents' logging in and
out by pausing and unpausing queue members.
I am now going to add a cron job to restart asterisk daily, in the hope
that until the problem is resolved properly, at least it will help
relieve some of the pain by making it stable for a full 24hrs at a time.

Matt Florell wrote:
Quote:
I would suggest upgrading to at least 1.4.18. I was able to run it for
about 2 weeks and almost one million calls before I could get it to
crash, and the 1.4.19RC2 seems to fix even more of the locking issues
as well. I know a lot of these problems still existed under 1.4.17.

A million calls sounds good, but 2 weeks, not so good. It's a bit
disappointing to me that crashing /ever/ is acceptable, I had always had
the understanding that asterisk was supposed to be rock-solid. I suppose
it's some consolation that its not just me that has problems!

Thanks for all the input. I think short term I will restart asterisk
daily, then the action plan is to revert back to Debian Etch, and then
install asterisk 1.4.18 from source, and hopefully this will improve
things.

Thanks,
Ben
Back to top
atis at iq-labs.net
Guest





PostPosted: Tue Mar 18, 2008 10:45 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

I would suggest taking latest 1.4 branch from SVN (or 1.4.19-rc3 when
it's out). There has been few deadlocks fixed since rc2.

Recompile asterisk with DEBUG_THREADS enabled (in "make menuselect"),

If you're not using safe_asterisk script to start it, you should
execute also "ulimit -c unlimited" before launching asterisk..

When your asterisk is deadlocked, open CLI and execute "core show
locks". Copy that output, and submit to bugs.digium.com - it will tell
developers where exactly is problem.

Then, do "killall -11 asterisk". It will dump asterisk to core file,
and that might provide helpful information later. If your have been
requested backtraces, look in /tmp (or in directory you launched
asterisk from) for core file. Open that core file with "gdb
/usr/sbin/asterisk core.xxxx" and take a dump of "thread apply all bt
full" (make sure you set "set pagination off" in gdb before this)

Regards,
Atis

On 3/18/08, Norman Franke <norman at myasd.com> wrote:
Quote:

Check around on bugs.digium.com. You'll find a number of issues reported
that sound similar. I'm hoping that 1.4.19 will fix a lot of stuff, since
the release candidates seem much more stable to me. I couldn't keep Asterisk
up for more than a few days before on 1.4.18. I've also applied a few
SIP-related patches from various bug reports and things are much, much more
stable.

1.4.17, which you mentioned, is also very buggy. 1.4.18 fixed many issues.

Norman Franke
Answering Service for Directors, Inc.
www.myasd.com

On Mar 18, 2008, at 7:40 AM,
asterisk-users-request at lists.digium.com wrote:


We have been experiencing some ongoing reliability problems with

Asterisk for quite some time, and I am trying to find out if anyone else

has experienced the same problems.

_______________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:

http://lists.digium.com/mailman/listinfo/asterisk-users

--
Atis Lezdins,
VoIP Project Manager / Developer,
atis at iq-labs.net
Skype: atis.lezdins
Cell Phone: +371 28806004
Cell Phone: +1 800 7300689
Work phone: +1 800 7502835
Back to top
asterisk-list at puzzl...
Guest





PostPosted: Tue Mar 18, 2008 10:49 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On Tue, 2008-03-18 at 11:05 -0400, Norman Franke wrote:
Quote:
I've also applied a few SIP-related patches from various bug reports
and things are much, much more stable.

Mind sharing which patches you have applied?

Thanks,
Patrick
Back to top
stotaro at totarotechn...
Guest





PostPosted: Tue Mar 18, 2008 11:26 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On Tue, Mar 18, 2008 at 8:05 AM, Gordon Henderson
<gordon+asterisk at drogon.net> wrote:
Quote:
On Tue, 18 Mar 2008, Steve Totaro wrote:

Quote:
Why not try a different OS such as CentOS for now? That would be my next step.

I wouldn't suggest chasing distros is the way to solve issues, especially
if you're happy with the hardware.

Personally, I'd go back to Debian, but stick to stable (Etch) and then
compile and install a custom kernel tailored exactly to your hardware,
then compile and install your own asterisk from source.

But only because that's what I do, and it works for me ...

Gordon

Well personally, I would go to 1.2.x unless there was some feature in
1.4 that is absolutely needed but the OP said that was not a long term
option. I have deployed ONE 1.4 system and that is because I had to,
no work arounds due to hardware (unless zaptel 1.4 plays nice with
Asterisk 1.2).

I will probably continue this train of thought (1.2.X is more
production ready) until these threads stop popping up on the list.

Thanks,
Steve Totaro
Back to top
astmattf at gmail.com
Guest





PostPosted: Tue Mar 18, 2008 11:26 am    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

On 3/18/08, Ben Willcox <ben.willcox at british-gymnastics.org> wrote:
Quote:

A million calls sounds good, but 2 weeks, not so good. It's a bit
disappointing to me that crashing /ever/ is acceptable, I had always had
the understanding that asterisk was supposed to be rock-solid. I suppose
it's some consolation that its not just me that has problems!

Thanks for all the input. I think short term I will restart asterisk
daily, then the action plan is to revert back to Debian Etch, and then
install asterisk 1.4.18 from source, and hopefully this will improve
things.

Keep in mind that my tests go from 0 to 400 calls in about 1 minute
then they keep that volume for several hours, and I kept running them
for two weeks, and about 6 hours into the last test is when it
crashed. I should mention that 1.2.26.2 is what I still use on all of
my production servers and they will go for months without a crash.

As for rebooting nightly or weekly, that is something we do on a lot
of our high-volume servers just to be safe. When pushing Asterisk to
high concurrent call volumes it is a good idea to give it a fresh
start every day if you can. If Asterisk is being used as a standard
office PBX it should be able to run for months with no crashes.

MATT---
Back to top
tzafrir.cohen at xorco...
Guest





PostPosted: Tue Mar 18, 2008 12:00 pm    Post subject: [asterisk-users] Asterisk 1.4 reliability problems Reply with quote

Off-topic note:

On Tue, Mar 18, 2008 at 05:45:04PM +0200, Atis Lezdins wrote:

Quote:
If you're not using safe_asterisk script to start it, you should
execute also "ulimit -c unlimited" before launching asterisk..

Without -g (at least on Linux) Asterisk will refuse to generate core
dumps. With -g it will generate core files but will also set the ulimit
to "unlimited".

With safe_asterisk you have -g enabled by default, and hence ulimit -c
unlimited on by default.

--
Tzafrir Cohen
icq#16849755 jabber:tzafrir.cohen at xorcom.com
+972-50-7952406 mailto:tzafrir.cohen at xorcom.com
http://www.xorcom.com iax:guest at local.xorcom.com/tzafrir
Back to top
Display posts from previous:   
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users All times are GMT - 5 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

VoiceMeUp - Corporate & Wholesale VoIP Services