VoIP Mailing List Archives
Mailing list archives for the VoIP community |
|
View previous topic :: View next topic |
Author |
Message |
e at musinghalfwit.org Guest
|
Posted: Thu Mar 05, 2009 5:45 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
Greetings,
I've been using FS in production on this rev (I realize it's pretty far
behind current) and it's been running well, save 1 issue.
The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.
I have noticed that under load (50 or so cps with ~800-900 bridged calls up)
that over time some channels on the public side seem to get "stuck". Due to
the nature of how this is being used , I would expect both sip profiles to show
the same number of channels in use any time i do a 'sofia status' ( or at least
be within a channel or 2 of each other). However after a day of heavy use I had
a disparity of ~250 channels. These extra channels also seem to put some
continual load on the 'system cpu' as well , reported via top.
Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.
Last night I grabbed a core file and started in with GDB today. I found
the 120 or so threads that represented real active calls when I took the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same, annotated below.
I walked through the code path by hand , based on the bt's and I don't see how
this could be happening unless it's a locking issue. But as far as I can tell
each session has it's own mutex defined in the switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also would have expected
if it were something of a deadlock nature it would stop processing calls all
together.
I grabbed the commands from the .gdbinit (super handy btw!!) and have been trolling
through the variables to try to ascertain something about why these threads seem to
be stuck, but am not having much luck even coming up with a scenario to try
to replicate the issue.
If anyone has any pointers as to where I might look next it would be greatly
appreciated.
We will be updating to the newest release soon, however I was hoping to nail down
what is going so I can systematically replicate it and verify by testing in the lab
that it is fixed , rather than just pushing the new release to produvction and hoping.
Thanks in advance for any tips/pointers anyone may have.
-e
......bt and bt full for a single "hung" thread
#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840, obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *) 0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504, 9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895, __saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280, 9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960, 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728, 134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728, 3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840, obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
brian at freeswitch.org Guest
|
Posted: Thu Mar 05, 2009 6:01 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
Well the rules usually state that you try SVN trunk then report a jira
if the problem persists but since you're 2000+ revs behind chances are
we already fixed this issue. Are you using bypass media?
/b
On Mar 5, 2009, at 4:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
mrene_lists at avgs.ca Guest
|
Posted: Thu Mar 05, 2009 6:01 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
HI,
If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs
.
This gives the whole team a way of following up on issues.
Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.
As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.
In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.
Also, are you using proxy/bypass media or just the default?
Math
On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.
The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.
I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
some
continual load on the 'system cpu' as well , reported via top.
Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.
Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.
I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.
I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
threads seem to
be stuck, but am not having much luck even coming up with a scenario
to try
to replicate the issue.
If anyone has any pointers as to where I might look next it would be
greatly
appreciated.
We will be updating to the newest release soon, however I was hoping
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.
Thanks in advance for any tips/pointers anyone may have.
-e
......bt and bt full for a single "hung" thread
#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
e at musinghalfwit.org Guest
|
Posted: Thu Mar 05, 2009 6:25 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
Yeah I know I didn't open a bug because my rev was so far behind. I
was just looking for any advice for where to poke next. Troubleshooting
this has been a fantastic introduction to some of the inner workings of
freeswitch so I was hoping to see it through and learn as I went.
To answer your question no we are not using bypass media.
-e
It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 04:52:43PM -0600 , Brian West said:
Quote: | Well the rules usually state that you try SVN trunk then report a jira
if the problem persists but since you're 2000+ revs behind chances are
we already fixed this issue. Are you using bypass media?
/b
On Mar 5, 2009, at 4:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
e at musinghalfwit.org Guest
|
Posted: Thu Mar 05, 2009 6:30 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.
No proxy or bypass media here, just defaults.
I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.
Thanks
-e
It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM -0500 , Mathieu Rene said:
Quote: | HI,
If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs
.
This gives the whole team a way of following up on issues.
Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.
As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.
In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.
Also, are you using proxy/bypass media or just the default?
Math
On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.
The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.
I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
some
continual load on the 'system cpu' as well , reported via top.
Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.
Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.
I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.
I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
threads seem to
be stuck, but am not having much luck even coming up with a scenario
to try
to replicate the issue.
If anyone has any pointers as to where I might look next it would be
greatly
appreciated.
We will be updating to the newest release soon, however I was hoping
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.
Thanks in advance for any tips/pointers anyone may have.
-e
......bt and bt full for a single "hung" thread
#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
nik.middleton at noble... Guest
|
Posted: Thu Mar 05, 2009 6:45 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they did
seem to clear.
That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one. Not sure how
to debug this one.
Version: FreeSWITCH Version 1.0.trunk (12276)
-----Original Message-----
From: freeswitch-users-bounces@lists.freeswitch.org
[mailto:freeswitch-users-bounces@lists.freeswitch.org] On Behalf Of Eric
Liedtke
Sent: 05 March 2009 23:23
To: freeswitch-users@lists.freeswitch.org
Subject: Re: [Freeswitch-users] Hung Channels (SVN Rev 10231)
Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.
No proxy or bypass media here, just defaults.
I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.
Thanks
-e
It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM
-0500 , Mathieu Rene said:
Quote: | HI,
If you suspect a bug, the place to report it is JIRA. See:
| http://wiki.freeswitch.org/wiki/Reporting_Bugs
Quote: | .
This gives the whole team a way of following up on issues.
Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.
As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.
In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.
Also, are you using proxy/bypass media or just the default?
Math
On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
|
|
Quote: | Quote: | far
behind current) and it's been running well, save 1 issue.
The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
|
|
Quote: | Quote: | come
into the private side profile, get routed via dialplan to the
|
| gateway
Quote: | Quote: | defined in the external profile and on to the vendor. Pretty simple.
I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
|
|
Quote: | Quote: | some
continual load on the 'system cpu' as well , reported via top.
Of course due to the load on the box I have to keep logging turned
|
| way
Quote: | Quote: | down. So I've been trying to troubleshoot it as best I can.
Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
|
|
Quote: | Quote: | the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.
I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
|
|
Quote: | Quote: | I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.
I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
|
|
Quote: | Quote: | threads seem to
be stuck, but am not having much luck even coming up with a scenario
|
|
Quote: | Quote: | to try
to replicate the issue.
If anyone has any pointers as to where I might look next it would be
|
|
Quote: | Quote: | greatly
appreciated.
We will be updating to the newest release soon, however I was hoping
|
|
Quote: | Quote: | to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.
Thanks in advance for any tips/pointers anyone may have.
-e
......bt and bt full for a single "hung" thread
#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
|
| src/
Quote: | Quote: | switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
|
| threadproc/unix/
Quote: | Quote: | thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
|
| src/
Quote: | Quote: | switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
|
|
Quote: | Quote: | 0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
|
|
Quote: | Quote: | 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
|
|
Quote: | Quote: | 3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
|
| threadproc/unix/
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
Quote: | http://www.freeswitch.org
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
brian at freeswitch.org Guest
|
Posted: Thu Mar 05, 2009 6:48 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
I would update... We fixed a few bugs related to hung calls in the
past 24 hours.
/b
On Mar 5, 2009, at 5:39 PM, Nik Middleton wrote:
Quote: | Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they
did
seem to clear.
That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one. Not sure
how
to debug this one.
Version: FreeSWITCH Version 1.0.trunk (12276)
|
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org |
|
Back to top |
|
|
anthony.minessale at g... Guest
|
Posted: Thu Mar 05, 2009 11:07 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
if they went away by themselves they must not have been hung?
On Thu, Mar 5, 2009 at 5:39 PM, Nik Middleton <nik.middleton@noblesolutions.co.uk (nik.middleton@noblesolutions.co.uk)> wrote:
Quote: | Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they did
seem to clear.
That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one. Not sure how
to debug this one.
Version: FreeSWITCH Version 1.0.trunk (12276)
-----Original Message-----
From: freeswitch-users-bounces@lists.freeswitch.org (freeswitch-users-bounces@lists.freeswitch.org)
[mailto:freeswitch-users-bounces@lists.freeswitch.org (freeswitch-users-bounces@lists.freeswitch.org)] On Behalf Of Eric
Liedtke
Sent: 05 March 2009 23:23
To: freeswitch-users@lists.freeswitch.org (freeswitch-users@lists.freeswitch.org)
Subject: Re: [Freeswitch-users] Hung Channels (SVN Rev 10231)
Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.
No proxy or bypass media here, just defaults.
I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.
Thanks
-e
It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM
-0500 , Mathieu Rene said:
Quote: | HI,
If you suspect a bug, the place to report it is JIRA. See:
| http://wiki.freeswitch.org/wiki/Reporting_Bugs
Quote: | .
This gives the whole team a way of following up on issues.
Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.
As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.
In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.
Also, are you using proxy/bypass media or just the default?
Math
On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:
Quote: | Greetings,
I've been using FS in production on this rev (I realize it's pretty
|
|
Quote: | Quote: | far
behind current) and it's been running well, save 1 issue.
The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
|
|
Quote: | Quote: | come
into the private side profile, get routed via dialplan to the
|
| gateway
Quote: | Quote: | defined in the external profile and on to the vendor. Pretty simple.
I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
|
|
Quote: | Quote: | some
continual load on the 'system cpu' as well , reported via top.
Of course due to the load on the box I have to keep logging turned
|
| way
Quote: | Quote: | down. So I've been trying to troubleshoot it as best I can.
Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
|
|
Quote: | Quote: | the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.
I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
|
|
Quote: | Quote: | I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.
I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
|
|
Quote: | Quote: | threads seem to
be stuck, but am not having much luck even coming up with a scenario
|
|
Quote: | Quote: | to try
to replicate the issue.
If anyone has any pointers as to where I might look next it would be
|
|
Quote: | Quote: | greatly
appreciated.
We will be updating to the newest release soon, however I was hoping
|
|
Quote: | Quote: | to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.
Thanks in advance for any tips/pointers anyone may have.
-e
......bt and bt full for a single "hung" thread
#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
|
| src/
Quote: | Quote: | switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
|
| threadproc/unix/
Quote: | Quote: | thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
|
| src/
Quote: | Quote: | switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
|
|
Quote: | Quote: | 0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
|
|
Quote: | Quote: | 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
|
|
Quote: | Quote: | 3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
|
| threadproc/unix/
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
|
--
Anthony Minessale II
FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/
AIM: anthm
MSN:anthony_minessale@hotmail.com ([email]MSN%3Aanthony_minessale@hotmail.com[/email])
GTALK/JABBER/PAYPAL:anthony.minessale@gmail.com ([email]PAYPAL%3Aanthony.minessale@gmail.com[/email])
IRC: irc.freenode.net #freeswitch
FreeSWITCH Developer Conference
sip:888@conference.freeswitch.org ([email]sip%3A888@conference.freeswitch.org[/email])
iax:guest@conference.freeswitch.org/888
googletalk:conf+888@conference.freeswitch.org ([email]googletalk%3Aconf%2B888@conference.freeswitch.org[/email])
pstn:213-799-1400 |
|
Back to top |
|
|
anthony.minessale at g... Guest
|
Posted: Thu Mar 05, 2009 11:10 pm Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) |
|
|
in your case you will have no choice but to update.
Please do a fresh checkout as the build system has also drastically changed.
On Thu, Mar 5, 2009 at 5:19 PM, Eric Liedtke <e@musinghalfwit.org (e@musinghalfwit.org)> wrote:
--
Anthony Minessale II
FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/
AIM: anthm
MSN:anthony_minessale@hotmail.com ([email]MSN%3Aanthony_minessale@hotmail.com[/email])
GTALK/JABBER/PAYPAL:anthony.minessale@gmail.com ([email]PAYPAL%3Aanthony.minessale@gmail.com[/email])
IRC: irc.freenode.net #freeswitch
FreeSWITCH Developer Conference
sip:888@conference.freeswitch.org ([email]sip%3A888@conference.freeswitch.org[/email])
iax:guest@conference.freeswitch.org/888
googletalk:conf+888@conference.freeswitch.org ([email]googletalk%3Aconf%2B888@conference.freeswitch.org[/email])
pstn:213-799-1400 |
|
Back to top |
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|