Sponsor: VoiceMeUp - Corporate & Wholesale VoIP Services

VoIP Mailing List Archives
Mailing list archives for the VoIP community
 SearchSearch 

[Freeswitch-users] Hung Channels (SVN Rev 10231)


 
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> freeSWITCH Users
View previous topic :: View next topic  
Author Message
e at musinghalfwit.org
Guest





PostPosted: Thu Mar 05, 2009 5:45 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

Greetings,

I've been using FS in production on this rev (I realize it's pretty far
behind current) and it's been running well, save 1 issue.

The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.

I have noticed that under load (50 or so cps with ~800-900 bridged calls up)
that over time some channels on the public side seem to get "stuck". Due to
the nature of how this is being used , I would expect both sip profiles to show
the same number of channels in use any time i do a 'sofia status' ( or at least
be within a channel or 2 of each other). However after a day of heavy use I had
a disparity of ~250 channels. These extra channels also seem to put some
continual load on the 'system cpu' as well , reported via top.

Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.

Last night I grabbed a core file and started in with GDB today. I found
the 120 or so threads that represented real active calls when I took the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same, annotated below.

I walked through the code path by hand , based on the bt's and I don't see how
this could be happening unless it's a locking issue. But as far as I can tell
each session has it's own mutex defined in the switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also would have expected
if it were something of a deadlock nature it would stop processing calls all
together.

I grabbed the commands from the .gdbinit (super handy btw!!) and have been trolling
through the variables to try to ascertain something about why these threads seem to
be stuck, but am not having much luck even coming up with a scenario to try
to replicate the issue.

If anyone has any pointers as to where I might look next it would be greatly
appreciated.

We will be updating to the newest release soon, however I was hoping to nail down
what is going so I can systematically replicate it and verify by testing in the lab
that it is fixed , rather than just pushing the new release to produvction and hoping.

Thanks in advance for any tips/pointers anyone may have.

-e

......bt and bt full for a single "hung" thread


#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840, obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *) 0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504, 9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895, __saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280, 9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960, 2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728, 134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728, 3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840, obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
brian at freeswitch.org
Guest





PostPosted: Thu Mar 05, 2009 6:01 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

Well the rules usually state that you try SVN trunk then report a jira
if the problem persists but since you're 2000+ revs behind chances are
we already fixed this issue. Are you using bypass media?

/b

On Mar 5, 2009, at 4:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
mrene_lists at avgs.ca
Guest





PostPosted: Thu Mar 05, 2009 6:01 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

HI,

If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs
.
This gives the whole team a way of following up on issues.

Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.

As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.

In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.

Also, are you using proxy/bypass media or just the default?

Math

On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.

The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.

I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
some
continual load on the 'system cpu' as well , reported via top.

Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.

Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.

I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.

I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
threads seem to
be stuck, but am not having much luck even coming up with a scenario
to try
to replicate the issue.

If anyone has any pointers as to where I might look next it would be
greatly
appreciated.

We will be updating to the newest release soon, however I was hoping
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.

Thanks in advance for any tips/pointers anyone may have.

-e

......bt and bt full for a single "hung" thread


#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
e at musinghalfwit.org
Guest





PostPosted: Thu Mar 05, 2009 6:25 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

Yeah I know Wink I didn't open a bug because my rev was so far behind. I
was just looking for any advice for where to poke next. Troubleshooting
this has been a fantastic introduction to some of the inner workings of
freeswitch so I was hoping to see it through and learn as I went.

To answer your question no we are not using bypass media.

-e

It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 04:52:43PM -0600 , Brian West said:
Quote:
Well the rules usually state that you try SVN trunk then report a jira
if the problem persists but since you're 2000+ revs behind chances are
we already fixed this issue. Are you using bypass media?

/b

On Mar 5, 2009, at 4:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
e at musinghalfwit.org
Guest





PostPosted: Thu Mar 05, 2009 6:30 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.

No proxy or bypass media here, just defaults.

I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.

Thanks
-e

It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM -0500 , Mathieu Rene said:
Quote:
HI,

If you suspect a bug, the place to report it is JIRA. See: http://wiki.freeswitch.org/wiki/Reporting_Bugs
.
This gives the whole team a way of following up on issues.

Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.

As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.

In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.

Also, are you using proxy/bypass media or just the default?

Math

On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.

The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They
come
into the private side profile, get routed via dialplan to the gateway
defined in the external profile and on to the vendor. Pretty simple.

I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put
some
continual load on the 'system cpu' as well , reported via top.

Of course due to the load on the box I have to keep logging turned way
down. So I've been trying to troubleshoot it as best I can.

Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.

I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as
I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.

I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these
threads seem to
be stuck, but am not having much luck even coming up with a scenario
to try
to replicate the issue.

If anyone has any pointers as to where I might look next it would be
greatly
appreciated.

We will be updating to the newest release soon, however I was hoping
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.

Thanks in advance for any tips/pointers anyone may have.

-e

......bt and bt full for a single "hung" thread


#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at src/
switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)
0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,
3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at threadproc/unix/
thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
nik.middleton at noble...
Guest





PostPosted: Thu Mar 05, 2009 6:45 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they did
seem to clear.

That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one. Not sure how
to debug this one.

Version: FreeSWITCH Version 1.0.trunk (12276)

-----Original Message-----
From: freeswitch-users-bounces@lists.freeswitch.org
[mailto:freeswitch-users-bounces@lists.freeswitch.org] On Behalf Of Eric
Liedtke
Sent: 05 March 2009 23:23
To: freeswitch-users@lists.freeswitch.org
Subject: Re: [Freeswitch-users] Hung Channels (SVN Rev 10231)

Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.

No proxy or bypass media here, just defaults.

I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.

Thanks
-e

It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM
-0500 , Mathieu Rene said:
Quote:
HI,

If you suspect a bug, the place to report it is JIRA. See:
http://wiki.freeswitch.org/wiki/Reporting_Bugs
Quote:
.
This gives the whole team a way of following up on issues.

Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.

As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.

In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.

Also, are you using proxy/bypass media or just the default?

Math

On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty

Quote:
Quote:
far
behind current) and it's been running well, save 1 issue.

The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They

Quote:
Quote:
come
into the private side profile, get routed via dialplan to the
gateway
Quote:
Quote:
defined in the external profile and on to the vendor. Pretty simple.

I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck". Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put

Quote:
Quote:
some
continual load on the 'system cpu' as well , reported via top.

Of course due to the load on the box I have to keep logging turned
way
Quote:
Quote:
down. So I've been trying to troubleshoot it as best I can.

Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took

Quote:
Quote:
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.

I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening unless it's a locking issue. But as far as

Quote:
Quote:
I can tell
each session has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.

I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these

Quote:
Quote:
threads seem to
be stuck, but am not having much luck even coming up with a scenario

Quote:
Quote:
to try
to replicate the issue.

If anyone has any pointers as to where I might look next it would be

Quote:
Quote:
greatly
appreciated.

We will be updating to the newest release soon, however I was hoping

Quote:
Quote:
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.

Thanks in advance for any tips/pointers anyone may have.

-e

......bt and bt full for a single "hung" thread


#0 0xb7fd5410 in __kernel_vsyscall ()
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
src/
Quote:
Quote:
switch_core_state_machine.c:462
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
threadproc/unix/
Quote:
Quote:
thread.c:138
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0 0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4 0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
src/
Quote:
Quote:
switch_core_state_machine.c:462
exception = 0 '\0'
state = <value optimized out>
endstate = CS_NEW
endpoint_interface = <value optimized out>
driver_state_handler = (const switch_state_handler_table_t *)

Quote:
Quote:
0xb73b1720
application_state_handler = <value optimized out>
thread_id = 3085554955
env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,

Quote:
Quote:
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,

Quote:
Quote:
3085458203, 3086590960, 2976606624,
134564192, 2833244904}}}}
sig = <value optimized out>
__func__ = "switch_core_session_run"
__PRETTY_FUNCTION__ = "switch_core_session_run"
#5 0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
session = (switch_core_session_t *) 0x95fe270
event = <value optimized out>
event_str = 0x0
val = <value optimized out>
__func__ = "switch_core_session_thread"
__PRETTY_FUNCTION__ = "switch_core_session_thread"
#6 0xb7efd916 in dummy_worker (opaque=0x9ada840) at
threadproc/unix/
Quote:
Quote:
thread.c:138
No locals.
#7 0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8 0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
Quote:
Quote:
http://www.freeswitch.org


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
Quote:
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
brian at freeswitch.org
Guest





PostPosted: Thu Mar 05, 2009 6:48 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

I would update... We fixed a few bugs related to hung calls in the
past 24 hours.

/b

On Mar 5, 2009, at 5:39 PM, Nik Middleton wrote:

Quote:
Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they
did
seem to clear.

That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one. Not sure
how
to debug this one.

Version: FreeSWITCH Version 1.0.trunk (12276)


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
anthony.minessale at g...
Guest





PostPosted: Thu Mar 05, 2009 11:07 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

if they went away by themselves they must not have been hung?

On Thu, Mar 5, 2009 at 5:39 PM, Nik Middleton <nik.middleton@noblesolutions.co.uk (nik.middleton@noblesolutions.co.uk)> wrote:
Quote:
Well if it's any consolation, I have a 4 day ish old copy of SVN and I
have around 200 of these hung calls, though after an hour or so they did
seem to clear.

That said, FS made 138,330 call attempts today, not too shabby, and
through out the call quality was as good as the first one.  Not sure how
to debug this one.

Version: FreeSWITCH Version 1.0.trunk (12276)


-----Original Message-----
From: freeswitch-users-bounces@lists.freeswitch.org (freeswitch-users-bounces@lists.freeswitch.org)
[mailto:freeswitch-users-bounces@lists.freeswitch.org (freeswitch-users-bounces@lists.freeswitch.org)] On Behalf Of Eric
Liedtke
Sent: 05 March 2009 23:23
To: freeswitch-users@lists.freeswitch.org (freeswitch-users@lists.freeswitch.org)
Subject: Re: [Freeswitch-users] Hung Channels (SVN Rev 10231)

Yup, as I mentioned to brian didn't want to clog jira with a bug that's
been fixed or report against a rev 2k+ revs behind. I was trying to work
through it as a learning exercise. And yeah I actually added a bunch of
stuff to the list_sessions function to spit out a variety of associated
variables for each session looking for a pattern somewhere to clue me
into what might be happening.

No proxy or bypass media here, just defaults.

I will keep at it and once we update the production systems, if the
problem persists I will open a bug in jira with all the neccessary
goodies.

Thanks
-e

It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 05:55:33PM
-0500 , Mathieu Rene said:
Quote:
HI,

If you suspect a bug, the place to report it is JIRA. See:
http://wiki.freeswitch.org/wiki/Reporting_Bugs
Quote:
.
This gives the whole team a way of following up on issues.

Also can you upgrade to svn trunk? A lot of fixes gets committed
daily, so its good to stay up to date.

As you seem familiar with GDB, you may symlink the .gdbinit file in
the support-d/ folder to your home directory.
This will give you some FS-specific macros such as "list_sessions"
which will dump a list of uuids to session pointers.

In your jira, make sure you include "thread apply all bt",
"list_sessions" and show channels (this one goes in FS) but PLEASE
update to svn trunk and test again to see if it still happens.

Also, are you using proxy/bypass media or just the default?

Math

On 5-Mar-09, at 5:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty

Quote:
Quote:
far
behind current) and it's been running well, save 1 issue.

The basic setup is an SBC , 2 GiG-E ports, 1 public , 1 private. I
have
2 sip profiles created , 1 per ip interface. This is being used to
terminate traffic to a provider so calls are only 1 direction. They

Quote:
Quote:
come
into the private side profile, get routed via dialplan to the
gateway
Quote:
Quote:
defined in the external profile and on to the vendor. Pretty simple.

I have noticed that under load (50 or so cps with ~800-900 bridged
calls up)
that over time some channels on the public side seem to get
"stuck".  Due to
the nature of how this is being used , I would expect both sip
profiles to show
the same number of channels in use any time i do a 'sofia
status' ( or at least
be within a channel or 2 of each other). However after a day of
heavy use I had
a disparity of ~250 channels. These extra channels also seem to put

Quote:
Quote:
some
continual load on the 'system cpu' as well , reported via top.

Of course due to the load on the box I have to keep logging turned
way
Quote:
Quote:
down. So I've been trying to troubleshoot it as best I can.

Last night I grabbed a core file and started in with GDB today. I
found
the 120 or so threads that represented real active calls when I took

Quote:
Quote:
the
corefile, I also found ~250 threads that appeared to be stuck in the
CS_NEW state. The backtraces on all of them looks the same,
annotated below.

I walked through the code path by hand , based on the bt's and I
don't see how
this could be happening  unless it's a locking issue. But as far as

Quote:
Quote:
I can tell
each  session  has it's own mutex defined in the
switch_core_session_t struct,
so I wouldn't think they would be stepping on each other. I also
would have expected
if it were something of a deadlock nature it would stop processing
calls all
together.

I grabbed the commands from the .gdbinit (super handy btw!!) and
have been trolling
through the variables to try to ascertain something about why these

Quote:
Quote:
threads seem to
be stuck, but am not having much luck even coming up with a scenario

Quote:
Quote:
to try
to replicate the issue.

If anyone has any pointers as to where I might look next it would be

Quote:
Quote:
greatly
appreciated.

We will be updating to the newest release soon, however I was hoping

Quote:
Quote:
to nail down
what is going so I can systematically replicate it and verify by
testing in the lab
that it is fixed , rather than just pushing the new release to
produvction and hoping.

Thanks in advance for any tips/pointers anyone may have.

-e

......bt and bt full for a single "hung" thread


#0  0xb7fd5410 in __kernel_vsyscall ()
#1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
#4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
src/
Quote:
Quote:
switch_core_state_machine.c:462
#5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
#6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at
threadproc/unix/
Quote:
Quote:
thread.c:138
#7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
#8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6
(gdb) bt full
#0  0xb7fd5410 in __kernel_vsyscall ()
No symbol table info available.
#1  0xb7d14cb6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#2  0xb7d4f1dc in usleep () from /lib/tls/i686/cmov/libc.so.6
No symbol table info available.
#3  0xb7ee02cd in switch_sleep (t=1000) at src/switch_time.c:143
No locals.
#4  0xb7e9da03 in switch_core_session_run (session=0x95fe270) at
src/
Quote:
Quote:
switch_core_state_machine.c:462
       exception = 0 '\0'
       state = <value optimized out>
       endstate = CS_NEW
       endpoint_interface = <value optimized out>
       driver_state_handler = (const switch_state_handler_table_t *)

Quote:
Quote:
0xb73b1720
       application_state_handler = <value optimized out>
       thread_id = 3085554955
       env = {{__jmpbuf = {134603552, -1428248680, -1461722504,
9184, -1210273432, -1210014020}, __mask_was_saved = -1210034895,
__saved_mask = {__val = {0, 3084988404, 3084937740, 3086469280,
9184, 1, 2976641592, 2833244792, 3086590960,
       168036728, 3084937740, 2833244808, 3085923728, 1, 3086590960,

Quote:
Quote:
2833244840, 3086590960, 0, 134564192, 2833244840, 3085923728,
134564244, 3086590960, 2833244872, 3085887870, 134564240, 168036728,

Quote:
Quote:
3085458203, 3086590960, 2976606624,
       134564192, 2833244904}}}}
       sig = <value optimized out>
       __func__ = "switch_core_session_run"
       __PRETTY_FUNCTION__ = "switch_core_session_run"
#5  0xb7e9c765 in switch_core_session_thread (thread=0x9ada840,
obj=0x95fe270) at src/switch_core_session.c:853
       session = (switch_core_session_t *) 0x95fe270
       event = <value optimized out>
       event_str = 0x0
       val = <value optimized out>
       __func__ = "switch_core_session_thread"
       __PRETTY_FUNCTION__ = "switch_core_session_thread"
#6  0xb7efd916 in dummy_worker (opaque=0x9ada840) at
threadproc/unix/
Quote:
Quote:
thread.c:138
No locals.
#7  0xb7e034fb in start_thread () from /lib/tls/i686/cmov/
libpthread.so.0
No symbol table info available.
#8  0xb7d55e5e in clone () from /lib/tls/i686/cmov/libc.so.6


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
Quote:
Quote:
http://www.freeswitch.org


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users

UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
Quote:
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org





--
Anthony Minessale II

FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/

AIM: anthm
MSN:anthony_minessale@hotmail.com ([email]MSN%3Aanthony_minessale@hotmail.com[/email])
GTALK/JABBER/PAYPAL:anthony.minessale@gmail.com ([email]PAYPAL%3Aanthony.minessale@gmail.com[/email])
IRC: irc.freenode.net #freeswitch

FreeSWITCH Developer Conference
sip:888@conference.freeswitch.org ([email]sip%3A888@conference.freeswitch.org[/email])
iax:guest@conference.freeswitch.org/888
googletalk:conf+888@conference.freeswitch.org ([email]googletalk%3Aconf%2B888@conference.freeswitch.org[/email])
pstn:213-799-1400
Back to top
anthony.minessale at g...
Guest





PostPosted: Thu Mar 05, 2009 11:10 pm    Post subject: [Freeswitch-users] Hung Channels (SVN Rev 10231) Reply with quote

in your case you will have no choice but to update.
Please do a fresh checkout as the build system has also drastically changed.


On Thu, Mar 5, 2009 at 5:19 PM, Eric Liedtke <e@musinghalfwit.org (e@musinghalfwit.org)> wrote:
Quote:
Yeah I know Wink I didn't open a bug because my rev was so far behind. I
was just looking for any advice for where to poke next. Troubleshooting
this has been a fantastic introduction to some of the inner workings of
freeswitch so I was hoping to see it through and learn as I went.

To answer your question no we are not using bypass media.

-e


It's seems fuzzy now but I think on Thu, Mar 05, 2009 at 04:52:43PM -0600 , Brian West said:
Quote:
Well the rules usually state that you try SVN trunk then report a jira
if the problem persists but since you're 2000+ revs behind chances are
we already fixed this issue.  Are you using bypass media?

/b

On Mar 5, 2009, at 4:38 PM, Eric Liedtke wrote:

Quote:
Greetings,

I've been using FS in production on this rev (I realize it's pretty
far
behind current) and it's been running well, save 1 issue.


_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org

_______________________________________________
Freeswitch-users mailing list
Freeswitch-users@lists.freeswitch.org (Freeswitch-users@lists.freeswitch.org)
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org





--
Anthony Minessale II

FreeSWITCH http://www.freeswitch.org/
ClueCon http://www.cluecon.com/

AIM: anthm
MSN:anthony_minessale@hotmail.com ([email]MSN%3Aanthony_minessale@hotmail.com[/email])
GTALK/JABBER/PAYPAL:anthony.minessale@gmail.com ([email]PAYPAL%3Aanthony.minessale@gmail.com[/email])
IRC: irc.freenode.net #freeswitch

FreeSWITCH Developer Conference
sip:888@conference.freeswitch.org ([email]sip%3A888@conference.freeswitch.org[/email])
iax:guest@conference.freeswitch.org/888
googletalk:conf+888@conference.freeswitch.org ([email]googletalk%3Aconf%2B888@conference.freeswitch.org[/email])
pstn:213-799-1400
Back to top
Display posts from previous:   
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> freeSWITCH Users All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

VoiceMeUp - Corporate & Wholesale VoIP Services