Sponsor: VoiceMeUp - Corporate & Wholesale VoIP Services

VoIP Mailing List Archives
Mailing list archives for the VoIP community
 SearchSearch 

[Freeswitch-users] Fwd: strange behaviour with scheduled jobs, uuid_transfer hangs up one of the legs if freeswitch is n


 
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> freeSWITCH Users
View previous topic :: View next topic  
Author Message
fraunhofer.lists.frees...
Guest





PostPosted: Fri Aug 07, 2009 10:38 pm    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

Hello List, Hello *,

First of all the usual excuses: sorry for the bad english and the long
email, no native speaker and i really tried to make it shorter, but i
guess this would result in even more "check back"s than it already
does :)

we're currently running in a weird "lockup"-scenario in our loadtests.

Our setup is the following:

three freeswitch servers, let's call them A(-leg), M(aster), and
B(-leg) with the goal in mind to initiate calls on M which calls A,
play some file, bridge to B, limit call length and play (different)
prompts to A and B if they exceed that limit.

(Note that A and B work fine, regardless of the amount of load we put on them)
A and B are silly dialplan logic, accepting calls on a certain
extension after a random delay and playing moh. Before calling
playback to a localstream they call a lua script which schedules
hangup somewhere in future (which works flawlessly)

Calls are initiated on M using some hacked up loadgen-script issuing
http requests like
  originate [sofiaSyntaxToExtensionOn_A] 6000
. The 6000 extension on M has the following (xml) dialplan which
essentially does the following:
------
answer()
...playback file...
...set some callerid stuff
set bypass_media
bridge to extension 6009 on B
------
we use "execute_on_answer" on the b-leg to run a script which limits
the length of the call (doesn't matter if it's done via "export
nolocal" or "inlined" into the data part of the bridge application
"{execute_on_answer=lua ...}")

<action application="export" data="nolocal:execute_on_answer=lua
lua/schedule-hangup.lua ${uuid}" />

the lua script "schedula-hangup.lua" does essentially the following:

------
api = freeswitch.API();
local res = api:execute("sched_api", "+10 none lua
lua/c2c-hangup-timeout.lua " .. argv[1]);
------

the 10 seconds are just to speed up the time until it gets stuck.

this is where things start to go wrong. if I comment out the call to
the "schedule-hangup" script, everything works fine, even if it's
under heavy load.

c2c-hangup-timeout.lua does the following:
------------------
local sess = argv[1];
if(sess)
then
   freeswitch.consoleLog("INFO", "c2c-hangup-timeout.lua for uuid "
.. sess .. "\n");

   api = freeswitch.API();
   local stillValid = api:execute("uuid_getvar", sess .. "
Dummy-DoesChannelExists");
   if(stillValid:sub(1,4) == "-ERR")
   then
       log("session uuid " .. sess .. " disappeared (nothing bad)");
   else
       -- this is important!!! Otherwise the aleg get's just hung up!
       api:execute("uuid_media", sess);
       api:execute("uuid_transfer", sess .. " -both timeout");
   end
else -- /if(sess)
   log("called with nil session?");
end -- /if(sess)

------------------

i guess this needs some explanation:
we get the uuid of the channel as argument in argv[1]. We don't use
  local session = freeswitch.Session(uuid);
since if the channel referenced by "uuid" does not exist any longer,
freeswitch (or the lua bindings) try to interpret the uuid as an
"originate string" and can't figure out how to call that. So we use a
dummy api call to get some channel variable. If the channel does not
exist any longer (A or B already hung up), we get an error message
starting with "-ERR", otherwise the channel still exists (we get
"_unset_" as the value, if it's not set) and we continue by getting
freeswitch back in the media path (uuid_media) and then transferring
both legs to an extension called "timeout" which plays some prompt and
finally calls hangup().

If we don't do the uuid_media call, one of the legs gets hung up when
we transfer them to the extension. This looks like the following on
the console after issuing "uuid_transfer [uuid] -both timeout"
(extensions are not the same as in our loadgen example above)


--------------
2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1334 Hangup (*)
sofia/internal/1000 [CS_HIBERNATE] [BLIND_TRANSFER]
2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer
sofia/internal/1004@192.168.179.177:5060 to XML[timeout@default]
2009-07-23 19:57:19.865703 [INFO] mod_dialplan_xml.c:310 Processing
BFR1004->timeout in context default
API CALL [uuid_transfer(73812082-77b1-11de-b9f8-a10bb0eb9f69 -both
timeout)] output:
+OK

2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer (**)
sofia/internal/1000 to XML[timeout@default]
2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1084 Session
60 (sofia/internal/1000) Ended
2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1086 Close
Channel sofia/internal/1000 [CS_DESTROY]
-----------

note that it first does Hangup (denoted by *, no that's not an
asterisk Smile on extension 1000 and then tries to Transfer (**) the hung
up channel to the dial plan. this could be the same as in an earlier
post to the list "SIP re-invite / bypass_media // Phillip Jones //
Wed, 01 Jul 2009 13:30:53 -0700)"

This is why we do not directly call sched_transfer() but call a script
in between to do the uuid_media() call. I couldn't figure out how to
call that directly from the xml dialplan and/or how to check if the
channel still exists.

so... after using uuid_media(), both legs are transferred without an
(intermediate|bogus) hangup() call.

This only works fine if we've few concurrent calls. There is no magic
borderline where it starts to refuse work.

Some of the Symptoms are: traffic decreased to zero as no new channels
are successfully brought up, some of the signaling traffic is not
ACKed or OKed, scheduled jobs are not run.

if i read the output of "show channels" correctly, they're all stuck
in different applications like hangup(), some are calling lua but most
of them are in signaling_bridge(). Freeswitch is still responding on
the console and there's almost no load on the machine (no busy polling
or some other kind of running amok).

if i kill one of them using uuid_kill() or kill all of them
using"fsctl hupall" i get "Task was executed late by 866 seconds 12379
sched_api_function (none)" messages and the usual cleanup takes place.
As a quick hack i tried to schedule a uuid_kill() call 20 seconds
after the scheduling call to the lua script but that job is not
executed either.

So what am I doing wrong? Is it some deadlock where uuid_media() and
uuid_transfer()  are waiting for the other to finish?
Or some other silly simple thing i missed?

Thx in advance

 Benedikt.

_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
pjintheusa at gmail.com
Guest





PostPosted: Sat Aug 08, 2009 5:57 am    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

Hi there,

Not sure whether this helps but test this without set bypass_media. In
my setup I have noticed the leg A session ends when bypass_media is
true. Call/bridge continue successfully.

Phillip Jones

On Thu, Aug 6, 2009 at 1:28 PM, Benedikt
Fraunhofer<fraunhofer.lists.freeswitch-001@traced.net> wrote:
Quote:
Hello List, Hello *,

First of all the usual excuses: sorry for the bad english and the long
email, no native speaker and i really tried to make it shorter, but i
guess this would result in even more "check back"s than it already
does Smile

we're currently running in a weird "lockup"-scenario in our loadtests.

Our setup is the following:

three freeswitch servers, let's call them A(-leg), M(aster), and
B(-leg) with the goal in mind to initiate calls on M which calls A,
play some file, bridge to B, limit call length and play (different)
prompts to A and B if they exceed that limit.

(Note that A and B work fine, regardless of the amount of load we put on them)
A and B are silly dialplan logic, accepting calls on a certain
extension after a random delay and playing moh. Before calling
playback to a localstream they call a lua script which schedules
hangup somewhere in future (which works flawlessly)

Calls are initiated on M using some hacked up loadgen-script issuing
http requests like
  originate [sofiaSyntaxToExtensionOn_A] 6000
. The 6000 extension on M has the following (xml) dialplan which
essentially does the following:
------
answer()
...playback file...
...set some callerid stuff
set bypass_media
bridge to extension 6009 on B
------
we use "execute_on_answer" on the b-leg to run a script which limits
the length of the call (doesn't matter if it's done via "export
nolocal" or "inlined" into the data part of the bridge application
"{execute_on_answer=lua ...}")

<action application="export" data="nolocal:execute_on_answer=lua
lua/schedule-hangup.lua ${uuid}" />

the lua script "schedula-hangup.lua" does essentially the following:

------
api = freeswitch.API();
local res = api:execute("sched_api", "+10 none lua
lua/c2c-hangup-timeout.lua " .. argv[1]);
------

the 10 seconds are just to speed up the time until it gets stuck.

this is where things start to go wrong. if I comment out the call to
the "schedule-hangup" script, everything works fine, even if it's
under heavy load.

c2c-hangup-timeout.lua does the following:
------------------
local sess = argv[1];
if(sess)
then
   freeswitch.consoleLog("INFO", "c2c-hangup-timeout.lua for uuid "
.. sess .. "\n");

   api = freeswitch.API();
   local stillValid = api:execute("uuid_getvar", sess .. "
Dummy-DoesChannelExists");
   if(stillValid:sub(1,4) == "-ERR")
   then
       log("session uuid " .. sess .. " disappeared (nothing bad)");
   else
       -- this is important!!! Otherwise the aleg get's just hung up!
       api:execute("uuid_media", sess);
       api:execute("uuid_transfer", sess .. " -both timeout");
   end
else -- /if(sess)
   log("called with nil session?");
end -- /if(sess)

------------------

i guess this needs some explanation:
we get the uuid of the channel as argument in argv[1]. We don't use
  local session = freeswitch.Session(uuid);
since if the channel referenced by "uuid" does not exist any longer,
freeswitch (or the lua bindings) try to interpret the uuid as an
"originate string" and can't figure out how to call that. So we use a
dummy api call to get some channel variable. If the channel does not
exist any longer (A or B already hung up), we get an error message
starting with "-ERR", otherwise the channel still exists (we get
"_unset_" as the value, if it's not set) and we continue by getting
freeswitch back in the media path (uuid_media) and then transferring
both legs to an extension called "timeout" which plays some prompt and
finally calls hangup().

If we don't do the uuid_media call, one of the legs gets hung up when
we transfer them to the extension. This looks like the following on
the console after issuing "uuid_transfer [uuid] -both timeout"
(extensions are not the same as in our loadgen example above)


--------------
2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1334 Hangup (*)
sofia/internal/1000 [CS_HIBERNATE] [BLIND_TRANSFER]
2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer
sofia/internal/1004@192.168.179.177:5060 to XML[timeout@default]
2009-07-23 19:57:19.865703 [INFO] mod_dialplan_xml.c:310 Processing
BFR1004->timeout in context default
API CALL [uuid_transfer(73812082-77b1-11de-b9f8-a10bb0eb9f69 -both
timeout)] output:
+OK

2009-07-23 19:57:19.865703 [NOTICE] switch_ivr.c:1349 Transfer (**)
sofia/internal/1000 to XML[timeout@default]
2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1084 Session
60 (sofia/internal/1000) Ended
2009-07-23 19:57:19.865703 [NOTICE] switch_core_session.c:1086 Close
Channel sofia/internal/1000 [CS_DESTROY]
-----------

note that it first does Hangup (denoted by *, no that's not an
asterisk Smile on extension 1000 and then tries to Transfer (**) the hung
up channel to the dial plan. this could be the same as in an earlier
post to the list "SIP re-invite / bypass_media // Phillip Jones //
Wed, 01 Jul 2009 13:30:53 -0700)"

This is why we do not directly call sched_transfer() but call a script
in between to do the uuid_media() call. I couldn't figure out how to
call that directly from the xml dialplan and/or how to check if the
channel still exists.

so... after using uuid_media(), both legs are transferred without an
(intermediate|bogus) hangup() call.

This only works fine if we've few concurrent calls. There is no magic
borderline where it starts to refuse work.

Some of the Symptoms are: traffic decreased to zero as no new channels
are successfully brought up, some of the signaling traffic is not
ACKed or OKed, scheduled jobs are not run.

if i read the output of "show channels" correctly, they're all stuck
in different applications like hangup(), some are calling lua but most
of them are in signaling_bridge(). Freeswitch is still responding on
the console and there's almost no load on the machine (no busy polling
or some other kind of running amok).

if i kill one of them using uuid_kill() or kill all of them
using"fsctl hupall" i get "Task was executed late by 866 seconds 12379
sched_api_function (none)" messages and the usual cleanup takes place.
As a quick hack i tried to schedule a uuid_kill() call 20 seconds
after the scheduling call to the lua script but that job is not
executed either.

So what am I doing wrong? Is it some deadlock where uuid_media() and
uuid_transfer()  are waiting for the other to finish?
Or some other silly simple thing i missed?

Thx in advance

 Benedikt.

_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org


_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
fraunhofer.lists.frees...
Guest





PostPosted: Sat Aug 08, 2009 7:02 am    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

Hi Phillip,

2009/8/8 Phillip Jones <pjintheusa@gmail.com>:

Quote:
Not sure whether this helps but test this without set bypass_media. In
my setup I have noticed the leg A session ends when bypass_media is
true. Call/bridge continue successfully.

thx for that hint. unfortunately we can't do that due to the high
volume we anticipate
and i think i already tried that.

Note that it works in exactly this setup (with the uuid_media()-call)
if there are only very few calls handled. It just starts to refuse
work once enough jobs are scheduled and not executed.

Cheers
Benedikt

_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
mike at jerris.com
Guest





PostPosted: Sat Aug 08, 2009 3:04 pm    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

how many does it stop at? is it the same number each time?

Mike

On Aug 8, 2009, at 7:53 AM, Benedikt Fraunhofer wrote:

Quote:
Hi Phillip,

2009/8/8 Phillip Jones <pjintheusa@gmail.com>:

Quote:
Not sure whether this helps but test this without set bypass_media.
In
my setup I have noticed the leg A session ends when bypass_media is
true. Call/bridge continue successfully.

thx for that hint. unfortunately we can't do that due to the high
volume we anticipate
and i think i already tried that.

Note that it works in exactly this setup (with the uuid_media()-call)
if there are only very few calls handled. It just starts to refuse
work once enough jobs are scheduled and not executed.

Cheers
Benedikt


_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
fraunhofer.lists.frees...
Guest





PostPosted: Sat Aug 08, 2009 4:30 pm    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

Hello Mike,

2009/8/8 Michael Jerris <mike@jerris.com>:

Quote:
how many does it stop at?  is it the same number each time?

i tried to express that non-wisdom using the words
"This only works fine if we've few concurrent calls. There is no magic
borderline where it starts to refuse work."

this is surely not the best english, i admit. I just have no exact
"lower-bound" when it suddenly happens. My manual test-calls work only
in the beginning, once the loadgen is on for ~3min even the manual
test-calls do not work. Once it's locked up, it initiates calls
successfully (it rings) but (in my example) M doesn't ACK the answer
event it get's from A.

In my current setup with 100 call-legs i can repeatably make it
happen. Even better (for debugging), it does happen each time.

If someone is interested in the exact setup i can place a tarball
somewhere or - if someone could advise me with some gdb commands he
wants to have - i'll recompile with debugging symbols and get you
what you want. I'm definitely stuck here.

Thx.

Beni.

_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
mike at jerris.com
Guest





PostPosted: Sat Aug 08, 2009 5:13 pm    Post subject: [Freeswitch-users] Fwd: strange behaviour with scheduled job Reply with quote

please open a bug on jira.freeswitch.org with the details of exactly
how to re-create this issue

Mike

On Aug 8, 2009, at 5:21 PM, Benedikt Fraunhofer wrote:

Quote:
Hello Mike,

2009/8/8 Michael Jerris <mike@jerris.com>:

Quote:
how many does it stop at? is it the same number each time?

i tried to express that non-wisdom using the words
"This only works fine if we've few concurrent calls. There is no magic
borderline where it starts to refuse work."

this is surely not the best english, i admit. I just have no exact
"lower-bound" when it suddenly happens. My manual test-calls work only
in the beginning, once the loadgen is on for ~3min even the manual
test-calls do not work. Once it's locked up, it initiates calls
successfully (it rings) but (in my example) M doesn't ACK the answer
event it get's from A.

In my current setup with 100 call-legs i can repeatably make it
happen. Even better (for debugging), it does happen each time.

If someone is interested in the exact setup i can place a tarball
somewhere or - if someone could advise me with some gdb commands he
wants to have - i'll recompile with debugging symbols and get you
what you want. I'm definitely stuck here.

Thx.

Beni.


_______________________________________________
FreeSWITCH-users mailing list
FreeSWITCH-users@lists.freeswitch.org
http://lists.freeswitch.org/mailman/listinfo/freeswitch-users
UNSUBSCRIBE:http://lists.freeswitch.org/mailman/options/freeswitch-users
http://www.freeswitch.org
Back to top
Display posts from previous:   
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> freeSWITCH Users All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

VoiceMeUp - Corporate & Wholesale VoIP Services