Sponsor: VoiceMeUp - Corporate & Wholesale VoIP Services

VoIP Mailing List Archives
Mailing list archives for the VoIP community
 SearchSearch 

[asterisk-users] wct4xxp Excessive Interrupts Resulting in Unusable System or Card


 
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users
View previous topic :: View next topic  
Author Message
scott.lykens at kmmsin...
Guest





PostPosted: Sun Jun 01, 2014 7:42 am    Post subject: [asterisk-users] wct4xxp Excessive Interrupts Resulting in U Reply with quote

Hello all-

I have a Digium TE410P in an HP DL145 G2 dual processor server that generates well over 100,000 interrupts per second (sometimes I’ve counted 160,000+ per second) generally resulting in either the system becoming swamped and unusable or the kernel disabling the IRQ the TE410P is on resulting in the spans on that card being unusable.


I have confirmed that the card is good by placing it in an IBM server running FreePBX Distro and verifying that it generates only 1,000 interrupts per second, and works properly.


This is on a system running 64-bit Ubuntu 14.04 LTS, kernels 3.13.0-27-generic and 3.13.0-27-lowlatency. I have compiled and installed DAHDI from source, both 2.9.1.1 and 2.8.0, and see the same result with the Ubuntu DAHDI package which is based on 2.5.0. I have entered BIOS and disabled all extra devices I can and reset the configuration data.


Most frequently the interrupt is disabled by the kernel - booting with the irqpoll option as suggested by the error message does not always solve the problem and introduces other problems. See dmesg below:


(not prepped yet message repeat *many* times)
[ 16.371739] wct4xxp 0000:81:01.0: Not prepped yet!
[ 16.371743] wct4xxp 0000:81:01.0: Not prepped yet!
[ 16.611991] irq 25: nobody cared (try booting with the "irqpoll" option)
[ 16.615221] CPU: 0 PID: 0 Comm: swapper/0 Tainted: GF O 3.13.0-27-generic #50-Ubuntu
[ 16.615224] Hardware name: HP ProLiant DL145 G2/K85NL, BIOS 2.14 10/20/2005
[ 16.615227] ffff880139ea6a9c ffff88013bc03e68 ffffffff817199c4 ffff880139ea6a00
[ 16.615231] ffff88013bc03e90 ffffffff810c19d2 ffff880139ea6a00 0000000000000019
[ 16.615235] 0000000000000000 ffff88013bc03ed0 ffffffff810c1e6c 000000008101b763
[ 16.615239] Call Trace:
[ 16.615241] <IRQ> [<ffffffff817199c4>] dump_stack+0x45/0x56
[ 16.615253] [<ffffffff810c19d2>] __report_bad_irq+0x32/0xd0
[ 16.615257] [<ffffffff810c1e6c>] note_interrupt+0x1ac/0x200
[ 16.615260] [<ffffffff810bf749>] handle_irq_event_percpu+0xd9/0x1d0
[ 16.615263] [<ffffffff810bf87d>] handle_irq_event+0x3d/0x60
[ 16.615267] [<ffffffff810c29ea>] handle_fasteoi_irq+0x5a/0x100
[ 16.615272] [<ffffffff81015cde>] handle_irq+0x1e/0x30
[ 16.615276] [<ffffffff8172c6cd>] do_IRQ+0x4d/0xc0
[ 16.615281] [<ffffffff81721e6d>] common_interrupt+0x6d/0x6d
[ 16.615283] <EOI> [<ffffffff810d63c1>] ? tick_nohz_idle_enter+0x41/0x70
[ 16.615289] [<ffffffff810d63bd>] ? tick_nohz_idle_enter+0x3d/0x70
[ 16.615292] [<ffffffff810beb48>] cpu_startup_entry+0x88/0x290
[ 16.615297] [<ffffffff81707e97>] rest_init+0x77/0x80
[ 16.615302] [<ffffffff81d35f70>] start_kernel+0x438/0x443
[ 16.615305] [<ffffffff81d35941>] ? repair_env_string+0x5c/0x5c
[ 16.615308] [<ffffffff81d35120>] ? early_idt_handlers+0x120/0x120
[ 16.615312] [<ffffffff81d355ee>] x86_64_start_reservations+0x2a/0x2c
[ 16.615315] [<ffffffff81d35733>] x86_64_start_kernel+0x143/0x152
[ 16.615317] handlers:
[ 16.615987] [<ffffffffa01d3420>] t4_interrupt_gen2 [wct4xxp]
[ 16.615987] Disabling IRQ #25
[ 17.607238] dahdi_echocan_mg2: Registered echo canceler 'MG2'
[ 17.608276] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[ 17.608360] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[ 17.708056] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.708065] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.736138] wct4xxp 0000:81:01.0: Span 2 configured for ESF/B8ZS
[ 17.808065] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.808073] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.864134] wct4xxp 0000:81:01.0: Span 3 configured for ESF/B8ZS
[ 17.908049] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 17.908058] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 17.992139] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[ 18.008106] wct4xxp 0000:81:01.0: RCLK source set to span 1
[ 18.008114] wct4xxp 0000:81:01.0: Recovered timing mode, RCLK set to span 1
[ 20.208172] wct4xxp 0000:81:01.0: Setting yellow alarm span 1
[ 20.208212] wct4xxp 0000:81:01.0: RCLK source set to span 2
[ 20.208216] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 2
[ 20.308149] wct4xxp 0000:81:01.0: Setting yellow alarm span 2
[ 20.308180] wct4xxp 0000:81:01.0: RCLK source set to span 3
[ 20.308184] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 3
[ 20.408173] wct4xxp 0000:81:01.0: Setting yellow alarm span 3
[ 20.408200] wct4xxp 0000:81:01.0: RCLK source set to span 4
[ 20.408204] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4
[ 25.601523] wct4xxp 0000:81:01.0: Span 1 configured for ESF/B8ZS
[ 25.601587] wct4xxp 0000:81:01.0: SPAN 1: Primary Sync Source
[ 25.601673] wct4xxp 0000:81:01.0: Span 4 configured for ESF/B8ZS
[ 25.608209] wct4xxp 0000:81:01.0: RCLK source set to span 4
[ 25.608215] wct4xxp 0000:81:01.0: System timing mode, RCLK set to span 4



Checking /proc/interrupts reveals that the card generated 100,000 interrupts without being serviced and the kernel disabled it (and also reveals that the card is apparently on its own IRQ):


maintenance@sip:~$ cat /proc/interrupts
CPU0 CPU1
0: 46 0 IO-APIC-edge timer
1: 10 0 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 0 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 4 0 IO-APIC-edge i8042
14: 0 0 IO-APIC-edge pata_amd
15: 0 0 IO-APIC-edge pata_amd
16: 304 0 IO-APIC-fasteoi nouveau
19: 1221 0 IO-APIC-fasteoi eth1
21: 8681 0 IO-APIC-fasteoi sata_nv
22: 0 0 IO-APIC-fasteoi ehci_hcd:usb1
23: 0 0 IO-APIC-fasteoi ohci_hcd:usb2
25: 100000 1 IO-APIC-fasteoi wct4xxp
NMI: 1 1 Non-maskable interrupts
LOC: 17884 19728 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 1 1 Performance monitoring interrupts
IWI: 1554 815 IRQ work interrupts
RTR: 0 0 APIC ICR read retries
RES: 6566 8577 Rescheduling interrupts
CAL: 220 4521 Function call interrupts
TLB: 638 504 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 1
MIS: 0



Any ideas on how I can further diagnose and pursue this? Google does not reveal much related to this issue that is useful.


Thank you!

--
Scott L. Lykens
Keystone Medical Management Solutions, Inc.
+1 814 325-7500 x501 -- www.kmmsinc.com
Back to top
webaccounts173 at jgoe...
Guest





PostPosted: Sun Jun 01, 2014 7:58 am    Post subject: [asterisk-users] wct4xxp Excessive Interrupts Resulting in U Reply with quote

Just to be sure, what's the output of "vmstat 10 10"?

jg

--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
New to Asterisk? Join us for a live introductory webinar every Thurs:
http://www.asterisk.org/hello

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-users
Back to top
scott.lykens at kmmsin...
Guest





PostPosted: Sun Jun 01, 2014 9:27 am    Post subject: [asterisk-users] wct4xxp Excessive Interrupts Resulting in U Reply with quote

Quote:
Just to be sure, what's the output of "vmstat 10 10"?


From within a minute or so of the system starting, keep in mind that the TE410P’s IRQ is disabled so the sys value is not representative of actual use had it been.


maintenance@sip:~$ vmstat 10 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 7714300 42712 176292 0 0 458 37 369 369 1 3 91 4 0
0 0 0 7714336 42720 176324 0 0 0 4 194 396 0 0 99 0 0
0 0 0 7714676 42720 176324 0 0 0 5 197 397 0 0 100 0 0
0 0 0 7714732 42736 176324 0 0 0 8 216 443 0 0 99 0 0
0 0 0 7714736 42744 176324 0 0 0 2 195 395 0 0 99 0 0
0 0 0 7714736 42744 176324 0 0 0 0 200 420 0 0 99 0 0
0 0 0 7714712 42752 176324 0 0 0 4 205 414 0 0 99 0 0
0 0 0 7714760 42804 176324 0 0 0 23 216 430 0 0 98 2 0
0 0 0 7714756 42812 176324 0 0 0 4 201 409 0 0 99 0 0



Thank you.

--
Scott L. Lykens
Keystone Medical Management Solutions, Inc.
+1 814 325-7500 x501 -- www.kmmsinc.com
Back to top
webaccounts173 at jgoe...
Guest





PostPosted: Sun Jun 01, 2014 10:01 am    Post subject: [asterisk-users] wct4xxp Excessive Interrupts Resulting in U Reply with quote

Yes, I can see this. Another thing to check would be to start from a different OS (eg from a USB
stick) and see how the card behaves on the otherwise same hardware.

Since your ProLiant G2 server is almost 10 years old, and the TE410P works with 3.3V only
(http://www.digium.com/en/products/telephony-cards/digital/quad-span), it might be worth to
check this.

jg

--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
New to Asterisk? Join us for a live introductory webinar every Thurs:
http://www.asterisk.org/hello

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-users
Back to top
scott.lykens at kmmsin...
Guest





PostPosted: Sun Jun 01, 2014 10:53 am    Post subject: [asterisk-users] wct4xxp Excessive Interrupts Resulting in U Reply with quote

On Jun 1, 2014, at 11:01 AM, jg <webaccounts173@jgoettgens.de> wrote:

Quote:
Yes, I can see this. Another thing to check would be to start from a different OS (eg from a USB stick) and see how the card behaves on the otherwise same hardware.

Since your ProLiant G2 server is almost 10 years old, and the TE410P works with 3.3V only (http://www.digium.com/en/products/telephony-cards/digital/quad-span), it might be worth to check this.

The server is equipped with a 3.3v PCI-X slot. (https://h10057.www1.hp.com/ecomcat/hpcatalog/specs/provisioner/05/411095-421.htm).

It is an old server but it has worked just fine for the task of hosting Asterisk for some time and I prefer not to spend $2,000+ to replace both the server and the PCI card with more modern hardware. Admittedly, the TE410P is new to the equation in the last several months but only in the last few weeks has this really become a problem to the point of affecting use. In fact, I was on a call Thursday morning for about an hour that was entirely SIP but during that time the system started blocking and other users could no longer make calls - even though my call was unaffected.

The server is equipped with an AMD 8132 PCI-X bridge which apparently is known for being difficult in regards to interrupts. Google reveals that a few drivers have workarounds related to this chipset and to a range of revisions that mine happens to fall into.

I will build a live-cd based usb key later on today and test the hardware independent of its present OS.

Thank you.

Scott
--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --
New to Asterisk? Join us for a live introductory webinar every Thurs:
http://www.asterisk.org/hello

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-users
Back to top
Display posts from previous:   
Post new topic   Reply to topic    VoIP Mailing List Archives Forum Index -> Asterisk Users All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

VoiceMeUp - Corporate & Wholesale VoIP Services