split a table into several with Beautiful Soup [Python]
I need your help with a problem which I can't find out...
I have an html Table with tr and td:
for example:
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Macros
</h2>
</td>
</tr>
<tr>
<td>
#define
</td>
<td>
<a class="el" href="#g3e3da223d2db3b49a9b6e3ee6f49f745">
SND_LSTINDIC
</a>
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
liste sons indication
<br />
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Définition de type
</h2>
</td>
</tr>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
typedef void(*
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g73cba8bd62d629eb05495a5c1a7b2844">
f_sndChangeFunc
</a>
)(
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
i_eSound,
aBOOL
i_bStart,
aBYTE
i_byDisableModule)
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
Fonction rappel sur départ/arrêt bip.
<a href="#g73cba8bd62d629eb05495a5c1a7b2844">
</a>
<br />
</td>
</tr>
<tr>
<td colspan="2">
<br />
<h2>
Énumérations
</h2>
</td>
</tr>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
enum
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
{
}
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
identificateurs sons
<a href="group__Sound.html#g4ab7db37a42f244764583a63997489a8">
Plus de détails...
</a>
<br />
</td>
</tr>
</table>
I try to split this table by several one. I would like to get out
title and create a table with the following lines.
For example the expected result here should be this:
<h2>
Macros
</h2>
<table border="0" cellpadding="0" cellspacing="0">
<tr>
<td>
</td>
</tr>
<tr>
<td colspan="2">
<br />
</td>
</tr>
<tr>
<td>
#define
</td>
<td>
<a class="el" href="#g3e3da223d2db3b49a9b6e3ee6f49f745">
SND_LSTINDIC
</a>
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
liste sons indication
<br />
</td>
</tr>
</table>
<h2>
Définition de type
</h2>
<table>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
typedef void(*
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g73cba8bd62d629eb05495a5c1a7b2844">
f_sndChangeFunc
</a>
)(
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
i_eSound,
aBOOL
i_bStart,
aBYTE
i_byDisableModule)
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
Fonction rappel sur départ/arrêt bip.
<a href="#g73cba8bd62d629eb05495a5c1a7b2844">
</a>
<br />
</td>
</tr>
</table>
<h2>
Énumérations
</h2>
<table>
<tr>
<td class="memItemLeft" nowrap="nowrap" align="right" valign="top">
enum
</td>
<td class="memItemRight" valign="bottom">
<a class="el" href="#g4ab7db37a42f244764583a63997489a8">
e_sndSound
</a>
{
}
</td>
</tr>
<tr>
<td class="mdescLeft">
</td>
<td class="mdescRight">
identificateurs sons
<a href="group__Sound.html#g4ab7db37a42f244764583a63997489a8">
Plus de détails...
</a>
<br />
</td>
</tr>
</table>
I use python and BeautifulSoup in order to parse my html code. I tried
with this first :
from BeautifulSoup import BeautifulSoup, NavigableString
import sys
import os
soup = BeautifulSoup(allHtml)
for table in htmlSoup.findAll("table"):
h2s = table.findAll("h2")
if h2s is not []:
FirstH2 = True
LastH2 = False
for i, h2 in enumerate(h2s):
if h2 is not []:
LastH2 = ( i == len(h2s) - 1 )
h2.parent.replaceWithChildren() # <td> deleted
h2.parent.replaceWithChildren() # <tr> deleted
print h2.parent
if FirstH2:
h2.replaceWith( h2.prettify() + '<table>' )
#h2_tag_idx = h2.parent.contents.index(h2) # other
method to add Tags
#h2.parent.insert(h2_tag_idx + 1, '<b>OK</b>')
else:
h2.replaceWith( '</table>' + h2.prettify() + '<table>' )
FirstH2 = False
print soup.prettify()
But no way, it replace my Tag with the HTML équivalent ASCII code...
I also tried to get every contents in the table and after try to rebuild
several table en put it again in the soup but it failed...
I also tried to get the table in a String and split the string with as
delimiter and reput all subTable into the soup but it failed too...
If someone has an idea, it would be great!
Thanks in advance!
No comments:
Post a Comment