Sup Cantik: Mengekstrak href dari daftar pesanan HTML

Saya mencoba mengekstrak URL dari dalam daftar pesanan HTML menggunakan modul python BeautifulSoup. Kode saya mengembalikan daftar nilai NONE yang jumlahnya sama dengan jumlah item dari daftar yang diurutkan sehingga saya tahu saya berada di tempat yang tepat dalam dokumen. Apa yang saya lakukan salah?

URL yang saya ambil adalah http://www.dailykos.com/story/2013/04/27/1203495/-GunFAIL-XV Berikut adalah 5 dari 50 baris dari daftar HTML (mohon maaf atas panjangnya):

> `<div id="body" class="article-body">
<ol>
<li><a href="http://www.wacotrib.com/news/city_of_waco/waco_police/grandfather-s-wound-is-a-gunshot-police-say/article_aeccbf93-4f81-5c3f-a304-bc91f6ba45a8.html">WACO, TX</a>, 3/18/13: Police responding to a domestic disturbance call found a man struggling to restrain his grandson, who was agitated and holding an AR-15. The cops shot grandpa. But that would totally never happen in a crowded theater.</li>
<li><a href="http://grossepointe.patch.com/articles/grosse-pointe-park-police-make-several-arrests">GROSSE POINTE PARK, MI</a>, 4/06/13: Grosse Pointe Park police arrested a 20-year-old Detroit man April 6 after he accidentally shot a 9mm handgun into the floor of a home in the 1000 block of Beaconsfield. The man was trying to make the gun safe when it discharged.</li>
<li><a href="http://ottawaherald.com/news/041613shooting">OTTAWA, KS</a>, 4/13/13: No one was injured when a “negligent” rifle shot rang out Saturday night inside a residence in the 1600 block of South Cedar Street in Ottawa. Dylan Spencer, 22, Ottawa, was arrested by Ottawa police about 7 p.m. on suspicion of unlawfully discharging an AR-15 rifle in his apartment, according to a police report. <a href="https://www.facebook.com/OttawaHerald/posts/000000000000000?comment_id=656061&amp;reply_comment_id=656512&amp;total_comments=3">The bullet</a> exited his apartment, passed through both walls of an occupied apartment and lodged into a utility pole. But of course, Dylan didn't think the gun was loaded. So it's cool.</li>
<li><a href="http://www.kobi5.com/component/zoo/item/klamath-falls-man-dead-after-a-shooting.html">KLAMATH FALLS, OR</a>, 4/13/13: An investigation into the shooting death of Lee Roy Myers, 47, has been ruled accidental. The Klamath County Major Crimes Team was called to investigate a shooting on Saturday, April 13. An autopsy concluded the cause of death was an accidental, self-inflicted handgun wound.</li>
<li><a href="http://westhampton-hamptonbays.patch.com/groups/police-and-fire/p/accidental-weapon-discharge">SOUTHAMPTON, NY</a>, 4/13/13: The report states that the detective visited the home and interviewed the man, who legally owned the Ruger 10/22 rifle. The man said he was cleaning the rifle when it accidentally discharged into his big toe. When the rifle was pointed in a downward angle, inertia caused the firing pin to strike the primer, which caused the rifle to fire, according to the incident report. The detective advised the man on safety techniques while cleaning his rifle. (Step one: unload it.)</li>`

Dan ini kode saya:

page= urllib2.urlopen(url)
soup = BeautifulSoup(page)
li=soup.select("ol > li")
for link in li:
    print (link.get('href'))

person user2330011    schedule 28.04.2013    source sumber


Jawaban (1)


Anda mengulangi li elemen yang tidak memiliki atribut href. a tag di dalamnya berfungsi:

import urllib2
from bs4 import BeautifulSoup

url = "http://www.dailykos.com/story/2013/04/27/1203495/-GunFAIL-XV"

page = urllib2.urlopen(url).read()
soup = BeautifulSoup(page)
li = soup.select("ol > li > a")
for link in li:
    print(link.get('href'))
person alecxe    schedule 28.04.2013