데이터 크롤링 편집하기

<br />
===기본 방법===
{| class="wikitable"
|+
!과정
!설명
!방법
|-
|url 열기
|url 문서를 연다.
뷰티플 스프를 불러오는 과정에 유의하자.
|<syntaxhighlight lang="python">
from bs4 import BeautifulSoup
from urllib.request import urlopen

with urlopen(url) as 문서:  # 이처럼 열어 사용한다. with을 사용하면 save()를 따로 할 필요가 없다.
    명령
</syntaxhighlight>
|-
|BeautifulSoup와 연결
|특정 html 문서를 파서와 연결한다.
|<syntaxhighlight lang="python">
html = BeautifulSoup(문서, lxml)  # 파서 라이브러리를 lxml로 지정해 사용한다.
</syntaxhighlight>
|-
|태그 찾기
|위 명령 부분에 작성한다.
find_all() 혹은 find() 사용.
|<syntaxhighlight lang="python">
내용 = html.find('찾을태그', class_='찾을클래스')
</syntaxhighlight>find()는 가장 위에 있는 것 하나만 찾는다.
find_all()은 결과를 리스트로 반환한다.


옵션에서 class 대신 class_를 사용하는 것은 파이썬 내부에 class라는 명령이 있기 때문이다.


찾아내면 태그를 통째로 가져오는데, 택스트만 추출하려면 여기에 다시 .text 로 텍스트 속성에 접근해야 한다.
|-
|
|
|
|}
[[분류:파이썬:데이터 스크롤링]]