오답노트
[MeCab] 형태소 분석 본문
MeCab
MeCab은 Vocab을 통해 형태소를 분석해주는 형태소 분석기이다.
In [ ]:
!git clone https://github.com/SOMJANG/Mecab-ko-for-Google-Colab.git
Cloning into 'Mecab-ko-for-Google-Colab'... remote: Enumerating objects: 115, done. remote: Counting objects: 100% (24/24), done. remote: Compressing objects: 100% (20/20), done. remote: Total 115 (delta 11), reused 10 (delta 3), pack-reused 91 Receiving objects: 100% (115/115), 1.27 MiB | 9.43 MiB/s, done. Resolving deltas: 100% (50/50), done.
In [ ]:
!pwd
%cd Mecab-ko-for-Google-Colab
/content /content/Mecab-ko-for-Google-Colab
In [ ]:
!bash install_mecab-ko_on_colab_light_220429.sh
Installing konlpy..... Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/ Collecting konlpy Downloading konlpy-0.6.0-py2.py3-none-any.whl (19.4 MB) |████████████████████████████████| 19.4 MB 1.2 MB/s Requirement already satisfied: numpy>=1.6 in /usr/local/lib/python3.7/dist-packages (from konlpy) (1.21.6) Collecting JPype1>=0.7.0 Downloading JPype1-1.4.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (453 kB) |████████████████████████████████| 453 kB 46.6 MB/s Requirement already satisfied: lxml>=4.1.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (4.9.1) Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from JPype1>=0.7.0->konlpy) (4.1.1) Installing collected packages: JPype1, konlpy Successfully installed JPype1-1.4.0 konlpy-0.6.0 Done Installing mecab-0.996-ko-0.9.2.tar.gz..... Downloading mecab-0.996-ko-0.9.2.tar.gz....... from https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz --2022-10-03 14:23:03-- https://bitbucket.org/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::22c5:2ef4, 2406:da00:ff00::22cd:e0db, ... Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://bbuseruploads.s3.amazonaws.com/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-0.996-ko-0.9.2.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNC5572T7J&Signature=SpLrn6d6D%2F%2FpTV9ELC%2BtSFtuB%2Bc%3D&x-amz-security-token=FwoGZXIvYXdzEAAaDAx3BYgsIG4X4mmIlCK%2BAc%2F0%2F9gBi3LGJU2XBbGekZo4nHeW%2FDGMxq6jNUuoIxyBiJylw0XBKoRgvaINON%2FBHSeZ9VrB8oPrszpn1dcNya1zl2A2dYvmu1HqPoeJZEunMGqaEmwcbzKEpmfUOrPFY5KyPXryfO7fuPQvNm6yCNKef0QOmHb10WaSbGzTH7t327q3GHfKcxEXl91QPeBTyhFFsfSfWgEYrjRUwnO4hmxs6V6Z2Y9WTYt0Rv1W5v40dHnkle81vim86qSvZlIo59rrmQYyLTma0JKcKIdAEX46y%2Bi3itLacjmwGko0uNiybnHsOZ8ZzXgzsv9bpj1%2FNXCXAw%3D%3D&Expires=1664808047 [following] --2022-10-03 14:23:03-- https://bbuseruploads.s3.amazonaws.com/eunjeon/mecab-ko/downloads/mecab-0.996-ko-0.9.2.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-0.996-ko-0.9.2.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNC5572T7J&Signature=SpLrn6d6D%2F%2FpTV9ELC%2BtSFtuB%2Bc%3D&x-amz-security-token=FwoGZXIvYXdzEAAaDAx3BYgsIG4X4mmIlCK%2BAc%2F0%2F9gBi3LGJU2XBbGekZo4nHeW%2FDGMxq6jNUuoIxyBiJylw0XBKoRgvaINON%2FBHSeZ9VrB8oPrszpn1dcNya1zl2A2dYvmu1HqPoeJZEunMGqaEmwcbzKEpmfUOrPFY5KyPXryfO7fuPQvNm6yCNKef0QOmHb10WaSbGzTH7t327q3GHfKcxEXl91QPeBTyhFFsfSfWgEYrjRUwnO4hmxs6V6Z2Y9WTYt0Rv1W5v40dHnkle81vim86qSvZlIo59rrmQYyLTma0JKcKIdAEX46y%2Bi3itLacjmwGko0uNiybnHsOZ8ZzXgzsv9bpj1%2FNXCXAw%3D%3D&Expires=1664808047 Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 52.216.39.9 Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|52.216.39.9|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 1414979 (1.3M) [application/x-tar] Saving to: ‘mecab-0.996-ko-0.9.2.tar.gz’ mecab-0.996-ko-0.9. 100%[===================>] 1.35M 7.58MB/s in 0.2s 2022-10-03 14:23:04 (7.58 MB/s) - ‘mecab-0.996-ko-0.9.2.tar.gz’ saved [1414979/1414979] Done Unpacking mecab-0.996-ko-0.9.2.tar.gz....... Done Change Directory to mecab-0.996-ko-0.9.2....... installing mecab-0.996-ko-0.9.2.tar.gz........ configure make make check make install ldconfig Done Change Directory to /content Downloading mecab-ko-dic-2.1.1-20180720.tar.gz....... from https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz --2022-10-03 14:24:40-- https://bitbucket.org/eunjeon/mecab-ko-dic/downloads/mecab-ko-dic-2.1.1-20180720.tar.gz Resolving bitbucket.org (bitbucket.org)... 104.192.141.1, 2406:da00:ff00::22cd:e0db, 2406:da00:ff00::22c0:3470, ... Connecting to bitbucket.org (bitbucket.org)|104.192.141.1|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNIFNP5EA6&Signature=ATxsW%2BzUhcA1dikgGtaA4vrjgAE%3D&x-amz-security-token=FwoGZXIvYXdzEAAaDL7E%2FxbtFr7cH09o6SK%2BAUrNlKKmMMnPrVYE26P6zupBlPoWIzGAgnLu77dsiMG%2FsYMmwL7RM3KUHZAlHozWDr6WerRmMUxrJEj6Z42s4%2BHnco8Ka2O8zkL4ENOLDRYqOHWW1JJPBVRsCb0Bmi2QZYqr6mINoz3%2B%2FA85%2FuprHEeEN2SOY6MOoImaln7POLfVgO%2BhzVKohQnZESNuKi8fK9p3VVQcxi1ndVVDgA8t4aRBCeFODSuQ%2FpBo%2Bl5Pnz2M58zZQsplpDICirSqkU0o%2Ft%2FrmQYyLdgMP7HgGb%2Bs%2BTChEKbeEse3z8b9TNanovMJM%2F2muJb%2BmbvJxT2Q5eV6pSgytQ%3D%3D&Expires=1664808710 [following] --2022-10-03 14:24:40-- https://bbuseruploads.s3.amazonaws.com/a4fcd83e-34f1-454e-a6ac-c242c7d434d3/downloads/b5a0c703-7b64-45ed-a2d7-180e962710b6/mecab-ko-dic-2.1.1-20180720.tar.gz?response-content-disposition=attachment%3B%20filename%3D%22mecab-ko-dic-2.1.1-20180720.tar.gz%22&response-content-encoding=None&AWSAccessKeyId=ASIA6KOSE3BNIFNP5EA6&Signature=ATxsW%2BzUhcA1dikgGtaA4vrjgAE%3D&x-amz-security-token=FwoGZXIvYXdzEAAaDL7E%2FxbtFr7cH09o6SK%2BAUrNlKKmMMnPrVYE26P6zupBlPoWIzGAgnLu77dsiMG%2FsYMmwL7RM3KUHZAlHozWDr6WerRmMUxrJEj6Z42s4%2BHnco8Ka2O8zkL4ENOLDRYqOHWW1JJPBVRsCb0Bmi2QZYqr6mINoz3%2B%2FA85%2FuprHEeEN2SOY6MOoImaln7POLfVgO%2BhzVKohQnZESNuKi8fK9p3VVQcxi1ndVVDgA8t4aRBCeFODSuQ%2FpBo%2Bl5Pnz2M58zZQsplpDICirSqkU0o%2Ft%2FrmQYyLdgMP7HgGb%2Bs%2BTChEKbeEse3z8b9TNanovMJM%2F2muJb%2BmbvJxT2Q5eV6pSgytQ%3D%3D&Expires=1664808710 Resolving bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)... 54.231.168.1 Connecting to bbuseruploads.s3.amazonaws.com (bbuseruploads.s3.amazonaws.com)|54.231.168.1|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 49775061 (47M) [application/x-tar] Saving to: ‘mecab-ko-dic-2.1.1-20180720.tar.gz’ mecab-ko-dic-2.1.1- 100%[===================>] 47.47M 65.6MB/s in 0.7s 2022-10-03 14:24:41 (65.6 MB/s) - ‘mecab-ko-dic-2.1.1-20180720.tar.gz’ saved [49775061/49775061] Done Unpacking mecab-ko-dic-2.1.1-20180720.tar.gz....... Done Change Directory to mecab-ko-dic-2.1.1-20180720 Done installing........ configure make make install bash <(curl -s https://raw.githubusercontent.com/konlpy/konlpy/v0.6.0/scripts/mecab.sh) https://github.com/konlpy/konlpy/issues/395#issue-1099168405 - 2022.01.11 Done Install mecab-python Successfully Installed Now you can use Mecab from konlpy.tag import Mecab mecab = Mecab() 사용자 사전 추가 방법 : https://bit.ly/3k0ZH53 NameError: name 'Tagger' is not defined 오류 발생 시 런타임을 재실행 해주세요 블로그에 해결 방법을 남겨주신 tana님 감사합니다. light 버전 작성 : Dogdriip님 ( https://github.com/Dogdriip ) 문제를 해결해주신 combacsa님 감사합니다.
In [ ]:
import MeCab
In [ ]:
tagger = MeCab.Tagger()
In [ ]:
sentence = '이제 Colab에서 Mecab-ko-dic 사용이 가능합니다.'
print (tagger.parse(sentence))
이제 MAG,성분부사|시간부사,F,이제,*,*,*,* Colab SL,*,*,*,*,*,*,* 에서 JKB,*,F,에서,*,*,*,* Mecab SL,*,*,*,*,*,*,* - SY,*,*,*,*,*,*,* ko SL,*,*,*,*,*,*,* - SY,*,*,*,*,*,*,* dic SL,*,*,*,*,*,*,* 사용 NNG,행위,T,사용,*,*,*,* 이 JKS,*,F,이,*,*,*,* 가능 NNG,정적사태,T,가능,*,*,*,* 합니다 XSA+EF,*,F,합니다,Inflect,XSA,EF,하/XSA/*+ᄇ니다/EF/* . SF,*,*,*,*,*,*,* EOS
MeCab 사용자 사전 추가¶
In [ ]:
#사용자 사전 경로
%cd /content/mecab-ko-dic-2.1.1-20180720/
/content/mecab-ko-dic-2.1.1-20180720
In [ ]:
#사용자 사전 경로 내 파일
!ls user-dic/
nnp.csv person.csv place.csv README.md
In [ ]:
#사전 내용 확인
!cat user-dic/nnp.csv
대우,,,,NNP,*,F,대우,*,*,*,*,* 구글,,,,NNP,*,T,구글,*,*,*,*,*
In [ ]:
#사전 내용 추가
!echo "Mecab-ko-dic,,,,NNP,*,T,Mecab-ko-dic,*,*,*,*,*" >> user-dic/nnp.csv
!cat user-dic/nnp.csv
대우,,,,NNP,*,F,대우,*,*,*,*,* 구글,,,,NNP,*,T,구글,*,*,*,*,* Mecab-ko-dic,,,,NNP,*,T,Mecab-ko-dic,*,*,*,*,*
In [ ]:
!bash ./tools/add-userdic.sh
generating userdic... nnp.csv /content/mecab-ko-dic-2.1.1-20180720/tools/../model.def is not a binary model. reopen it as text mode... reading /content/mecab-ko-dic-2.1.1-20180720/tools/../user-dic/nnp.csv ... done! person.csv /content/mecab-ko-dic-2.1.1-20180720/tools/../model.def is not a binary model. reopen it as text mode... reading /content/mecab-ko-dic-2.1.1-20180720/tools/../user-dic/person.csv ... done! place.csv /content/mecab-ko-dic-2.1.1-20180720/tools/../model.def is not a binary model. reopen it as text mode... reading /content/mecab-ko-dic-2.1.1-20180720/tools/../user-dic/place.csv ... done! test -z "model.bin matrix.bin char.bin sys.dic unk.dic" || rm -f model.bin matrix.bin char.bin sys.dic unk.dic /usr/local/libexec/mecab/mecab-dict-index -d . -o . -f UTF-8 -t UTF-8 reading ./unk.def ... 13 emitting double-array: 100% |###########################################| reading ./NP.csv ... 342 reading ./EF.csv ... 1820 reading ./XSN.csv ... 124 reading ./Foreign.csv ... 11690 reading ./Place-station.csv ... 1145 reading ./user-place.csv ... 2 reading ./Person-actor.csv ... 99230 reading ./NNG.csv ... 208524 reading ./J.csv ... 416 reading ./MM.csv ... 453 reading ./VV.csv ... 7331 reading ./VX.csv ... 125 reading ./XPN.csv ... 83 reading ./Hanja.csv ... 125750 reading ./VA.csv ... 2360 reading ./NorthKorea.csv ... 3 reading ./user-person.csv ... 1 reading ./EP.csv ... 51 reading ./Wikipedia.csv ... 36762 reading ./NNBC.csv ... 677 reading ./XSV.csv ... 23 reading ./MAG.csv ... 14242 reading ./Person.csv ... 196459 reading ./NR.csv ... 482 reading ./XSA.csv ... 19 reading ./NNB.csv ... 140 reading ./ETM.csv ... 133 reading ./Inflect.csv ... 44820 reading ./CoinedWord.csv ... 148 reading ./VCP.csv ... 9 reading ./IC.csv ... 1305 reading ./XR.csv ... 3637 reading ./VCN.csv ... 7 reading ./Preanalysis.csv ... 5 reading ./NNP.csv ... 2371 reading ./ETN.csv ... 14 reading ./Symbol.csv ... 16 reading ./Group.csv ... 3176 reading ./MAJ.csv ... 240 reading ./Place.csv ... 30303 reading ./Place-address.csv ... 19301 reading ./user-nnp.csv ... 3 reading ./EC.csv ... 2547 emitting double-array: 100% |###########################################| reading ./matrix.def ... 3822x2693 emitting matrix : 100% |###########################################| done! echo To enable dictionary, rewrite /usr/local/etc/mecabrc as \"dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic\" To enable dictionary, rewrite /usr/local/etc/mecabrc as "dicdir = /usr/local/lib/mecab/dic/mecab-ko-dic"
In [ ]:
# 사전 리빌드
!sudo make install
make[1]: Entering directory '/content/mecab-ko-dic-2.1.1-20180720' make[1]: Nothing to be done for 'install-exec-am'. /bin/mkdir -p '/usr/local/lib/mecab/dic/mecab-ko-dic' /usr/bin/install -c -m 644 model.bin matrix.bin char.bin sys.dic unk.dic left-id.def right-id.def rewrite.def pos-id.def dicrc '/usr/local/lib/mecab/dic/mecab-ko-dic' make[1]: Leaving directory '/content/mecab-ko-dic-2.1.1-20180720'
In [ ]:
tagger = MeCab.Tagger()
sentence = '이제 Colab에서 Mecab-ko-dic 사용이 가능합니다.'
print (tagger.parse(sentence))
이제 MAG,성분부사|시간부사,F,이제,*,*,*,* Colab SL,*,*,*,*,*,*,* 에서 JKB,*,F,에서,*,*,*,* Mecab-ko-dic NNP,*,T,Mecab-ko-dic,*,*,*,*,* 사용 NNG,행위,T,사용,*,*,*,* 이 JKS,*,F,이,*,*,*,* 가능 NNG,정적사태,T,가능,*,*,*,* 합니다 XSA+EF,*,F,합니다,Inflect,XSA,EF,하/XSA/*+ᄇ니다/EF/* . SF,*,*,*,*,*,*,* EOS
'Python > DL' 카테고리의 다른 글
[NLP] 문서 벡터화 & 문서 유사성 (0) | 2022.10.04 |
---|---|
[NLP] 규칙/패턴 기반 자연어 처리 (0) | 2022.10.04 |
[NLP] 형태소 분석과 품사 태깅 (1) | 2022.10.03 |
[keras] ImageDataGenerator (0) | 2022.09.29 |
[YOLO] Object Detection From Pretrained Model (0) | 2022.09.22 |