Random Similarities
Euskara Magyar
Aim
To estimate the probability of drawing the same or similar word in two randomly selected samples - one from Euskara and one from Magyar.
Method
Faults/Failures
We failed to use only fixed size roots of the form CVC for example.
We failed to consider such questions as "Are Euskara initial, medial and final phonemes really independent?".
We failed to consider such questions as "How does Magyar vowel harmony change things, if at all?".
Notes and Observations
The concept of a "random sample" does not just mean that samples are chosen aimlessly or haphazardly. It does NOT even necessarily imply that such a sample is a typical cross-section of its parent group. It refers only to a certain selection process of samples consisting of members from a parent population, in which care is taken to ensure every individual has the SAME CHANCE of being drawn in each sample. This is of the utmost importance.
It is nonsense to try to estimate probabilities of chance resemblances between languages using word lists that have already been selected based on some criteria of "similarity". But that is exactly what some do to "prove" just how likely chance resemblances are between unrelated languages. The sample word lists MUST be chosen randomly so that "we don't fool ourselves" when calculating frequencies and estimating probabilities!
Results
Magyar sample 1 Frequency Analysis
Initial Medial Final ------------------- ------------------- ------------------- Phoneme freq Rel. freq. freq Rel. freq. freq Rel. freq. a 2 0.019801980 25 0.096899225 8 0.079207921 b 5 0.049504950 3 0.011627907 0 0.000000000 c 2 0.019801980 0 0.000000000 2 0.019801980 d 3 0.029702970 6 0.023255814 8 0.079207921 e 3 0.029702970 31 0.120155039 0 0.000000000 f 9 0.089108911 0 0.000000000 0 0.000000000 g 1 0.009900990 12 0.046511628 9 0.089108911 h 9 0.089108911 5 0.019379845 0 0.000000000 i 5 0.049504950 15 0.058139535 2 0.019801980 j 1 0.009900990 2 0.007751938 2 0.019801980 k 7 0.069306931 6 0.023255814 8 0.079207921 l 9 0.089108911 18 0.069767442 9 0.089108911 m 3 0.029702970 5 0.019379845 4 0.039603960 n 1 0.009900990 8 0.031007752 3 0.029702970 o 3 0.029702970 23 0.089147287 0 0.000000000 p 2 0.019801980 4 0.015503876 1 0.009900990 r 2 0.019801980 17 0.065891473 8 0.079207921 s 1 0.009900990 3 0.011627907 5 0.049504950 t 5 0.049504950 6 0.023255814 15 0.148514851 u 1 0.009900990 5 0.019379845 0 0.000000000 v 8 0.079207921 6 0.023255814 0 0.000000000 y 0 0.000000000 0 0.000000000 0 0.000000000 z 0 0.000000000 3 0.011627907 6 0.059405941 á 0 0.000000000 12 0.046511628 0 0.000000000 é 2 0.019801980 12 0.046511628 2 0.019801980 í 0 0.000000000 8 0.031007752 0 0.000000000 ó 1 0.009900990 1 0.003875969 2 0.019801980 ô 0 0.000000000 0 0.000000000 0 0.000000000 ő 0 0.000000000 0 0.000000000 0 0.000000000 ö 2 0.019801980 8 0.031007752 0 0.000000000 ú 0 0.000000000 2 0.007751938 0 0.000000000 ű 0 0.000000000 0 0.000000000 0 0.000000000 ü 1 0.009900990 3 0.011627907 1 0.009900990 gy 2 0.019801980 1 0.003875969 4 0.039603960 ly 1 0.009900990 0 0.000000000 0 0.000000000 ny 1 0.009900990 1 0.003875969 1 0.009900990 ty 0 0.000000000 0 0.000000000 0 0.000000000 cs 3 0.029702970 3 0.011627907 1 0.009900990 zs 1 0.009900990 1 0.003875969 0 0.000000000 cz 0 0.000000000 0 0.000000000 0 0.000000000 sz 5 0.049504950 3 0.011627907 0 0.000000000 Totals 101 258 101 Grand Total 460
For example, the Relative Frequency for Medial "a" is calculated as follows
25/258 = 0.096899225
The others are calculated similarly using the respective column totals.
Euskara sample 2 Frequency Analysis
Initial Medial Final ------------------- ------------------- ------------------- Phoneme freq Rel. freq. freq Rel. freq. freq Rel. freq. a 18 0.178217822 79 0.146840149 27 0.267326733 b 5 0.049504950 14 0.026022305 0 0.000000000 d 3 0.029702970 20 0.037174721 0 0.000000000 e 13 0.128712871 54 0.100371747 12 0.118811881 f 0 0.000000000 1 0.001858736 0 0.000000000 g 7 0.069306931 17 0.031598513 0 0.000000000 h 3 0.029702970 4 0.007434944 0 0.000000000 i 9 0.089108911 53 0.098513011 17 0.168316832 j 5 0.049504950 0 0.000000000 0 0.000000000 k 6 0.059405941 39 0.072490706 1 0.009900990 l 5 0.049504950 22 0.040892193 0 0.000000000 m 3 0.029702970 9 0.016728625 0 0.000000000 n 6 0.059405941 24 0.044609665 9 0.089108911 o 3 0.029702970 24 0.044609665 9 0.089108911 p 1 0.009900990 9 0.016728625 0 0.000000000 r 0 0.000000000 67 0.124535316 2 0.019801980 s 3 0.029702970 12 0.022304833 0 0.000000000 t 1 0.009900990 32 0.059479554 0 0.000000000 u 2 0.019801980 26 0.048327138 18 0.178217822 x 2 0.019801980 2 0.003717472 0 0.000000000 z 5 0.049504950 14 0.026022305 4 0.039603960 dd 0 0.000000000 0 0.000000000 0 0.000000000 ts 0 0.000000000 2 0.003717472 0 0.000000000 tx 1 0.009900990 2 0.003717472 0 0.000000000 tt 0 0.000000000 0 0.000000000 0 0.000000000 tz 0 0.000000000 11 0.020446097 2 0.019801980 ll 0 0.000000000 0 0.000000000 0 0.000000000 ń 0 0.000000000 1 0.001858736 0 0.000000000 Totals 101 538 101 Grand Total 740
Initial Probabilities
Model I Model II Euskara Magyar Est. Prob. Magyar Est. Prob. a a,á 0.003529066 a,á 0.003529066 b b,v 0.006371924 b,p,v 0.007352220 d d 0.000882266 d,t 0.002352711 e e,é 0.006371924 e,é 0.006371924 f f 0.000000000 f 0.000000000 g g 0.000686207 g,k 0.005489658 h h 0.002646799 h,g 0.002940888 i i,í 0.004411332 i,í 0.004411332 j j,h,gy,zs,s 0.006862072 j,h,gy,zs,s 0.006862072 k k 0.004117243 k,g 0.004705421 l l 0.004411332 l 0.004411332 m m 0.000882266 m 0.000882266 n n 0.000588178 n 0.000588178 o o,ó,ö,ő,ô 0.001764533 o,ó,ö,ő,ô 0.001764533 p p 0.000196059 p,b,v 0.001470444 r r 0.000000000 r 0.000000000 s sz 0.001470444 sz,zs,z 0.001764533 t t 0.000490148 t,d 0.000784237 u u,ú,ü,ű 0.000392118 u,ú,ü,ű 0.000392118 x s 0.000196059 s,zs,z 0.000392118 z sz 0.002450740 sz,z,zs,s 0.003431036 dd gy 0.000000000 gy 0.000000000 ts c 0.000000000 c,z 0.000000000 tx cs,cz 0.000294089 cs,cz,zs,ty,z 0.000392118 tt ty 0.000000000 cs,cz,ty 0.000000000 tz c 0.000000000 c,z 0.000000000 ll ly 0.000000000 j,ly 0.000000000 ń ny 0.000000000 ny 0.000000000 Prob of an initial match 0.049014802 0.060288207
For example, the probability of an Initial match between Euskara <a> and Magyar <a> or <á> is approximately
= prob. (Initial Euskara <a>) * ( prob.(Magyar <a>) + prob. (Magyar <á>) )
= 0.178217822 * ( 0.019801980 + 0.000000000 )
= 0.003529066
Medial Probabilities
Model I Model II Euskara Magyar Est. Prob. Magyar Est. Prob. a a,á 0.021058471 a,á 0.021058471 b b,v 0.000907755 b,p,v 0.001311201 d d 0.000864528 d,t 0.001729057 e e,é 0.016728625 e,é 0.016728625 f f 0.000000000 f 0.000000000 g g 0.001469698 g,k 0.002204547 h h 0.000144088 h,g 0.000489899 i i,í 0.008782168 i,í 0.008782168 j j,h,gy,zs,s 0.000000000 j,h,gy,zs,s 0.000000000 k k 0.001685830 k,g 0.005057491 l l 0.002852944 l 0.002852944 m m 0.000324198 m 0.000324198 n n 0.001383245 n 0.001383245 o o,ó,ö,ő,ô 0.005532982 o,ó,ö,ő,ô 0.005532982 p p 0.000259359 p,b,v 0.000842915 r r 0.008205815 r 0.008205815 s sz 0.000259359 sz,zs,z 0.000605170 t t 0.001383245 t,d 0.002766491 u u,ú,ü,ű 0.001873145 u,ú,ü,ű 0.001873145 x s 0.000043226 s,zs,z 0.000100862 z sz 0.000302585 sz,z,zs,s 0.001008616 dd gy 0.000000000 gy 0.000000000 ts c 0.000000000 c,z 0.000043226 tx cs,cz 0.000043226 cs,cz,zs,ty,z 0.000100862 tt ty 0.000000000 cs,cz,ty 0.000000000 tz c 0.000000000 c,z 0.000237745 ll ly 0.000000000 j,ly 0.000000000 ń ny 0.000007204 ny 0.000007204 Prob of a medial match 0.074111697 0.083246880
Final Probabilities
Model I Model II Euskara Magyar Est. Prob. Magyar Est. Prob. a a,á 0.021174395 a,á 0.021174395 b b,v 0.000000000 b,p,v 0.000000000 d d 0.000000000 d,t 0.000000000 e e,é 0.002352711 e,é 0.002352711 f f 0.000000000 f 0.000000000 g g 0.000000000 g,k 0.000000000 h h 0.000000000 h,g 0.000000000 i i,í 0.003333007 i,í 0.003333007 j j,h,gy,zs,s 0.000000000 j,h,gy,zs,s 0.000000000 k k 0.000784237 k,g 0.001666503 l l 0.000000000 l 0.000000000 m m 0.000000000 m 0.000000000 n n 0.002646799 n 0.002646799 o o,ó,ö,ő,ô 0.001764533 o,ó,ö,ő,ô 0.001764533 p p 0.000000000 p,b,v 0.000000000 r r 0.001568474 r 0.001568474 s sz 0.000000000 sz,zs,z 0.000000000 t t 0.000000000 t,d 0.000000000 u u,ú,ü,ű 0.001764533 u,ú,ü,ű 0.001764533 x s 0.000000000 s,zs,z 0.000000000 z sz 0.000000000 sz,z,zs,s 0.004313303 dd gy 0.000000000 gy 0.000000000 ts c 0.000000000 c,z 0.000000000 tx cs,cz 0.000000000 cs,cz,zs,ty,z 0.000000000 tt ty 0.000000000 cs,cz,ty 0.000000000 tz c 0.000392118 c,z 0.001568474 ll ly 0.000000000 j,ly 0.000000000 ń ny 0.000000000 ny 0.000000000 Prob of a final match 0.035780806 0.042152730
Estimation of Probabilities
Probability of finding a random match on a single word with no semantic leeway is :-
Model I (Little Phonetic leeway) is
= 0.049014802 * 0.074111697 * 0.035780806
= 0.000129976
~ 1/7694
Model II (More Phonetic Leeway) is
= 0.060288207 * 0.083246880 * 0.042152730
= 0.000211556
~ 1/4727
If one allows for semantic leeway i.e. more than one meaning is allowed in the search for matching words then the probability of a "match" is calculated as follows:-
Probability = 1 - (1 - p)N
~ N*p for p small (by the Binomial Theorem)
where p = prob. of a match with no semantic leeway and N = number of acceptable meanings allowed for a match.
| No Semantic leeway | Semantic
leeway and the Probability of a random match |
|||
| 10 meanings | 100 meanings | 1000 meanings | ||
| Model I | 1/7694 | 1/770 | 1/77 | 1/8 |
| Model II | 1/4727 | 1/473 | 1/47 | 1/5 |
Appendix A
Two Random samples of size 101
Random Number Random Basque Random Number Random Magyar [pages 1-558] [pages 299-637] 26 amaboskarren 564 pislogás 453 osalari 346 divat 544 ziega 405 felé 537 zatigabeki 502 lé 283 hizkabe 495 köpü 128 borrokalari 508 lég 318 itsasbeso 454 irat 14 aipagabetasun 529 tisztít 258 gutxienez 351 egyetem 336 jo 479 gördül 335 jeinutsu 628 világít 404 mendi 546 nyikorog 429 neska 338 csepered 163 eliza 521 hasad 424 naigabe 330 borotva 225 galtze 439 haza 547 zintzurkoi 548 ócska 522 urripen 524 lágy 527 xehatze 328 bocsánat 266 harraska 630 sugár 274 herabe 567 puha 39 ararteko 528 szorít 23 aldikada 478 fogni 334 jauspen 599 tehén 364 labur 619 váltakoz 533 zalekeria 631 viz 369 lapurreta 632 vizsla 458 panpoxtu 451 ihlet 151 duda 513 lök 150 domeka 537 munka 476 sakela 401 olvad 114 bezuza 391 metszés 426 nasai 347 donga 236 gerizatu 483 old 55 artegatzaile 524 lát 320 iturgintza 394 fehér 27 amaldeko 336 csal 36 apalketa 525 öreged 162 ekintza 470 kerek 293 idortu 315 barázda 164 elikatu 633 völgy 46 arkugiltza 489 kisér 27 amaitu 510 lepel 76 azkara 587 szemérem 325 izendun 478 fakad 142 desgiltzapetu 366 végez 358 korain 431 hajlít 515 uhertze 465 karc 499 tiraka 345 dij 200 etorki 369 vádol 390 lurreratu 447 húgy 126 bizkarraulki 485 tart 330 jakingarri 436 has 46 aritz 392 faggat 168 enbeita 434 hang 377 leherdura 590 szimat 427 negarmuxinkatu 616 vág 305 inoren 335 cirok 29 anaide 476 buj 6 afaltoki 299 abból 349 kilimakatu 382 ég 350 kirik 500 lak 88 barreiari 377 erkölcs 354 komeni 568 hág 425 narru 604 tíz 309 iraizte 637 zsír 387 losintxatu 535 mi 28 amiano 411 fivér 363 kuzkur 555 örül 183 erre 573 rombol 506 txakurkeria 490 kolomp 529 zabalkuntza 368 eleven 487 soineko 426 gyilok 228 garbigai 452 illen 257 gutińo 508 ledér 475 saiatu 427 gyök 528 xukadera 335 cím 172 eragozle 551 ordas 314 irtenarazi 341 csomó 211 eztabaidaka 451 igéz 157 egoki 382 éhes 402 meategi 440 hegy 334 jaungoikotu 453 ingó 153 edertu 412 fojt 248 gona 434 hanyag 160 ehogaitz 513 lyuk 312 irin 361 néz 427 negu 416 föld 348 kezkatze 461 jog 177 erdipurdika 488 kilenc 19 alaitu 326 beteg 84 baldindu 364 szegény 442 omen 584 szár 307 ipurdi 615 üzem 34 aobatez 526 rövid 23 aldiz 609 túr 434 odolisuriz 611 undok 185 errenta 418 fül 2 aberekeria 505 locsol 251 gorroto 359 kerít 404 mendeko 305 apaszt
The Random numbers refer to page numbers in the dictionaries.
References
Copyright © 2003
Last updated 31 May 2003