مرکز منطقه ای اطلاع رسانی علوم و فناوری فصلنامه فناوری اطلاعات و ارتباطات ایران 2717-0411 12 45 2021 1 27 Improving Efficiency of Finding Frequent Subgraphs in Graph Stream Using gMatrix Summarization بهبود کارائی و دقت یافتن یال‌های پرتکرار در خلاصه سازی gMatrix از جریان گراف 95 114 fa مسعود کاظمی دانشجوی کارشناسی ارشد، دانشکده مهندسی کامپیوتر، دانشگاه صنعتی خواجه نصیرالدین طوسی سید حسین خواسته دانشگاه صنعتی خواجه نصیرالدین طوسی حمیدرضا رخصتی دانشگاه صنعتی خواجه نصیرالدین طوسی 2021 2 1 In many real-world frameworks, dealing with huge domains of nodes and online streaming edges are unavoidable. Transportation systems, IP networks and developed social medias are quintessential examples of such scenarios. One of the most important open problems while dealing with massive graph streams are finding frequent sub-graph. There are some approaches such as count-min for storing the frequent nodes, however performing these methods will result in inaccurate modelling of structures based on the main graph. Having said that, gMatrix is one of the recently developed approaches which can fairly save the important properties of the main graph. In this approach, different hash functions are utilized to store the basis of streams in the main graph. As a result, having the reverse of the hash functions will be extremely useful in calculation of the frequent subgraph. Though gMatrix mainly suffer from two problems. First, they are not really accurate due to high compression rate of the main graph and second, the complexity of returning a query is high. In this thesis, we have presented a new approach based on gMatrix which can reduce the amount of memory usage as well as returning the queries in less amount of time. The main contribution of the introduced approach is to reduce the dependency among the hash functions. This will result in less conflicts while creating the gMatrix later. In this study we have used Cosine Similarity in order to estimate the amount of dependency and similarity among hash functions. Our experimental results prove the higher performance in terms of algorithm and time complexity. در سیستم‌های کاربردی، گراف‌ها با دامنه وسیعی از راس‌ها وجود دارند و یال‌ها به سرعت زیادی در قالب جریان گراف تولید می‌شوند. یکی از مسائل موجود در جریان‌های گراف سنگین که به صورت لحظه‌ای وارد می‌شوند پیدا کردن زیرگراف‌های پرتکرار است. خلاصه‌های جریان مبتنی بر طرح، مانند count-min، اطلاعات گره‌های پرتکرار را با دقت قابل قبولی نگهداری می‌کنند ولی ساختار گراف اصلی را از دست می‌دهند. از بین این روش‌ها، gMatrix ساختاری می‌باشد که مشخصات گراف اصلی را نیز حفظ می‌کند. این روش از توابع درهم‌ساز مختلف، برای ذخیره‌ی خلاصه‌ی جریان گراف استفاده کرده و به کمک این توابع و معکوس آنها، زیرگراف‌های پرتکرار را به‌دست می‌آورد. به دلیل داشتن حجم کمتر از جریان اصلی، gMatrix معمولا به پرس و جوها با دقت بالایی پاسخ نمی‌دهد. همچنین این روش از مشکل مرتبه‌ی زمانیِ بالا در پاسخ به پرس‌ و جو‌‌ها هم رنج می‌برد. در این مقاله روش جدیدی ارائه شده است که به ازای هزینه‌ی کمِ حافظه‌ی مصرفی، زمان پاسخگویی به پرس و جو زیرگراف پرتکرار را به صورت چشم‌گیری کاهش می‌دهد. همچنین الگوریتم ارایه شده با افزایش استقلال بین توابع در هم سازی با استفاده از روش شباهت برداری کُساین، احتمال برخورد عناصر در هم سازی شده را کاهش می‌دهد. نتایج آزمایشات تجربی که به زبان C++ پیاده‌سازی شده است و بر روی داده‌های شبکه اجتماعی فرندستر اجرا شده است، نشان می‌دهد که روش پیشنهادی برای یافتن زیرگراف‌های پرتکرار پیچیدگی زمانی و دقت یافتن این زیر گراف‌ها را بهبود می‌بخشد.

http://jour.aicti.ir/fa/Article/Download/14720