Today i'm navigating to 44 html pages and get from each html page the source content. But before i'm navigating to all the 44 pages i want now to navigate first to only one htmk page i call it main html page and then when finish to get this one htmk page source content then i want to start the process of the 44 pages.
First i have a button click event:
private void toolStripButton3_Click(object sender, EventArgs e)
{
//GetFirstHtmlInfo();
//CheckqueueMainUrl();
GetHtmls();
CheckQueue();
}
For now i'm not using the GetFirstHtmlInfo() and not the CheckqueueMainUrl() Since i'm not using them now so the GetHtmls and CheckQueue are working fine.
This is the GetHtmls method:
private Queue<Uri> myUrls = new Queue<Uri>();
private bool isBusy = false;
private void GetHtmls()
{
for (int i = 0; i < 47; i++)
{
adrBarTextBox.Text = sourceUrl + i;
targetHtmls = (combinedHtmlsDir + "\\Html" + i + ".txt");
Uri targetUri = new Uri(sourceUrl + i);
myUrls.Enqueue(targetUri);
}
}
Then the CheckQueue method:
Uri uri;
private void CheckQueue()
{
if (isBusy)
return; // We're downloading some page right now, don't disturb
isBusy = true; // OK, let's get started
if (myUrls.Count == 0) // No more pages to download, we're done
{
isBusy = false;
return;
}
uri = myUrls.Dequeue(); // Get one URL from queue
getCurrentBrowser().Navigate(uri);
}
And last the WebBrowser document completed event:
int numberofpages = 0;
//DocumentCompleted
private void Form1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
// myUrls.Dequeue();
// If page loaded completly then do something
int urlnumber = uri.ToString().IndexOf("pagenumber=");
string number = uri.ToString().Substring(urlnumber + 11);
int num = Int32.Parse(number);
targetHtmls = (combinedHtmlsDir + "\\Html" + num + ".txt");
if (num == 0)
{
string firstTag = "hdnTotalPages";
string endTag = "/>";
string curr = getCurrentBrowser().DocumentText;
int index = curr.IndexOf(firstTag);
int index1 = curr.IndexOf(endTag, index);
string pagesCount = curr.Substring(index + 22, ((index1 - 24) - index));
numberofpages = int.Parse(pagesCount);
for (int i = 0; i < numberofpages; i++)
{
adrBarTextBox.Text = sourceUrl + i;
targetHtmls = (combinedHtmlsDir + "\\Html" + i + ".txt");
Uri targetUri = new Uri(sourceUrl + i);
myUrls.Enqueue(targetUri);
}
}
StreamWriter writer = File.CreateText(targetHtmls);
writer.Write(getCurrentBrowser().DocumentText);
writer.Close();
isBusy = false; // We're done
CheckQueue(); // Check next page in queue
}
I'n the end i'm getting 48 txt files on my hard disk. And also i'm getting the number 48 in the variable numberofpages.
My problem is:
When i'm getting the number 48 i want to assign it to the loop:
for (int i = 0; i < 47; i++)
I entered 47 but i want to use the numberofpages here since it can be 47 but later it could be 48 or 11 or 120
How can i get the source of the first html only first time get from it the number of pages the same i'm doing it now and assign the number to the loop and then only to start the loop ?
- I want to also to get from another html page to get it's source content the way i'm doing it now but this html page should be process before i'm calling GetHtmls() and this html page i should be able to extract from it stuff in the completed event before i'm doing all the 47 html pages.
If to try to explain it more easy:
Make first process of one html only and in the completed event do something with this html page when it finished loading it.
Then make a second process and this time process first only one html page from the loop for (int i = 0; i < 47; i++) if it's 47 or 48 or 122 first process only the first one in the loop and in the completed event when it finished loading the page do something with it.
Last third process just do the loop for (int i = 0; i < 47; i++) like i'm doing now do all the pages from 0 to 47.
I need to build like 3 processes to do.
I tried but it didn't work for me and i got a lot of mess.
Aucun commentaire:
Enregistrer un commentaire